Speedy Spider
If you've come to this page, chances are you have been visited by our web spider Speedy Spider (called Speedy for short below). This page aims at answering any questions you may have about this visitor to your site.
- What is the purpose of Speedy?
- How do I prevent my site (or parts thereof) from being crawled?
- Can I set the minimum delay between requests to my site?
- From which IP addresses/hostnames does Speedy operate?
- I have multiple hosts running on the same IP. How can I prevent too many requests from being made to this IP across the various hosts?
- Will Speedy download dynamic pages?
- Does Speedy support HTTP compression?
- My question is not answered here. / How can I get in touch with Entireweb concerning Speedy?
What is the purpose of Speedy?
Speedy is an automated web crawler used to build the search engine index at Entireweb.
How do I prevent my site (or parts thereof) from being crawled?
Speedy obeys the Robot Exclusion standard. To prevent a particular file or directory
of your site (or the entire site) from being crawled by Speedy, place a file named robots.txt
in the root directory so that it may be accessed as http://www.mysite.com/robots.txt.
In this file you may specify disallowed resources in the following format:
User-agent: Speedy
Disallow: /secret
Disallow: /search?
Here we have prevented access to all files located in the directory /secret and to the dynamic module /search?.
To keep all spiders (not just Speedy) out of your site, replace Speedy with an asterisk (*).
Please see robotstxt.org for more information on robots.txt files.
If you're unable to use the robots.txt method, you may also disallow pages by inserting appropriate meta tags in
the pages HTML code. To prevent Speedy from indexing a certain page, add the following line to the header section:
<meta name="speedy" content="noindex">
Speedy discovers new pages by following the links it finds on the web. If you do not wish Speedy
to process the links it finds on your pages, add the line
<meta name="speedy" content="nofollow">
Both these restrictions can be written as the single line
<meta name="speedy" content="noindex,nofollow">
To prevent all bots (i.e. not just Speedy) from indexing/following links, simply change "speedy" to "all".
Can I set the minimum delay between requests to my site?
Yes! Speedy supports the Crawl-Delay robot directive. The following robots.txt lines will
limit Speedy to downloading one page every five seconds (of course you may replace 5 with any delay you like):
User-agent: Speedy
Crawl-Delay: 5
Note that Speedy will automatically adjust its request rate to the speed of your site, so this directive should normally not be needed.
From which IP addresses/hostnames does Speedy operate?
Entireweb's crawlers primarily operate from the IP range 88.131.106.0/26. These IP numbers map to hostnames of the form cNN.entireweb.com, where NN is a hexadecimal number.
I have multiple hosts running on the same IP. How can I prevent too many requests from being made to this IP across the various hosts?
No need! Speedy automatically recognizes virtual hosts on the same IP and limits the number of requests made to that particular IP.
Will Speedy download dynamic pages?
Yes, but since dynamic pages are often database-driven, we limit the number of dynamic pages we download from a particular location during a crawl.
Does Speedy support HTTP compression?
Yes! Speedy always downloads pages using gzip or deflate compression, if the webserver supports this. Using HTTP compression, we're able to reduce the load we put on web servers even more. We advise you to enable HTTP compression in your web server if you have not done so already, as that will enable you to serve pages to all of your visitors a lot faster.
My question is not answered here. / How can I get in touch with Entireweb concerning Speedy?
Speedy is currently in a Beta stage, but we believe it to be a very polished and well-behaved Web robot.
However, in the unlikely event that you experience any problems with it, we would love to hear from you so that we may eliminate any bugs in our systems.
You're also welcome to contact us if you have general questions about Speedy Spider or Entireweb.
You can reach our technical crew directly here.
