You’re nearing the end of the book, so you should have a pretty good handle on what exactly robots, spiders, and crawlers are, right? No doubt you do, but did you know that there is much more to these Internet creatures than just the fact that they crawl from one web site to another? Spiders, robots, crawlers, or whatever else you choose to call them can determine how well you rank in search engines, so it’s best to make friends with them as quickly as possible. Certain strategies will help you find favor with these crawlers (which is the name we’ll use to lump them all together), and others, unfortunately, will help you find your way right out of search engine rankings.
You should already have a general understanding that a robot, spider, or crawler is a piece of software that is programmed to “crawl” from one web page to another based on the links on those pages. As this crawler makes it way around the Internet, it collects content (such as text and links) from web sites and saves those in a database that is indexed and ranked according to the search engine algorithm. When a crawler is first released on the Web, it’s usually seeded with a few web sites and it begins on one of those sites. The first thing it does on that first site is to take note of the links on the page. Then it “reads” the text and begins to follow the links that it collected previously. This network of links is called the crawl frontier; it’s the territory that the crawler is exploring in a very systematic way. The links in a crawl frontier will sometimes take the crawler to other pages on the same web site, and sometimes they will take it away from the site completely. The crawler will follow the links until it hits a dead end and then backtrack and begin the process again until every link on a page has been followed