
Search engines - this is special software that constantly scan the content on the Internet. Here you need to make a small but very important correction - robots crawl only text, ie only web pages in languages html, htm, shtml, xml etc. All other files (archives, graphics, music, video) do not touch the robots. Most often, the word robot, use the word search engine or search engine, although it is not true. Search engine in a simplified form can be written as set of interrelated elements, which necessarily includes: 1.
Crawler 2. Database 3. Interface to the users (web site) In order not to confuse the readers of this list, I intentionally removed such items as query processor, and various additional services, which have each search engine. Why robots? Internet - a vast network that contains a wealth of information in any way which may be, but need oriented, ie be able to find the right time the right data. Just for this and want the search engines. To a search engine know what address on the Internet that is, he needs to preview all sites and to place their content in its own database.
This is just what and deals with the search robot. Then, when queried, the search engine searches its own database and gives the user the results of his inquiry. It would seem that so much fuss over the fact that the site will be once the program and read it. But robots are browsing sites not one or two times, they do it constantly, because information in the network are constantly changing, some sites reappear, some stop working on some pages there are changes, so a database search engine must constantly make information about all the changes that have occurred in the network. Otherwise, month results, issued in response to requests, will be out of date, therefore, unsatisfactory. The more powerful the computer on which the robot software, the greater the number of pages can be seen per unit time (eg, per hour or per day). This page view is called indexing. When the robot through all the pages, saying that the site is indexed. But the Internet a huge number of Web pages, as the robot manages to bypass all? Robots are configured so as to go to different sites with different spacing. If the site is updated very often, the robot visits him once a day or more. If, however, over and over again visiting the same site, the robot does not find it any changes and additions, the frequency of visits to this site then gradually reduced. As a result, indexing a site can only take place once a month or less. As the robot focuses on the network? Moving from site to site through the links. When the robot once again looks for updates site, it says it all the links, some of them are already known to him (ie the addresses of these sites already in its database), and some for the first time he sees. In the second case, the robot immediately goes to a new link, or puts it in his "job" and return to it after a while.
Recent Comments