Last updated on March 6, 2023 by Antti Koskenrouta

I wrote this blog post originally for the Public Safety and Homeland Security Bureau’s internal blog. Its purpose was to explain the very basics of how search engines, such as Google, actually work. It was published on March 29, 2010

Using search engines has become an integral part of Internet surfing, and Google dominates the market. This is exemplified by, for example, the fact that “to Google” has become a synonym for doing a web search. Have you ever heard anyone say “Yahoo that”?

But have you ever thought about why Google shows the results it does for your search terms? To understand this, first we need to look into what is happening behind the scenes of Google search.

At any given time Google’s robots, or ‘bots’ for short, tirelessly crawl the Internet to find new pages and detect updates to pages it already knew about. The bot analyzes each page; its body text, headings, images, everything, and stores the information on Google’s centralized computers. This is called Indexing. Think of indexing as the over zealous librarian, who goes from shelf to shelf, reads every book, and remembers exactly where they can be found in her library.

So now Google has made meticulous notes about the pages on the Internet, but it still does not explain what pages Google shows based on your search? Enter the Algorithm. We’ll use capital A, because it is the driving force of Google’s search.

The official definition of algorithm is:
a precise rule (or set of rules) specifying how to solve some problem

http://wordnetweb.princeton.edu/perl/webwn?s=algorithm

In Google’s case, the Algorithm is a complex mathematical formula, whose ‘problem’ is to find the most relevant page to the search terms from its index. It is constantly evolving and it is estimated that it has hundreds of variables that determine the result. The Algorithm’s specifics are a secret comparable with Coca Cola’s recipe or Fort Knox’s door codes.

However, through reverse engineering, industry people have been able to decipher many of the variables. They unanimously agree that one of the biggest factors, if not the most important one, is the links pointing to a page. This is also what separated Google from other search engines in the beginning; the founders of Google, Sergei Brin and Larry Page got this idea from the academic world, where the most referenced articles are usually the best on a given topic.

Here is an overtly simplified example of how it works in Google: If there were two pages on topic A, and Page 1 had three links pointing to it and Page 2 only one, Google would determine that Page 1 must be better because more people link to it and show it above Page 2 in the search results. In reality it is not this simple, because according to Google, not all inbound links are created equal. The Algorithm evaluates the quality of the links based on a set of characteristics. These include the topic relevancy between the pages and how popular the linking page itself is (measured, in turn, by how many links point to the page and the quality of those links).

In addition to these external factors, Google also evaluates so called on-page factors. For best results, the key is to make the page concise, on-topic, well-structured and to code it in a logical way to make it easy for the bots to crawl and understand the content.

Like with so many everyday things, we often pay very little mind to how things work. Now that we’ve covered the basics, the next step would naturally be to cover how a webmaster, or anybody who makes websites, can increase the visibility of his and her pages on Google. That is called Search Engine Optimization, or SEO, but that will be the topic for a later post.


Glossary
Indexing = The action of Google’s robots skimming through millions of web pages and making note of their content
SEO = Search Engine Optimization, the actions to make a page rank better in Google
Algorithm = Google’s very secret, complex mathematical formula, which determines what pages should be returned as search results based on the searched keywords.

Categories: Blog