Search engines are, undoubtedly, incredibly complex beasts. But if we break down their process into three key steps, it can be easier to understand what's going on and how that affects marketing practices like SEO.
Sam is an experienced digital marketing consultant with a specialism in search engine optimisation (SEO). He's led, created and managed the implementation of search marketing strategies for companies, big and small, across a variety of sectors.
Knowing how a search engine works is the first step towards understanding search engine optimisation (SEO). And understanding SEO is paramount to successfully leveraging it as a marketing channel.
So that’s what we’ll look at in this post.
But, fear not, we won’t go too deep. Sure, it would help to know all about machine learning, natural language processing, and information retrieval. But that’s not necessary for our purposes here.
We just need to understand the main components and how they fit together. So when we discuss concepts such as crawling, indexing and ranking with others, we have enough shared knowledge to work together effectively.
Personally, I think this is really important. As an SEO consultant, part of my job is to educate my clients. Over the years I’ve found that by helping my clients learn more about SEO, they become more engaged. They start to understand why I might be recommending something. They buy into the solutions I propose. They bring new ideas to me and figure out ways they can help. They begin to properly value SEO.
This has benefits not only for a consultant-client relationship but also between departments (marketing and development, for example) or, perhaps most obviously, within teams (such as junior and senior SEOs).
So, how does a search engine work?
Ignoring all the complexities in algorithms and servers, a search engine boils down to three core functions:
Crawling is a simple concept. A search engine navigates around the internet looking for new web pages. Each time it finds a new page, it adds it to a list.
To help visualise this process, picture a website as a house. Every page on the website is a room inside the house.
Now, imagine if someone (let’s call them ‘Jim’) has the job of walking around the UK and making a list of not just every house they find, but every room in every house. That’s all crawling is, but instead of a person, you have something called a “bot” or a “spider” tasked with creating the list. For ease, we’ll just call it “the crawler”.
Next up is indexing.
Here, the search engine consults the list of web pages created by the crawler. It revisits each page for a deeper look. It looks at all the headers, copy, imagery, and whatever else is there, to try and understand what the page is about.
When it’s done, the search engine stores the page, and everything it’s learnt about it, to a database. We call this database the “index”. Only once a page has been “indexed” (i.e. added to the index) does it have a chance of appearing in the search results.
Let’s go back to our example. Once Jim has finished making their list of houses and rooms, along comes a second person, Pam. She takes Jim’s list and revisits each house and each room. But, this time, Pam makes detailed notes about the layout, the colours, the items, and so on. Based on what she finds, Pam tries to categorise the room (e.g. is it a kitchen, a living room, a bedroom, etc?).
Continuous crawling and indexing
For simplicity’s sake, we have described a scenario where crawling and indexing happen in an orderly fashion.
In reality, it’s a lot more fluid. The crawler will constantly search for new web pages. The indexer will constantly visit pages the crawler has found.
Both will also regularly revisit pages they’ve been to previously, just to see if anything’s changed.
The final function of a search engine is to retrieve relevant pages from its index when requested by a user.
Let’s go back to our example once more. A third person, Dwight, asks Pam to produce a list of the rooms with a coffee table. Pam consults the list and picks out all the rooms that match. She decides to order the list based on what she thinks Dwight will be most interested in. To help, she uses everything she knows about Dwight, coffee tables, and other people who have asked about coffee tables.
Although simplified, this is pretty much what happens in the retrieval process.
A user types (or says) a search query, which we call a keyword or keyphrase.
The search engine then filters through the index to find every page it has related to that keyword.
It applies an algorithm to rank the pages it found in a suitable order and displays the final list to the user.
The algorithm processes thousands of signals about the different pages it found, the user, and other users to determine an order.
Breaking a search engine into these three separate functions instantly helps us diagnose where we might encounter problems further down the line.
If the crawler has an issue accessing our pages, they might not be added to the list for indexing, which would result in them not being indexed. If the indexer has an issue understanding our pages, the system might not know what keywords and phrases our page should rank for. If the retrieval system’s algorithm decides other pages should be ranked above ours, we might appear lower in the results (or not at all).
Fortunately, this is where SEO comes in.
We can make our website easy for a search engine to crawl, so it definitely finds all our pages. We can improve our content, so the indexer thoroughly understands the topic and purpose of each page. We can do things that might improve how the ranking algorithm calculates our position, such as making the site mobile friendly or quick to load.