As SEOs, we well understand the value links contribute to ranking websites in search results.
At their most basic, links are counted as “votes” of popularity for search engines to rank websites. Beyond this, search engineers have long worked to extract a large number of signals from the simple link, including:
- Trustworthiness – Links from trusted sites may count as an endorsement
- Spamminess – Links from known spam sites may count against you
- Link Manipulation – Looking at signals such as over-optimization and link velocity, search engines may be able to tell when webmasters are trying to “game” the system
One of the most important signals engineers have worked to extract from links is topical relevance. This allows search engines to answer questions such as “What is this website about?” by examining incoming links.
Exactly how search engines use links to measure and weigh topical relevance is subject to debate. Rand has addressed it eloquently here, and again here. Over the years, several US patent filingsfrom Google engineers demonstrate exactly how this process may work. It’s important to look at these concepts to better understand how incoming links may influence a website’s ability to rank.
This is the “theory” part of SEO. As usual with these types of posts, a huge thanks to Bill Slawski and his blog SEO by the Sea, which acted as a starting point of research for many of these concepts.
1. Hub and authority pages
In the beginning, there was the Hilltop algorithm.
In the early days of Google, not long after Larry Page figured out how to rank pages based on popularity, the Hilltop algorithm worked out how to rank pages on authority. It accomplished this by looking for “expert” pages linking to them.
An expert page is a document that links to many other topically relevant pages. If a page is linked to from several expert pages, then it is considered an authority on that topic, and may rank higher.
A similar concept using “hub” and “authority” pages was put forth by Jon Kleinberg, a Cornell professor with grants from Google and other search engines. Kleinberg explains:
“…a good hub is a page that points to many good authorities; a good authority is a page that is pointed to by many good hubs.”
– Authoritative Sources in a Hyperlinked Environment (PDF)
These were eloquent solutions that produced superior search results. While we can’t know the degree to which these concepts are used today, Google acquired the Hilltop algorithm in 2003.
2. Anchor text
Links contain a ton of information. For example, if you link out using the anchor phrase “hipster pizza,” there’s a great chance the page you’re linking to is about pizza (and maybe hipsters).
That’s the idea behind several Google PageRank patents. Earning links with the right anchor text can help your page to rank for similar phrases.
This also explains why you should use descriptive anchor text when linking, as opposed to generic “click here” type links.
Beyond the anchor text, other signals from the linking page — including the title and text surrounding the link — could provide contextual clues as to what the target page is about. While the importance of anchor text has long been established in SEO, the influence of these other elements is harder to prove.
3. Topic-sensitive PageRank
Despite rumors to the contrary, PageRank is very much alive (though Toolbar PageRank is dead).
PageRank technology can be used to distribute all kinds of different ranking signals throughout a search index. While the most common examples are popularity and trust, another signal is topical relevance, as laid out in this paper by Taher Haveliwala, who went on to become a Google software engineer.
The concept works by grouping “seed pages” by topic (for example, the Politics section of the New York Times). Every link out from these pages passes on a small amount of topic-sensitive PageRank, which is passed on through the next set of links, and so on.
When a user enters a search, those pages with the highest topic-sensitive PageRank (associated with the topic of the search) are considered more relevant and may rank higher.
4. Reasonable surfer
All links are not created equal.
The idea behind Google’s Reasonable Surfer patent is that certain links on a page are moreimportant than others, and thus assigned increase weight. Examples of more important links include:
- Prominent links, higher up in the HTML
- Topically relevant links, related to both the source document and the target document.
Conversely, less important links include:
- “Terms of Service” and footer links
- Banner ads
- Links unrelated to the document
Because the important links are more likely to be clicked by a “reasonable surfer,” a topically relevant link can carry more weight than an off-topic one.
“…when a topical cluster associated with the source document is related to a topical cluster associated with the target document, the link has a higher probability of being selected than when the topical cluster associated with the source document is unrelated to the topical cluster associated with the target document.”
– United States Patent: 7716225
5. Phrase-based indexing
Not going to lie. Phrase-based indexing can be a tough concept to wrap your head around.
What’s important to understand is that phrase-based indexing allows search engines to score the relevancy of any link by looking for related phrases in both the source and target pages. The more related phrases, the higher the score.
In addition to ranking documents based on the most relevant links, phrase-based indexing allows search engines to do cool things with less relevant links, including:
- Discounting spam and off-topic links: For example, an injected spam link to a gambling site from a page about cookie recipes will earn a very low outlink score based on relevancy, and would carry less weight.
- Fighting “Google Bombing”: For those that remember, Google bombing is the art of ranking a page highly for funny or politically-motivated phrases by “bombing” it with anchor text links, often unrelated to the page itself. Phrase-based indexing can stop Google bombing by scoring the links for relevance against the actual text on the page. This way, irrelevant links can be discounted.
6. Local inter-connectivity
Local inter-connectivity refers to a reranking concept that reorders search results based on measuring how often each page is linked to by all the other pages.
To put it simply, when a page is linked to from a number of high-ranking results, it is likely more relevant than a page with fewer links from same set of results.
This also provides a strong hint as to the types of links you should be seeking: pages that already rank highly for your target term.
7. The Golden Question
If the above concepts seem complex, the good news is you don’t have to actually understand the above concepts when trying to build links to your site.
To understand if a link is topically relevant to your site, simply ask yourself the golden question of link building: Will this link bring engaged, highly qualified visitors to my website?
The result of the golden question is exactly what Google engineers are trying to determine when evaluating links, so you can arrive at a good end result without understanding the actual algorithms.
About those links between sites you control…
One important thing to know is this: in nearly all of these Google patents and papers, every effort is made to count only “unbiased” links from unassociated sites, and discount links between sites and pages related to one another through preexisting relationships.
This means that both internal links and links between sites you own or control will be less valuable, while links from non-associated sites will carry far more weight.
Researching the impact of topical links
While it’s difficult to measure the direct effect these principles exert on Google’s search results (or even if Google uses them at all), we are able to correlate certain linking characteristics with higher rankings, especially around topical anchor text.
Below is a sample of results from our Search Engine Ranking Factors study that shows link features positively associated with higher Google rankings. Remember the usual caveat that correlation is not causation, but it sure is a hint.
It’s interesting to note that while both partial and exact match anchor text links correlate with higher rankings, they are both trumped by the overall number of unique websites linking to a page. This supports the notion that it’s best to have a wide variety of links types, including topically relevant links, as part of a healthy backlink profile.
Practical tips for topically relevant links
Consider this advice when thinking about links for SEO:
- DO use good, descriptive anchor text for your links. This applies to internal links, outlinks to other sites, and links you seek from non-biased external sites.
- AVOID generic or non-descriptive anchor text.
- DO seek relationships from authoritative, topically relevant sites. These include sites that rank well for your target keyword, and “expert” pages that link to many authority sites. (For those interested, Majestic has done some interesting work around Topical Trust Flow.)
- AVOID over-optimizing your links. This includes repetitive use of exact match anchor text and keyword stuffing.
- DO seek links from relevant pages. This includes examining the title, body, related phrases, andintent of the page to ensure its relevancy to your target topic.
- DO seek links that people are more likely to click. The ideal link is often both topically relevant and placed in a prominent position.
- AVOID manipulative link building. Marie Haynes has written an excellent explanation of the kinds of unnatural links that you likely want to avoid at all cost.
Finally, DO try to earn and attract links to your site with high quality, topically relevant content.