Cutting-Edge Data Study: Anchor Text Optimisation in ’17

Tom Smith 3 years ago

Anchor text has always been a crucial part of any off-page SEO strategy. Historically there has been a fine line between acceptable and manipulative levels of exact match anchor text, but a two-year gap between Penguin refreshes has seen webmasters once again using exact match anchor text much more liberally.

This post aims to understand the state of anchor text usage in 2017 and specifically whether exact match anchor text is still being penalised as severely as the first Penguin algorithm, or if a more aggressive use can benefit a site’s rankings.

To inform our analysis we have conducted the biggest correlation study ever on the topic of anchor text. We took the top ten ranking for 100 of the biggest keywords, which totalled 14,599,910 searches per month. We then analysed all the referring domains for our 1,000 ranking URLs, which gave us a sample size of well over 100,000 backlinks.

If you'd like to use our free anchor text template as you go along, click on the button below.

_Blog - Mastermind Anchor Text Button

We’ve included the methodology below the survey but for now let’s just dive right into the data.

The Study

We have used Pearson's correlation coefficient to understand if there is a correlation (if any) between the data and increased rankings. For those not familiar with Pearson’s, it is a correlation measure that will spit out a result between -1 and +1 where numbers closer to the whole signal a stronger positive (or negative correlation).

For the purpose of this study we’re looking for a strong negative correlation i.e. a downward trend with higher figures for lower positions, to demonstrate a correlation between anchor text and rankings.

Pearson’s relies on the input data being free from outliers. We have included both a trim mean and median interpretation of the results. This is an important step, as without this, sound conclusions cannot be drawn from the data.

To begin with, below is a plot of the mean vs the median of unique referring domains pointing towards our 1000 domains, where the blue line represents the mean and the orange line represents the median:

Referring domains

Here we can see both lines have a strong negative trend, down and to the right. This suggests a high correlation between the number of referring domains and rankings. Both the mean and median have a correlation score of -0.85.

We can appreciate that correlation does not necessitate causation, and that any data inferred from this study is not conclusive. However, we can already see that our data supports what is widely considered to be the biggest ranking factor, so we can infer that any other strong correlations demonstrated in this study must also have some weight in Google’s ranking algorithm.

Exact Match

Exact match anchor text refers to a link where an exact word or phrase that a referred page is primarily trying to rank for is used as the link text. For example:

Zazzle Media is an award-winning content marketing agency.

Influence of Exact Match on Rankings

Exact match

Whilst both mean and median have a spike in position three, we can still see a correlation down and to the right. The median correlation is strong, at -0.81 whereas the mean is weaker at only -0.34.

We can visually see that the median is pretty unreliable, in as much as the numbers drop off to zero from position six onwards.

We can also see that Position One does tend to have a higher percentage of exact match anchor text, with somewhere between 5% to 8% exact match anchor text correlating with strong (position one - three) rankings, whereas lower rankings have on average about 3%.

As this data set has excluded outliers, we can’t see the influence on over-optimisation on rankings. The results are startling:

Exact match results

This is an exact opposite trend line, and a correlation of +0.82. We can see that websites whose link profile is saturated with exact match anchor text perform significantly worse than those which do.

We have demonstrated there is a fine line between optimisation and over-optimisation, with no more than 8% exact match being acceptable.

It is important to understand your own niche’s anchor text distribution and our methodology section gives you all the tools to recreate this study for yourself, but back to the study.

Phrase Match

Phrase Match anchor text contains the exact word or phrase, but the anchor also contains additional phrases.

Zazzle Media is an award-winning content marketing agency.

This type of anchor text can also be a compound anchor text which includes both the       brand and the anchor text.

Anchor text study by Zazzle Media content marketing agency.

Influence of Phrase Match on Rankings

Influence of phrase match

The results here are much more volatile. With on average 19% of our 1000 domains anchor text profile being made up of phrase match anchor text, webmasters use it more liberally than exact match, but to no real benefit. The correlation is anywhere between -0.09 and -0.14, which is very weak.

This does seem unusual; while using a keyword as the anchor would benefit rankings, including the term within a string has little effect whatsoever.

One possible explanation for this is that phrase match anchor text tends to utilise quite long anchor text. Both our examples were five words long, which is already pretty long for a link. It seems trying to cram in as many key terms as possible would inflate the anchor text beyond natural levels, and this is something the machine learning algorithm would pick up on - and demote.

If we exclude any anchor text longer than five words, the data is actually more supportive:

Phrase match

It’s still volatile but the correlation is now between -0.24 and -0.13 which does show that shorter anchor texts containing exact match keyword usage correlate better with high rankings, compared to longer anchor text.

It is also worth noting at this point that shorter word count overall correlates with rankings at a rate of -0.19.

Partial Match

Partial Match anchor text contains part of the word or phrase that the referred page is primarily trying to rank for.

Zazzle Media has won awards for its content marketing which has helped to connect brands with their users.

For the sake of our analysis we will also include partial phrase match within this definition:

Zazzle Media has won awards for its content marketing which has helped to connect brands with their users.

Influence of Partial Match on rankings

Influence of partial match

Here we really do have no real correlation. With the mean to median giving us a range between +0.05 and -0.19 this gives us no real evidence to support using any variation of exact match anchor text.

It is worth noting that partial match is used very liberally, in about 45% of all links containing the keyword within the anchor to no detrimental effect, as even with outliers included this seemed to push it more towards a negative correlation. We can conclude that partial match usage doesn’t appear to be penalised by Google.


Based upon Google’s hilltop algorithm, co-citation is where multiple pages are referred to by a single webpage or site. A simplistic way of putting it is this: if site A is topically relevant to the term which is mentioned on the referring page, and site B is also referenced, then site B must also be topically relevant.


Co-citation feels like it is an extremely natural way to link. When talking about a subject it is logical to link to multiple sites on the same topic, but does it have a correlation with rankings?

Influence of Co-citation on Rankings

Co-citation of rankings

Now the results here surprised me - there is no real correlation here either. With a correlation score of between +0.28 and -0.21 we can’t really infer whether co-citation has an influence on rankings.

One possible explanation for this data is that if a large proportion of your links are shared with others in the top ten, then the ranking benefit will be split between you and your competitors, and therefore you will not be able to outrank them with these links alone. This will only be possible through building unique links. Understanding co-citation as a ratio of total is therefore flawed.

If we look at the data by just raw number of co-citation links, it paints a different picture:

Cocitation volume

Here we have a much stronger correlation. With a mean of -0.8 and a median of -0.85 the number of co-citation links directly correlates with position one rankings, and is equivalently as strong a ranking factor as links from unique referring domains.

The mean number of co-citation links is 74, and the median is 13 (compared to the mean/median referring domain being 205/66). This is incredible data, and may save webmasters much time if they focus on building co-citation links, as they are clearly very valuable.

There is a note of caution with the data though, and again it is the very flaw in any correlation study. It is difficult to conclude whether sites have high rankings because they have lots of co-citation links, or if sites with lots of referring domains naturally have lots of co-citation links.

I tried to analyse only the sites with lower than average referring domains which also had above average co-citation links, but the data set became tiny (less than 20 sites). The average position did drop slightly (from 5.5 to 4.9) but further data analysis will need to be able to conclusively answer the questions posed in this survey.


Co-occurrence refers to the presence of the exact match anchor text somewhere on the page, but not specifically marked up as anchor text. This makes sense; after all, if a page is talking in-depth about blue widgets, and refers to a page as an authority on the topic, but without specifically referencing the key phrase in the anchor, is this not more relevant than an article which does not reference blue widgets at all, apart from in the anchor?

 Influence of Co-occurrence on Rankings

Influence of Co-occurance

Again the results here are not particularly interesting. With a mean/median correlation of -0.14 to +0.28 we can’t see any sort of correlation between mentioning the keyword on the page and increased rankings.

It is worth noting that the usage does increase from exact and phrase match anchor text, with the overall average being between 40% and 53%, but this does still seem quite low. One possible explanation would be that, while the topic is covered on the page, there is no exact match usage of the keyword, and only partial or LSI uses on the page.

Providing data on such scale would be near-enough impossible, but is achievable on a smaller data set and is still worth exploring. The following section will give you all the tools required to do this, as well as all the other data in this study.


What we’ve discovered in this study is that exact match anchor text correlates well with position one rankings, and that the optimal level of anchor text across all our 100 position one sites is anywhere between 5% and 8%, but if we take a deep dive into industries we can see that this level fluctuates massively.

Some industries, such as gambling, seem to get away with significantly more exact match anchor texts over, say, the medical industry. The biggest manipulation red flag the machine learning algorithm will be able to spot is not exact match anchor text in excess of an overall average, but over the industry average.

I’ve created this template to allow you to measure your own anchor text ratios over others in your industry. Click the button below to download.

_Blog - Mastermind Anchor Text Button

You’ll need to have the top ten rankings for your chosen keyword (or keywords), an export of all of those sites’ backlinks and the link anchor for those referring pages. I got my information from Ahrefs but any other tool which provides this will be sufficient.

Fire up the template and input data into the sheets from right to left, starting with the Data sheet. Please be aware that I have disabled automatic calculation of formulas otherwise this will be very slow, so I would advise to first copy all the data into the first sheet, then calculate the formulas from right to left, before moving onto the next sheet.

You will need to input the ranking URL and the associated keyword into this sheet in columns A and F, then finally add the keyword again into column I and split the keyword into text to columns by the delimiter space. The spreadsheet here is looking for partial match keywords and will automatically calculate the length of the keyword, so will know how many cells of data to search in.

Word count

Should your keyword only be one word long it will return the same result as for phrase match keywords. The formula can only handle keywords up to four words in length otherwise performance will drop too much for large data sets, so if you have lots of long tail keywords it might be better to contact us to get a bespoke report created.

The Sites sheet will only require you to input the ranking URLs and their associated keywords. Make sure they are corresponding to the correct rank otherwise your data will be invalid. If you want to run this off multiple keywords then simply extend the associate rank downwards.

Website positioning

Finally, the median and mean spreadsheets will populate themselves once you run the formulas on them. These are array formulas (as denoted by the {} brackets) so when recalculating the sheet ensure that the formulas are contained within braces, otherwise you will need to CTRL + SHIFT + ENTER (or CTRL + CMD + ENTER) for these to work.

You should now have all of your data.

Position totals

The mean spreadsheet is working out a trimmed version of the mean, which excludes the top and bottom 5% of results, whereas the median is just a straight median.

At the bottom of the spreadsheet there will be an average amount out of the top ten, as well as the correlation of the metric compared to the position.

This gives you an incredible amount of data. If you’re ranking outside the top ten you can see how many links on average you would need to rank (average row in the total column).

You can see how much exact match anchor text your competitors are using in the Exact column.

You can see just how many co-citation links your competitors have, and you can then go back into the Data spreadsheet and see which sites link out the most to your competitors by filtering the Co-citation column by anything above 0.

In Summary

 If you take away anything from this blog post it should be to learn to monitor your anchor text profile. We've spectacularly demonstrated that there is a fine line between optimal and manipulative, and if you're not monitoring this you can easily get punished.

Without a concrete understanding of what works for your industry, you could get left behind.

_Blog - Mastermind Anchor Text Button

Stay in touch with the Zazzle Media family

Sign up for our monthly newsletter and follow us on social media for the latest news.

Our website uses cookies for various purposes and to enhance the site’s functionality. This helps us understand how you use and interact with the website.

Settings Accept Cookies