Surfacing Unnatural Links > Why You’re Doing it Wrong

We are now well into 2013 and, as expected, Google is upping its game in the ‘fight’ to reduce visibility for badly designed sites and those ‘powered’ by spammy links and link networks.

For those impacted by that new war on ‘spam’ the reality of escaping the clutches of the Google penalty is proving to be more difficult than the majority expected. Link removal campaigns and the use of Disavow have done little to improve visibility for those suffering from filtering and while Google’s ‘penalties’ have moved away from whole-of-site hits to more focused impact it still hurts.

A New Approach to Link Removal

In this article I want to cover a new way to detect unnatural links in a more structured, and dare I say more successful, way. There is a lot of advice on how to find unnatural links on the web but I don’t think any of it is quite right.

We are a year in now from the original penguin (see what I did there!) update yet there are still 81% of websites not recovered according to Search Engine Round Table’s article. There are also a huge number of people under manual penalties that are consistently removing/ disavowing links and reconsidering with Google with no success. This tells us two things:

    1. It is difficult to recover from these penalties and
    2. The unnatural links method has not been perfected.

Before I go into exactly how this new unnatural link detection works I need to talk about the current methods used and why they are not quite right.

The Current Methods Don’t Work

The current unnatural links detection methods mostly involve using the following metrics:

  • Over Optimised Anchor Texts
  • High volume of low quality links
  • High volume of one particular type of link (i.e. blog comments, directory links)
  • Site wide links

The above metrics are the most popular and well-documented techniques to find unnatural links. I completely agree with these metrics and I have written about these techniques in my Penguin Recovery Guide but going by the current statistics of how many websites have actually recovered from the Penguin penalty there is clearly something being missed.

The current methods are good at finding links from websites who have gone overboard on their link building but if there are a lot of branded anchor text links etc that are unnatural these can easily pass through undetected. This got me working hard to find other ways to discover these unnatural links.

It’s not about Over Optimised Anchor Text; it’s all about Trust.

After weeks of researching, testing, failing and testing again I have come up with a unnatural links detection method that is both very structured for speed and actually ignores the old over cooked metric methods used currently. This new method focuses on the ‘trust’ of the websites that linking to you.

To understand why you can find unnatural links from just working out the inherent ‘trust’ of a website you first need to understand how the entire web is linking and basically what Google sees when it looks at the overall link architecture of the web.

The entire link graph is quite a complicated space but to bring it down to every basic level, the following points are true.

  • Authoritative sites links to other authoritative sites.
  • Spammy Sites link to other spammy sites.

Two very basic statements that most people will know but there is so much we can learn from this. Most authoritative sites will only link out to extremely relevant sites that are also authorities in the niche. Because of current and old school link building methods, spammy sites link to other spammy sites in the niche.

Looking at it another way, just as there are in the real world there are ‘good neighbourhoods’ and ‘bad neighbourhoods’ and if your links are found within the latter’s ‘cluster’ (see below for an example of what a cluster looks like from CognitiveSEO’s awesome Link Visualiser)

Screen shot 2013-03-19 at 17.05.08

When you think about it, many of the the old link building methods were very automated; creating quantities of links from such things as blog networks, blog commenting, forums and article sites. With the volume of links being created this generated a huge amount of links pointing from several low trust sites to the ” money” sites.

If you picture this as a link graph as above you have several low quality, low trust pages linking to several sites that are trying to game the system via these automated methods. This makes it very easy for Google to see that there are several sites gaining a lot of page rank from sites that only have low trust. This is a very easy method of detecting which sites to penalise.

The key is look at the trust of the sites linking to you.

How to find the Trust of the Sites Linking to you

Now that we fully understand why we must look at the trust of sites rather than the over cooked metrics. We can look into how we can exactly do that.

The process really is very simple, thanks to Majestic SEO’s link metrics Citation Flow and Trust Flow. This is because Majestic’s link metrics are very similar to what Google use to judge links. Again, to keep it very simple Google uses Page Rank and their secretive Trust Rank and Majestic have the equivalent Citation Flow and Trust Flow.

How to use Citation Flow and Trust Flow to find spammy links

To use Majestic’s metrics to find links it is important to briefly understand what each is.

Citation Flow

Citation flow is basically the same as Page Rank. This will show how much actual page rank juice a link has and how much power it can pass.

Trust Flow

Trust Flow is the actual trust of a link and how much the link is actually trusted based on what links that domain has etc. Because Majestic uses human SEO experts to judge the trust of a site, this is actually a very reliable metric.

When you take it back to Google looking at sites that have a high amount of page rank from low trust sites we can basically do the same using Majestic’s metrics. If a link has a high citation flow and a low trust flow then most of the time that link is going to be spammy.

Create a Trust Ratio from Majestic’s Metrics

What we can do now is use these two link metrics to create our own trust ratio for a site. This is very simple and is the following:

Trust Flow / Citation Flow = The Trust Ratio

A very simple calculation but very useful to finding unnatural links. To prove this works lets look at an authoritative site and a spammy site to compare difference.

Authoritative Site

http://www.cancerresearchuk.org

Domain Citation Flow: 65

Domain Trust Flow: 76

Trust Ratio: 76 / 65 = 1.169

‘Spammy’ Site

http://packersxxxxclub.com/

Domain Citation Flow: 30

Domain Trust Flow: 6

Trust Ratio: 6 / 30 = 0.2

The two sites selected, one is a reputable, trusted charity and the other is a random blog network I discovered. This is a very extreme comparison but you can clearly see that the higher the trust ratio the less spammy the site is. If the ratio is above 1, the trust flow is actually higher than the citation flow and is a very good indicator of a trustworthy site.

Now we can use this trust ratio to work out what links are spammy on a penalised website.

Not Every Niche is the same

Before we actually start looking at the trust ratios of the domains pointing to a site. We have to work out exactly how much trust a specific niche has. The reasoning behind this is quite simple. We can easily look at a site in one niche and work out that any domain with a trust ratio lower than 0.3 for example needs removing but this may be completely different in another niche

Some niches are quite trustworthy but there are some niches such as “electronic cigarettes” or “payday loans” that are very spammy and untrustworthy and we must reflect our trust ratio on that.

Find the Average Trust Ratio of a Niche

Tools Needed: SEO Quake

To find the average trust ratio of a niche we can use a tool such as SEO Quake to export some data from Google’s SERPs. I am not going to explain what SEO Quake is or how to install it as it is beyond the scope of this article but you can find installation details etc here.

1. Download and Install SEO Quake for your browser

2. Set it up so that the Google SERPs overlay is installed.

3. In Google’s search setting, up the amount of SERPs displayed on one page to 30.

Once SEO Quake is installed and Google is showing 30 results on one page search for your main target term for the niche you are in. For this article I am going to use the term “electronic cigarette”. You will see that Google will show the top 30 results of a page and SEO Quake will go and fetch the metrics you have set it to.

All we want SEO Quake to do is to export the top 30 results. You will see a “Show as CSV” button at the top of the SERPs. Click that and it will allow you to copy the results from a text box and paste them into any spreadsheet or text editor such as notepad.     working out trust ratio on spreadsheet

2. Then drag replicates the same formula for each URL so that you know the trust ratio for each URL.

You will see that some of the Trust Ratios will say “#DIV/0″. This is simply where the Citation Flow and Trust Flow are both 0 and cannot be divided. Simply replace these figures with 0 so that the next step works correctly.

all trust ratio images

3. Then work out the average trust ratio of all the URLs at the bottom of the spreadsheet using the formula =AVERAGE(P2:P31), as the diagram shows below.

average trust for niche

We now have the average trust ratio of the specific niche we are looking at and have an idea of what trust ratio the sites have that rank well in the niche and can now use this data to compare it to our own sites.

We looked at a penalised client in the e cigarette niche and they had a citation flow of 31 and a trust flow of 10. Doing the calculation this results in a trust ratio of 0.322. We can clearly see that this site does not have enough trust to rank in the industry hence it is penalised for the low trust/ spammy links it has.

Now onto using this data to find the spammy domains pointing to your site.

Find the Low Trust Ratio Domains pointing to your Penalised Site

Now that we know the average trust ratio of the niche, we can create a figure that if any domains linking to the site are lower than, we will remove or disavow as spammy.

This is the most manual part of the process as you may have to play with the ratio slightly. As a rule, I usually half the average ratio and start there. So in this case I would look at removing domains that are lower than 0.352. Let’s put it to the test.

1. Using majestic site explorer, put in the domain of the penalised site you want to find spammy links for. Make sure the domain option button is selected and not the URL. We want to look at all of the links to the domain. Then select the “Ref Domains” tab to see all referring domains. Export the data by clicking the “Download CSV” button.

*We are using the technique to look at referring domains and not the individual links for one simple reason. The standard Majestic SEO account only lets you export the first 2500 links of a site but allows you to export 1000 domains. You can capture a much larger portion if not all of the link profile by using this to export the referring domains.

Now we have a spreadsheet with all the referring domains of the site.

1. Next, Create a new column to the right of the “TrustFlow” Column called “Trust Ratio”. Enter the calculation to work out the Trust Ratio =Q2/P2.

trust ratio of ref domains

Now we have the trust ratio of all the domains. At this point we can see how trustworthy or spammy the domains pointing to the site are.

2. Add another column to the right of Trust Ratio called “Spammy”. Then in the cell below add the following =IF(R2<0.352,”Yes”,”No”). The diagram below shows this:

if statement creation
What this formula is doing is saying if the Trust Ratio is lower than the trust ratio we decided was spammy. Then label it as a spammy domain.

Replicate this formula across all your domains and you will soon have a list that states a yes or no answer on whether they are spammy or not.

3. Now test a lot of the domains to make sure that the results are accurate. You do not want to be removing any trustworthy domains. If there are some trustworthy sites (it is unlikely) then simply raise the trust ratio to tighten the results. Once you are happy you have a list of spammy domains.

4. Add Auto filtering to the spreadsheet  (see here on how to add auto filtering) so you can select only the domains listed as spammy.

I stuck with the ratio 0.352 for the e cigarette penalised client and there was only one site I had to label as not spammy (basically a supplier who had used bad link building techniques and was low trust themselves). The rest were absolutely terrible sites and just selecting 5 of them below your can see how effective it is at picking up spammy domains that are pointing to this site.

http://conxxxion-remedies-info.info/

http://drugaxxxhab.us/

http://cleanixxxf.com/

http://www.myfitnexxxe.com/

(elements of the domain removed for confidentiality reasons)

Deal with the Spammy Domains

Now that we have a list of spammy domains that have been checked, we can now look into removing them.

If you wish you can go back and find the actual links from each spammy domain so you can contact the sites and request removal. But from the general success rate of being able to contact the webmasters of these domains you can just enter each domain into a text file and submit it to the Google disavow tool. The example below is what you would put into the disavow:

“We have found the following domains that have unnatural links pointing to our site and we wish to have no association with them. domain: drugxxxxhab.us”

Going Forward

Once you have disavowed, you then need to resubmit with Google or wait for the next algorithmic penalty refresh to see if this was enough to revoke the penalty applied.

If it isn’t enough, you can go back to your spreadsheet and tighten the ratio. Try a higher trust ratio in the IF statement put into excel to capture more domains, check them and disavow/ remove the links from them also. Continue doing this until you are revoked from your penalty.

Conclusion

Forget over optimised anchor text. Forget whether all your links are from directory sites. Use this method to look at the actual ‘trust’ your backlinks have and use that to find where you shouldn’t be linking from.

It is a very specific process but so far in testing it has proved really effective in improving profiles of those hit by any link-based penalty. What we need now is an army of testers to apply the theory across a larger data set; which is where you come in! We would love some of you guys to try this out and let us know your results.

And if you need any help with finding your spammy links via this method, get in touch and we can help and send a list back to you.

 

  • markba55

    Adam, is there a danger that focusing on just trust might overlook some other key penalty causes like anchor text and site type?
     
    Awesome to see such an innovative way of detecting links, especially when the current methods are struggling. Great post.

    • AdamJamesMason

      @markba55 Thanks Mark. It is still important to make sure that things like anchor text ratio are correct, however it’s looking like the anchor text ratios are hit by a separate filter rather than these manual and other link based penalties.
       
      When you look at the link profile from a trust point of view you can very quickly discover the spammy blog networks that have used unethical tactics to game the system. If you think about it. removing these very low trust domains from your link profile will ultimately increase the trust of your site which Google is watching and analysing very carefully now.

      • markba55

        @AdamJamesMason and I guess things like anchor text etc are ‘short term’ fixes as they can eventually be re-manipulated over time quite easily… whereas trust is much more long term. Another win for me here is that is you do this and increase the trust of your site, when the penalty gets lifted you have a much stronger platform to rebuild from.

        • Elaine Clay

          @markba55 @AdamJamesMason Chances are by removing the spammy domains you’re tackling the anchor text ratio issue anyway because they’re quite likely to have linked to you using keyword rich anchor text

        • AdamJamesMason

          @Elaine Clay  @markba55  @AdamJamesMason Thanks Elaine. That’s correct  :) The point of this new method is it ignores the links anchor texts. There are some sites now that are very smart to this and are using branded anchor texts to try to stay under the radar. Using this trust based technique to find spammy domains, even they cannot hide :)

        • marcusbowlerhat

          @AdamJamesMason  @Elaine Clay  @markba55 I’ll second that and I have seen sites with almost purely branded anchors still clearly under the wing of the penguin.

  • AdamJamesMason

    This is a quite a big piece with a lot of info on it. Genuinely loved writing it :) If anyone has any questions let me know and I will happily help.

  • marcusbowlerhat

    Hey Adam,

    • AdamJamesMason

      @marcusbowlerhat Hi Marcus, how did you find the post?

      • marcusbowlerhat

        @AdamJamesMason  @marcusbowlerhat Very good, replied above, god knows what happens with this duplicate. :S

  • marcusbowlerhat

    Hey Adam, certainly in our experience, there is a lot more to this than just anchor text and I have seen clearly penalised site with what would look at first glance to be a nice and clean branded anchor profile. But, when you dig in, via majestic or we have been using the link detox tool from link research tools you still see that pattern of links from low quality sites. 
     
    Anchor text seems to be part of the problem, one that is actually fairly easily fixed. I have resolved a few anchor text issues for people that got them back up on one or two heavily hit terms, but still the overall traffic was decimated. 
     
    Very cool.  :)

    • AdamJamesMason

      @marcusbowlerhat
      Thanks very much :) Yes I agree. We have used the link detox tool and seen good results from that also. 
       
      We have looked at over optimised anchor text ratios a lot. I wrote a post on SEJ http://goo.gl/PKg9h that shows the targeted/ branded/ generic anchor texts we use. You may be interested to see that and see how it compares to what you use. But overall, we have seen more clients that are hit by an algorithmic filter than an actually penalty when it comes to over optimised anchor texts. Hence why I started looking for another method.
       
      Please try this method out yourself. The more people test this the better. :)

      • marcusbowlerhat

        @AdamJamesMason Hey, yeah, sorry, terminological gaff there, this seems in most cases I have seen to be algorithmic and the anchor aspect seems related but separate to the quality of the linking sites for sure. Will give this a blast though and let you know how I get on.
         
        Sadly, I am a little excited to give it a go, which shows I don’t get out much. ;)

        • AdamJamesMason

          @marcusbowlerhat Yes I think the anchor texts are related in the sense people have overdone anchor text in the past to game the system. 
           
          Haha, you should have seen me when I was first testing this method out. I nearly hit the roof when it exported a huge list of spammy blog networks. Never before discovered using the old method we had.
           
          Let me know what happens.

        • marcusbowlerhat

          @AdamJamesMason I have had some tussels with clients regarding links to remove, sites under brutal penalties still hanging on to links on sites that look okay, but are to all intents and purposes just dead in the water. 
           
          I tend to think be pretty brutal – if it’s not a real link on an active site then kill it. 
           
          Will have a play with this though for sure and shout you back. :)

  • ProntoTimmy

    Hey Adam, great article and really interesting ideas! I’m excited to try them out myself, but I’m a little unclear on one thing. How did you get SEO Quake to pull data from Majestic? Did you create your own custom parameter for this?
     
    Thanks!
    Tim

  • Marc_McDermott

    Hey great article. I think I know the answer to this question but just want to confirm.
     
    When you look at the maj metrics of the top 30 rankings domains for the niche, are those the metrics of the ranking domain/page or those of the links pointing to the ranking domain/page?

    • AdamJamesMason

      @Marc_McDermott Hi Marc, thanks for commenting. It is actually the metrics of the domain/page themselves. That way you can find a good average of what ranks in that niche.
       
      Ultimately though, the metrics of the domain are based on what links to it anyway so it is all the same thing really.
       
      I hope this helps.

      • Marc_McDermott

        @AdamJamesMason  @Marc_McDermott Any reason why you go with majestic metrics over seomoz moztrust and mozrank? have you found majestic to be better?

        • http://www.zazzlemedia.co.uk/ zazzlemedia

          @Marc_McDermott  @AdamJamesMason Hi Marc. Moz metrics are solid but increasingly fresher data sets are needed with so much changing so rapidly. Majestic’s data is better and their two new metrics feed off that well. Hope that makes sense?!

        • AdamJamesMason

          @Marc_McDermott As already stated we have found Majestic to be fresher. You are welcome to try the other link metrics also. It should work, however you may need to play around with the ratios slightly.

  • AndyPalmerSEO

    Hi Adam,
     
    How have your results been after inmplementing this technique?
     
    Cheers

    • AdamJamesMason

      @AndyPalmerSEO Hi Andy, thanks for commenting. The results have been very consistent across the sites we have used this on. It does vary slightly per niche but it immediately picks out the low quality/ spammy domains i.e. link networks.
       
      What I usually do is start with halfing the average trust ratio of the niche (as said in the article). Then look at a good few of the domains to make sure it is correct. Then remove/ disavow the links and reconsider. If Google still come back saying there are unnatural links then you can simply raise the trust ratio slightly to detect slight higher trust links that could still be causing issues. The beauty of this method is you can easily tweak the aggressiveness of the detection dependent on the niche.
       
      I hope this helps.

  • AndyPalmerSEO

    Sorry its me again! Would you only do this for clients that have actually received a notification via webmaster tools? I just thought that if someone has had a drop in rankings this isn’t necessarily due to the existing poor links but maybe the absence of decent links so using this may cause isssues?

    • AdamJamesMason

      @AndyPalmerSEO Hi Andy, that’s fine ask as many question as you wish.
       
      Yes this method is designed mainly for clients hit with Penguin or a Manual Links Penalty. For cases where there is a drop in rankings I am not sure if this would cause an issue as such. When we have seen a drop in rankings we usually associate it with some of their links losing value (directory links etc). In this case it would find these links and remove them but this would not actually give any kind of gain.

  • MarketStomper

    Great read Adam, really interest concept. I like the the trust flow calculation method to determine spammy links on your profile. What have your rankings or have you seen any improvements to doing this.
    Cheers
    Danny Howard

    • AdamJamesMason

      MarketStomper Hi Danny, Thanks for commenting. Yes we have seen great results from using this method. We have used this as part of our link removal process and seen success with that which has lead to increases in rankings.

  • thomasr

    FAntastic POst, I google+ it for later reference. The bummer is my large company does not do any link building yet we still got a Penguin 2.0 unnatural link warning. So now we have to dig through thousands of URLs and go through all this trouble just to get our traffic back! Even though we did nothing in the first place is mainly sites scrapping or automated crap.

  • peterwatson12

    Ok, so taking everything into account that you mention above, I ran a competitor site through Majestic today because after Penguin he boomed up the rankings. 
    His score was:
    Citation: 28
    Trust: 9
    If I were cleaning up a link profile and found this site was linking to me I would disavow it immediately!
    My questions is, why is this site kicking ass? The on page factors are very weak, and you can see their off page scores mentioned above.

  • shansta

    I have hundreds of backlinks from sites like user.blogger.com, user.tumblr.com, user.wordpress.com – a lot have Citation:0, Trust:0. Would you recommend adding these to my disavow list?

  • rkamreja

    Hi Adam,
    few of my clients got hit from 2.1 update and then i was thinking of using CT and TF of backlinking domians as i was looking for some tool to automate finding CT & TF of backlinks and found your post few days ago. i am very much pleased with the info provided. Now i am in process of trying out your method on one website to check on the effects and i will update as soon as i see notable result. 

    i am not using paid version of majesticseo, but rather free registered account, i can verify the site and able to download the CT & TF data of linking domains.

    My question is.
    when looking for CT and TF should i use Historic Data or Last 90 days data to put the info in the excel sheet?