We break down the factors that Google gives the most weight to
Target audience: Marketing professionals, SEO specialists, PR pros, brand managers, businesses, nonprofits, educators, Web publishers, journalists. This article originally appeared at Moz and is republished with permission.
By Matt Peters
Chief Data Scientist, Moz
Every two years, the SEO site Moz runs a Ranking Factors study to determine which attributes of pages and sites have the strongest association with ranking highly in Google. The study consists of two parts: a survey of professional SEOs and a large correlation study.
According to our survey respondents, here is how Google’s overall algorithm breaks down — see the chart above. We see:
- Links are still believed to be the most important part of the algorithm (approximately 40%).
- Keyword usage on the page is still fundamental, and other than links is thought to be the most important type of factor.
- SEOs do not think social factors are important in the 2013 algorithm (only 7%), in contrast to the high correlation I’ll outline below.
Page Authority, social signals correlate strongly with higher rankings
We’ll dive into the data in a minute, but here are five key conclusions:
- Page Authority correlates higher than any other metric we measured.
- Social signals, especially Google +1s and Facebook shares, are highly correlated.
- Despite Google’s Penguin release, anchor text correlations remain as strong as ever.
- New correlations were measured for schema.org and structured data usage.
- More data was collected on external links, keywords, and exact match domains.
Cyrus Shepard and Matt Brown organized this year’s survey of 120 SEOs. The survey asked respondents to rate many different factors on a scale of 1-10 according to how important they thought they were in Google’s ranking algorithm. We present the average score across all responses. The highest-rated factors in our survey had average scores of 7-8 with less-important factors generally ranging from 4-6.
To compute the correlations, we started with a large set of keywords from Google AdWords (14,000+ this year) that spanned a wide range of search volumes across all topic categories. Then, we collected the top 50 organic search results from Google-US in a depersonalized way. All Search Engine Search Results Pages were collected in early June, after the Penguin 2.0 update.
When interpreting the correlation results, it is important to remember that correlation does not prove causation.
Enough of the boring methodology, I want the data!
Here’s the first set, Mozscape link correlations:
Correlations: Page level
Correlations: Domain level
Page Authority is a machine learning model inside our Mozscape index that predicts ranking ability from links and it is the highest correlated factor in our study. As in 2011, metrics that capture the diversity of link sources (C-blocks, IPs, domains) also have high correlations. At the domain/sub-domain level, sub-domain correlations are larger then domain correlations.
In the survey, SEOs also thought links were very important:
Over the past two years, we’ve seen Google crack down on over-optimized anchor text. Despite this, anchor text correlations for both partial and exact match were also quite large in our data set:
Interestingly, the surveyed SEOs thought that an organic anchor text distribution (a good mix of branded and non-branded) is more important then the number of links:
The anchor text correlations are one of the most significant differences between our results and the Searchmetrics study. We aren’t sure exactly why this is the case, but suspect it is because we included navigational queries while Searchmetrics removed them from its data. Many navigational queries are branded, and will organically have a lot of anchor text matching branded search terms, so this may account for the difference.
Are keywords still important on page?
We measured the relationship between the keyword and the document both with the TF-IDF score and the language model score and found that the title tag, the body of the HTML, the meta description and the H1 tags all had relatively high correlation:
See my blog post on relevance vs. ranking for a deep dive into these numbers (but note that this earlier post uses a older version of the data, so the correlation numbers are slightly different).
SEOs also agreed that the keyword in the title and on the page were important factors:
Survey: On page
We also computed some additional on-page correlations to check whether structured markup (schema.org or Google+ author/publisher) had any relationship to rankings. All of these correlations are close to zero, so we conclude that they are not used as ranking signals (yet!).
Exact/partial match domain
The ranking ability of exact and partial match domains (EMD/PMD) has been heavily debated by SEOs recently, and it appears Google is still adjusting their ranking ability (e.g. this recent post by Dr. Pete). In our data collected in early June (before the June 25 update), we found exact match domain correlations to be relatively high at 0.17 (0.20 if the exact match domain is also a dot-com), just about on par with the value from our 2011 study:
As in 2011, social signals were some of our highest correlated factors, with Google+ edging out Facebook and Twitter:
SEOs, on the other hand, do not think that social signals are very important in the overall algorithm:
This is one of those places where the correlation may be explainable by other factors such as links, and there may not be direct causation.
Back in 2011, after we released our initial social results, I showed how Facebook correlations could be explained mostly by links. We expect Google to crawl their own Google+ content, and links on Google+ are followed so they pass link juice. Google also crawls and indexes the public pages on Facebook and Twitter.
The future of search: An analysis of a site’s value to users
Looking into the future, SEOs see a shift away from traditional ranking factors (anchor text, exact match domains, etc.) to deeper analysis of a site’s perceived value to users, authorship, structured data, and social signals:
Finally, my MozCon slides contain some more details and data: