Happy New Year! I have been taking a nice long holiday break and am now ready to get back into blogging. I haven’t been idle over the break. I’ve mostly been writing for other projects, such as my InformIT page and a video lecture series I am getting close to releasing.
In the course of the research for these two projects, I made a startling discovery: The Google Panda algorithm is a radical attempt to equate content quality with SEO, as much as an automated system can do so. I knew that Google said this about content quality when it released Panda, and I even wrote about it on this blog. But I didn’t understand the inner workings of how Google makes this happen. Plus, I didn’t really believe that you could develop an algorithm that truly favored content quality until I started researching the way Panda is built.
Savvy readers will notice I used the present tense in the previous paragraph–it was intentional. Google has developed Panda to be continually improved. Panda’s on version 2.6 since it’s initial release in March of 2011. It issued 30 new improvements in December alone. It’s called machine learning,–a method borrowed from artificial intelligence–which describes how to train a system for continual improvement. It’s similar to the method that IBM used to hone Watson’s Jeopardy!-playing skills.
Automation is necessary because of the sheer volume of content on the web. But the real key lies in the inputs Google gave to the algorithm–and the way it analyzed those inputs– before honing it through machine learning. The inputs were derived from perhaps hundreds of quality testers, who ranked thousands of pages for content quality. Google took this data and crunched the numbers to derive some rules. Then the machine-learning program honed the rules, and continues to hone them over time, getting ever more accurate. The end result is an algorithm that places a premium on content quality over the simplistic checklists and other tools common to traditional SEO.
SEOMoz’s Rand Fishkin does the best job of explaining what this means for SEOs in the future.
Note: Content Quality is as much a function of the whole collection of pages and other assets as it is of a particular page. It won’t do to focus all your attention on the top-level portal and let the lower-level pages go. The quality of your whole corporate site stands or falls as a collection. Though individual pages can do better than others, poor-quality pages, duplicates, and other features common to poor content strategy pull the whole collection down. The antidote to poor content quality is good content strategy.