The 3 I’s of Smarter Content: Instrumented, Interconnected, Intelligent
As you may know, I work for IBM, primarily as a content strategist specializing in search. My first assignment in this role was the Smarter Planet website. In its purest form, the site is a celebration of all the great work people are doing to prepare our planet for the emerging challenges (some might say crises) it faces: Population growth, limited food and water supplies, global warming, ever more virulent diseases, financial instability etc. All these challenges require infrastructure modernization around the world–transportation, energy, health care, government, finance, education, agriculture, water management, etc.
Information is the core of this modernization. When we systematically monitor our infrastructure, connect these systems together, and gather and analyze the data they produce, we can make more intelligent decisions, and ultimately overcome the world’s challenges. This is the optimism that Smarter Planet inspires in governments, NGOs and corporations around the world. The systems can be as small as an embedded sensor or as large as a mainframe. This is often called the Internet of Things.
As I helped develop the content around this concept for IBM, it often occurred to me that the very process we use to make better digital experiences for our audiences needs to become more instrumented, interconnected and intelligent. We need to use the data we gather about our audience to better address their needs. We need to learn more about how these sites and their assets relate to one another so that our audience can make better sense of the whole collection of content–not just the individual pages in isolation. We need to develop systems that use this information to help us craft more relevant content with our limited resources.
Are we there yet? No. We have a lot of work to do to make our content smarter. Even though Smarter Planet content is smarter story by story and sprint by sprint, the rest of IBM content needs work. That is a long process. But we at least have a strategy or a set of directions for getting there. Let me share them with you.
There’s a whole chapter in our book on using web analytics to make better content decisions. One of my first blog posts on this site was on the topic. The basic metrics are ranking, referrals, bounces and engagements/conversions. As mundane as it might seem, you would be surprised at how few people actually do this in an end-to-end way. I have to train analysts inclined to just use visits/visitors and time on the page as their primary data points. Visitors is a decent stat, especially if you can segment it by new or repeat visitors. But visits and especially time on the page don’t tell me much about what I need to improve on the site. Does a longer than usual average time on page say users liked the content or they got lost?
To develop a decent set of action items, you need a sense of the audience you expect to attract, to what extent you are attracting them, and what they do when they come to your pages. That’s what we attempt to help you with in our book. It starts with keyword intelligence: What words and phrases do your target audience enter into search queries, (or use as hash tags)? Based on that data, you should know how many referrals you can expect if you build an optimal page for a keyword (or hashtag). If they don’t click, you need a better search snippet (or tweet). If they are referred but bounce at a high rate, you are not doing a good enough job of demonstrating the relevance of the page to the referring source. If they don’t bounce, you can get really particular about what clicks they took and how to improve their engagement.
I breezed through that because I didn’t want to seem repetitive relative to the links two graphs above. If you’re confused, click those links, especially the blog post. If this is old hat, it gets better. The model outlined in those links was developed almost 18 months ago. A lot has changed since then. The main thing is, we are creating one tooling platform where all these metrics can be viewed and interpreted together. I’ll highlight one kind of insight we hope to glean with this tooling in particular.
When you look at referral data, ever notice that the null referrer (direct load) is often proportional to search referrals? Here’s an example from the current site I work on:
The first thing you’ll notice is a huge bump in referrals staring in April. Why? Well, we redesigned and relaunched the site on April 14th, optimizing the page for some very valuable keywords. That work resulted in a lot of search referrals. Less obvious is, as the search referrals (gray) increased, so did the direct load (yellow) referrals. Why is that? Our hypothesis involves this use case: People come to our site from search and realize there’s more on the page than they can consume in one visit. So they bookmark it for future reference. Most of the null referrers are people selecting the site from bookmarks.
How can we prove our hypothesis? This is where new and repeat visitors comes in. Suppose search referrals are typically counted as new visitors and bookmarked referrers are typically counted as repeat visitors. When you compare new visitors to repeat visitors, the graphs should match up within a certain margin of error. If they do, we prove our hypothesis. What would that mean? It would mean that our organic search efforts are even more effective than raw referral data suggest. Repeat visitors are even more valuable than new visitors because they are more likely to engage and ultimately convert.
I only bring up this example because it shows the power of combining data from a variety of sources to draw insights about what’s working and what isn’t on your site. In any event, the whole spectrum of web analytics data is essential to building a system of instrumented content. This includes data from outside your site (keyword and social data), referral data, click data, A/B testing data, and other more qualitative data sources such as user surveys and user experience testing. The more data you have, the better your content decisions, and the better your results.
Tim Berners-Lee called it the World Wide Web because the map of the information resembled a web. In an interconnected, hypertext world, the connections between pieces of content become as important as the individual pieces themselves. This is a principle we use in the book:
A piece of content is only valuable to the extent that it is connected to other relevant pieces of content.
Yet most content efforts I audit pay little or no attention to these connections. Almost all of the focus rests on the individual page or asset they are developing. Even on my team, I have to continually interject the bigger picture in our meetings because team members are inclined to focus just on the user experience of the page at hand. Perhaps it is a remnant of print publishing in which each publication is a self-contained unit. On the web, pages are utterly irrelevant if they are self-contained units. They are consumed only in reference to other relevant content.
Time and again, I see duplication, overlaps and gaps in content collections because teams only focus on the nodes of the web and not its fibers. This is why I was inclined to start with Interconnected in my tour of the three I’s: It is a starting point for any effective content effort. In order to create good new content, you have to understand the landscape of existing content in your topic area. We spent three chapters on this in our book because it is so important.
How do you focus on the fibers of the web as well as the nodes? In a word, research. It starts with a content audit on your site. But it quickly evolves into an audit of the whole web (social and static) around the topic. What (or who) are the most influential nodes in the web for your topic? How are they interconnected? What does the cloud of search terms around your topic look like? Who is ranking for these words and why? Again, how are they interconnected?
We ask a whole host of such questions in our book. And we give the reader a roadmap to answering the questions. But it is a long and laborious job. Deadlines don’t often allow for the kind of extensive research we recommend in our book. How do we build intelligent systems to enable this research at the pace of content development efforts?
The tooling we are helping to build will automate this research so that development efforts can better focus on filling gaps and connecting them to relevant nodes in the common content collection known as the web. It will also help us do a better job of developing the right content given our audience and business priorities. We can’t fill every gap, nor should we. We also acknowledge that we have a unique perspective and our job is to influence our audience to come to a shared agenda, not merely to fill an information gap. This is how we stake a claim to our place in the landscape.
If we understand our place in the landscape, we can understand how to create new relevant connections (links, shares) and user acceptance patterns (likes, +1s) around our content efforts. This is how you boost the value of those efforts.
Intelligent content is just the combination of these two sources of information–web analytics and content research–as primary inputs into one content management system. I say “just,” but you know it is a bigger challenge than a simple flow chart suggests. We are talking about a massive and complex data set. The content strategist’s view is similar to the astronomer’s–constellations filled with every kind of thing in the universe. And like the universe, the web is ever expanding. There is really no point in encapsulating it all. What we need is a way to organize it. This is the point of the Semantic Web.
This data set is not only daunting because of its sheer size. It is also quite complex. Human languages are living, breathing ecosystems with multiple meanings for every word and phrase, multiple words for objects and concepts, sarcasm, irony, you name it. It is highly complex in a single language. At IBM, we publish in more than 40. How can we expect to model it in a way that helps us automate our content management decisions in near real time? Linguists and cognitive scientists have been trying to solve this problem for decades.
I believe Watson offers an opportunity to help us solve this problem. Watson is a system that not only parses the semantics of natural language, it helps understand the context of the language through the use of pragmatics as well. Semantics can only help us sort out ambiguities in the meaning of words. But it can’t help us understand all the nuances in how we use words in context. This is where pragmatics is necessary.
In the simplest terms, semantics is the science of meaning, pragmatics is the science of context. In the future, content strategists will depend on systems that understand both meaning and context.
Because Watson was designed to solve the unique challenges of Jeopardy!, it can understand both the semantics and the pragmatics of natural language. The game requires the player to have the information needed to answer questions plus the ability to understand cryptic clues. These two cognitive abilities are needed for a system that inputs listening, keyword research and analytics data and outputs content recommendations. Like Jeopardy! clues, social and keyword data is fragmentary, vague and often ambiguous. It requires Watson’s contextual intelligence to understand the questions audiences need answers to. Answering the questions in a way that makes sense to the audience requires Watson’s natural language semantic intelligence.
A Watson-like system by itself is not enough to create an intelligent content system. That would help us understand the language we should use to better communicate with our audiences. You also need intelligent content governance to eliminate duplicates and fill gaps. Google says there are nearly 8,000 keywords IBM should care about. Ideally, we would have one asset per keyword and user task. If you count both pre-sales and post-sales content tasks, there might be 10 user tasks per keyword. So an ideal state might be 80,000 assets. Suffice it to say we have a lot more assets than this already. And we are nowhere near filling all the gaps. How do we move towards the ideal state while we’re creating new content all the time?
We are working on the business rules that minimize conflicts between content owners. But we will need some kind of board or council to break all ties. We will also need people to implement all the recommendations. Our tooling can help us automate some of that process as well. But automation doesn’t replace humans, it helps them do their jobs better, faster and more efficiently.
The whole point of smarter content is to free content strategists up to think about ways to create better content, rather than bogging them down in manual minutia. I’m an IBMer and that’s what I’m working on.
James Mathewson is search strategy lead for IBM.