The quality of user-generated content varies widely. As I discussed in an earlier post, it's possible to separate the wheat from the chaff using combinations of explicit and implicit metadata. But once you've identified the good stuff, you start to find more and more of it. User-generated content on successful sites accumulates in real time--lots of it. How do you present it in meaningful ways? How do you keep the presentation of "best" content fresh? How do you make it findable, rememberable, parsable? You need to set it up to self-organize, and creating a folksonomy is a great way to start.
In a traditional folksonomy (there are several uncommon kinds I won't get into now), users add "tags" or labels to individual content objects. These tags become the basis for a living, breathing categorization scheme that informs search and navigation. On sites like Flickr, folksonomy is used in powerful ways to organize photos into a multi-level hierarchy, which can be filtered by "interestingness" (Flickr's quality concept), by location, by camera, and more to produce dazzlingly multifaceted content organization. Take a look at this page, a Flickr "tag cluster" filtered by interestingness:
One strength of tagging systems is that they can organize content across an unlimited number of pivots (though the value of that capability, in terms of informing navigation, decreases as the number of pivots increases). For example, an apple can be tagged with both "fruit" and "red," making it findable within category schemes based on either food type or color.
This is really wonderful stuff. But folksonomy as rendered by tags has its limitations, especially in contexts where there are fewer content objects or less incentive for users to take action to tag them.In such situations, creating dynamic relationships between objects based on combinations of explicit and implicit metadata adds new layers of meaning, helping users discover content of interest.
There are lots of ways to accomplish this. I'll describe a few in this post, but what it all boils down to is increasing discoverability by grouping objects and presenting them in association with each other. If you're interested in one battery charger, for example, there's a decent chance you'll be interested in another. But from there it gets a little more complicated.
Basic Similarity
When I say basic similarity, I actually have in mind a specific kind of rule governing the association of content objects, namely, that they share an attribute. For example, when I view a video on a social media site, the system might suggest other videos I might want to see based on a common tag, a shared word in the title, or a common creator.
But in systems with user-generated content, there are often a huge number of objects. Most often, there needs to be a threshold of similarity applied in order to narrow the number of similar items, such as a certain number of common tags applied, shared tags within taxonomic groupings, or association within purchase patterns.
Complex Similarity
Basic similarity is rarely enough. Imagine shopping, for example, for a camera lens. Looking at a detail page for a particular lens, you see a list of "related items." If this list were to include every other lens on Amazon.com, you'd have a gigantic list that wouldn't be helpful. Likewise with a list of all Canon products. But a list with multiple shared attributes, such as "lens" and "Canon" is potentially more useful. But you can take it even further than that by layering in implicit metadata--information provided by people.
In this screenshot from Amazon, there are implicit and explicit metadata layers added to the basic similarity construct. In this case, the user-driven similarity is among search queries. The set of similar queries describes a set of user sessions in which purchases were completed from within the objects returned by the searches. The objects purchased.are therefore similar.
You might wonder: Couldn't you get to this set of relationships using simple metadata from within a controlled vocabulary? The answer is no, because the similarity is ultimately constructed of value judgements by humans. People interested in the same stuff as me decided to buy these items. That's a layer of social information you can only get with robust behavioral metadata. Here's an abstract picture of how this looks:
Each of the ovals represents a content object, and each line represents a set of shared attributes. The attributes shared are the same in every case.
Complementarity
Complementarity is not the same thing as similarity, and it's very useful to think specifically about the difference. Objects that are complementary are not "like each other" in the sense that similar objects are. Instead, they sort of... "go together."
But what does it mean for objects to go together? How can we understand the relationships between peanut butter and jelly, peanut butter and honey, peanut butter and bananas? Each peanut butter complement has a relationship with peanut butter, but they don't share the same relationship with each other (I'm sure someone out there eats honey and jelly sandwiches, but that's not complementarity, it's surrealism).
So the relationships among complementary items are differently structured than the relationships among similar items. Whereas items similar to a given object are also generally similar to each other, that's not generally the case among complementary items. You can picture webs of similarity, but complementarity, from a structural perspective, looks more like a spokes on a wheel.
Here's an example from Amazon to illustrate the point:
This illustration is from the detail page of a camera lens. The lens is the primary content object on this page. Each of the items in this list are secondary content objects--each "goes with" the lens. The secondary objects are related to the primary object in a singular relationship, and they are meaningfully related to each other mainly by virtue of their parallel relationships with the primary object.
The lens is the hub of the wheel, and each of the items pictured above is connected to the hub via a spoke.
But if complementarity is a wheel-shape, how do we understand the nature of the spokes in a way that allows us to build complementary relationships into a web site architecture?
Complementarity is about supporting a core function of, completing, or adding value to an object. So to architect complementarity we need to understand a type of "aboutness," the thing that the object is good for.
Behavioral metadata doesn't tell the whole story of complementarity. Here's a case where hybrids of taxonomy and folksonomy come in handy.
Preference Among Similar
Adding yet another layer of social data to groups of similar objects, you can create another kind of value. In this example from Amazon, similar items are ranked by strength of correlation between object views and purchases. Here an implicit judgement, expressed through sales conversion, shows which of the similar objects is most preferred by other users.
Very useful for comparison shopping, especially among groups of complex, similar, or specialized objects (like digital cameras). Here's what this type of relatedness looks like in an abstract architectural view:
The objects are identified as similar by virtue of their shared attributes. The percentage indicated on each object indicates its percentage of total purchases within the group as a whole.
Preference needn't always be based on implicit metadata like sales conversation, though. Here's an example of preference among similar from YouTube that uses ratings. After a video plays all the way through, the YouTube video player offers up some suggestions about what to watch next. The suggestions are similar to the video that just played, in this case on the basis of their shared authorship and title words. Preference is expressed via ratings, so the suggestions are the top-rated similar videos.
And this makes perfect sense: If you watch a video all the way through, there's a decent chance you liked it and would be interested in discovering similar videos of high quality.
Affinity Recommendations
Simply put, affinity recommendations are recommendations based on people with whom you have preferences in common. The logic goes: We both loved Friday the 13th Parts 1 through 6; you've seen Halloween IV and liked it; therefore there's a decent chance I'll like Halloween IV.
Netflix has made a huge investment in its recommendation engine, and affinity recommendations are a huge part of how it works. Netflix has recognized that choosing a movie to rent is very often a socially-driven activity. Faced with thousands of choices, we turn to friends for advice. But the best recommendations aren't made just by friends--they're made by people with whom we share a common taste in movies. Netflix makes the degree of commonality explicit. Here's how it looks:
Netflix has built in a number of social features around explicit relationships with the people we know, and they're constantly tinkering. But affinity recommendations aren't always situated within existing relationships. In many cases the shared preferences are enough (including on Netflix, in the absence of "friends").
Here's what this looks like in the abstract:
In this diagram, the big bubbles represent people. The people are color-coded to indicate a profile of preferences. In the Netflix example, these preferences are explicit--ratings of particular movies. (I'm not sure whether Netflix also looks at rating patterns within classes of similar movies--but they certainly could if they needed to build a more extrapolated flavor of affinity. My guess is that they have sufficient volume of ratings that they don't need to extrapolate.) But preferences needn't be explicit in all situations. Preferences can also be gleaned through behavioral metadata and through algorithmic combinations of explicit and implicit metadata.
In this diagram, the two people represented by red bubbles share preferences for objects represented by small bubbles A, B, C, D, and E. Because person 2 also liked objects F and G, the system can present affinity recommendations of objects F and G to person 1.
Obviously, related ness gets pretty complicated at this level. For example, if person 1 has already expressed a non-preference for objects F and G, they'll be annoyed if you keep recommending them. So you need to build controls for that kind of scenario.
Nonetheless, the payoff for a strong system of affinity recommendations can be huge, in terms of overall perceived quality, conversion, and social collateral. If you're working with a system that includes a strong base of dedicated users and many content objects, you can add a lot of value.
The Devil in the Details
As with all social systems, even the most carefully-built system is likely to function a little different than you imagine after you let a bunch of unpredictable humans play with it for a while.
Keep close tabs on the health of your related items engine. Plan for and retain budget to tweak ongoingly. Establish KPI's to measure system health, and run A / B tests to optimize performance.
Above all, as always, have fun with your metadata!



As a non-geek with a growing interest in such self-organizing and online-based Wisdom of the Crowds methods your pithy primer here was invaluable- even though i had to read it 4 times it did become understandable. You got me interested enough to want to understand it. Thank you.
Have you found Here Comes Everybody a helpful book btw (I did) and do you have other reading to rec for those of us who are speaker/author/newbie bloggers who work with client with social media-based businesses?
Posted by: KareAnderson | March 05, 2009 at 09:50 AM
Hi! Sure, try Wikinomics (http://www.amazon.com/Wikinomics-Mass-Collaboration-Changes-Everything/dp/1591841933/ref=sr_1_1?ie=UTF8&s=books&qid=1236295063&sr=1-1)and Linked (http://www.amazon.com/Linked-Everything-Connected-Else-Means/dp/0452284392). Happy reading!
Posted by: Ryan | March 05, 2009 at 03:18 PM