Users create tons of terrible content!
We must not say so. We'll say instead, users create content that ranges widely in terms of its degree of interest to other users.
I've been doing a ton of work on quality the past few weeks, in several different contexts. Consider the following challenges:
- Our design for a client's corporate group blog includes a prominent list of "Most Popular" posts. In this context, where the blog is fully integrated with the main site via primary navigation, we expect many non-regular blog readers to wander over to the blog landing page, and we want to increase the likelihood that they'll see something both compelling and fresh.
- Another client's technical support forums include thousands of posts, many of them addressing the same or similar issues, and only one of them providing the single best solution for a particular user's problem at a particular time. The most common handful of solutions address 90% of user's issues.
- A client's social media site uses a set of algorithms based on user behavior to rank user-created content objects, and the best are exposed on the home page of the site. However, their exposure on the home page garners them more attention than average, strengthening their ranking and producing static "best" lists. The rich get richer.
- A client's community site is wide-open, allowing posting on a range of subject areas. The home page of the site shows the most popular posts, but the popular posts have tended to be less than germane to the business rationale behind the site.
Let me take a step back, though, and talk a bit about the general approaches to organizing emergent user-generated content:
Emergent Content
Emergent content is content that shows up automatically, based on a set of rules. No single person decides what content is visible and what content isn't. Instead, content objects appear in various contexts based on attributes encoded in their metadata. Sometimes this means the good stuff shows up, sometimes it means stuff shows up based on similarity to other stuff, sometimes it means complentariness, and so on.
In the business, we call it "The Magic of the Web."
Algorithmic Approaches
In many contexts, especially on web sites with huge volumes of user-created content, quality is derived algorithmically, based on user behavior. And user behavior can be thought of as including explicit behavior like voting and rating and implicit behavior as well. For example, the number of people viewing a video object can reflect its quality--and beyond that, quality measures can include views of the entire video, comments on the video, time on page for the video, qualified page views, and more. And other measures can reflect negatively on the video's quality--incomplete views, time on page below a qualifying threshold, bounce rate, and more.
Bits and pieces of behavioral metadata stick to content objects--less like fingerprints than like the folded corners, weakened binding, marginalia, and coffee stains on an often-loaned book--and provide a basis for us to construct the rules that govern how the objects behave.
Algorithmic approaches to measuring quality take into account a combination of these measures in ways appropriate to the type of object and culture of the site. Which measures are more important (and how much more important) depend on the context of user needs and business requirements--every site is unique. "Quality" is quantifiable only with the right formula.
The most bad-ass example of algorithmically-derived quality is, no doubt, the Explore category on Flickr. Based on its concept of "Interestingness," Flickr finds photos that are interesting and shows them in views filterable by date, camera, location, tag, and tag cluster.
What's so amazing about the Flickr Explore feature is that it's constantly refreshed and the photos are always, indeed, interesting. Flickr has an advantage: Millions of content objects. Naturally, the best hundred are pretty darn good.
Editorial Approaches
As appealing as is the idea the bottom-up, self-organizing content architecture, editorial participation has a key role to play in many branded online communities. First and foremost, an editorial presence can provide a human face for the company, a point of social contact between the community and the brand. An editorial presence can also model appropriate behavior, highlight the kind of contributions that best serve the needs of the community, and speak for the brand without being perceived as a constraining or intervening voice. It just has to be done with a delicate touch.
A while back I wrote about a quality problem on the social media site Treemo. Since I wrote that post, Treemo has rearchitected its home page, which used to highlight popular content, to feature editorially-selected content. The result is a much stronger affordance on the home page for content that aligns with the site's mission. Over time, that affordance will help improve the overall quality of content on the site--though Treemo's too-simplistic separate display of most viewed, favorited, and commented still render horrors upon the unsuspecting eye.
Hybrid Approaches
Combination approaches can offer the benefits of both algorithmic and editorial quality measures. Typically, an initial level of quality valuation emerges from user behavior, qualifying a top tier of content for editorial review.
A great example of the hybrid approach is JPGmag.com. JPG is a social media site, which announces a theme for a period of time and invites users to submit photos related to the theme. Users view photos one at a time and vote whether each photos is "good for (theme)." The experience is compelling, partly because the emergence of photos is affected by the votes of other users, so the photos that aren't voted up tend to disappear quickly. On the surface, the interaction is extremely simple. But there's some good, solid magic going on behind the scenes.
Photos that rise to the top based on user voting are qualified for review by the editorial panel, which selects photos for inclusion in the theme-based print edition of the magazine.
Beyond the Top-10 List
Ranking content objects based on quality makes possible some powerful uses of quality content--including but greatly improving on the typical idea of the "top 10" list. Quality can be superimposed on other metadata to produce content presentation that are both highly faceted and of high quality. For example, you can search Flickr for photos tagged "gorilla" within a geographic range limited to Rwanda, then sort the results by interestingness, and even filter by the type of camera you own. You can see the best photos of Rwanda's gorillas taken with your particular camera model, and see how your own adventure travel pictures measure up.
Amazon's customer reviews page is another great example. Customer-written product reviews are ranked based on a simple "usefulness" vote by users. Amazon highlights the most helpful reviews above and below a rating threshold. The most helpful favorable and unfavorable reviews appear side by side, an instant conversation that socializes the shopping experience, adds credibility to the buying process, and creates a unique value for online shoppers.
Quality in Action
All of the challenges I described earlier require a solution with some kind of quality measure. Over the past three or four years, I've found that my information architecture work is less about labeling and organizing content and more about creating mechanisms to apply descriptors and rules to govern emergence. As user-created content becomes more important, the IA work happens at a remove. It also feels more critical to get it right and to design flexible systems that can adapt to unpredictable user behavior.
Likewise, new possibilities to create value have emerged--and quality is just the beginning. Let me know what you think, and stay tuned for more posts on this topic.




Great to see this kind of summary of your work in this area--lots of interesting observations in there.
One comment on your IA comment at the end--you said:
"I've found that my information architecture work is less about labeling and organizing content and more about creating mechanisms to apply descriptors and rules to govern emergence."
Unless you have a totally fixed (e.g., retrospective) set of things you are working with in a system, labeling and organizing could itself be described as another technique for creating mechanisms and rules to govern and/or influence the pattern of emergence.
For example, from when I worked in a library, I wouldn't say that the catalog was less emergent than Flickr tags. Rather, I'd say that the catalog's pattern of emergence worked at a slower pace.
So, for example, given this more limited pattern of change, one could manually create an info organization for "interesting" that might keep pace with the amount of interesting stuff being added to the catalog. But, really, even then, the same issues would come up--there was a need for higher level mechanisms and rules through which one could govern without item-level control.
(I'm pretty sure I've showed you this before--maybe of interest, a presentation proposal I wrote: Between Cathedrals and Bazaars: Complementary Architectures for Control and Freedom of Information - http://tinyurl.com/2ryl7j )
Posted by: Jay Fienberg | December 11, 2007 at 06:19 PM
Jay kicked knowledge thus:
"Unless you have a totally fixed (e.g., retrospective) set of things you are working with in a system, labeling and organizing could itself be described as another technique for creating mechanisms and rules to govern and/or influence the pattern of emergence."
In the abstract sense, I can't agree more. Information systems of all kinds are fundamentally emergent. In the concrete sense of scoped web sites, which rarely are as dynamic in the fast-motion knowledge domain of the web as real libraries are in the slow-motion knowledge domain of the real world, though, I feel more inclined toward a basic confusion, not so different from satisfaction, that says the little I know is, finally, of some use, whether I understand it "broadly enough" or not.
Is there something different about user-generated content on the web(s), or am I kidding myself? Maybe what I'm paying attention to is the prevalence of content that is "of limited interest," i.e. personal, targeted, tangential even--relative to the zone of alignment between marketing goals and customer value, that appears on the web.
Thanks Jay. Sorry for teasing you so much in the process of conceding your point. Honestly, it's because I really love the stuff you say and hope you keep saying more of it.
rt
Posted by: Ryan | December 11, 2007 at 08:12 PM
"Is there something different about user-generated content on the web(s)...?"
I was going to explain things, but I couldn't figure out a phrase to describe the "users" who generate the content that isn't "user-generated content". What do you call them?
Anyway, part of what I was wondering was: when you start thinking about mechanisms and rules for structuring content, whether "user-generated" or that traditional kind, I think the types of rules are actually not that different. But, they do need to be effective at different speeds (and, times).
Posted by: Jay Fienberg | December 13, 2007 at 01:15 PM
Yep, that's a great point. I also think the inputs for those mechanisms are different, in that they're artifacts of the objects' usage. So maybe it's not who's "generating" the object that matters, it's who's consuming, describing, evaluating, etc., the object.
I suspect also part of the speed issue you mention at some point becomes an issue of rote vs. spontaneous architectures.
And yeah, we all hate the phrase "user-generated content." I don't have anything better, though I do try to say participant rather than user.
Thanks for the comments, Jay. Always thought-provoking.
Posted by: Ryan | December 13, 2007 at 03:38 PM