TCS Daily

Content Is Crap

By Arnold Kling - January 13, 2003 12:00 AM

"So we also can be grateful this week for the launch Monday of some smart software from a Stanford University-based organization called Creative Commons" --Dan Gillmor

Creative Commons is an Internet service founded by Lawrence Lessig, a lawyer and author specializing in Internet issues. It allows the creator of music, text, or software to put a tag on the content that specifies terms of use.

While there are many Net-heads who share Dan Gillmor's enthusiasm for Creative Commons, I do not. It has little or no significance, because it is based on a strikingly naive 60's-retro ideological view of how content intermediaries function. The Commons enthusiasts believe that content publishers earn their profits by using copyright law to steal content from its creators and charge extortionary prices to consumers.

In contrast, I believe that it is important to recognize that publishers perform a valid economic function of filtering content and effectively distributing and selling it to consumers. Today's media companies deserve plenty of contempt, as I have argued many times - see here or here or here. However, although we can get along without today's publishers, we cannot get along without the function that they perform.

The Sewage Treatment Problem

The public water system is somewhat unpleasant to think about. Basically, the stuff you flush down the toilet gets sent through a filtering system. That system "treats" the sewage until what remains is sufficiently pure to send back to you as drinking water.

As content intermediaries, publishers perform an analogous function. Individual software writers, authors, and musicians produce something close to raw sewage. The computer programs, books, and music that people buy are closer to drinkable water.

What Creative Commons lets you do as an author is label your stuff before you flush it down the toilet. If you don't want the sewage treatment plant to filter your stuff and sell the water on its usual terms, Creative Commons lets you have your way. If you think that publishers are stealing your crap, you can stop them.

Bayesian Intermediaries

In reality, publishers are adding value, not simply stealing. They add value by filtering out content that people do not want and by having established mechanisms for collecting revenue and distributing royalties to authors. A really meaningful attack on the publishing industry has to perform those functions and do so more efficiently than the incumbent firms.

I am optimistic that the Internet provides the infrastructure that allows for new, more efficient forms of content intermediation. However, the key is to solve the problems of filtering and revenue generation. In other words, new intermediaries must add significant value and find a way to charge for it.

The Net has spawned many alternative publishing models, but they do not add enough value. My guess is that the major gains in value added will come from the implementation of what are called Bayesian filters.

Bayesian filters are under development for addressing the problem of spam. If you accept the assumption that raw content is like raw sewage, then it becomes clear that the challenge of adding value in content publishing is like the problem of filtering spam. You need to let the right messages in, but find a way to keep out the junk.

A Bayesian filter differs from an ordinary rule-based keyword filter in two ways. One way is that the Bayesian filter uses flexible weights on keywords rather than rules. The other way is that a Bayesian filter can be "trained" by me as an individual to filter my own mail, based on how I sort mail myself.

For example, I have a friend in Philadelphia who sends me personal email and also forwards along two types of emails - commentary on the Middle East and jokes. When I get email from this friend, I examine the header to try to figure out which type of email it is. I want to read the personal mail and the Middle East commentary, but I want to delete the jokes without reading.

It is almost impossible to use a rule-based filter to sort through my friend's mail. Obviously, I cannot implement a rule that says "treat all mail from this sender as spam." On the other hand a Bayesian filter, which learns to weigh various factors in the email, could sort the email effectively. In fact, the Bayesian filter might be less crude than my personal filtering. I am reading some uninteresting Middle East commentary and I may be missing some good jokes, and a Bayesian filter could be subtle enough to detect the good jokes and filter out the uninteresting commentary.

I believe that weblogs serve a filtering function. However, this function could be enhanced by Bayesian filters. As much as I like Asymmetrical Information and Brad DeLong's Semi-Daily Journal, they contain much material that I do not care to read. Meanwhile, I miss interesting items on other blogs. If I could train a Bayesian filter, I could use my blog-reading time more efficiently.

Even Googling could be enhanced by Bayesian filters. Everybody who uses Google has different strategies for entering keywords and different reactions to the results. It is easy for two people to enter the same keywords and be seeking different things. If Google used Bayesian filters to learn more about how I use its search engine, it could present me with more relevant results.

Economics vs. Ideology

If you looking for a technology to bet on in content intermediation, I would bet on Bayesian filtering. I would not bet on Creative Commons.

Creative Commons is based on a naive ideology that believes that raw content is gold, which then gets stolen by the evil media companies. In reality, the economics of content are that most of the value-added comes from the filtering process, not the creation process. If you want to overthrow incumbent publishers with Internet-based alternatives, you are better off starting from the assumption that Content is Crap.

TCS Daily Archives