You can use RSS to easily follow a few high-profile websites and link sharing services like Slashdot or Digg to discover popular web content. But that’s like reading a classic newspaper and some magazines: The information provided may have a higher chance of being relevant to you, but there’s still a lot of noise that wastes your time.
In this article, I’ll discuss the shortcomings of link sharing services using Dzone as an example. Dzone is a relatively small-scale service targeted at software developers and one of my most important sources of information.
On Dzone, users publish links along with a short description and a few tags from a limited set of ca. 50 predefined tags. Users rate entries using "up"/"down" votes and popular entries appear on the front page. Unlike with Slashdot, Reddit, or Digg, a web page being linked from Dzone typically receives just a few hundred, sometimes a few thousand page views. Because of this, there is little incentive for content producers to game the system. That’s one of the reasons I like Dzone.
Limiting tag choice is an unusual thing in Web 2.0 sites but it simplifies automatic processing (see my article on Yahoo! Pipes). A taxonomy would be more appropriate and more powerful but you can reverse-engineer one based on the tags.
There are two interesting metrics connected to each entry, where only the first one is supported by Dzone:
- Quality of the entry (aka popularity)
- Relevancy of the entry to me
Quality is determined by user votes which works reasonably well for Dzone. I have seen little abuse, but as in many Web 2.0 sites (see MusicBrainz, for example), there is little incentive to vote. Typically, there is a dedicated minority of users who contribute most of the votes. You often try to create incentive by building karma systems ("top voters of the week" etc.). Sometimes this works, most of the time it doesn’t.
The second metric, relevancy, is more interesting. It is a completely subjective measure and thus doesn’t work well with the "wisdom of the crowd" approach. To get around this, some link sharing sites build focused sub-communities ("programming", "politics", "cat pictures"). This is mostly a workaround though. What you really want is personalization: I expect the system to present entries that match my interests and are thus relevant to me.
In Dzone, you can subscribe to tag feeds ("java", "web design", etc.), but if you subscribe to multiple feeds you end up with duplicates because one entry may be present on more than one feed. Dzone currently doesn’t offer aggregated feeds based on user-selected tags. Because of this I’m using Yahoo! Pipes to filter the front page feed. This leaves me with ca. 40% of the original stories from which more than half are still irrelevant.
While user-defined feeds would be a big step forward, there is only one real solution: The system has to learn from my actions (clicks and votes) and present only those entries that are interesting to me personally. A system like this is called a Recommender System. Basically, there are content-based recommender systems that select entries (called items in the literature) based on my user profile and properties of the entries. And there are collaborative filtering systems that recommend items based on what users with a similar profile found interesting. Recommender systems is an active research topic. See this survey paper for a good overview.
While collaborative filtering is more difficult to scale and doesn’t handle item churn well (see the Google News paper for details), content-based recommenders are a lot easier to scale. However, content-based schemes often suffer from over-specialization and can be more difficult to implement.
Recommender systems have been in use on the web for more than a decade, so it’s surprising that none of the popular link sharing services has implemented personalization features yet. The first service to offer this would likely have an advantage over its competitors. My time and attention is limited, so the other services would quickly become irrelevant. At least to me.