Participation Inequality on the Web

Jacob Nielsen, in his usual stellar style, has published an excellent article, on participation inequality in Internet communities:

User participation often more or less follows a 90-9-1 rule:

  • 90% of users are lurkers (i.e., read or observe, but don’t contribute).
  • 9% of users contribute from time to time, but other priorities dominate their time.
  • 1% of users participate a lot and account for most contributions: it can seem as if they don’t have lives because they often post just minutes after whatever event they’re commenting on occurs.


There are about 1.1 billion Internet users, yet only 55 million users (5%) have weblogs according to Technorati. Worse, there are only 1.6 million postings per day; because some people post multiple times per day, only 0.1% of users post daily.

Here is a shameless plug for our older post of the size of blogosphere (Jacob Nielsen’s numbers match pretty well with our numbers).

Some of the participation inequality is driven by inherent human nature. As a result even physical communities display lop-sided participation characteristics. Internet, though, I think exacerbates this problem. Following are some of the reasons:

  • People are inherently selfish. Contributing to a community does not come naturally to most people unless there is a reward associated with contributing. The mechanisms for providing reward for participation are largely missing from the web at this point in time.
  • The default substrate for interactions on the Internet is anonymity. It takes an extra effort to get them to drop the cloak of anonymity and express their opinions. This happens only when they feel really-strongly about the topic of discussion.
  • Bad UI design that discourages users from participating. One of the most annoying issues here is requiring users to register before they can leave a comment.
  • Read-only web interactions used to be the norm for most web 1.0 interactions. We are just starting to focus on the community participation and user generated content etc. and I suspect over time these statistics will change for lesser inequality.

Jacob Nielsen focuses on the effects of participation inequality on the web:

  • Customer feedback. If your company looks to Web postings for customer feedback on its products and services, you’re getting an unrepresentative sample.
  • Reviews. Similarly, if you’re a consumer trying to find out which restaurant to patronize or what books to buy, online reviews represent only a tiny minority of the people who have experiences with those products and services.
  • Politics. If a party nominates a candidate supported by the “netroots,” it will almost certainly lose because such candidates’ positions will be too extreme to appeal to mainstream voters. Postings on political blogs come from less than 0.1% of voters, most of whom are hardcore leftists (for Democrats) or rightists (for Republicans).
  • Search. Search engine results pages (SERP) are mainly sorted based on how many other sites link to each destination. When 0.1% of users do most of the linking, we risk having search relevance get ever more out of whack with what’s useful for the remaining 99.9% of users. Search engines need to rely more on behavioral data gathered across samples that better represent users, which is why they are building Internet access services.
  • Signal-to-noise ratio. Discussion groups drown in flames and low-quality postings, making it hard to identify the gems. Many users stop reading comments because they don’t have time to wade through the swamp of postings from people with little to say.

He also goes into what can be done to address the situation somewhat:

  • Make it easier to contribute. The lower the overhead, the more people will jump through the hoop. For example, Netflix lets users rate movies by clicking a star rating, which is much easier than writing a natural-language review.
  • Make participation a side effect. Even better, let users participate with zero effort by making their contributions a side effect of something else they’re doing. For example, Amazon’s “people who bought this book, bought these other books” recommendations are a side effect of people buying books. You don’t have to do anything special to have your book preferences entered into the system. Will Hill coined the term read wear for this type of effect: the simple activity of reading (or using) something will “wear” it down and thus leave its marks — just like a cookbook will automatically fall open to the recipe you prepare the most.
  • Edit, don’t create. Let users build their contributions by modifying existing templates rather than creating complete entities from scratch. Editing a template is more enticing and has a gentler learning curve than facing the horror of a blank page. In avatar-based systems like Second Life, for example, most users modify standard-issue avatars rather than create their own.
  • Reward — but don’t over-reward — participants. Rewarding people for contributing will help motivate users who have lives outside the Internet, and thus will broaden your participant base. Although money is always good, you can also give contributors preferential treatment (such as discounts or advance notice of new stuff), or even just put gold stars on their profiles. But don’t give too much to the most active participants, or you’ll simply encourage them to dominate the system even more.
  • Promote quality contributors. If you display all contributions equally, then people who post only when they have something important to say will be drowned out by the torrent of material from the hyperactive 1%. Instead, give extra prominence to good contributions and to contributions from people who’ve proven their value, as indicated by their reputation ranking.

I couldn’t agree with him more…Providing the right incentives to participants is the key to cracking this tough nut.

Jigsaw Data

Interesting article in the SF Chronicle about Jigsaw Data, a company with an unusual social network for buying and selling business contacts:

Here’s how Jigsaw works: You can pay a subscription of $25 per month to access the database or you can enter 25 contacts per month. Members get two contacts back for each one they enter. All information is entered anonymously.

It seems like they are expecting strong growth:

Since it started operations on Jan. 1, 2004, Jigsaw has amassed a database of 3 million contacts at 150,000 companies, and the company expects that to grow to 5 million by year’s end. Only 131 of its 105,000 members sell points, Fowler said. “Almost all trade data to get data.”

It just seems like the wrong way to build a social network. Getting your contact added to Jigsaw feels like a breach of trust…A typical way your contact can get into Jigsaw is – you send a communication to a professional contact, who in turn, sells your information to Jigsaw (similar to somebody selling your information to a spammer). Sure you can opt out if you like, but I would rather have an explicit opt-in mechanism.

Anyone, even nonmembers, can go to the site to see if they’re listed. If they are, they can set parameters for how they wish to be contacted. A person could even say: “Never contact me.” Fowler’s own guidelines tell people never to call his mobile phone, keep e-mails short and not pitch wealth management or other financial services.

I guess the basic assumption that Jigsaw is making is that all professionals, just because they are employed, are going to want to be contacted by other professionals. I personally can see the value of sharing my contact information with people in the same field. But I can imagine other people who will object. Also who is going to make sure that information is not misused? What is there to prevent Jigsaw from becoming a super data base for spammers?

There is also a valid question about the quality of the data, raised by Bob Blakley in his post on the subject along with his new business card (:-)):

BTW, check out if you are in Jigsaw here.

Social networks and the profit motive

The Blog Herald had an interesting review called Diggs for sale in organized fashion

User/Submitter is a new service that connects publishers with diggers. Have a story? Then hand over the cash and User/Submitters’ dig users will digg it for you – and you might even end up on the frontpage of Digg. That’s serious stuff, The Blog Herald knows that.

Publishers get to pay $20 and an additional $1 per dig, and digg users can get paid $0.50 for every 5 stories they digg. It seems real enough but I don’t really know, we’ll find out soon I’d reckon since the blogosphere tends to flush out the frauds.

Similarly there was another startup called PayPerPost profiled at TechCrunch.

The service is a marketplace for advertisers to pay bloggers to write about products for a fee. Commenters to our original post were polarized into those violently for and those againt the product. The key area of controversy is the fact that advertisers can mandate that posts be positive on the product, and disclosure of payment is optional for the blogger (screen shot at end of post shows sample available writing opportunities).

The main issue here is how can we incorporate the profit motive in the social networks? If a social network becomes useful enough, somebody will try to make money from it. Take comment spam as an example…No sooner did the blogs gain some popularity, spammers were creating splogs and comment spam to make money from them. So what can be done about it?

I think the answer is not a whole lot can be done to completely eliminate people from trying to make money from social networks. The issue is that in physical communities, in all interactions, users have to identify themselves. In online communities, it’s easier for users to participate without the constraint of location or without even identifying themselves. The communities pay for this ease of use and participation in terms of enabling various profit driven actors. There are still a few simple things that can be done with the design of the communities to be more effective in handling profit driven participants:

  • Community oriented monitoring of content
  • Incentivize positive participation
  • Penalize negative participation

This is not going to eliminate the profit motive from the social networks but it will help communities be more effective dealing with the issue. Slashdot did all these things and as a result is a whole lot more spam-proof than Digg. Also as expected, Slashdot pays a price for its sophisticated community design, in terms of more sophisticated user interaction model that reduces ease of use.

User Owned Product Preferences

Interesting piece from Kim Cameron and Dave Winer of the scripting news

Doc talks about a Vendor Management Systems, to balance the other side’s Customer Management Systems. I, of course, like. A prototype for this is a movie review system where I own and control my data. Today, I rate movies on Netflix and Yahoo, but I can’t get them to share the data with each other, so they make recommendations without info the other one has. If I had a place where I kept my movie ratings and gave each of them a pointer to it, they could read it and I would control the data. It would be very easy to set up, the technology is no trick at all. The hard part is getting enough users to do it this way to gain critical mass. This is also the idea behind Edgeio and Marc Canter’s People Aggregator. Open systems, users own the data, silos smell of sulfur.

The user’s ownership of her or her own data sounds like a great idea. It will work great for a number of interactions, where the services provided by the vendors are commoditized and the users preferences are well established. E.g. travel preferences (E.g what kind of seat you prefer (Aisle), what airlines etc.), on-line shopping for clothes, shoes (sizes, colors, cuts), movie rentals, computer upgrades etc. This would lead to user advertised needs rather then suppliers generated demand…

Doc Searls frames the problem:

We need to serve market (not marketing) relationships that arise from decisions customers have already made to buy something. They have money in hand, and the intention to book a hotel, rent a car, buy a basketball backboard. Whatever they want, marketing’s job is done. Sales needs to show up now. But how? That’s the question. And the answers that work can’t come from the sell side. We need new means to the buyer’s ends, coming from the buyer’s side.
What I want is for vendors in an open and free market (not a proprietary silo like eBay or Amazon or Travelocity or some other intermediator with a walled garden) to respond to the intentions (or gestures, or expressions, or whatever) of the customer. On customers’ terms. I want to turn the tables on the lame customer management systems every big vendor has, and which have no idea how to relate. Especially to humans who would rather not be “managed”, thank you.
To be fair, until now the full burden of customer relationship management fell on vendors. They had no choice about being lame, because they had to relate to everybody, and to limit the variables involved. I want to change that, from the customer’s side, with a Vendor Management System under the customer’s control that is so richly useful, and capable, that vendors have no choice but to relate to it — on customer terms that will prove mutually beneficial out the wazoo.

Doc’s model sounds great for fulfilling demand for simple products, but how do businesses make money in such an environment? Typically businesses want to create product/service differentiations so that they can charge a premium. How will complex products, where multiple trade-offs are involved, be sold? And how did the demand get created in the first place? My guess would be that some business invested money for creating the demand. Would they have invested the money if they did not expect to be able to recoup that investment via higher margins and premium pricing for their unique features? And without the marketing arguments for new products that can improve people’s lives, would we have the Internet or even the telephone?

Yahoo! Single Sign On APIs Release

Yahoo! has just released a new set of APIs for enabling SSO on the web. TechCrunch has a good review.

There are two pieces to BBAuth. The first is a single sign on tool to authenticate the user. The second piece is a set of APIs to get into specific Yahoo services and interact with user data. For example, the Yahoo Photos API allows other applications to, among other things, upload photos, tag photos, and modify titles and descriptions. Yahoo is also opening up Yahoo Mail through BBAuth.

I am not sure what data the 3rd party apps will be able to access from Yahoo!? E.g. will the 3rd party apps be able to access information related to Yahoo! Shopping? Will they be able to access user address information? What kind of systems will be put in place to validate the 3rd party application providers to ensure that the user information is not misused?

This move makes a lot of sense for Yahoo! as they get to be the central repository of all user information and to participate in a whole lot of transactions unrelated to their properties. I am not sure, though, that it makes sense for 3rd party apps or users. Typically users and 3rd party application providers are reluctant to have an intermediary, that does not add any direct value, in the middle of a transaction. There have been a few initiatives like this in the past – Microsoft Passport and Six part’s TypeKey to name a few. Such SSO initiative did not do very well in the past…let’s see how Yahoo!’s move fares?

Identity in the Blogosphere

The Blogosphere is exploding with activity. Driven by empowered users who are leveraging easy-to-use tools, blogging is transforming the publishing industry. CNN had an interesting article on the number of blogs in china.

BEIJING, China (Reuters) — The number of blog sites in China reached 34 million in August, a 30-fold increase from four years ago, state media said on Tuesday, despite a series of curbs on media and dissent.

China has more than 17 million people writing blogs (short for Web logs) and more than 75 million people reading them, Xinhua news agency said.

David Sifry of Technorati in his latest state of the blogosphere post says that they are tracking 50 million blogs and that the size of blogosphere is doubling every 6 months.

  • Technorati is now tracking over 50 Million Blogs.
  • The Blogosphere is over 100 times bigger than it was just 3 years ago.
  • Today, the blogosphere is doubling in size every 200 days, or about once every 6 and a half months.
  • From January 2004 until July 2006, the number of blogs that Technorati tracks has continued to double every 5-7 months.
  • About 175,000 new weblogs were created each day, which means that on average, there are more than 2 blogs created each second of each day.
  • About 8% of new blogs get past Technorati’s filters, even if it is only for a few hours or days.
  • About 70% of the pings Technorati receives are from known spam sources, but we drop them before we have to send out a spider to go and index the splog.
  • Total posting volume of the blogosphere continues to rise, showing about 1.6 Million postings per day, or about 18.6 posts per second.
  • This is about double the volume of about a year ago.

In the article David mentions that only 12% of the posts are in Chinese language compared to 41% for English. This means that they are under-counting the number of blogs in other languages than English. Also they are likely under-counting the blogs at community sites such at MySpace etc. I believe a more accurate picture is presented by the blog herald survey from February (its a bit dated but still very instructive…I hope they come up with another survey soon)

The good news: the blogosphere continues to boom. This month I estimate there to be 200 million blogs in existence.

The sums by country add up to approx 154 million blogs and by host 185 million, but this doesn’t take into account a pile of places + smaller hosts + self hosted blogs. Hence I’m calling the figure 200 million blogs.

Broken down by the hosts the data looks as follows (from Marketingfacts):

Broken down by country the breakdown looks as follows:

Overall my guess is that the number of blogs is well above 250 million (based on the review of the blog hearld sources to get updated count). Of these about 30% are spam blogs (some estimates are higher but taking into account all the estimates for hosting sites etc. this seems like a reasonable number) and another 35-40% are inactive blogs (refer to David Sifry’s post for inactive blog rates. The rates are probably lower for community blogs because of ease of use) in which users stopped posting after creating the blog. This leaves us about 80 million blogs. Taking into account the world Internet usage statistics, this means that about 4-8% (Assuming some users have multiple blogs) of the people on-line are blogging. To get a better understanding of who is creating these blogs, let’s break the blog usage in three different categories.

  1. Community Blogs: These are blogs created to participate in an existing community where blogging is the main method of communication. A number of times the authors of such blogs don’t even know that they are blogging. Example of such sites are MySpace, LiveJournal, MSN spaces , Xanga etc. These kinds of blogs make up majority of the blogosphere. MySpace has more then 100 million members now. I don’t thinkTechnorati indexes most of these blogs. One can argue that these should not even be considered blogs but if the they do enable users to publish information in an easy and democratic way I think they should be considered as blogs.
  2. Personal Blogs: These blogs are created by individuals to express a point of view and to interact with other like-minded bloggers in an open community. These are the kind of blogs like this one that are hosted by individuals or blog hosting sites like wordpress, Typepad etc. The community interaction on such blogs are less closed and the level of technical sophistication required to manage such a blog is a lot higher then the community blogs. Such blogs form a small part of the blogosphere (less then 15% would be my guess).
  3. Corporate Blogs: These blogs are created by companies to propagate or enhance the company positioning. They also serve to humanize the company (e.g. Microsoft blogs or Google employee blogs). Not many companies have created formal structures for blogging yet but they are likely to come down the line. These might include corporate hosting of blogs or universal branding etc. Such blogs form a small part of the blogosphere (less then 5% would be my guess).

Vertically speaking the blogosphere for non-community blogs can be broken down in 6 main areas:

  1. Technology
  2. Politics and world events
  3. Arts (celebrity, jokes, films, music, TV etc.)
  4. Sports (Don’t think this one is a huge segment yet)
  5. Personal
  6. All other

Identity in a community blog site is typically governed by the the community owner. They make each of the users sign up and provide some basic identity information that is shared.

In the personal and corporate blogosphere identity is a problem. Some hosting sites try to address the issue by requiring users to log-in and have a create a blog before they can interact with the blogs on their site but this does not help the identity situation if the users don’t post anything on their blogs. In reality for most interactions in personal and corporate blogosphere anonymity is the norm. This default of anonymity provides the wrong incentives for participation in communities and thereby messing up the quality of conversations.

BusinessWeek article on Click Fraud

Last week, Business Week had a great article on click fraud. The writers did a great job detailing the issues with the paid-to-click (PTA) businesses working with domain parking services to make click fraud happen. One additional angle I would have liked to see in the article is the angle on competitive click fraud. Competitive click fraud is when company A pays somebody to click on ads for company B, in order to drain company B of its resources. I am not sure if such an arrangement would even be illegal besides being difficult to prosecute.

After reading the article, I wanted to leave a comment at the BW site, but they have comment moderation turned on. So after leaving the comment I got a message saying that my comment will be reviewed by somebody in 24 hours…There isn’t much I hate more than having to wait 24 hours to get into a conversation. Anybody else had the same experience? On the other hand though, I guess BW has to be careful about the spammers. Also I guess being a established old business, they probably believe in erring on the side of caution then free flowing discussions. I am not really upset with BW as this is an issue facing most established brands…short of moderating/censoring the discussion there really isn’t a way to ensure a good quality of discussion.

Taking passwords to the grave

Interesting article on CNET related to the issues with estate planning in the on-line world. The problem is getting messier with users moving a lot of their financial and organizational information on to the web. One of the suggestions in the article is that on-line users, add all their passwords and account information to their estate plans. This really does not work if it means that you need to update your estate plan, every time you create a new account or change a password. The problem is worse if users are trying out new sites, especially with so many cool services like GMail, GCal or stock trading companies coming on-line. What we need is a universal mechanism to store all the passwords and other on-line identities in a central location. The access to this central location is what should be passed on, in a structured manner. There are systems like Inforcards, OpenID, SXIP that provide these services…let’s just hope that these systems see strong adoption.

Privacy and Social networks

There was a time a few years back where privacy was a huge issue on the web. Consumer advocates were up in arms about companies not guarding customer’s information or even selling it to other companies. The apparent issue with that was that companies and spammers will use that information to steal the identity of customers and thereby cause them financial harm or send unsolicited communications.

With the advent of social networks, things seem to have changed.

  • There are now 110 million profiles on MySpace.
  • There are millions of users sharing their deepest thoughts on YouTube.
  • There are over 60 million blogs where users are publishing their thoughts and at times their identification information like email and address.

A lot of the information on these sites makes the job of spammers/companies a lot easier. In addition to providing contact information, this user generated content also provides a great deal of information which can be used by spammers/companies to better target their offers. The strange thing, though, is that the users creating this content do not seem to care. What is going on?

I think what is going on is simple utility optimization. As my economics 101 professor would have said – the utility the users are deriving from participating in these communities is greater then any downside in terms of privacy. Another reason could be that users of these social sites are sharing non-transactional information (as opposed to transaction information like credit cards, SSN etc.) that cannot be used easily to cause financial harm. My guess is that in the busy world that we live in, people are staved for attention. As a result, users in these social sites might actually welcome targeted offers or communications from people who take the time to read through all the information they have published. For social network users this is a way to fulfill the basic need of connecting with other humans. In this sense, the social networks are replacing the real world communities and relationships. It could also be that the tools available right now makes it hard to limit the access to the information to a smaller community. I guess that is what SixApart is trying to address with their new VOX platform.

The big problem with the current system is that information in online communities makes the users a lot more vulnerable compared to real world communities. The reason is that in online communities all information is logged and is available to all seeing eyes of Google and Technorati for perpetuity. See the interesting post from Eric Nolin on the topic (he defines cool sounding “Nolin’s maxim”). Thoughts?