How do I know you are who you say you are?

There have been some new identity verification services trying to bring some order to the wild world of Internet.

You can read my take over on the Read/WriteWeb (Thanks Richard, for the great job editing and laying out the piece!).

Citizendium

Citizendium, is the new Wiki project, started by Larry Sanger, one of the co-founders of Wikipedia. They define the project as follows:

The Citizendium (sit-ih-ZEN-dee-um), a “citizens’ compendium of everything,” is an experimental new wiki project. The project, started by a founder of Wikipedia, aims to improve on the Wikipedia model by adding “gentle expert oversight” and requiring contributors to use their real names. It has taken on a life of its own and will, perhaps, become the flagship of a new set of responsibly-managed free knowledge projects. We avoid calling it an “encyclopedia” until the project’s editors feel comfortable putting their reputations behind this description.

With the new rules about the editors having to use their real name and with “gentle oversight”, Citizendium is trying to ensure that the community members have the right incentives. I guess the hope is that if people identifying themselves with their real names, they are less likely to vandalize or edit content for short term gratifications. The goal is to avoid cases like the one Stephen Colbert caused, when he talked about Wikipedia in one of his shows.

This is a noble goal and these new policies will no doubt help create a better community environment. The question, though, still remains about how Citizendium will ensure that all editors indeed use their real name while editing content. Right now they seem to be asking for individuals to use their real name as a part of their user agreement, but what is somebody still uses a fake name? I am sure if the Citizendium generates enough credible content, there are going to be incentives for people to use fake identities to edit content. One way could be to heap shame and scorn on any user breaking the terms of service but again if somebody does break their terms of service, how are they going to know what the real identity of the person is? Also, if they get enough critical mass, I am sure they will be open to attacks from payperpost like services intent on modifying certain content for financial gain.

So overall, I think this is a step in the right direction, I am not sure this is going to be enough to generate a 100% reliable body of knowledge. Let’s see how it develops.

How many identities do you have?

I have been thinking about the differences in teenage and adult behavior in the sphere or creating and managing identities.

You can read my take over on the Read/WriteWeb (Thanks Richard, for the great job editing and laying out the piece!).

Agent Smith Effect

(For those of you, who haven’t seen the “Matrix” movies, Agent Smith was one of the protagonists of Neo. His special ability, after a mutation, was that he could create as many copies of himself as he liked. Also, you really need to watch the Matrix movies…I promise they’ll blow your mind.)

Bill Thompson, the regular columnist at BBC had an interesting piece about how young people use identities in social media.

…young people who forget their MySpace logins are just as likely to make a new account as fret over their lost friends or painstakingly constructed homepage decorations.

Recent work by US-based social media researcher Danah Boyd, one of the more astute observers of network behaviour, indicates that it is a more general attitude.

Her observations of young net users have led her to believe that “many teens are content (if not happy) to start over with most of their accounts in most places”, and she has noted that for young people an online profile is “not seen as something to build an extensive identity around, but something to use to talk to friends in the moment”.

Now this is very different from what I do. I have had the same myYahoo account for 7 years and have only made minor tweaks to the layout over that period. I hate losing access to an account that I created, as I think I am losing a part of myself by abandoning an account. As such, I try and get the same login name and password for all my accounts, to ensure that I can remember and maintain access to them.

…but perhaps teenagers, experimenting with their identity in relationships, clothing styles and all other aspects of life are simply extending this playfulness to the virtual realm.

Not all young users are casual about their online identity, of course, and Boyd is at pains to point out that many young people invest heavily in aspects of their online activities. However, the willingness to abandon a profile as a work-in-progress and start over is definitely something I’ve observed in my children and their friends.

This approach to online identity has a number of implications for anyone trying to understand the way the internet is growing, and also carries an important lesson for those trying to build services or make money out of them.

One positive aspect is that it will make it harder to pin online activity onto a real person, since accounts that are created and quickly discarded will contain fewer identifying details.

…

More importantly, this casualness clearly renders any statistics about the number of signed-up users effectively meaningless, and this could be a big problem for the sites themselves as companies vie for investment and point to sign-ups as an indicator of popularity and future success.

Commentator Clay Shirky has been waging a campaign against the sloppy journalism of those who quote Linden Labs figures for Second Life “residents”. He points out that many happily accept the headline figure of two million users without considering that only 36,000 of those are paid-for accounts while a high but indeterminate proportion of the remainder are inactive, set up for free by people who tried out the service and then moved on.

It is the same with MySpace, Bebo or any of the other social sites, of course, and shows how poor we are at measuring what really goes on online.

Websites, having struggled for years to adapt to the idea of the pageview instead of the server request as the key measure of site activity, are now building interactive pages that occupy user attention and time but don’t generate hits or page views – and they don’t know how to measure this usage. Now it seems that the millions of signups on MySpace, Bebo and the other social network sites could be the same set of forgetful teenagers coming back again.

And again.

This is an interesting observation and adds another argument to the need for better metrics to measure the value of online interactions. We have provided some ideas on the topic here, here and here. Danah Boyd, a PhD student at UC Berkely, explains this phenomenon as follows:

Adults often worry about the amount of time that youth spend online, arguing that the digital does not replace the physical. Most teens would agree. It is not the technology that encourages youth to spend time online – it’s the lack of mobility and access to youth space where they can hang out uninterrupted.

…

Teens have increasingly less access to public space. Classic 1950s hang out locations like the roller rink and burger joint are disappearing while malls and 7/11s are banning teens unaccompanied by parents. Hanging out around the neighborhood or in the woods has been deemed unsafe for fear of predators, drug dealers and abductors. Teens who go home after school while their parents are still working are expected to stay home and teens are mostly allowed to only gather at friends’ homes when their parents are present.

Additionally, structured activities in controlled spaces are on the rise. After school activities, sports, and jobs are typical across all socio-economic classes and many teens are in controlled spaces from dawn till dusk. They are running ragged without any time to simply chill amongst friends.

By going virtual, digital technologies allow youth to (re)create private and public youth space while physically in controlled spaces. IM serves as a private space while MySpace provide a public component. Online, youth can build the environments that support youth socialization.

So multiple throw away identities is another manifestation of teenagers experimenting with new looks etc. Another question that comes to mind …Is the phenomenon of throwaway identities only limited to teenagers? One place to look for answer is the blogosphere. On serious blogs, commenters can leave comments under any name they like. Now I have always used my own name while leaving a comment. But does anybody have stats on how many people use fake or context sensitive names (like using a name ILOVEAPPLE while leaving a comment positive to Apple) ? My guess is that the use of fake identities is a lot less prevalent in serious blogosphere compared to other teenage oriented social media.

I would imagine that as these teenagers mature and settle on an identity, they are comfortable with, they will focus on building their reputation around that identity. Now, a lack of a mechanism by which users in blogosphere and other social media can build a reputation around their identities, might be contributing to proliferation of these throwaway identities. May be, all we are lacking are incentives to participants in online social media, to maintain the same identity. What do you think?

Attributor

Interesting piece in the WSJ about a start-up in redwood city, called Attributor.

Attributor analyzes the content of clients, who could range from individuals to big media companies, using a technique known as “digital fingerprinting,” which determines unique and identifying characteristics of content. It uses these digital fingerprints to search its index of the Web for the content. The company claims to be able to spot a customer’s content based on the appearance of as little as a few sentences of text or a few seconds of audio or video. It will provide customers with alerts and a dashboard of identified uses of their content on the Web and the context in which it is used.

The company is looking to ensure that all content reproduction or other uses are properly attributed and paid for. This sounds fantastic…although a bit hard to believe. I guess the content fingerprint is based on analysis of each piece of content and match is based on matching unattributed pieces to the original pieces via a search. What happens if the somebody just modifies a verb in the unattributed content? Would Attributor be able to catch that? May be they just do a statistical analysis of the content? I am really curious to find out more.

Wouldn’t a writer fingerprint be a better and more workable idea? I can imagine a writer having a fingerprint in terms of favorite words, syntax etc. Such a system will be able to handle situation such as one being discussed by Valleywag and VentureBeat. Also such a system could be very useful in social media as a way to establish identity of a user.

10 Minute Mail

10 Minute Mail is a new service for creating temporary email addresses. These addresses can be used for registering on sites that require users to provide an email address. The goal is to to rid users of a lot of unsolicited spam emails. Chris Null from Yahoo! has a review of the service:

Well here’s a brain-dead simple solution to the problem: 10 Minute Mail (Note: Web traffic from this story may be causing the 10 Minute Mail site to crash. If it doesn’t load, try it again later.), which provides, for free, exactly what is promised in the name: An email address that vanishes after 10 minutes. There’s no registration, no verification. Just click over to the site and hit “Get my 10 Minute Mail e-mail address.” You’ll instantly be given an address that ceases to exist after 10 minutes. You can then use this address in filling out web forms or whatnot, and a very simple web-based interface gives you full access to any mail the account receives. You can reply to any messages, but you can’t send mail to an account that hasn’t already emailed you. If you can’t get the job done in 10 minutes, you can reset the timer to 10 minutes at any time. There’s no need to login, no password to remember.

For safe surfing and spam avoidance, I haven’t found a simpler, more elegant solution than 10 Minute Mail. It works flawlessly and couldn’t be easier to use. It’s earned a place in my Favorites folder. Give it a spin and see what you think!

I can see this being useful when you want to register for some event or something but you don’t want to receive any follow on emails…Typically, though, most users (including me) have an email address just for the purpose of registering for services that could send spam emails.

Now, what happens if a site requires users to give a valid email address, as part of their term of service (TOS). Isn’t using 10minutemail generated addresses a violation of such terms? Also all the emails that this service generates are from domain 10minutemail.com…Couldn’t the sites that are asking for user email address just reject emails with 10minutemail.com domain, as part of email validation?

Overall, it just seems like a wrong solution to the problem. The real solution is to punish businesses or service providers that spam their users by signing out or boycotting them. Trying to fake one’s identity to avoid potential spam mail, just does not seem like the right way to address this issue.

Anti Click Fraud

I did a post on click fraud some time back but I want to get back to the topic. Last week, VentureBeat profiled ClickFacts, a company focused on providing more accurate information to brand-owners about who clicks on their on-line ads.

ClickFacts offers a Web-hosted product. It doesn’t manually track the IP addresses of the people who are making the clicks, like many other click-fraud companies do; that doesn’t suffice anymore, because fraudsters are more sophisticated, using programs that cover up their IP addresses. Instead, ClickFacts has developed an analytics program that looks at other variables.

–It looks at customer’s keywords and analyzes which ones have a higher propensity for click-fraud.
–It compares the sites of competing companies, and shows which IP addresses are hitting each of those sites, and tries to assess patterns, such as whether the timing of clicks appear programmed, for example hitting every 3.4 seconds. But it’s also measuring traffic anomolies beyond IP address patterns, to test for “proxy” and others fraud sources.
–It tests for group activity, in case several publishing sites or other people have colluded to click on each other’s sites or other sites.
–It shows which Web sites are consistently seeing lots of clicks on an advertisement, but where those clicks aren’t leading to continued activity within an advertiser’s site.

This is somewhat different from what some of their competitors are doing…

An incumbent player in this area is Optimal iQ, which is owned by ClickForensics, but its approach relies more on reading logs.

Its still not clear though, what brand-owners can do with the data generated by these companies? Can they go back to Google and as for a refund (Apparently Google and Yahoo! only refund 1% of your advertising $$ spend)? As expected Google has come out critically against ClickFacts and its VentureBeat Profile (posted as updates to the story).

Update: Google has responded to this here and here, sending some reports arguing that ClickFacts has counted as fraudulent clicks some clicks that appear benign. It has to do with when users use the back arrow after clicking through to a advertisers page: When you click on a Kodak ad, for example, and land on the Kodak page, and then click on a specific camera there, and then click back to go look at another camera, ClickFacts is wrongly counting that as a bad click.

Update II: ClickFacts has responded, in turn, saying it is surprised Google is bringing this up. ClickFacts fixed this “back arrow” problem in June, after Google first brought it to ClickFact’s attention, Caruso tells VentureBeat. In fact, he showed us a demo that strongly suggests to us ClickFacts is no longer counting those clicks as bad. He said Google’s reports above refer to an analysis of ClickFacts in February, and that Google knows ClickFacts has solved it, and so this should be a moot point. He said he wants to work with Google on other click-fraud matters, too.

The real issue here is that we need a good and universally accepted way to establish the identify of a user. Its a hard problem. Another manifestation of this problem is Digg with SpikeTheVote related issues…There might not be any quick fixes as this might require a very expensive infrastructure solution. Still the importance of solving this problem cannot be overstated. The future of on-line commercial and community activity may depend on it.

Anonymity is not privacy

I am quickly becoming a fan of Dave Kearns and his Identity Management Newsletter in Network World. Dave discusses complex identity related issues but manages to write in a very simple and easy to read style. In his latest installment Dave talks about the difference between privacy and anonymity

I’d like to begin a discussion on anonymity as it relates to identity and technology. As noted last month, anonymity and privacy are frequently confused. One difference though is that privacy is almost always absolute (either something is private or it is not) while anonymity can be relative. If you look up “anonymity” at answers.com, you’ll find some variations in definition:

* “The quality or state of being unknown or unacknowledged.” (The American Heritage Dictionary of the English Language, Fourth Edition)

* “The quality or state of being obscure.” (Roget’s II: The New Thesaurus, Third Edition)

Anonymity is characteristic of interactions in a specific context…Like you getting a coffee from a coffee shop or leaving a comment with a made up name on a forum.

If I join a chatroom where I’m only known as “SillyGrrl” I may think I’m anonymous because I think no one knows my true identity. But the chatroom has the IP address I use to converse and my ISP knows who was using that IP address at that time. Even if I go to a library terminal or an Internet café, there are records of who used which machine and IP address at any given time. Privacy considerations may lead to those records being destroyed periodically – monthly, weekly, daily – even hourly. But anyone with the wherewithal to be watching while I connect (just as the police were watching outside the coffee shop) can shatter the façade of anonymity and connect the activity to me.

In the course of our life and through out our day, we are going in and out of various contexts in various states of anonymity. We might assume that our status in a particular context is anonymous, depending on weather we share uniquely identifiable information in the context. But as Dave point out and as outlined in this excelled video from Google tech talk, “You Are What You Say: Privacy Risks of Public Mentions“, (thanks Nitin for pointing this out) the risks to your identity, from somebody taking the time to collapse and search across, such contexts, is severe. Anybody remember the AOL Search data release fiasco. I guess, with this background, we can define privacy as a guarantee, that your data will be kept silo-ed and not shared or merged with other contexts.

The upshot – we should be careful about what we say in public forums because even with rudimentary search across contexts, people may be able to find out a lot about you. Even scarier, is somebody forming a company to just search across various public contexts on behalf of clients…In fact, I am pretty sure such companies already exist. So be careful.

KarmaWeb

Personal Weblog of Jitendra Gupta

Menu

Category Archives: Anonymity