Wednesday, December 15, 2010

There's the ripple: From Gawker comments to personally identifying information

Okay, so Gawker was hacked. One might ask, "Why?" But I think the more interesting question is, "So what?"

On December 12, hackers posted a list of the usernames and passwords from a total batch of over a million users of Gawker Media web sites. (Gawker includes online media properties such as Gizmodo). According to the Wall Street Journal, the passwords were encrypted, but the hackers decoded 188,279 of them and published them. The published a list of the 50 most used out of those decoded. complains that the most used passwords are extremely weak. But let's keep in mind that there are about 800,00 passwords the hackers didn't publish, and the reason might be that they're too difficult to decode, or at least it would take more time to decrypt them. The top 5 in the decoded dataset were


The top choice, 123456, came in at over 3,000 uses within the dataset of 188,279. Taken together, it looks like the top 5 cover about 7,000 passwords out of the decoded, published dataset, or about 1 in 4 passwords, more or less.

So people have weak passwords on Gawker sites. So what?
Does it matter here? These are accounts set up so people can leave comments on online articles, thus preventing most spammers from taking over the comments. It's not like it's a big deal. The accounts are a hall pass for access. goes on to examine whether there are differences in password usage by email provider (I assume they're going by the domain in the email addresses used as usernames for Gawker accounts). I think that is missing the point. There are a few problems with the practice of implementing accounts on comments to prevent spam. First, it puts the burden of keeping a clean site onto the users, rather than implementing stronger security on the server side. Second, having password access for leaving comments may stifle some would-be brilliant insights because people don't want to register on the site - not a great way to encourage engagement. Third, people use the same passwords on many sites. I don't blame users for this when there are sites like Gawkers' that require accounts to do basic things that wouldn't normally risk users' privacy and identity. 

Let's talk a bit more about the first and third problems. They have some things in common.

Account management is for the convenience of site owners, not the protection of the users
Gawker Media and other media web sites have forced commenters to create accounts on their sites to prevent spammers from taking over the commenting space. Not having spammers makes it much easier to moderate the comments (if you're going to at all). So, the sites have traded the convenience of their readers for their own. That is, rather than employ someone or some technology to deal with the spam (or a combination), they implement an account management system, thus putting the burden on their readers to prevent comment spam.

Account management systems are often implemented to make it easier for IT and Security to do their jobs. While dealing with password maintenance issues has a cost, the cost is higher for users than for the organization. For the organization that is looking just at saving IT money, it's a win. For the organization that wants to create a loyal audience, registering on a site and maintaining a password create obstacles to participation.

By putting the burden on users to stop comment spam, media companies actually make their users' data less secure
As we know, people use the same usernames and passwords in as many places as they can. On average, people have between 15 and 25 username-password combinations they use every day. People who work with complex systems often have many, many more. So, when users make the tradeoffs between respecting security policy and getting to their goal, they make reasonable choices, usually in favor of their own efficiency. Thus, using the same username and password in multiple places, for both very risky, highly personal situations such as online banking and low security, low risk scenarios like leaving comments on Gizmodo.

A requirement like Gawker's has possibly inadvertently compromised the personal security of more than a million of its readers. When a hacker knows one username and password for you, along with anything else about you, it is fairly easy to break into all kinds of accounts you access online.

The security experience is the worst part of using nearly every site
IT has owned login and registration for so long, that designers and users alike have been trained to put up with whatever security engineers say needs doing. We rarely question the purpose of a security policy, what it is in response to, what the tradeoffs are, how it fits into the larger security plan of an organization, and what we want to the security experience to be for users. Most of the implementations are made without any user research or usability data at all.

As is the case with many security decisions in organizations, each issue is treated in isolation. Who would have thought that comment spam would interact with a) the security of the servers and b) the security of users' personally identifying information?

See also:
from Jeff Attwood  

from Richi Jennings at ComputerWorld

the announcement from Gawker 

ADDED 15 December 2010 at 3:30pm EST, from Karen Bachmann, from an email to me: 

Good points, Dana. I recently worked with a client who has a strong financial need to know that they are reaching the right audience, highly specialized professionals looking for detailed technical information. Visitors are currently required to set up an account and have to log in with username and password each time they visit. The information, though, is not restricted. Anyone can create an account.

When I interviewed members of their intended audience, having to log in to get to non-sensitive information was a huge but familiar barrier to entry. Most people were resigned to this with all websites of this type in their field, but none were happy about it. During the interviews, I actually asked people to interact with the site and saw several problems regularly. 1) Those who had created accounts either forgot completely that they had because of time between visits. 2) Visitors who knew they had accounts forgot their credentials, a problem they indicated was common for their interactions with others site of this type. 3) The most Web savvy stated that they had a standard "throwaway" set of credentials that they would always use on a site like this. When asked about their likely use of the site, most said that they would usually just go elsewhere for the information when the credentials got in their way.

Since the basic use wasn't really about guarding access to the information, I recommended to my client that they simply request an email address as a short-term solution, and omit full credentials for access. If they still required an account (more details about the user), account management would require a login. However, in the longer term, technology could actually take most of the burden from users. The company has a huge database of contacts that could be used to cross reference emails entered. Handling this on their servers would actually provide them with even more data about types of users than the account information they collected.

The managers at the client indicated they really just assumed that the only way they could gather their information was with an account and credentials model. They had not really considered their real needs against the user perception and goals. They intend to make the change I recommended as part of their redesign, which is still pending.


  1. Great points all, but what do you suggest to combat bot-delivered spam?

    Human moderation only works for properties large enough to fund it, and comment spam affects even tiny author-operated blogs.

    CAPTCHA is already broken a few ways, tremendously user-hostile, and only a rear-guard action even if it worked superbly today.

    You can't reasonably block every bot IP address in the world, and detecting and blocking is work akin to moderation anyway.

    So...what? It's easier to ID the problem than to fix it.

  2. Fair enough. It's true, I don't know the answer. And I agree that CAPTCHA is hostile in a lot of ways.

    I guess I'd like to see some creative research on how to deal with the problem of spam bots.

    I do know that it's just lucky that it happened on Gawker and not on or Because when it does, it will be much more ugly for people, and the ripple will be much more like a tidal wave.

  3. Off the cuff, I can think of some of hybrid solutions to the bot problem, JimmieDave, but all of my ideas put at least some burden on the users: email confirmations on first post, a variation on a community ranking/policing system currently used to manage trolls, giving users quick access to all their posts and perpetual editing/delete rights. These may be preferable because they demand less effort than logging in each time and reduce risky re-use of credentials, but none are perfect.

    Having seen how trolls are dealt with in some online communities, users may not object to a role in policing bots and spam--if that is the actual goal of credentials. Unfortunately, the credential system that most sites use is, well, a lot like the TSA. The innocent majority suffer with burdensome rules targeting (usually ineffectively) the guilty minority.

    While user research can help determine the tolerance for spam vs. acceptable responsibility as a member of a community, I think that the optimal design approach *generally* will be a system of solutions that blend technology and human interventions. But someone far more clever than I may suggest the elegant single solution that eludes me so far. :)