What’s Worse: Facebook Breaches Or Scraping?

The latest Facebook-related privacy breach is much ado about little, according to several observers. Larry Magid, writing at Huffington Post Technology, explains what happened this way: “The culprit in this case is not bad intentions but bad Internet code.” Magid says the cause of the problem was “referer” code, something that has been around as long as Web links have.

Referer code, as The Wall Street Journal noted in its article about the issue, “passes on the address of the last page viewed when a user clicks on a link. On Facebook and other social-networking sites, referers can expose a user’s identity.” The applications that were doing this “were passing the User ID,” a violation of Facebook’s clearly stated privacy policy that “prohibits application developers from disclosing ‘user information to ad networks and data brokers.'”

“What’s at issue here,” Magid says, “is a technical bug in the Web’s plumbing that doesn’t just affect Facebook’s App developers but other companies as well. And while Facebook — one of the world’s largest web properties with 500 million users — ought to be doing all it can to fix this problem, the company is far from the only culprit.”

So, some unnecessary weeping and teeth-gnashing, it appears, has taken place about Facebook.

On the other hand, scraping is not an accident of code, but an intentional method of getting information on individuals by, for example, copying every message on a private message board. A story in the WSJ ongoing series, What They Know, revealed how Nielsen scraped info from a PatientsLikeMe private discussion board in which people share intimate details about their medical conditions, prescriptions used, and more. Nielsen, the article says, “monitors online ‘buzz’ for clients, including major drug makers, which buy data gleaned from the Web to get insight from consumers about their products.”

Scrapers are looking for identifying information. It’s their business, and business is good, based on the WSJ article: “Spending on data from online sources is set to more than double, to $840 million in 2012 from $410 million in 2009.”

The mission of this scraper, in particular, bothered me. “New York-based PeekYou LLC has applied for a patent for a method that, among other things, matches people’s real names to the pseudonyms they use on blogs, Twitter and other social networks.” PeekYou says it hands over only demographic information, not personally identifying info, but the nirvana for marketers is identity to enable one-to-one targeting, so I have to wonder how long companies that are good at scraping – which takes some technical knowledge but is not difficult – will be able to hold off demands for specific identification.

Scraping attacks are increasing. “At Monster.com, the jobs website that stores résumés for tens of millions of individuals, fighting scrapers is a full-time job, ‘every minute of every day of every week,’ says Patrick Manzo, global chief privacy officer of Monster Worldwide Inc.” Sites that used to deter 1K to 2K attacks a month are now fighting from three to ten times as many, the Journal says.

There’s much more – and much more disturbing – info in the WSJ article, so it is worth reading.

What I am left thinking, after reading about scraping and about the last Facebook issue, is that deliberate attacks of scraping present a much more serious threat to me, and one I can do virtually nothing about. Even if I deleted my virtual identity altogether, everything already there is there. Facebook presents privacy problems, that’s clear, but I joined and I have to accept some responsibility for what I post there – not, of course, for what gets stolen. But scraping operates in, for now anyway, an apparent gray area legally but not ethically, IMO. Just because you can doesn’t mean you should, even if there is an increasing demand that you do.