The New Yorker Serves Up Spam

38197-spam.jpg

Just yesterday I was thinking to myself that I wanted to know more about spam (the internet variety). I get a few hundred junk comments on this site’s spam filter everyday, and while most of them are either about sluts or Paxil, some have a distinctly literary quality to them. It made me wonder about the humans behind spam, and I thought that I should research the phenomenon. Within 24 hours, I ran across this article in the new issue of the New Yorker about “the losing war on junk email.” It doesn’t have quite the humanist edge that this astounding story did (about a intelligent but monumentally gullbile Massachusetts psychotherapist who fell pray again and again to those Nigerian email scammers who want you to wire them money), but it’s a pretty eye-opening story about the tricky world of smoked ham.

A spammer’s job is to confound the filters. The spellings “V1agra” or “Vi-agr@” mean nothing to a machine, but almost any human reader gets the point. In 2002, the programmer Paul Graham wrote an essay called “A Plan for Spam,” which became an intellectual manifesto for the thousands of researchers trying to find a way to clean up the Internet. “I think it’s possible to stop spam, and that content-based filters are the way to do it,” he wrote. “The Achilles’ heel of the spammers is their message. They can circumvent any other barrier you set up. But they have to deliver their message, whatever it is. There is no way they can get around that.”

Graham compared every character—dashes, apostrophes, numbers, symbols—in thousands of genuine e-mails with those in thousands of pieces of spam. He was able to train his software to use the context of a message to guess how likely it was that an e-mail containing certain words in relation to each other was spam. The words “republic” and “madam” seem innocent enough, but when they appear together in an e-mail they are often from a Nigerian huckster who has addressed his e-mail “Dear Sir or Madam.” Mail like that is invariably spam.

As filters become more sophisticated, spam becomes more elusive. There are millions of ways to write a word using punctuation, numbers, and other symbols. One mathematically minded blogger who looked into it found that there are 600,426,974,379,824,381,952 ways to spell Viagra. “If I thought that I could keep up current rates of spam filtering, I would consider this problem solved,” Graham wrote. “But it doesn’t mean much to be able to filter out most present-day spam, because spam evolves.” Indeed, most anti-spam techniques so far have been like pesticides that do nothing other than create a more resistant strain of bugs.

via Metafilter

Comments (3) to “The New Yorker Serves Up Spam”

  1. Many thanks for the pointer

  2. chas, what’s your personal email address?

  3. The guy in the New Yorker story….what a train wreck!
    How are you going to protect a guy from himself when he still believes his swindlers were on the level nice guys?