# Anti-anti-spam measures



## D. Strout (Jun 4, 2013)

Like it or not, we often have to post our e-mail addresses online. And that attracts spammers. I've seen some pretty elaborate "anti-spam" measures to disguise one's e-mail address from spammers. But these anti-spam measures forget some pretty simple principles of how e-mail addresses look. E-mails usually have a limited number of characters, and the period and @ symbols are replaced with pretty standard stuff. As a proof of concept, I made an anti-anti-spam script that takes your "spam-proof" address and returns it to its normal form.

Mostly it was in response to the elaborate e-mail disguising which I saw at the end of HTTP Zoom's recent offer. All the slashes and brackets and such can be discarded, since you don't see those in e-mail addresses. Ditch spaces and you're good to go. They didn't even change the @ and period. You get my point.

You can find the script *here*. Try throwing it your e-mail as you usually write it online and see if it holds up. If it does, good for you, post your method here. If not, check out the source and see how it works so you can improve. It is by nature rather "greedy": it will convert anything it can. Spammers can miss a fair number of e-mail addresses, they don't care. If you really want to evade this sort of thing, put a slash in your e-mail. It is valid, between quotes.

Some examples of addresses that will be made "spammable" by this:

MyEmail [remove] AT my (no-spam) e_mail DOT n-e-t

[removethis] my-email [A-T] _ something DOT com

myemail AT myemail DOT com

myemail [@] myemail [.] com

The biggest issue is determining strings to remove. Right now it removes "remove", "removethis", and "nospam", but you could use something else. If I were to beef this up, I could probably tell it to remove anything between brackets. Or I could get _really_ fancy and see if most of the characters are lowercase, and if so, remove uppercase characters. The point is, once you get past ditching brackets and spaces and stuff, it becomes hard for the computer to determine what else doesn't belong.


----------



## KuJoe (Jun 4, 2013)

Just use images.


----------



## D. Strout (Jun 4, 2013)

OCR - run that image through http://www.i2ocr.com/. It comes out close enough that once it's run through my script, you have the address. Maybe if you don't name the image "email.png" on the server?


----------



## NodeBytes (Jun 4, 2013)

recaptcha - http://www.google.com/recaptcha/mailhide/d?k=01IcAt8nYLhCSDhXfQn5eQ_Q==&c=cOaxb0PXqc45znKdT7WTP8jYIjKS8h9hGUZCg9j9VPo=


----------



## shovenose (Jun 4, 2013)

bcarlsonmedia said:


> recaptcha - http://www.google.com/recaptcha/mailhide/d?k=01IcAt8nYLhCSDhXfQn5eQ_Q==&c=cOaxb0PXqc45znKdT7WTP8jYIjKS8h9hGUZCg9j9VPo=


useless


----------



## NodeBytes (Jun 5, 2013)

@shovenose - It's an interesting idea.


----------



## drmike (Jun 5, 2013)

Solution to prevent this harvesting is to make the email and image and do so with a non-traditional font.  Something readable by humans, but not so simple to be OCR'd.

I never push emails out in public.  But I've been know to push folks to a contact form   That works and gets around the initial public email handout.


----------



## Damian (Jun 5, 2013)

http://www.google.com/recaptcha/mailhide/ is good if your address will be on a website... not so good when you have to input it somewhere.

I think back to The Old Days where email addresses were everywhere. I remember them being used as identifiers on forums, on PHP.net documentation comments, etc. We were so innocent back then.


----------



## JDiggity (Jun 5, 2013)

Ahhh I just block the stuff and keep moving.


----------



## mojeda (Jun 5, 2013)

recaptcha was cracked a long time ago wasn't it?


----------



## D. Strout (Jun 5, 2013)

mojeda said:


> recaptcha was cracked a long time ago wasn't it?


Yes and no. This article summarizes how the crack worked, but it also says it was fixed by Google before the crackers even got a chance to demo their work.


----------



## nunim (Jun 5, 2013)

D. Strout said:


> Yes and no. This article summarizes how the crack worked, but it also says it was fixed by Google before the crackers even got a chance to demo their work.


Neat article, I didn't know you only needed one word to be entered, seems like that would make it easier to crack.


----------



## mojeda (Jun 5, 2013)

The original creater/founder of reCaptcha did an AMA on reddit a few days ago I think, he answered a few questions about captcha/reCaptcha.

I'll look for the link later when I get home.


----------



## MCH-Phil (Jun 5, 2013)

nunim said:


> Neat article, I didn't know you only needed one word to be entered, seems like that would make it easier to crack.


I used that lil trick for a while to save time.  It stopped working a few months ago.  I've not tried since.


----------



## D. Strout (Jun 5, 2013)

nunim said:


> Neat article, I didn't know you only needed one word to be entered, seems like that would make it easier to crack.


The whole point of the ReCAPTCHA project was to digitize books. The premise was simple: take a word that a computer could successfully OCR and put it up against one it could not. If the user entered the known word correctly, then you could be relatively certain that the unknown word was correct too. If you showed the same prompt to two or three folks and they all answered the second word the same or similarly, there's one more word of a book digitized. That's why the above works (worked?). It was entering the known word correctly, and a glitch had rendered the unknown word unnecessary, presumably due to some trickery with the two spaces. I assume that, as *@MCH-Phil* mentioned, it no longer works because they realized that at least two words should be there.

So if you didn't know, when you bypass a ReCAPTCHA, you are slowing the progress of book digitization. Although with the Google takeover, I don't mind: they've made those things so impossible to read that I'd be happy to bypass them any way I can.


----------



## nunim (Jun 6, 2013)

I tested it on the official recaptcha side right before I posted, it worked just fine.


----------



## Jono20201 (Jun 6, 2013)

Luckily my name jon*at*han breaks your script, however that doesn't full proof my email box. Luckily gmail usually does a good filtering job.


----------



## D. Strout (Jun 6, 2013)

Jono20201 said:


> Luckily my name jon*at*han breaks your script, however that doesn't full proof my email box. Luckily gmail usually does a good filtering job.


It depends on how you obfuscate your address. If you were to do something like "[jonathan] AT [gmail] DOT [com]" or [jonathan] @ [gmail.com] it would go through correctly. Only if you make the @ in to a lowercase word does it break.


----------

