Anti-anti-spam measures

D. Strout

Resident IPv6 Proponent
Like it or not, we often have to post our e-mail addresses online. And that attracts spammers. I've seen some pretty elaborate "anti-spam" measures to disguise one's e-mail address from spammers. But these anti-spam measures forget some pretty simple principles of how e-mail addresses look. E-mails usually have a limited number of characters, and the period and @ symbols are replaced with pretty standard stuff. As a proof of concept, I made an anti-anti-spam script that takes your "spam-proof" address and returns it to its normal form.

Mostly it was in response to the elaborate e-mail disguising which I saw at the end of HTTP Zoom's recent offer. All the slashes and brackets and such can be discarded, since you don't see those in e-mail addresses. Ditch spaces and you're good to go. They didn't even change the @ and period. You get my point.

You can find the script here. Try throwing it your e-mail as you usually write it online and see if it holds up. If it does, good for you, post your method here. If not, check out the source and see how it works so you can improve. It is by nature rather "greedy": it will convert anything it can. Spammers can miss a fair number of e-mail addresses, they don't care. If you really want to evade this sort of thing, put a slash in your e-mail. It is valid, between quotes.

Some examples of addresses that will be made "spammable" by this:

MyEmail [remove] AT my (no-spam) e_mail DOT n-e-t

[removethis] my-email [A-T] _ something DOT com

myemail AT myemail DOT com

myemail [@] myemail [.] com

The biggest issue is determining strings to remove. Right now it removes "remove", "removethis", and "nospam", but you could use something else. If I were to beef this up, I could probably tell it to remove anything between brackets. Or I could get really fancy and see if most of the characters are lowercase, and if so, remove uppercase characters. The point is, once you get past ditching brackets and spaces and stuff, it becomes hard for the computer to determine what else doesn't belong.
 
Last edited by a moderator:

D. Strout

Resident IPv6 Proponent
OCR - run that image through http://www.i2ocr.com/. It comes out close enough that once it's run through my script, you have the address. Maybe if you don't name the image "email.png" on the server?
 
Last edited by a moderator:

drmike

100% Tier-1 Gogent
Solution to prevent this harvesting is to make the email and image and do so with a non-traditional font.  Something readable by humans, but not so simple to be OCR'd.

I never push emails out in public.  But I've been know to push folks to a contact form :)  That works and gets around the initial public email handout.
 

Damian

New Member
Verified Provider
http://www.google.com/recaptcha/mailhide/ is good if your address will be on a website... not so good when you have to input it somewhere.

I think back to The Old Days where email addresses were everywhere. I remember them being used as identifiers on forums, on PHP.net documentation comments, etc. We were so innocent back then.
 

mojeda

New Member
The original creater/founder of reCaptcha did an AMA on reddit a few days ago I think, he answered a few questions about captcha/reCaptcha.

I'll look for the link later when I get home.
 

MCH-Phil

New Member
Verified Provider
Neat article, I didn't know you only needed one word to be entered, seems like that would make it easier to crack.
I used that lil trick for a while to save time.  It stopped working a few months ago.  I've not tried since.
 

D. Strout

Resident IPv6 Proponent
Neat article, I didn't know you only needed one word to be entered, seems like that would make it easier to crack.
The whole point of the ReCAPTCHA project was to digitize books. The premise was simple: take a word that a computer could successfully OCR and put it up against one it could not. If the user entered the known word correctly, then you could be relatively certain that the unknown word was correct too. If you showed the same prompt to two or three folks and they all answered the second word the same or similarly, there's one more word of a book digitized. That's why the above works (worked?). It was entering the known word correctly, and a glitch had rendered the unknown word unnecessary, presumably due to some trickery with the two spaces. I assume that, as @MCH-Phil mentioned, it no longer works because they realized that at least two words should be there.

So if you didn't know, when you bypass a ReCAPTCHA, you are slowing the progress of book digitization. Although with the Google takeover, I don't mind: they've made those things so impossible to read that I'd be happy to bypass them any way I can.
 
Last edited by a moderator:

Jono20201

New Member
Verified Provider
Luckily my name jonathan breaks your script, however that doesn't full proof my email box. Luckily gmail usually does a good filtering job.
 

D. Strout

Resident IPv6 Proponent
Luckily my name jonathan breaks your script, however that doesn't full proof my email box. Luckily gmail usually does a good filtering job.
It depends on how you obfuscate your address. If you were to do something like "[jonathan] AT [gmail] DOT [com]" or [jonathan] @ [gmail.com] it would go through correctly. Only if you make the @ in to a lowercase word does it break.
 
Top