Monday, July 20, 2009

Spam vs. CAPTCHA, the lesser of two evils

For quite a while now, one of the greatest annoyances I've encountered on the net is something we've come
to accept as something comparable to "the lesser of two evils," a spambot-roadblock known as "CAPTCHA." (This acronym actually
has a meaning, which is "Completely Automated Public Turning test to tell Computers and Humans Apart.")

Now, you might ask me, "So what, you fool? Would you prefer getting every one of your forms exploited by spambots?"

Of course not, there is nothing I despise more than getting countless well-meaning offers of masculine-organ-gargantuafication. (And that is not entirely because rendering those areas any larger would be more of a nuisance than anything else.. (Bad puns end here. (Nesting ftw!)))

As much as I want to avoid those mails, I can't help feeling a great irritation every time an incomprehensible image pops up,
declaring me a fifty-line script for not realizing that S was actually a 5. More than once, this frustration has lost a forum or blog
a comment from yours truly, and probably many more from others.

When talking about these matters in a corporative fashion, you use the term "conversion ratio." Simply put, it's the percentage of visitors
that actually follows through with the action that you as author wish for them to take, werther that is filling out a form, signing up as a member, or perhaps purchasing a certain product or service. And, as you've probably figured out by now, the use of CAPTCHAs might hold a negative impact on this ratio.

At least, that was what a recent post on the SEOmoz.org-blog was all about. The author of this post put together some very clear and impressive statistics, showing that the use of CAPTCHAs yielded an 88% reduction in spam, but at the same time the figure of failed "conversions" rose drastically. And the figure of spam was not that great to begin with.
[You can read the full, very interesting post here: http://www.seomoz.org/blog/captchas-affect-on-conversion-rates]

So, when putting the conversion ratio in first perspective, not implementing a CAPTCHA seems to yield more favorable results. But really, we do not want that spam!

The same post as mentioned above provided a link to a soon three-year old alternative solution to the problem - called the "Honeypot CAPTCHA."
The general idea of this solution is that, when a spam-bot traverses your page, it looks for and attacks any tasty-looking form, but rarely ever pays any attention to user-oriented code, that is the stylesheet. So, what if we would put in a field in our form that code-wize appears as a completely normal input field, but is invisible to the real user? Get it? If that field, which a real user wouldn't fill out actually *is* filled out, we can deduce that this was the workings of something less intelligent, a couple of dirty lines of code. In the final part of this post, I wrote a simple example piece of code.
[The blog in which this solution, as well as two other interesting ones were originally posted can be found here: http://haacked.com/archive/2007/09/11/honeypot-captcha.aspx]

Opinions voiced against this method primarily concern the very important matter of accessibility - accessing a form with a field like this with a screen-reader or text-based browser would confuse and/or render the valid user unable to use the form. However, supplying proper commentary about the field should solve this matter. And also, how *does* a screen-reader/text-browser go about regular CAPTCHAs, anyways?

But facing the cold, hard facts, we can't fool ourselves into believing that spambots will stay silly forever. In fact, there should already be quite a few sophisticated ones out there. The battle against spam has been raging since the olden days, and just to provide an example I'd like to toss in a link to this very informative post by an anti-spam software developer, written in early '06. [Go ahead and read: http://unknowngenius.com/blog/archives/2006/01/30/the-state-of-spam-karma.] He discontinued working on his project, SpamKarma2, in mid-'08, and put the code up on Google Code under a standard GPLv2 license, where it's still being developed today.

Back to the point - he points out in the post I liked to above that he had already then observed an increase in spambot efficiency, making the access look more human-like, following links in a "common" manner, and even bypasses javascript-filters. A programmer who can implement a javascript parser in his spambot would hardly be challenged to create one for stylesheets as well, the reason there hasn't been any indications of one yet is simply that there hasn't been any need for it. Thus, the honeypot-solution, if widely spread, would probably be surmounted with relative ease.

If I haven't frustrated you enough yet, breaking all the good parts of the "solution" before you've even had a chance to code it into your site, here's one more. "OCR." Utilizing this technique, invented to turn scanned images into normal text, the quite famous XRumer bot was able to break Hotmail and gMail CAPTCHAs in late '08. So the race is, by all measures, a tight one. Obfuscated CAPTCHAs however still seem to hold pretty high ground, and thus it is indeed the optimal way to avoid spam. But, (back to square one), user-unfriendly and perhaps holding a negative commercial impact.

So to sum things up:
  1. Using the honeypot CAPTCHA and common sense, a "low-value" target would probably be able to avoid practically all spam without implementing intrusive techniques such as regular, hard-to-OCR CAPTCHAs.
  2. For as good security as possible, a hard-to-OCR CAPTCHA is the way to go, unfortunate but true. One nice system I'd like to push for is the reCAPTCHA service, which makes the pestering work into a good deed by using your human processing cycles to digitalize old books and publications.[For more information on this, visit http://recaptcha.net]
  3. The battle rages on. If you've got any information regarding this topic I'd more than love to hear from you. Especially if you hold some information about the workings of more sophisticated spambots. Ignorance might be bliss, but living in the grey-zone in between is pure hell.

Thanks for sticking through, hope you found this somewhat useful.

------------------------------------------
Honeypot CAPTCHA simple example:

#letshidethis { display: none; }

<form>
<input ...>
<input ...>
<textarea ...>
<div id="letshidethis">
<input name="user_info" ... (or some other, tasty-looking faux name)
</div>
<input submit>
</form>

Then in your code, you would simply check if user_info contains any data. If it does, it might very well be spam.

Stuff of notice here is to not provide a completely unintelligible name on the fake input, since some (many?) spambots seemingly look for a collection of names to post into.

6 comments:

  1. Nice article. I just stumbled upon your blog and wanted to say that I have really enjoyed reading your articles.

    ReplyDelete
    Replies
    1. They are penetrating the spam imprison that are resenting here for more than a few times. hire someone to do my essay After final that work there won't have some problem that the users may countenance.

      Delete
  2. I just hope you could make another post related to this. This is definitely good blog.

    ReplyDelete
  3. AUO B101AW03 V.1 LCD screen
    Screen Size 10.1 Inch
    Resolution Pixels 1024x600
    Backlight Type LED
    Aspect Ratio 16:9
    Screen Surface Matte
    Data Connection 40-Pin
    Application Laptop or Notebook
    http://www.nbkit.com/Wholesale/product/New-AUO-B101AW03-V.1-LCD-screen-10.1-inch-LED-1024x600-WSVGA_c17768.html
    AUO B101AW03 V.1 LCD screen LCD screen is the most important component of laptop.Maybe Other laptop parts or components could be replaced or repaired easily at low cost.but for LCD screen,in most cases it only can't be compatible problem. For solve the following problem,broken lcd screen,bad pixels,white lines,color,shine screen.Replace A LCD screen is only solution.Sometime,you need to make sure anti-static steps prepared before remove old screens and check new lcd screen parameters match your original LCD screen. so sugguest buyers need buy it at professional laptop screen website.
    The LCD screen review from NBkit.com
    10.1" wide screen has 3 kinds different resolution 1024 x576 1024x600 and 1366x768. the previous 2 kinds we called standard screen, and the later one we called HD version. Something we need to note is that once you put a screen 1024x600 on a 1024x576 laptop, you may see one black line on screen at bottom, it seems strange and looks not funny. 40 pin connector and LED backlight are the same feature of this 3 kinds screens. So they have good compatibilty each other, when you select correct resolution. 10.1 HD version is for some user like movie with netbooks. personally, 10.1 HD is not good , if you need HD , why not purchase 15.6 or 14.1 HD version? In addition, we have to refer to 10.2 wide screen here , because by NBkit.com statistics , many laptops do a mix 10.1" 10.2 " lcd screen in same laptops. for example HP Mini 1000 , you may find the screen may be N101N6-L01(40Pin) or CLAA102NA2CCN (30pin) or other screens. Some repair man will tell you the differenence if you want to do repair, but if you plan to do it self, that is you need to pay some attention. In a word, 10.1" , 10.2" are mainly part of netbook market.,up to 55% At last , 10.4" 10.6" wide screen is popular LCD screen for tablets in past 1-3 years. they usually has resolution at 1280*768 or 1280*800 . at very high price , good quality , LED or single CCFL lamps backlight, 14pin connector, 20 pin connector and 30pin connector. that you need to take care when you want do any repair or DIY replacing. Fujitsu tablet 10.6" or SONY 10.6 PCG-TR is the sample.

    ReplyDelete
  4. Hello everyone, have a nice article. I read this post and get lot of important news. I sharing this post some of my close friends and they accede to visit this site. Thanks

    SEO Services

    ReplyDelete
  5. They are searching the spam capture that are resenting here for several times.After finishing that work there won't have any problem that the users may face.



    Digital Camera

    ReplyDelete