Sunday, October 4, 2009

Sending mail from outside your network

After having set up a mail server for the company I work at (I'm a technician at a SEO, PPC and web marketing company) we quickly came across a problem. One of our employees tried to send a mail through our server from outside our network, which resulted in an error.

The first problem I figured was because pretty much all ISPs in Sweden block port 25 (SMTP). It was quickly solved by routing incoming port 587 to internal port 25 on our server. Yup, it solved the problem. But only the first problem, because after we could actually connect to the server from outside the network we got another problem. An authentication problem. Duh, I should've known.

It's supposed to block attempts to spam through our server from outside but I never considered the possibility that it'll block legitimate mail as well. What I had to do was obviously set up some kind of authentication, and the way to do it is SASL. I use Dovecot on the server, so Dovecot SASL is what I'll be using in this guide. If you followed my guide to Postfix+Dovecot your server should be compiled with support for it already.

First of all we need to enable it in Postfix. To do that simply add the following lines to your Postfix configuration ( some place appropriate:
broken_sasl_auth_clients = yes
smtpd_sasl_type = dovecot
smtpd_sasl_path = private/auth
smtpd_sasl_auth_enable = yes

You will also want to add permit_sasl_authenticated somewhere in the "smtpd_recipient_restrictions" list.

Next up is Dovecot, so open up your dovecot.conf. You can start by adding "login" to the mechanisms row, so it'll look like this:
mechanisms = plain login

After that change the client path row to:
path = /var/spool/postfix/private/auth
I also changed both user and group to postfix instead of dovecot.

That should be all you need to do to enable SASL for your mail server, now all that's left is to test it. Run the following four commands to completely stop and start the mail server:
postfix stop
pkill -9 dovecot
postfix start

To test the SASL authentication the first thing we have to do is to base64-encode our username and password so we can send it to the server. Normally the mail client does this, but since we're telneting we have to do it ourselves. Run the following command in the terminal:
perl -MMIME::Base64 -e 'print encode_base64("\000user\@domain.tld\000password")'
The string of random characters it returned is the base64-encoded version of "\0user@domain.tld\0password."

Now let's try authenticating ourselves when sending a mail through telnet!
telnet localhost 25
Connected to localhost.
Escape character is '^]'.
220 server.domain.tld ESMTP Postfix

235 2.7.0 Authentication successful
From here on send a mail as usual, with "MAIL FROM", "RCPT TO" and "DATA."

That's all there was to it - now you, like me and my company's employees, should be able to send mail from outside the local network. ;)

Friday, September 18, 2009

Facepalm supreme - Deceived by a button

Okay, so I've produced a piece of code looking like this. It's the peak, the epitome of simplicity, no doubt about it.

// We only want to be able to delete the item if we either:
// [1] have accesslevel 0 (admin privs), or
// [2] are the author of the item (identified by session and database id)

   $delete_id = // Sanitation removed for simplicity

   $author_id = // DB operations removed for simplicity

   if($_SESSION['accesslevel'] == 0 || $author_id == $_SESSION['user_id']){

      if(mysql_query("DELETE FROM table WHERE id=$delete_id") or $output = 'Error removing item '. $delete_id)
         $output = 'Item ' . $delete_id . 'deleted!';
   } else {
      $output = 'You do not have the privileges required to remove this item!'

Then we have a couple of unrelated clauses, and the output of the list of items, which of course displays a possibility to delete the item if the privileges are upheld. It looks like follows:

$result = // Query for all relevant items

while($row = mysql_fetch_array($result)){
$button_delete = ''; // Empty if unprivileged
   if( /* $_SESSION['user_id'] is privileged enough */){
      $button_delete = "<a href='?page=$_GET[page]&delete_id=$row[id]'><input type='button' value='Delete'></a>";

   echo "<tr><td>$row[item_name]</td> [...] <td>$button_delete</td></tr>";

// And finally the code to output the deletion-status, down here
echo $output;

Assuming all the code I decided to leave out are all without any errors and definitely doesn't interfere with the code above, can you find any errors?
My original source-file contained roughly 150 lines of code, and I was baffled when I found a very, very strange bug.

  • If I'm not privileged enough, I get the correct message that I lack privileges and the query is, of course, not executed.
  • If I'm privileged, the query is indeed executed, meaning we've entered the first IF-clause. ALAS, I get the same message as above (!!)

We simultaneously exeute the content of both the IF and the ELSE clauses. Whoa.

Isolating only the logical comparison part and running it, I was unable to reproduce the error. So some other part of the code had to be interfering. But there were no conflicts, there *could not be* any conflicts. Commenting my way through the code, I was assured of that. After spamming my code with outputs in the different clausules, (which only led me closer to the brink of insanity, since even though the code seemed to evaluate true (the query was executed), I did not get any output), the error was at long last stumbled upon.

<a href='?page=$_GET[page]&delete_id=$row[id]'><input type='button' value='Delete'></a> This here, ladies and gentlemen, was the perpetrator. The caused me such grief. Even though it was not supposed to have any effect on the code's workings (it was not in a form, it's value was never registered), it somehow managed to set some property amiss. Now, I'm not very well versed in the Document Object Model, but that my querystring was somehow "invisibly altered", and on top of that, that my evaluations were compared and their respective clauses were exectued before some variables were set (or set over again, as it appeared...), sounds very unreasonable, or at the very least, odd.

Just a short note regarding the debugging. For validating a user as author, I performed a simple query selecting the author's ID, storing it in a variable. When outputting this variable, before any DELETE-queries or basically anything at all was exectued, with unfulfilled privileges, I got the output correctly. But, when outputting it when it was true, it somehow was empty. Like the post had been deleted before the query was executed.
I'll write that again.

  1. Get author ID
  2. Echo this ID for debugging
  3. Compare to current user
  4. If true, perform DELETE
  5. If false, set errormessage

If #3 was false, we got the errormessage, and the author ID was echoed.
If #3 was true, we STILL got the errormessage, but the author ID was empty. The post was deleted before we performed the evaluation allowing the program to delete it, but ONLY when we actually DID have the permissions to do so.
My head still hurts...

Got any explanations to this? I'd love to hear about it!

MySQL triggers - automated magic

Working on a tenzui software project, I found a need to register every occuring change in the database, and it was not a hard task to realize that adding code for such monitoring to every single page, including ones generated by the client later on, would be cumbersome indeed. After some code-site traversing, I had my solution worked out. The glory of the open-source MySQL database shines through again - with a simple statement, a trigger, any desired action could easily be caught and acted upon. Diving in:

CREATE TRIGGER on_delete_from_archive
INSERT INTO changelog (action, table_name, item_name, user_id)
VALUES ('DELETE', 'archive',, $_SESSION['userID']; END//

A very simple, yet powerful example.

Suppose we have a table named archive, holding information about meeting minutes, publicly available documents etcetera. In your organization, you have a limited number of people who are able to add and remove content to this table. All good, until that day someone sets her UI password to "god," and pandoras box is opened. Now, some pesky kid playing deity quickly vaporizes your collection of .pdfs, leaving you devastated. Or not really, because you've got your backups, of course. But how did it happen?

By checking your changelog table, you can now see who and at what time any post was deleted from your archive table. Pinpointing the error to the haphazardly protected user, it's a small task to fix the hole. All with just a few lines of SQL, acting over all scripts making changes to the specified table. I find it pretty useful.

So, what does TRIGGERS support, really?
A trigger acts on the execution of any INSERT, DELETE or UPDATE statement upon the specified table. The event you want performed can take place either BEFORE the query has been run, or AFTER it. It also has the ability to handle "passing data," that is, the OLD and NEW data (which also happens to be the keywords for said operations).

These statements gives you ample opportunity to perform comparisons on two related pieces of data before performing the query activating the trigger, or as in our example above, log the successful execution of the query, together with related interesting data.

A thing of importance to take notice of is that triggers *only* react to SQL statements. APIs etc. can perform tasks on the tables without "triggering."

As you saw above, a trigger is created by invoking the "CREATE TRIGGER" statement. Up until recently, one would require the SUPER privilege to execute this, but with MySQL version 5.1.6, a new GRANT was added for this purpose: GRANT TRIGGER.

GRANT TRIGGER ON table.database TO 'user'@'localhost'

Thanks for your time! I hope you found these short notes useful, and that you'll make use of the awesome TRIGGER-statement in the future.

For more reading up, I'd recommend the following links:

Sunday, September 13, 2009

Guide to Postfix+Dovecot+MySQL Mail Server

Last week I spent a lot of hours researching and experimenting with Postfix (used to deliver mail, essentially an SMTP server) and Dovecot (POP and IMAP server, for handling the mails) trying to set them up properly with virtual users in a MySQL database. In this guide I'm going to go through the steps I ended up with in an easy-to-follow manner without going into too much detail. The systems I used are Slackware 12.1 and Slackware64 13.0 with Postfix 2.6.5 available at, Dovecot 1.2.4 available at and MySQL v>5 available at I have yet to set up SSL and things like that, but once I do I might edit this guide or post a new one for that purpose.


First of all we need to install all the software we need. Note that I assume you will use root priviliges at the right times, such as when installing or starting/stopping a service. I won't explicitly say when you have to be root.
I'm going to assume you have MySQL and it's libraries installed already, if not you'll have to find a separate guide for that (there should be plenty around).
Download the tarball of the latest version (currently 2.6.5) from and unpack with tar xvzf postfix-2.6.5.tar.gz. Go into the folder with cd postfix-2.6.5. Something that slightly annoys me here is that Postfix doesn't use flags to a configure script to compile with e.g. MySQL, but instead you have use a make command with unintelligible arguments before you do the compiling. To compile Postfix with TLS, MySQL support and Dovecot SASL (I think ;)) use this command:
make -f Makefile.init makefiles 'CCARGS=-DUSE_TLS -DHAS_MYSQL -I/usr/include/mysql -DUSE_SASL_AUTH -DDEF_SERVER_SASL_TYPE=\"dovecot\"' 'AUXLIBS=-L/usr/lib/mysql -lmysqlclient -lz -lm -lssl -lcrypto'
You might have to change the paths to what your system is using, those are what I used on Slackware and they seem to work just fine.

Before we can compile we have to make the user and groups Postfix will use. I made them with the following IDs and names:
groupadd postfix -g 2000
groupadd postdrop -g 2001
useradd postfix -u 2000 -g 2000

Now that we've done all that we can start compiling. Run make and make install. When it gets to the end it'll ask a lot of questions - you can just use the defaults (make sure the user and group is set correctly) unless you have any paths you want to change. After the make install is done Postfix should be installed and you can start it with postfix start to see if it works. Right about now it's probably a good idea to have a terminal with tail -f /var/log/maillog open. In there you will see all errors you get, in case you get any.
Download the tarball of the latest version (currently 1.2.4) from and unpack with tar xvzf dovecot-1.2.4.tar.gz. Go into the folder with cd dovecot-1.2.4. Fortunately Dovecot is kind enough to provide a configure script, which makes it easy to install with MySQL support. Run ./configure --with-mysql --with-sql-drivers. Now all you have to do is run make and make install and you should have Dovecot installed. You may want to create a dovecot user right about now:
groupadd dovecot -g 3000
useradd dovecot -u 3000 -g 3000
Try starting Dovecot with the command dovecot and check your terminal with maillog for any errors.


Now that we've got all the software installed all that is left is to configure them! There's a lot more to do here than in the installing part, but it's nothing difficult so just follow along and it shouldn't be a problem.
It's in MySQL we are going to store all the e-mail addresses and passwords and stuff we're going to have on our server. To get basic functionality what we need is a table for domains, a table for aliases and a table for mailboxes. We also need a user to connect with and a database to store the tables in. To create them log into mysql as root (mysql -u root -p) and issue the command CREATE DATABASE mail. To create a user called mail, with the password mail, that has full privileges on the database mail, run this command:
GRANT ALL ON mail.* TO 'mail'@'localhost' IDENTIFIED BY 'mail';

Without going into too many details I'll post the specifications of the tables below. You can just paste them into a file and use mysql -umail -pmail mail < mail.sql to have it create the tables for you.
CREATE TABLE `domain` (
`id` int(11) NOT NULL auto_increment,
`domain` varchar(255) NOT NULL,
`transport` varchar(255) NOT NULL default 'virtual',
`active` tinyint(1) NOT NULL default '1',
UNIQUE KEY `domain` (`domain`)
CREATE TABLE `alias` (
`id` int(11) NOT NULL auto_increment,
`alias` varchar(255) NOT NULL,
`target` varchar(255) NOT NULL,
`domain` varchar(255) NOT NULL,
`active` tinyint(1) NOT NULL default '1',
KEY `domain` (`domain`),
CREATE TABLE `mailbox` (
`id` int(11) NOT NULL auto_increment,
`username` varchar(255) NOT NULL,
`password` char(32) NOT NULL,
`maildir` varchar(255) NOT NULL,
`domain` varchar(255) NOT NULL,
`active` tinyint(1) NOT NULL default '1',
UNIQUE KEY `username` (`username`),
KEY `domain` (`domain`),
The default Postfix configuration file has got a ton (I really mean A LOT) of comments that we don't really need. The final configuration file I ended up with is only about 15 rows. I removed pretty much everything and wrote one not far from scratch, you might want to do the same. If you prefer going through those hundreds of lines of comments, be my guest, though. :) The file in question is /etc/postfix/ Use your favorite editor (such as vim or emacs) and open it up. I'll just post my and leave any additional configuring you might want (such as different delays or a different greeting etc.) to you. The lines you have to change are the ones with myhostname = server.domain.tld and mydomain = domain.tld. Additionally you might need to add an SMTP relay since almost all ISPs block port 25 (and instead provide an SMTP server for you to use). I've commented out the line, remove the hash sign and add the URL to your ISPs SMTP relay server if you need it. Note that I removed a lot of lines where the setting was the default, see /etc/postfix/
myhostname = server.domain.tld
mydomain = domain.tld
myorigin = $mydomain
mydestination =
alias_maps = hash:/etc/postfix/aliases
alias_database = hash:/etc/postfix/aliases
#relayhost =
smtpd_recipient_restrictions = permit_mynetworks, reject_non_fqdn_hostname, reject_non_fqdn_sender, reject_non_fqdn_recipient, reject_unauth_destination, reject_unauth_pipelining, reject_invalid_hostname
disable_vrfy_command = yes
virtual_mailbox_base = /var/mail/virtual
virtual_minimum_uid = 2000
virtual_uid_maps = static:2000
virtual_gid_maps = static:2000
virtual_alias_maps = mysql:/etc/postfix/
virtual_mailbox_maps = mysql:/etc/postfix/
virtual_mailbox_domains = mysql:/etc/postfix/
Next we need to create the MySQL configuration files for Postfix (the three listed above). They tell Postfix where to look for aliases and users and their directories.
user = mail
password = mail
dbname = mail
hosts = localhost
table = domain
select_field = domain
where_field = domain
additional_conditions = and active = 1
user = mail
password = mail
dbname = mail
hosts = localhost
table = alias
select_field = target
where_field = alias
user = mail
password = mail
dbname = mail
hosts = localhost
table = mailbox
select_field = maildir
where_field = username
additional_conditions = and active = 1
That's all there is to configuring Postfix. :)
Try restarting to see if the new configuration works.
The Dovecot configuration file is similar to Postfix in the way that it's got a whole ton of comments. I got rid of them and I'll simply post my configuration file here. It should work without any changes. Note that mine was located in /usr/local/etc/dovecot.conf, not /etc/dovecot.
base_dir = /var/run/dovecot/
listen = *
syslog_facility = mail
ssl = no
disable_plaintext_auth = no
login_dir = /var/run/dovecot/login
login_chroot = yes
login_user = dovecot
first_valid_uid = 2000
first_valid_gid = 2000
protocols = imaps imap pop3s pop3
mail_location = maildir:/var/mail/virtual/%d/%n
auth default {
mechanisms = plain
user = root
userdb sql {
args = /etc/dovecot/dovecot-mysql.conf
passdb sql {
args = /etc/dovecot/dovecot-mysql.conf
socket listen {
master {
path = /var/run/dovecot/auth-master
mode = 0600
user = dovecot
client {
path = /var/run/dovecot/auth-client
mode = 0660
user = dovecot
group = dovecot
The longest configuration file so far. I haven't optimised it by utilising default values or anything, could be why. Next we have to set Dovecot up to know how to use the MySQL tables. It's done with /etc/dovecot/dovecot-mysql.conf as such:
driver = mysql
connect = host=localhost dbname=mail user=mail password=mail
default_pass_scheme = PLAIN-MD5
password_query = SELECT password FROM mailbox WHERE username = '%u'
user_query = SELECT maildir, 2000 AS uid, 2000 AS gid FROM mailbox WHERE username = '%u' AND active = 1
That should be all the configuring you have to do, so try restarting Dovecot with pkill -HUP dovecot and see if it'll start and connect to MySQL properly!

Setting up users

If everything is working all that's left is to add some accounts to your database and test it!

To insert a domain into the database connect with mysql -u mail -p mail and enter:
INSERT INTO domain (domain) VALUES ('domain.tld');

To add a mailbox (an actual inbox on the computer) issue the following query in MySQL:
INSERT INTO mailbox (username, password, maildir, domain) VALUES ('user@domain.tld', md5('password'), 'domain.tld/user/', 'domain.tld');

Additionally I always add an alias for users, even if it points to the same address. In the alias table you can create aliases that points to an external mail, for example user2@domain.tld can be set to point to You can set up an alias to send to several mail addresses by adding several aliases that are the same but points to different targets. To create a catch-all (i.e. where any mails sent to an address of your domain that doesn't have an alias already ends up) enter the alias as @domain.tld. Here are some examples:
INSERT INTO alias (alias, target, domain) VALUES ('user@domain.tld', 'user@domain.tld', 'domain.tld');
INSERT INTO alias (alias, target, domain) VALUES ('user2@domain.tld', '', 'domain.tld');
INSERT INTO alias (alias, target, domain) VALUES ('user2@domain.tld', 'user2@domain.tld', 'domain.tld');
INSERT INTO alias (alias, target, domain) VALUES ('@domain.tld', 'catch-all@domain.tld', 'domain.tld');

To make the administering of this easier I put together my own administration interface in PHP in which you can easily add, edit and delete domains, aliases and mailboxes using the database I've posted here. It doesn't have any error-checking or security, but it does it's job trusting the user not to screw things up. It's probably a pretty good idea to password-protect it though, if you intend to have it on a server public to the Internet. .htaccess and .htpasswd will do the job just fine, I guess, so Google that.
Here is a download link: nmailadm-0.1.tar.gz


Now that you've got your mail server set up, it's about time you try sending and reading some mails using Telnet. If it works, you can try using a mail client (such as using POP to fetch mails with Gmail or Outlook to both send and recieve mails).

The easiest way to test is to use telnet. Here's how to send a mail using SMTP:
telnet localhost 25
Connected to localhost.
Escape character is '^]'.
220 server.domain.tld ESMTP Postfix

mail from: user@domain.tld
250 2.1.0 Ok
rcpt to: user2@domain.tld
250 2.1.5 Ok
354 End data with .
This is a test mail. :)
250 2.0.0 Ok: queued as **********
221 2.0.0 Bye
Connection closed by foreign host.

Now, check your maillog and you should see a couple of lines following you through the process you just did. Here is an example:
Sep 13 17:20:27 server postfix/smtpd[11333]: connect from localhost[]
Sep 13 17:21:33 server postfix/smtpd[11333]: **********: client=localhost[]
Sep 13 17:22:31 server postfix/cleanup[12236]: **********: message-id=<20090913152133.**********@server.domain.tld>
Sep 13 17:22:31 server postfix/qmgr[30203]: **********: from=, size=355, nrcpt=2 (queue active)
Sep 13 17:22:31 server postfix/virtual[13036]: **********: to=, relay=virtual, delay=81, delays=81/0.01/0/0.02, dsn=2.0.0, status=sent (delivered to maildir)
Sep 13 17:23:22 server postfix/smtpd[11333]: disconnect from localhost[]

Now that we've sent a message, let's try connecting with POP to list the messages we have!
telnet localhost 110
Connected to localhost.
Escape character is '^]'.
+OK Dovecot ready.

user user@domain.tld
pass password
+OK Logged in.
+OK 1 messages.
1 450

+OK Logging out.
Connection closed by foreign host.

Yep, that's about it. If you followed this and everything worked you should have a fully functional mail server with virtual users running! Next is to connect using a mail client such as Outlook so you won't have to send mails using telnet.

If you have any questions or problems, feel free to send me an e-mail or post a comment and I will do my best to help. :)

Friday, September 11, 2009

Google Chrome Extensions Usage Guide

There are probably a lot of people who can't live without their extensions for Firefox, and missing out on the by far (in my opinion) best browser currently available. Google Chrome. It was a problem for me too, until I found out by accident that the developer channel of Google Chrome actually supported extensions, though they were incomplete and even required a special flag to be set when running the Chrome executable. Even then, I felt I had to check this exciting feature out, and boy was I surprised when I found out how easy they are to make and use and how well they work! I quickly made a few quick extensions using most of the techniques available. The most popular is the PageRank Display extension, the first one I wrote, which has users all over the world and linkbacks from all the way over in Japan and Korea.

Even though there are a few online communities about Chrome extensions a lot of people are probably missing out on extensions, first of all because they're not enabled at all in the stable channel and second because you previously needed a special flag in the shortcut even on the dev channel. If you haven't already I strongly urge any Chrome users to switch to the dev channel, it is much further along the development and though it says it can be very unstable at times is has been extremely rare for me. It can be done here: switch between stable/beta/dev channel. The flag in the shortcut isn't a problem once you have the dev channel anymore, last week when they finally made it the default to have extensions enabled in the dev channel (along with some UI changes I've been looking forward to).

Now, how do you use extensions in Chrome? It's really very simple. All you have to do is find a link to an extension and click on it (or if you have an extension file (.crx), you can just drag it to your Chrome shortcut). After accepting the download Chrome will pop up a message asking you if you want to install the extension, which of course you do (unless you distrust the source of course). After that the extension is installed instantly (no need to restart Chrome like you do with Firefox) and you should see any UI the extension adds. Regarding UI, there are four types of them:
The toolstrip is a bar at the bottom of your browser window. It automatically allocates enough space to fit the extension since Chrome values the screen real estate of their users, something I greatly appreciate. In the toolstrip there can be basically anything, buttons, links, information, checkboxes, etc. The toolstrip can be hidden with the shortcut ctrl+alt+b. It's the most common form of extension UI.
Moles are pretty much an extension of the toolstrip. When you hover over an extensions allocated space in the toolstrip the name comes up. When you click the name you can open a mole (if the extension has created one) which is kind of like a popup page that shown up above the toolstrip. You can, for example, use the toolstrip to display some information and a form with checkboxes etc. in the mole for setting up what and how to display things.
Content scripts:
Content scripts are pretty much like user scripts (i.e. Greasemonkey, javascript files run on selected pages, Chrome has regular user script compatibility too, where you download a script and put it in a folder) in the form of an extension. Besides the possibility to combine other extension features with user scripts (called content scripts) it also gives auto-update and easier installation etc.
Page actions:
Page actions are used when you want the option to do things with a certain page. What it does is give you an icon in the omnibox (address bar) for the extension on the pages where it can be run. When you click on the icon the extension does something with the page. For example the extension can detect when a page has an RSS feed available and enable the page action for the page. An RSS icon will show up in the address bar which will e.g. add the feed to your Google Reader account when clicked.

An extension can combine any number of them in any combination and can communicate between them (as long as they belong to the same extension).

Once you've got the extensions there might be some you want to remove again, or you might simply want to see what extensions you have installed, versions, etc. To do all of this all you have to do is browse to chrome://extensions. Once there you can remove, add extensions from file, add unpacked extensions (they're packed in .crx files, before that they are folders with html and javascript files in them) and force check for updates. You will see a list of all installed extensions as well as their version and a description.

To test how extensions work I suggest you to visit the official sample extensions. As of now there are three sample extensions there:

Gmail Checker

Displays the number of unread mail in the toolstrip.
Gmail Checker screenshot

Subscribe in Feed Reader

Uses page actions to subscribe to a feed.
Subscribe in Feed Reader screenshot

BuildBot Monitor

Shows the status of the Chromium Build Bot (probably not interesting to a regular user).
BuildBot Monitor screenshot

These are two simple extensions that I wrote for some basic SEO work:

PageRank Display

Shows the PageRank of the open site in the toolstrip.
PageRank Display screenshot

Alexa Rank Display

Shows the Alexa Rank of the open site in the toolstrip.
Alexa Rank Display screenshot

To find more Google Chrome extensions I suggest you visit Chrome Plugins. They've got discussions and a lot of extensions posted by users, for example for mouse gestures, translation, smooth scrolling, session saving, weather, alarm clock, video downloader, etc.

I hope this'll get anyone still using other browsers because they've got extensions they need to switch to Chrome - it's got a fair number of great extensions already, and an excellent and easy-to-use extensions API for creating new extensions in case there doesn't yet exist one with the functionality you require. :)

Sunday, July 26, 2009

Googling the Right Way (a repost)


This is a republication of an article I wrote in early 2008, touching lightly upon the topic of Google queries and the mighty search-giants history. I first published the text on my then pet project site, pworks (which by the way now redirects to tenzui), and from there it got mirrored on a couple of technology-focused sites and forums

Since writing this, both I and the world of search has progressed greatly, and I hope to be able to write a follow-up, more in-depth post on the topic of searching in the future. So if you're interested in that - stay connected. ;)

Short History

Right, this will be a really, really short history lesson. If you're interested, check out what the people over there has written for yourself. (Link at page bottom)

So, Google was created by the duo Larry Page and Sergey Brin, two Stanford grad students who, although they didn't see eye-to-eye on many topics, were determined to crack the quite boring nut of organizing all that information that was spread out on the web. By 1997, their BackRub search engine had started gaining a sparkling reputation for its unique way of analyzing and ranking webpages through "back links", links pointing to a page from other pages. The system also gained attention for its interesting server environment, contrary to the "normal" high-end servers, BackRub ran on a collection of simpler PCs, collected from the campus' nooks and crannies.

From there, the story is one of unfathomable success ("Instead of discussing all the details, why don't I just write you a check?"), leading to the status of The One Search-engine we all know, love and envy.


"Back links?" you think. Yeah, Google's system of deciding what pages are worth your reading-cycles differed from all other search engines' way at the time. The PageRank algorithm ranks all sites by giving them a rank between 0 and 10, based on how many other pages are linking to the site, and what value the linking pages has.
If you are interested in the mathematics between the PageRank algorithm, I suggest you read about it on Wikipedia. The logics behind PR is not in the scope of this article.

From this information, you can probably figure out the basics of SEO, Search Engine Optimization. Get your page linked to by the big boys. Of course, some people just can't be content with playing by the rules, and the PR-algorithm isn't perfect, so from time to time someone manages to fool the algorithm, an example being the 302 Google Jack, redirecting the new, zero-ranked page to a rank ten page, like Google itself. When Google updates the PageRanks, the new page will get the same rank as the page it linked to. Other people buys and sells high-valued links, really a kind of advertising, but with a big debate buzzing in the background. Google has requested that such links use the HTML attribute "nofollow", rendering the page linked to to be ignored when re-ranking.

The above mentioned kinds of tricks, as well as many others, can of course lead your page to get devalued, meaning that it will not be ranked at all. Play safe!


Every Joe Schmoe knows that search-engines like Google doesn't take kindly too long sentences and free-text, but he probably never bothered reading up on how the magical searchbox actually works, something he should be severely punished for. Let's leave Joe to his fate, and rise far above him, to the lands without stupid questions.
Even in the "basic" syntax collection I'm sure you are able to find a few sparkling gems you didn't know about, so skim through it even if you feel confident in your Google-Fu.

So, top down, a standalone word yields pages containing that word, a sentence enclosed with quotation-marks (" ") similarly yields pages that contain that exact phrase. If you have ever created an SQL-query for some database, I'm sure you will find a lot of similarities as we go on now. Google is actually "just a database", remember?

Command Example Result

AND [&] (ampersand) Slackware AND Linux Shows pages containing both arguments, *OBS* this is the default operator, no need to include
OR [|] (pipe) laptop OR Desktop Shows pages containing either argument
- (minus) Hamburger -McDonalds Shows pages containing the word "Hamburger", but only if they don't mention "McDonalds"
+ (plus) +coke Contrary to the "includes" belief, this limits the results to the given form only, no pluralis or other tenses
~ (tilde) ~Hacker Results include everything deemed similar to "Hacker"
* (asterisk) Fish * Chips The wildcard (*) is replaced by one or more words/characters (and, n, 'n, &)
define: define:Nocturnal A personal favorite, looks up the meaning of the word
site: Phreaking Limits the search to a specific site
#...# zeroday 2007...2008 Search results include a value within the given range
info: Shows information about the site
related: Shows pages similar/related to argument
link: Shows sites linking to the argument
filetype: phrack filetype:pdf Results are limited to given filetype
([?]) Cyber (China & America) Nestling combines several terms in the same query
[?A] in [?B] 1 dollar in yen Converts argument A to argument B
daterange: daterange:2452122-2452234 Results are within the specified daterange. Dates are calculated by the Julian calendar
movie: movie:Hackers Movie reviews, can also find movie theaters running the movie in U.S cities
music: music:"Weird Al" Hits relate to music
stock: stock: goog Returns stock information (NYSE, NASDAQ, AMEX)
time: time: Stockholm Shows the current time in requested city
safesearch: safesearch: teen Excludes pornography
allinanchor: allinanchor: Best webcomic ever" Results are called argument by others
inanchor: foo bar inanchor:jargon As above, but not for all. The corresponding below all bear the same meaning
allintext: allintext:8-bit music Argument exists in text
allintitle: allintitle: Portfolio Argument exists in title
allinurl: allinurl:albino sheep Argument exists in URL

GET-variable breakdown
as_q=test (query string)
&hl=en (language)
&num=10 (number of results [ 10,20,30,50,100 ])
&as_epq= (complete phrase)
&as_oq= (at least one)
&as_eq= (excluding)
&lr= (language results. [ lang_countrycode ])
&as_ft=i (filetype include or exclude. [i,e])
&as_filetype= (filetype extension)
&as_qdr=all (date [ all,M3,m6,y ])
&as_nlo= (number range, low)
&as_nhi= (number range, high)
&as_occt=any (terms occur [ any,title,body,url,links ])
&as_dt=i (restrict by domain [ i,e ])
&as_sitesearch= (restrict by [ site ])
&as_rights= (usage rights [ cc_publicdomain, cc_attribute, cc_sharealike, cc_noncommercial, cc_nonderived ]
&safe=images (safesearch [ safe=on,images=off ])
&as_rq= (similar pages)
&as_lq= (pages that link)
&as_qdr= (get only recently updated pages d[ i ] | w[ i ] | y[ i ])
&gl=us (country)


So, Google gives us all those handy tools for filtering away what we don't want to see, how can we use this to help securing our own systems?

Well, for example, we could use the neat Google Hacking Database, a project where people has submitted a huge collection of queries yielding results that the unskilled webmaster (the Googledork) wishes weren't there. Everything from vulnerable login-forms to passwords surfaces with some cleverly engineered queries.


Goolag is a vulnerability scanner (and a politically involved protest..) made by the famous Cult of the Dead Cow. It builds on the above mentioned GHDB, scanning for vulnerabilities in the database. At the moment there is only a Windows-version of the program. The Goolag project is also a campaign against Google's (and a few other big players') choise to comply with the Chinese censorship policy.

Useful Queries

-inurl:htm -inurl:html intitle:"index of" "Last modified" mp3 mp3-file indexes, add desired artist -filetype:zip OR rar daterange:2453402-2453412 zip files on rapidshare uploaded on specified date Query results updated within one day

Others to life, the universe, and everything


Monday, July 20, 2009

Spam vs. CAPTCHA, the lesser of two evils

For quite a while now, one of the greatest annoyances I've encountered on the net is something we've come
to accept as something comparable to "the lesser of two evils," a spambot-roadblock known as "CAPTCHA." (This acronym actually
has a meaning, which is "Completely Automated Public Turning test to tell Computers and Humans Apart.")

Now, you might ask me, "So what, you fool? Would you prefer getting every one of your forms exploited by spambots?"

Of course not, there is nothing I despise more than getting countless well-meaning offers of masculine-organ-gargantuafication. (And that is not entirely because rendering those areas any larger would be more of a nuisance than anything else.. (Bad puns end here. (Nesting ftw!)))

As much as I want to avoid those mails, I can't help feeling a great irritation every time an incomprehensible image pops up,
declaring me a fifty-line script for not realizing that S was actually a 5. More than once, this frustration has lost a forum or blog
a comment from yours truly, and probably many more from others.

When talking about these matters in a corporative fashion, you use the term "conversion ratio." Simply put, it's the percentage of visitors
that actually follows through with the action that you as author wish for them to take, werther that is filling out a form, signing up as a member, or perhaps purchasing a certain product or service. And, as you've probably figured out by now, the use of CAPTCHAs might hold a negative impact on this ratio.

At least, that was what a recent post on the was all about. The author of this post put together some very clear and impressive statistics, showing that the use of CAPTCHAs yielded an 88% reduction in spam, but at the same time the figure of failed "conversions" rose drastically. And the figure of spam was not that great to begin with.
[You can read the full, very interesting post here:]

So, when putting the conversion ratio in first perspective, not implementing a CAPTCHA seems to yield more favorable results. But really, we do not want that spam!

The same post as mentioned above provided a link to a soon three-year old alternative solution to the problem - called the "Honeypot CAPTCHA."
The general idea of this solution is that, when a spam-bot traverses your page, it looks for and attacks any tasty-looking form, but rarely ever pays any attention to user-oriented code, that is the stylesheet. So, what if we would put in a field in our form that code-wize appears as a completely normal input field, but is invisible to the real user? Get it? If that field, which a real user wouldn't fill out actually *is* filled out, we can deduce that this was the workings of something less intelligent, a couple of dirty lines of code. In the final part of this post, I wrote a simple example piece of code.
[The blog in which this solution, as well as two other interesting ones were originally posted can be found here:]

Opinions voiced against this method primarily concern the very important matter of accessibility - accessing a form with a field like this with a screen-reader or text-based browser would confuse and/or render the valid user unable to use the form. However, supplying proper commentary about the field should solve this matter. And also, how *does* a screen-reader/text-browser go about regular CAPTCHAs, anyways?

But facing the cold, hard facts, we can't fool ourselves into believing that spambots will stay silly forever. In fact, there should already be quite a few sophisticated ones out there. The battle against spam has been raging since the olden days, and just to provide an example I'd like to toss in a link to this very informative post by an anti-spam software developer, written in early '06. [Go ahead and read:] He discontinued working on his project, SpamKarma2, in mid-'08, and put the code up on Google Code under a standard GPLv2 license, where it's still being developed today.

Back to the point - he points out in the post I liked to above that he had already then observed an increase in spambot efficiency, making the access look more human-like, following links in a "common" manner, and even bypasses javascript-filters. A programmer who can implement a javascript parser in his spambot would hardly be challenged to create one for stylesheets as well, the reason there hasn't been any indications of one yet is simply that there hasn't been any need for it. Thus, the honeypot-solution, if widely spread, would probably be surmounted with relative ease.

If I haven't frustrated you enough yet, breaking all the good parts of the "solution" before you've even had a chance to code it into your site, here's one more. "OCR." Utilizing this technique, invented to turn scanned images into normal text, the quite famous XRumer bot was able to break Hotmail and gMail CAPTCHAs in late '08. So the race is, by all measures, a tight one. Obfuscated CAPTCHAs however still seem to hold pretty high ground, and thus it is indeed the optimal way to avoid spam. But, (back to square one), user-unfriendly and perhaps holding a negative commercial impact.

So to sum things up:
  1. Using the honeypot CAPTCHA and common sense, a "low-value" target would probably be able to avoid practically all spam without implementing intrusive techniques such as regular, hard-to-OCR CAPTCHAs.
  2. For as good security as possible, a hard-to-OCR CAPTCHA is the way to go, unfortunate but true. One nice system I'd like to push for is the reCAPTCHA service, which makes the pestering work into a good deed by using your human processing cycles to digitalize old books and publications.[For more information on this, visit]
  3. The battle rages on. If you've got any information regarding this topic I'd more than love to hear from you. Especially if you hold some information about the workings of more sophisticated spambots. Ignorance might be bliss, but living in the grey-zone in between is pure hell.

Thanks for sticking through, hope you found this somewhat useful.

Honeypot CAPTCHA simple example:

#letshidethis { display: none; }

<input ...>
<input ...>
<textarea ...>
<div id="letshidethis">
<input name="user_info" ... (or some other, tasty-looking faux name)
<input submit>

Then in your code, you would simply check if user_info contains any data. If it does, it might very well be spam.

Stuff of notice here is to not provide a completely unintelligible name on the fake input, since some (many?) spambots seemingly look for a collection of names to post into.