This article is part of a series on exploring evented programming by building a distributed IRC bot.
IRC messages are not very complicated things. They're single-line, with a maximum of 512 character and end with a carriage return and new line (that's "\r\n", not simply "\n"). They consist of three sections: a prefix, a command and command parameters. While the IRC RFC does define some rather rigid restrictions on which characters may occur in each section, the modern IRC implementations seem to ignore these for the most part. So to really parse them, you just need to look at the separators.
IRC With Telnet
Most operating systems come with a Telnet client. While Telnet is not a "raw" socket, you can use the Telnet client as a raw socket connection, or you can use a program like netcat. What we'll be doing is connecting to an IRC server and seeing the IRC protocol first hand. We'll be issuing a few IRC commands manually to get our feet wet, and we'll be looking closely at the IRC messages the server sends back at us.
The two IRC commads you need to know to get connected are NICK and USER. The nick command chooes a nickname (which must be unique, no one else can currently be using that nickname on that server or network at the same time), and the user command sets a few things such as the username, host and real name for your time on the server. Since this is a quick test, simply pick something unique for your nick, and we'll leave the user command mostly blank.
When sending a command to the server, the format is as follows:
COMMAND arg1 arg2 argn :Final arg, but this one can have spaces\r\n
The command must be in all caps, so while I refer to the "nick command," what I really mean is the "NICK command." The arguments are all simple words or numbers, they're not quoted and no escape sequences work. They certainly cannot contain any spaces except the last argument. If any argument begins with a color (:) character, it is considered to be the last argument and extends to the very end of the line. So let's take a look at a successful handshake command set consisting of the nick and user commands.
NICK AboutRubyTest USER AboutRubyTest . . :About.com Ruby Test
The next command we'll look at is the join command. The join command let's you join a channel, sending you any messages sent to that channel by other clients. It takes a single parameter (in it's simplest form), and we'll join the #ruby-lang channel.
And finally, the quit command. IRC servers prefer that you use the quit message and have them disconnect the socket, rather than rudely disconnecting or simply leaving the socket open, having it time out eventually. The quit command takes a single parameter, and this is a message sent to everyone known as you "quit message," sometimes used to give a reason why you're quitting, but often it just advertises the IRC client being used.
QUIT :Testing finished
And now to try it out, but first you need to figure out how to save the data. I'm using the nc (netcat) command and piping to the tee program, which saves output to a file and then reproduces it on standard output. If you're using telnet and pipe to tee, you may need to edit out your commands (sometimes they can appear in the middle of a line doing this) or turn echo off. You can also simply copy and paste out of the terminal window, but make sure commands end up on single lines, they can't be split between two lines. What we want here is to play with the IRC protocol a bit and get some data to test in the next article. If you really can't figure it out, I've uploaded a sample IRC log here.
When you first connect, you'll see a few messages pop up. The server is looking up your hostname and doing a few other things, then it'll stop. This is the time to enter the nick and user commands. If you do it correctly, you'll see a huge number of messages scroll by. This is referred to as the "MOTD" or "Message of the Day," it's a set of messages from the server that welcomes you, tells you the server rules and gives you the latest news. It begins with the 001 command (not all commands have names, some just have numbers). After that, you can go ahead and enter your join command, wait for a few messages to come in and then the quit command.
$ nc chat.freenode.net 6667 | tee output :moorcock.freenode.net NOTICE * :*** Looking up your hostname... :moorcock.freenode.net NOTICE * :*** Checking Ident :moorcock.freenode.net NOTICE * :*** Found your hostname NICK RubyTest USer RubyTest . .:moorcock.freenode.net NOTICE * :*** No Ident response . :moorcock.freenode.net 001 RubyTest :Welcome to the freenode Internet Relay Chat Network RubyTest :moorcock.freenode.net 002 RubyTest :Your host is moorcock.freenode.net[184.108.40.206/6667], running version ircd-seven-1.1.3 … JOIN #ruby-lang :RubyTest!~RubyTest@ip.maine.res.rr.com JOIN #ruby-lang … QUIT :Test over :RubyTest!~RubyTest@ip.maine.res.rr.com QUIT :Client Quit ERROR :Closing Link: ip.maine.res.rr.com (Client Quit)
When you're sending commands, you generally don't use a prefix. However, when you receive messages from the server they usually start with a prefix. A prefix usually just describes who sent the command. Did it come directly from the server, or from another user? The prefix comes before the command (hence the name), starts with a colon (:) and ends with the space separating it from the command. It's usually either the name of the server sending you the message, or the user who sent a private message, channel message, quit, part, etc. For example, here are a few lines from the MOTD of FreeNode.
:moorcock.freenode.net 001 RubyTest :Welcome to the freenode Internet Relay Chat Network RubyTest :moorcock.freenode.net 002 RubyTest :Your host is moorcock.freenode.net[220.127.116.11/6667], running version ircd-seven-1.1.3 :moorcock.freenode.net 003 RubyTest :This server was created Mon Dec 31 2012 at 15:37:06 CST
The prefixes on these message say they're from the "moorcock.freenode.net" server, the command are 001, 002 and 003 respectively (these commands are reserved for MOTD lines), and the two parameters to each of these commands are the recipient (RubyTest, in this example), and some text (starting with a colon, signifying it's the last parameter and that it may have spaces in it). At the end of the line (not shown) is a carriage return and newline sequence ("\r\n" in Ruby-speak).
That's more or less all of the IRC you need to know to get started. Knowing how the IRC messages are important for the next article though, where we'll finally be parsing them using named capture groups. If you ever want to know more, you can refer to RFC 1459.
If you'd like to continue reading, the next article in this series is Parsing IRC Messages.