SaffronSierra & Gmail Classification
I’ve recently added some sample code to our “examples” repository that demonstrates how to use SaffronMemoryBase running on SaffronSierra to do basic email classification. The example leverages the convenience of labels within Gmail to provide the “labels” for classifying future emails.
If you have a Gmail account (or a Google Apps account) then you already know that as emails come in you can associate them with labels. You might have labels such as “accounts”, “soccer”, “music”, “work”, etc… (those are some of mine anyway). As I started thinking about building an email classification example it occurred to me that the labels within Gmail would provide a nice, easy way of doing classification. By “labeling” emails in Gmail I’ve already made the statement, “This email is about work”, or “This email is about soccer”. Why not leverage that hard work?
In it’s current form the classification example will connect to Gmail (using the JavaMail API), grab messages from each of your labels (the number of messages is configurable), initialize a SaffronMemoryBase space on your SaffronSierra service, build the necessary observation resources, and finally push them into SaffronSierra. Then you can use the sample code to see what SaffronMemoryBase thinks the labels for the messages currently in your inbox should be based on what has been observed. Sounds pretty simple right?
The code is pretty basic, but will hopefully be a useful “starting point” for developers that are interested in solving these types of problems using SaffronSierra. There are some natural extension points in the code that would make the classifications more accurate. Currently things such as “to”, “from”, and “reply-to” addresses along with “X-” headers are being used to build the resources that get observed. No science has gone into the selection of these attributes. Any developer that “knows” email will probably be able to tweak the code to create resources that are far more “representative” or “telling” when classifying messages.
This sample code is provided under an MIT style license. So, have at it. Change it. Do whatever you want with it. When you checkout the sample code from Subversion you’ll be looking for the “GmailClassification” class in the “src/java” directory. The command-line options supported by the code are defined in the comment header of the file.
To run this example make sure you’re running the latest SaffronMemoryBase version in SaffronSierra. If you run into problems or have questions just head over to our support site and shoot us a message.
Tags: classifications, development, email, gmail, Java, SaffronSierra, smb
This entry was posted on Friday, April 30th, 2010 at 10:52 am and is filed under SaffronMemoryBase, SaffronSierra. You can follow any responses to this entry through the RSS 2.0 feed. You can leave a response, or trackback from your own site.
Leave a Reply

