What?
SPAMICITY is an implementation of a naive Bayesian spam filter, that displays spam messages and marks words in the spam that are considered high spamicity (words that the computer uses to identify the email as spam). From these high spamicity words identified, SPAMICITY creates a visualization overlayed on the spam message. If the you want to help SPAMICITY with identifying spam, feel free to click on words to toggle their spamicity between low and hi.
Why?
We receive a lot of spam every day with the only sign of its existence the spam folder's perpetually increasing counter. I wanted to create something that made people interested in what they discard and ignore. In the process I've also managed to create a window into how Bayesian spam filtering works, and into what the computer is seeing when it considers spam.
How?
I created this project using and modifying code from Daniel Shiffman's lesson on Bayesian Filtering. Additionally this project is built with Processing and RiTa. Lastly, thank you for all of the help and ideas from my class!
Goals and Problems
- Finding and organizing large amounts of email and spam for training and display.
This was a fairly difficult endeavor, but eventually I managed to collect my email and a spam archive to make the training files and displayed spam.
- Enabling file i/o.
I ended up using Processing's loadString() function for file i/o. This led to some string manipulation down the road with the spam filter as the function returns lines of a file.
- Creating a naive Bayesian spam filter.
As I mention above, I used Daniel Shiffman's guide to Paul Graham's naive Bayesian spam filter. I ended up making some significant tweaks as well as adding in access points into the probabilities which I used in the visualization.
- Allow for user interaction.
I thought by letting people interact with what the machine-created spamicity levels, would lead to a fun experience of wanting to see what other words the machine would mark and whether words the user would select would ever reappear in a spam message to vindicate toggling them in the first place.
- Creating visual elements.
I implemented these visual elements as playful fractal cellular automaton. I made this extensible so that down the road I can create more varied cellular automaton.
- Make an appealing website.
What do you think?