After reading Kiri‘s analysis of her email, I was curious about my email. But first…
Dr. T sent me mail earlier today because I have hit the Big Time. Yes, my obscure, PG-13 blog is deliberately blocked by a Fortune 500 company:
Access Denied (policy_denied)
Your system policy has denied access to the requested URL.
The above Website is blocked. For assistance please Call XXX 853 5555-Option 1 and then 5 – Network Support Team
The best working theory I have is I’ve insulted the product manager by making unflattering observations about marketing partnerships, 5-gallon buckets of paint, or dryers.
But which?
Since I’m already banned, I might as well get on with the topic today. For the last week, I’ve been analyzing my personal mailbox to determine:
Will you pick me up a flux capacitor on your way to work?
We might go back and forth a few times:
How large? How many teslas?
Standard size. At least ten. Twenty if they’re not too expensive.
Is it my turn to buy milk this week?
No
It counts as only one “conversation.”

I’ve had this disclaimer around for at least a year to set expectations, but it was still surprising to see how low my response rate actually is. I try to be prompt with friends, but if it requires a thoughtful response, it could take a while for me to carve out a block of time. (Again: don’t be offended if I’m slow in responding.)
For the spam category, I looked at several things:

The gibberish text is combed from a variety of sources, sometimes even passages from legitimate news articles. At times, it has a strange, nearly-artistic beauty to it like Car Henge. The best will make its way into Spam Poetry competitions.
Approximately ten percent of the spams were written in an UTF-8, Asian-looking character set. The only non-scripty thing was the web site link.

When should you stop taking [These Stupid] Pills?
Whenever you feel comfortable with the way you look just stop taking our product.
In distant second place were “lonely women” with webcams seeking “enhancement” in the recurring subscription revenue sense of the word. The remaining spams were a smattering of “discount” software, refinancing/phishing schemes, and the old favorite, the 419 scheme.
I’d think it’d take a long time to rummage through both my earthlink and yahoo accounts to determine spam counts. How long did this analysis take you?
I think I probably get more “Asian” spam than you, because I have posted to Asian-related fora in my past. I have set up several filters to screen for Asian languages. For example, if a message has a capital C with a hook under it (I forgot what this is called), it’s probably in Korean and it’s really not personally for me.
I love the title of this post… clever.
Your graphs are prettier than mine. Either your Excel foo exceeds mine (very likely) or you’re using a nicer tool (also likely).
It’s interesting to compare our results! I guess with gmail, a “file” category isn’t relevant?
I’m impressed that you actually analyzed the spam. I didn’t have the stomach (or time) for it.
Claire: It took about an hour. I pasted the email headers and content previews into a delimited file so I could slice and dice them. I should investigate filtering on those character sets.
Kiri: Although gmail tries to encourage retention of email, I regularly delete and purge because it will otherwise clutter up search.
As for the spamalysis, it was out of morbid curiosity. Yes, I felt I needed to wash my hands afterwards.
Would you mind doing this analysis at my company?
I’m so happy I don’t have to actually go through corporate spam, one-by-one. We let the users take care of whatever they get, after Abaca has had its way with it.
I find since I’ve switched from Yahoo to gmail that my sp@m has 1) gone to nearly nil, and 2) is up on the one gmail account that I use for business related things where – shockingly – I’m actually subscribed to several newsletters. I still the best sp@m e-mail I ever got had the subject line “You will be happy” I never opened it because I wanted the subject to be true