Category Archives: business

The Mintern

In retrospect, there were signs that The Intern might not work out:

  • The child of a friend of the family of a Senior Executive
  • … touted as an upperclassman from a well-known engineering school, but was just starting a search for an internship in mid-June
  • … hasn’t been vetted or spoken with the hiring manager, but was slated to start Tuesday.  The IT folks love the smell of an unnecessary fire drill in the morning!
  • Intern shows up and wants to do Data Science!
  • …using only the skills he currently has (texting, Google queries, watching cat videos)
  • …because he’s actually a freshman
  • …and is only doing this because His Mom pushed him to get a job through a friend of the family.

Despite being given a clear set of tasks and the offer of tutoring from the team, this was not the experience The Intern had in mind.  He left after day three.  A weekend conversation between His Mom and the Senior Executive filled in some of the details.  The Intern would not be bothered with the courtesy of sending the hiring manager an explanation, even in email.

I have never seen our IT staff so giddy to delete an account.


Image credit: Melrose Municipal Schools
Image credit: Melrose Municipal Schools



I purchased a Sandisk 16Gb USB stick drive (~$10 on to transfer large files between machines rather than having to burn multiple DVDs.  (Also because I ditched my optical drive to install a second hard disk in my ancient MacBook.)  While shuttling files, the USB stick went into some kind of lock-out mode.

The consensus from the Interwebs was the device committed seppuku and was a lost cause.  Rather than toss it into the abyss, I thought I’d see how long the warranty was on these devices.  To my delight, I was only halfway through it.  I should at least try filing a warranty claim.

Finding the customer support contact mechanism is always difficult.  With each two steps forward came one interstitial step back to read some frequently answered question.  Invariably, these all devolve into “is your computer up to date?,” “is the device plugged into the computer?” and “Are you a moron?”   Nonplussed, I eventually found the sUpEr SiKrEt form to request a RMA (return merchandise authorization).  The form asked for a lot of information, including a copy of the receipt and the device serial number, etched in a subatomic font.

A day later, a customer support person contacts me asking for “supporting documentation,” taking photos of both sides of the USB stick, because why the hell not.  Oh, and would I please provide a copy of my receipt?   Because I could respond via e-mail, I supplied largest resolution, highest “quality” photo I could take of each side.  I also discovered something really useful: the phone’s camera makes an excellent magnifying glass.  Also: Two can play at this game.

The next morning, I had three emails.  The first acknowledged receipt of my supporting documentation.  The second indicated a verdict was reached by the warranty cabal.  The third included RMA instructions with a UPS label that I’d print out and slap on my extremely well-packaged sarcophagus sent to their return center.  Once received, there would be another evaluation period, after which the disposition would be determined and revealed to me in a deeply symbolic dream.  Okay, maybe I imagined the dream part.

After a few days, I received an email update as my deceased USB drive completed each stage of its journey to the warranty after world.  When it eventually reached the Asphodel Meadows the final notification indicated a new stick drive would be on its way.

Tonight, this arrived:

Huge Envelope
Huge Envelope

The replacement USB stick was enclosed within another envelope (not shown) in the larger, yellow, padded envelope.  For scale, I have placed a quarter next to the stick drive.

I estimate the cost in excess of the original product:  $14 for UPS both ways, $1 packaging, $2 for the device, $5 for people to interact (the dude sending me email, packaging, typing stuff).

What Is Your OTT Strategy?

As I was walking through the exhibits of O’Reilly’s “Making Data Work” Conference a few weeks ago, a vendor stepped in my path:

Vendor: “What is your organization’s Hadoop strategy?”

Having done a metric crap-ton of events as a vendor, I was sympathetic to what he was trying to do.  However, his premise of a tool being the One True Technology (OTT) for which my business would be expected to have embraced as a strategy struck me as absurd.  He might as well asked about what we’re doing about Microsoft PowerPoint or vim.

Jim: “We’ve been using it the last five years as a floor wax. However, I recently discovered it also makes an excellent dessert topping.”

Behold: The One True Yellowphant
Now in tasty floor wax flavor!

Flummoxed by a non sequitur his Solution Selling course could not prepare him for, he disengaged, letting me go about my business.  There seemed to be a lot of businesses offering Hadoop Business Strategy Optimization.

It was great being out of the office again, talking to people working on similar types of data problems but in completely different environments.  This is one of the things I miss most about leaving Tecplot.

I’m still compiling my notes for an internal presentation, but several of the keynote presentations have been posted.  My top three:

  • David McRaney, Survivorship Bias and the Psychology of Luck, which is a redux of his podcast, but still a great talk.  The premise is that when failures become invisible, you tend to focus more on successes, not realizing that you’re missing some vital pieces of information.

    Illustration by Brad Clark
    Illustration by Brad Clark
  • David Epstein, Small Data in Sports: Little Differences That Mean Big Outcomes – this was timely given the Olympics starting.  The performance between the winner and runner up is typically less than 0.5%.  While there are efforts to gather lots and lots of data, there are successful applications using small data, reducing a sport to a small handful of things they could affect.  (His longer talk went into this in more depth.)   The punchline: the 10,000 hour rule is missing the +/- 10,000 hours.
  • Rodney Mullen, The Art of Good Practice.  What I liked most about this was the meta-message that “Everyone from the community comes with their own backgrounds, own attributes, something you don’t have.”  Though it would have been easy to gloss over the presentation because of the skateboarding vernacular (“bracketing the feeble grind”), he offered some interesting ideas about focusing the type of practice.

The tutorials were generally good. I’d planned to sit in on the MLBASE track, but at the last minute, switched to John Foreman’s, Dissecting Data Science Algorithms Using Spreadsheets, based on his book, Data Smart, where he provides an overview on a handful of important algorithms using Excel.

Excel.  For Machine Learning? (image credit: John Foreman's blog)
Yes, Excel  (image credit: John Foreman’s blog)

While you’d be unlikely to use Excel for any non-trivial problem, it lets you learn the underlying algorithm (so you can apply it in the appropriate business context) rather than learning programming.  For the non-trivial case, you’d likely use R, Weka, or the Berkeley Data Analytics Stack (BDAS).

Intellectual Property

Intellectual Property?
Oh, really?

I’m not sure what was more puzzling:

  • They insist on a strong passwords, but are able to handle only two non-alphanumeric characters.  For example, this-password-restriction-is-poopy-doodoo-7F92PChQXHkR=mz{mzTfs6x6z”, the sequence emitted when my head landed on my keyboard, violates most of their rules.
  • This information is intellectual property.


Email patterns

Despite a concerted effort to keep my inbox tamed, it’s now back above 30 undealt-with emails.  While falling behind, I’ve noticed some recurring – and annoying – behavioral patterns.  I’m sure the list is incomplete, so feel free to share!

  • “The two-for” – a person who always — always— sends a second mail with the attachment they forgot to include the first time.   I can’t tell if the person is genuinely a flake or if they’re just pining for the return of corporate instant messaging.  One wants to post a sign on their monitor: always wait at least 30 minutes after eating receiving the first message before swimming responding.
  • “I must copy my manager on everything” – the sender wants to ensure their manager knows they’ve made a token effort to be “proactive.”  Note the air quotes.  (It’s also possible the sender’s manager “wants to be kept in the loop for minutiae” (cough: micromanages) or needs a high message count to justify her Crackberry.)  I used to whittle the Cc list down, but have since decided that it’s best to keep the list going.
  • “I must copy your  manager on everything, too.” – If I’m the sole entry on the “To:” list, the sender is implying I won’t respond to their request by overtly creating an audit trail.  Their manager is copied, too, as if to say, “See, I warned them to check their blood pressure / Beware of the Ides of March / Soylent Green is people — but they did not listen.”  What usually happens is my over-detailed, super-helpful response will usually elicit a walk over for the executive summary.
  • “The Escalating Cc:” – two people in an email discussion have differing opinions.  Instead of, like, actually walking down the hall and having a conversation, they start adding additional people to the discussion.  Sometimes this will devolve into the passive aggressive tone. Paraphrasing an exchange that might hypothetically have gone out to an entire department:

    “I don’t want to blame anybody, I want to fix the problem.”  [three sentences later] “[…] but Bob was the last one to touch it.  I will ask Bob when Bob comes in what Bob did to cause the system to become hopelessly broken.”

    By this point, I will walk over to ask Bob, whose work I respect, and ask I need to borrow the Covey Convincer (a 2×4 with “Synergize” written on one side) to put an end to this thread.   Or, just delete messages on the thread until my name is in the blame stream.

  • “The passive-aggressive.”– this is more of an attitude.Example #1: suppose a coworker was supposed to send you a TPS report draft yesterday, but didn’t.  A chronic procrastinator employing the hat trick of obstructionism, ambiguity and blame might respond: “I can’t give you a draft of the TPS report because you haven’t approved the table of contents.”  Of course, they didn’t tell you they were waiting on you, nor does it necessarily matter.  In reality, they were spending all their time leaving anonymous notesin public places.Example #2: the “two-for” sends a link in a form that you can’t use like forward slashes (*nix) instead of backward (windows) ones, the document is on a machine that you don’t have permissions to access, or the document was created in a proprietary software format like AUTO-CAD or SmartDraw.
  • More information than you require — when providing a response, assume the reader knows nothing about the subject and, in a non-sarcastic tone, explain in laborious detail.  This is one I’m guilty of, often a sign that I’m responding late at night with sparse distractions to compose a lengthy tome.  I know you won’t read it, you know you won’t read it, everyone knows you won’t read it.

WPEngine experiment

Executive summary: Dissatisfied with my blog’s response time, I looked into a variety of options.  For about a month, I tried a wpengine experiment whereby I moved everything to them.  It didn’t work out, but I learned a lot along the way.  I’ve been dissatisfied about the page load times of my blog for a long time. After some pent-up annoyance, I’d futz with things like swapping out the theme or not running fifty-seven plugins (especially the one that translates prose to the binary language of load lifters). Like a thump on the side of a recalcitrant TV, this would confer sufficient and unsubstantiated improvement that I could focus other things.

Then, in November, someone pinned one of my posts on Pinterest. Traffic shot up 5x for a couple of days before returning to normal.  The same post has been repinned every three weeks. While I am thrilled that something I wrote is being referenced by the erudite, mannered and comely pinhabitants, the pinstorms were pincreasing my page load times to a ghastly 15 seconds.

I needed to figure out why it was so slow.   Some DuckDuckGoing yielded a set of blogs by Steve Souders, formerly of Yahoo (now of Google) with best practices for improving performance.  Many of these were codified into YSlow.   Nearly everything they test for, I was doing wrong.

This was also the same conclusion of Webpagetest, which also lets you examine your site from different locations around the world.  It gave me a better appreciation for the benefits of content delivery networks.  I remain flummoxed about how to address the “cookie-free domain” of images recommendation it always dings me on.

Zoomph - saving me 2.5% bandwidth
ZOMG, 13.6363% savings by going aggressive!

Zoompf offers a free performance scan as a lead-in to their enterprise reporting service targeted at large sites who won’t flinch at the $100-200/month price tag.  (Me: flinch)  The sample report covers many of the same concepts as YSlow, but in an effort to impress, it just lists pages and pages of stuff sprinkled with up sell.   Frankly, I found this presentation unactionable and of dubious value.   For example, at the top of the performance list, under the “critical and high impact” category, was a page full of suggestions that I replace every PNG with a JPG.  An example reduction of 1294 to 777 bytes would yield “39.954% improvement.”

Um, riiiiiiight.  I have this rule of thumb that the more digits of precision someone (or some company) lists, the less likely it is they know what they’re talking about.  This image: Add media icon is fine as a PNG file.  And let’s be realistic, the 517 byte savings is not going to amount too much improvement.  It would be more effective to make the image part of a CSS sprite and save the extra connection.

An option I hadn’t considered before came via a HackerNews story where some A-list blogger was absolutely gushing about the principals of WPEngine. The premise is they are the WordPress equivalent of SuperFriends who have banded together to defeat Giganta offer managed WordPress installations.

When stated as a function of the opportunity cost of “things I can do if I don’t have to coax acceptable performance out of my blog,” the $29/month, entry-level plan  seemed worth looking into.  Though I host a vanity blog in an anonymous cookie-cutter neighborhood somewhere far east of town, past the phone pole with the red shoes dangling from the wires, I was concerned about exceeding the limits of their entry-level plan, especially if I kept being repinned.  The price difference between their entry-level ($29/month) and middle tier ($99/month) is steep.

They assured me they were most concerned about bandwidth and general resource allocation (issues partially mitigated through using CloudFlare). I viewed this as a good sign because instead of adopting the shared service provider promise of “infinite bandwidth” and overselling it (which is not unreasonable since everyone overestimates what they’ll use), they were keeping an eye on metering resources to provide quality service for everyone.  They also have a reasonable approach to counting visitors.

Migrating my blog to WPEngine was very easy:

  1. Export a backup of posts from my DreamHost-hosted blog.
  2. While I was migrating… used my vim-fu to modify the backup so the images pointed to a second domain.  This would let me decouple the images from the blog content.  (Doing this was an opportunity to clean out all of the non-content cruft that I’ve accumulated since 2002, when I was on Movable Type (before switching to WordPress, back to MT, and again, finally, to WordPress).
  3. Install my Pagelines theme and plugins.
  4. Tell CloudFlare to use the new server.

The switchover took about two hours, plus a few hours spread out over the next few nights to reconfiguring Pagelines and plugins to look acceptable.  I really appreciated their site backup/staging feature – it let me try plugins without taking down the original.  It’s speedy and seamless.

They install, but do not activate, a handful of curated plugins.  Being so separated from WP-cool, it was interesting to browse through what’s new and available.  My head almost exploded trying to ponder the search engine optimization one.  (Too many options.)

Using the WebPageTest, I went from FABF✓ to CAAF✓.  Page load time decreased to about 6 seconds.   Shortly after my trial started, DreamHost sent me this:

During a recent security scan we have identified that one or more of your hosted sites have been compromised and may have an open security flaw which allowed malicious parties to abuse your site for unscrupulous purposes. Further details regarding what security concerns we found can be found listed below:


After nearly shitting my pants, I determined there was no real damage because the web site was inactive.  What appears to have happened is another user on DreamHost was infected.  This infection, run via shell, surfed user directories to look for WordPress and specifically subdirectories that were world-writable.  It managed to traverse my directory structure, find the opensource theme (WonderFlux) had world-writable directory permission, and deposited a kiddie script in each.  The scripts were a relatively old hack that fetches a set of directives from a hard-coded IP address. In essence, it installs a shim to turn the site into a botnet. Pass the Canadian-grown V1@gr@!

To be safe, I blew away everything and restored from the known good backup taken when I moved the site over to WPEngine.  I also contacted WonderFlux’s proprietor suggesting that the file permissions might be reigned in a notch, changed passwords using my handy-dandy password manager, and used my scripting fu to audit file permissions.

I have a few simple takeaways from this experience:

  1. Always have a backup no older than the amount of data loss you’re willing to incur.
  2. Don’t expect help from your web hosting provider.  DreamHost deals with hundreds of these things a day.  Their support people are happy to run their scanner again, but that’s about all.  The “site restore” option buried deep within their control panel is ornamental.
  3. Even though WordPress components may be inactive or unused, they still represent potential security problems. I’d checked WonderFlux for php shenanigans, but didn’t think to validate its file permissions.  (This was clearly my bad.)  I should have also deleted it when I was done.

So up to this point, I was feeling a lot of love for giving WPEngine a try.

The first two weeks on WPEngine were great.  The three technical questions I had were answered promptly and with great information – Christopher and Donovan had WP-fu.  Even with the biggest pinstorm I’d seen yet, page load times were still hovering around the 6 second range.

Performance on WPEngine, late March 2012
Performance oN WPEngine, late March 2012

Then on March 21st, I tried logging into my console to write:

Because I ain’t going to tell you why.  That’s why.

There was nothing in the error logs, so I reported this to support.  One of their technicians gave me a “try it now,” and chalked it up to a timeout. I’d later find out this was part of some broader issues.  No specific root cause was noted.

A week later, it happened again.  It just hit me on the wrong day, and I was amped up on frustration in my support request.  Their junior support person responded within 20 minutes saying she couldn’t reproduce it and suggested it might be a “browser issue.”  (Browser Issue is becoming the new “Update Windows.”)  Since they had a similar site problem earlier that day, I had my doubts, but just in case, I tested it with Chrome, Firefox and Safari.   I also logged in from work using IE.  Same result.

The following evening, I was still unable to login and heard nary a peep from them.  I started moving all of my stuff back to Dreamhost.  Reloading the blog database on DreamHost took freaking forever, but I was back online.   One more check that they hadn’t responded, so I requested they cancel my account and that they refund the first month per their 60-day guarantee.   A few days later, their blog showed more of these 50x problems.  And again.

Clearly, they are having some growing pains.  I hope they can work past these because the value proposition of not spending evenings farting around with WordPress/Apache/Caching plugins or dealing with the ongoing security issues with PHP and WordPress make their offering interesting.

Performance on Dreamhost, post-move. April 2012

In the meantime, I’ve been spending my evenings farting around with WordPress/Apache/Caching plugins, coaxing out an improvement over what it was doing before.  Now if I can just get back to writing again…