I spent today at the Hadoop Summit 2009 (hadoopsummit09). Although I paid my $100 registration fee in advance I made the mistake of getting there a few minutes after 9am with the first keynote underway and joined a thick knot of hangers on in one of the doorways to the ballroom (don’t tell the fire marshal). Finally at 10:30 they broke the wall down that separated the adjacent salon and expanded the ballroom.
The first keynote I listened to was a long sales pitch from Sun. I don’t know why you would give a room full of hundreds of engineers and scientists a basic sales pitch for cloud computing, but it was a waste of time for both Sun and the audience.
Enough whining, the rest of the conference was very thought provoking, some quick impressions:
- If Moore’s Law has delivered roughly a million fold improvement in computing performance in the last 40 years, Hadoop, Zookeeper, and similar orchestration layers allow another thousand fold improvement for suitable problems.
- Amdahl’s Law trumps Moore in many situations but some of the problems now being solved were intractable, at least for a reasonable budget, if not unthinkable five or ten years ago.
- To put a million fold increase in perspective, that’s a lifetime of calculation (40 hours a week, 50 weeks a year, 40 years working lifetime) compressed into 288 seconds, 12 seconds shy of five minutes.
- The ability to orchestrate a thousand to ten thousand machines on a problem (admittedly you need a suitable problem) means we are looking at project CPU budgets measured not in CPU years but CPU millennia.
- This is not entirely new: certainly there were DoD and NSA projects working at that level with specialized hardware two and perhaps three decades ago. Pixar announced in 2006 that their movie Cars took 23 CPU millennia to produce, again with specialized hardware.
- But Amazon EC2 uses commodity hardware and makes CPU hours available for a dime with a credit card. Admittedly a CPU millennium will set you back roughly $876,000 at current prices.
- Several requests or comments on the need for fractional hour billing, which I took as at least anecdotal evidence for good parallelization of a lot of the tasks.
- Amazon reminded us that they are quite skilled at accepting (and shipping back) physical media containing data sets for their Elastic Compute Cloud.
- “Never underestimate the bandwidth of a station wagon full of tapes hurtling down the highway.” Andrew Tanenbaum
- The show reminded me a lot of INTEROP 88, the year that Interop transitioned from workshop to trade show with a few dozen vendors at the Santa Clara Convention Center. The vendor ecosystem for Hadoop is not yet as diverse, but the focus was clearly on system administration and technology, with the applications discussed in highly technical language. The crowd seemed to be researchers and system programmers for the most part, but the potential business impacts are starting to become a lot clearer.
Postscript June 14: Jinesh Varia made a remark during his keynote about “please check out our security whitepaper, some firms are building HIPAA complaint applications” so I did and found this paragraph which clearly telegraphs their strategic intent to move to the heart of enterprise applications:
Certifications and Accreditations
To provide customers with assurance of the security measures implemented, AWS is working with a public accounting firm to ensure continued Sarbanes Oxley (SOX) compliance, and attain certifications and unbiased Audit Statements such as recurring Statement on Auditing Standards No. 70: Service Organizations, Type II (SAS70 Type II). AWS will continue efforts to obtain the strictest of industry certifications in order to verify its commitment to provide a secure, world-class cloud computing environment. The flexibility and customer control that the AWS platform provides permits the deployment of solutions that meet industry-specific certification requirements. For instance, customers have built HIPAA-compliant healthcare applications on AWS.