I found myself wandering on a dark road in the rain last night, looking for a meetup organized by Tatyana Kanzaveli that featured a talk by Rob Weltman on “Cloud Computing: Using the Open Source Hadoop to Generate Data-Intensive Insights.” The event was in a banquet room at the Samovar Deli at 1077 Independence Ave in Mountain View, which is tucked into a light industrial pocket in Mountain View that’s dark at night (don’t be fooled by Tri-City Glass at 1077 #C, there is another 1077 that’s the Samovar). Inside it was bright and warm and there were about two dozen folks already eating.
It was a friendly group and after I had sat down and warmed up I thought to myself, here was a microcosm of Silicon Valley, twenty very technical people gathering for a good inexpensive meal and talking about the future of technology, all with an eye to influencing it in some way. Silicon Valley keeps on going despite any downturn or recession. I was reminded of the JRR Tolkien’s “Riddle of Strider”
“All that is gold does not glitter,
Not all those who wander are lost;
The old that is strong does not wither,
Deep roots are not reached by the frost.”
I met Tatyana Kanzaveli for the first time, who proved to be a dynamo, and the buffet at the Samovar was very good. I also had a chance to meet Christophe Bisciglia of Cloudera who was also in the audience. Cloudera offers enterprise-level support for Apache Hadoop users. An ex-Googler (there seem to be more around these days, last year it was almost unthinkable), Christophe offered a perspective on Google’s use of Hadoop. He and Rob Weltman agreed that network design had had a much larger impact on cloud performance than early infrastructure designers had anticipated.
Cloudera’s Hadoop & Big Data blog offered a “State of the Elephant 2008” on Jan-5-2009 that offers a good perspective on the rate of change in the technology (Hadoop is just at version 0.19), here are two excerpts but the whole thing is worth reading if are following Cloud Computing:
At the beginning of the year, Hadoop was a sub-project of Lucene. In January, Hadoop became a Top Level Project at Apache, in recognition of its success and diversity of community. This allowed sub-projects to be added, the first of which was HBase, previously a contrib project. ZooKeeper, a service for coordinating distributed systems, and which had been hosted at SourceForge, became a Hadoop sub-project in May. Then in October, Pig (a platform for analyzing large datasets) graduated from the Apache Incubator to become another Hadoop sub-project. Finally, Hive, which provides data warehousing for Hadoop, moved from being a Hadoop Core contrib project to its own sub-project in November.
The number of open source projects in the distributed computing space continues to grow relentlessly. Here are some that came to prominence in 2008, and have some connection to Hadoop (if only because they are used in conjunction with Hadoop, or perform similar functions). In no particular order:
- Mahout, an Apache Lucene sub-project to create scalable machine learning libraries that run on Hadoop
- Jaql, a query language for JSON data
- CloudBase, a data warehouse system build on Hadoop
- Cassandra, a distributed storage service
- Cascading, an API for building dataflows for Hadoop MapReduce
- Scribe, a service for aggregating log data
- Tashi, an Apache incubator project for cloud computing for large datasets
- Disco, a MapReduce implementation in Erlang/Python
- Hypertable, a distributed data storage system, modeled on Google’s BigTable
- CloudStore, a distributed filesystem with Hadoop integration (formerly Kosmos filesystem)
By coincidence I received two other Hadoop related announcements today:
From Ajay Anand:
The next Bay Area Hadoop User Group meeting is Wed-Feb-18 at Yahoo! 2811 Mission College Blvd, Santa Clara, Building 2, Training Rooms 5 & 6 from 6:00-7:30 pm. Agenda:
- Fair Scheduler for Hadoop – Matei Zaharia
- Interfacing with MySQL – Aaron Kimball
And from Chris Wensel:
Our Hadoop 2 day Boot Camp, to be held on March 5th and 6th, is open for public registration. For a course overview: www.scaleunlimited.com/courses/hadoop-boot-camp. There are only 12 seats in the class. For pricing and registration see www.scaleunlimited.com/courses/hadoop_course_march_5th_and_6th
Trackback from your site.