Saturday, December 25, 2010

“Strata Gems: Whirr makes Hadoop and Cassandra a snap - Get control over cloud resources”

“Strata Gems: Whirr makes Hadoop and Cassandra a snap - Get control over cloud resources”


Strata Gems: Whirr makes Hadoop and Cassandra a snap - Get control over cloud resources

Posted: 25 Dec 2010 01:16 AM PST

We're publishing a new Strata Gem each day all the way through to December 24. Yesterday's Gem: DIY personal sensing and automation.

Strata 2011 The cloud makes clusters easy, but for rapid prototyping purposes, bringing up clusters still involves quite a bit of effort. It's getting easier by the day though, as a variety of tools emerge to simplify the commissioning and management of cloud resources.

Whirr is one such tool: a simple utility and a Java API for running cloud services. It presents a uniform interface to cloud providers, so you don't have to know each service's API in order to negotiate their peculiarities. Furthermore, Whirr abstracts away the repetitive bits of setting up services such as Hadoop or Cassandra.

Whirr's command-line tool can be used to bring up clusters in the cloud. Bringing up a Hadoop cluster is as easy as this one-liner:

 whirr launch-cluster \     --service-name=hadoop \     --cluster-name=myhadoopcluster \     --instance-templates='1 jt+nn,1 dn+tt' \     --provider=ec2 \     --identity=$AWS_ACCESS_KEY_ID \     --credential=$AWS_SECRET_ACCESS_KEY \     --private-key-file=~/.ssh/id_rsa          

When the cluster has launched, a script (~/.whirr/myhadoopcluster/hadoop-proxy.sh) is created, which will set up a secure tunnel to the remote cluster, letting the user execute regular Hadoop commands from their own machine.

Whirr's service-name and instance-templates parameters are the key to running different services. The instance templates are a concise notation for specifying the contents of a cluster, and are defined on a per-service basis. The Hadoop example above, 1 jt+nn,1 dn+tt, specifies one node with the roles of "named node" and "job tracker", and one node with roles of "data node" and "task tracker".

Services currently supported by Whirr include:

  • Hadoop (both Apache and Cloudera Distribution for Hadoop)
  • Cassandra
  • Zookeeper

Adding new services involves providing initialization scripts, and implementing a small amount of Java code. Whirr is open source, currently hosted as an Apache Incubator project, and development is being led by Cloudera engineers.

  • For in-person instruction on getting started with Hadoop or Cassandra, check out the Strata 2011 Tutorials.

This entry passed through the Full-Text RSS service — if this is your content and you're reading it on someone else's site, please read our FAQ page at fivefilters.org/content-only/faq.php
Five Filters featured site: So, Why is Wikileaks a Good Thing Again?.

0 comments:

Post a Comment