Interview: Mesos Cluster manager Proposed as Apache Project
Mesos provides finer grain resource sharing, better support for data intensive applications, and is more flexible than existing cluster schedulers.
Ion Stoica is an Associate Professor in the EECS Department at University of California at Berkeley, where he does research on cloud computing and networked computer systems. Past work includes the Chord DHT, Dynamic Packet State (DPS), Internet Indirection Infrastructure (i3), declarative networks, replay-debugging, and multi-layer tracing in distributed systems. His current research includes resource management and scheduling for data centers, cluster computing frameworks for iterative and interactive applications, and network architectures. In 2006, he co-founded Conviva, a startup to commercialize technologies for large scale video distribution.
JAXenter speaks to Ion Stoica, on the Mesos project proposal, and why it hopes to become a part of the Apache ecosystem.
JAXenter: Mesos has just been proposed as an Apache Incubator project. Can you give us an introduction to the Mesos project?
Ion Stoica: Mesos is a cluster manager that allows multiple applications, such as MapReduce, HBASE and MPI, to share a cluster, the same way a traditional operating system allows applications to share a single machine. Sharing improves cluster utilization by allowing an application to take advantage of another application’s resources that otherwise would remain unutilized. For example, an analytics job can take advantage of the resources of a frontend application during the off-peek hours. Mesos also enables applications to share the same data set. The alternative of running an application per cluster could be prohibitive for very large data sets, since it would require the applications to either replicate or access the data remotely. Finally, Mesos allows administrators to perform rolling upgrades by slowly ramping up the new version of an application.
To support the sophisticated schedulers of today’s cluster computing applications, such as Hadoop or Dryad, Mesos uses a distributed two-level scheduling mechanism. At the first level, Mesos decides how many resources to offer each application; at the second level, applications themselves decide which resources to accept (of the ones offered by Mesos) and which computations to run on them.
JAXenter: What are the potential problems of having one instance control a whole cluster? And how does Mesos solve this issue?
Ion Stoica: There are three key challenges that Mesos needs to address: isolation, scale, and reliability. At the cluster level, Mesos provides isolation by controlling how many resources each application receives. At the node level, Mesos ensures that an application does not use more resources than allocated by leveraging existing isolation mechanisms such as VMs and Linux Containers. Messos is highly scalable as it only schedules resources across applications, while letting the applications themselves deal with the more complex job of deciding which computation to run, and where. Finally, to ensure reliability, Mesos uses Zookeeper to detect and pick a new replica to recover from failures.
JAXenter: What edge does Mesos have over existing cluster schedulers, for example Sun Grid Engine and Torque?
Ion Stoica: Mesos provides finer grain resource sharing, better support for data intensive applications, and is more flexible than existing cluster schedulers. In part, these properties are a consequence of targeting a different environment and workload. Mesos targets data intensive applications running on commodity clusters, where storage is distributed across machines. In contrast, existing cluster schedulers typically assume specialized hardware, such as Infiniband, and SANs, and target computation intensive applications.
Mesos allocates resources at a task granularity where a task (e.g., map or reduce) can take less than a minute to complete. This fine grain sharing allows applications to achieve good data locality by taking turns on the nodes where their input data is located. In contrast, traditional cluster schedulers share resources at the job granularity, where a job may take several hours to complete, and assume that data can be efficiently accessed remotely over a high speed network.
Mesos is flexible in that it allows applications to make their own scheduling decisions. Systems, such as Sun Grid Engine and Torque, ask applications to specify their requirements and employ a centralized scheduler to make all scheduling decisions.
JAXenter: How will becoming an Apache Incubator project, benefit Mesos?
Ion Stoica: We hope that by becoming an Apache incubator, Mesos will attract both users and developers. We have had good experience as a research team working with the open source community (to add several scheduling features to Hadoop), and thus we wanted to make Mesos part of the community from the beginning. We hope this will enable developers from organizations already using Mesos to become active contributors. Ultimately, we believe that Apache’s model of encouraging a community with varied goals and interests around a software project is critical to Mesos’ adoption.