Introducing Agentless Cloud Monitoring

In my thesis work I’m developing a framework built on top of KVM and QEMU which adds the capability of cloud-wide agentless monitoring.  If you’re interested in this line of thinking read on for a high-level introduction and please comment in!

Three properties of the Virtual Machine (VM) abstraction enable and distinguish modern cloud computing: strong isolation, virtualized hardware, and soft-state provisioning. Strong isolation provides isolation between a VM and its host, and between a VM and other VMs executing on the same host. Because of strong isolation, separate entities may share the same host without knowledge of each other in a multi-tenant environment. Virtualized hardware frees a VM from its underlying hardware architecture and devices. This freedom consolidates workloads, now untethered from their hosts, by migrating them as the work intensity varies, and assigning resources only when needed. Soft-state provisioning reduces the time to deploy a running service. Requested resources can tightly match current workloads, and as the demands of the workload change over time, resources are elastically scaled. Continue reading

The Curious Case of a Sick Google Glass

During recent experiments for a research paper, my research group observed very strange symptoms from our Google Glass. Most of our experiments were done to study the impact of latency on cognitive assistance applications such as programs designed to remind you who is in front of you, or notify you that it is safe to cross the street. We observed a large variation in latency which was unexplainable by the usual culprits such as poorly performing WiFi networks. We had isolated all the possible sources outside of the Google Glass, but the unknown source of latency jitter was still ruining our experimental results. At this point, we knew we had to figure out what was going on inside the Google Glass itself.

Continue reading

More with Less: Deduplication

Deduplication is a critical technology for modern production and research systems. In many domains, such as cloud computing, it is often taken for granted [0]. Deduplication magnifies the amount of data you can store in memory [1], on disk [2], and in transmission across the network [3]. It comes at the cost of more CPU cycles, and potentially more IO operations at origin and destination storage backends. Microsoft [4], IBM [5], EMC [6], Riverbed [7], Oracle [8], NetApp [9], and other companies tout deduplication as a major feature and differentiator across the computing industry. So what exactly is deduplication?

Continue reading

Eyes Clouded by Distributed Systems

You are probably reading this article with a dual- or quad-core processor, and perhaps with even more cores. Your computer is already a distributed system, with multiple computing components—cores—communicating with each other via main memory and other channels such as physical buses—or wires—between them. As you browse multiple web pages you are interacting with the largest distributed system ever created—the Internet.  We recently celebrated IPv6 Day [0]: IPv6 is a new form of addressing devices connected to the Internet because its sheer scale has outgrown the previous standard IPv4’s list of addresses—all 4+ billion of them.  Every Internet company depends on distributed systems, and, by extension, the economies of the world are now tied to them.

Companies such as Google, Facebook, and Amazon are all interested in building highly efficient large-scale distributed systems which power their businesses. Over the previous decade, Google has described their Google File System (GFS) [1]—a file system spanning thousands of computers to store more data than any single computer system, and a technology that has shaped almost every form of large-scale computing since publication: MapReduce [2].  MapReduce is distributed computing for the masses because it distills everything down to two functions—Map and Reduce—and once they are specified it handles all other aspects of coordinating thousands of computers on behalf of the programmer. Facebook has released open source projects such as Thrift [3] for implementing communication between programs in different programming languages. Amazon built the first, and largest, public cloud EC2 [4] by inventing new distributed systems designed to bring datacenter scale to the masses—with EC2 you can easily start 100 servers within minutes.   Amazon has offered many other services to enhance their overall cloud such as a storage substrate called S3 [5]—think of it as a building block for a GFS—and CloudFront [6], a content distribution network (CDN) designed to distribute data around the world for low latency and high bandwidth access. Akamai [7] also helps deliver the web’s content with one of the largest CDN networks in the world. Netflix has their own distributed CDN [8] as they outgrew solutions provided by Akamai and Amazon.

Continue reading