Introducing Agentless Cloud Monitoring

In my thesis work I’m developing a framework built on top of KVM and QEMU which adds the capability of cloud-wide agentless monitoring.  If you’re interested in this line of thinking read on for a high-level introduction and please comment in!

Three properties of the Virtual Machine (VM) abstraction enable and distinguish modern cloud computing: strong isolation, virtualized hardware, and soft-state provisioning. Strong isolation provides isolation between a VM and its host, and between a VM and other VMs executing on the same host. Because of strong isolation, separate entities may share the same host without knowledge of each other in a multi-tenant environment. Virtualized hardware frees a VM from its underlying hardware architecture and devices. This freedom consolidates workloads, now untethered from their hosts, by migrating them as the work intensity varies, and assigning resources only when needed. Soft-state provisioning reduces the time to deploy a running service. Requested resources can tightly match current workloads, and as the demands of the workload change over time, resources are elastically scaled.

The VM abstraction places users of clouds under the powerful and valuable illusion that they have complete control of hardware-based, real servers. This illusion means cloud servers are used in exactly the same manner as non-cloud, or classical, hardware-based servers. Experience and investments devoted to configuring and maintaining classical servers are not lost by transitioning into a cloud. The same management tools, the same monitoring tools, the same workloads, and the same routines apply in clouds. Thus, it comes as no surprise that the cloud model with its familiarity, and the additional benefit of elasticity, is now the de facto standard for centralized computing resources.

Of course, familiarity implies that the same problems, the same troubleshooting, and the same monitoring needs arise in the cloud model as with classical servers. In specific, monitoring subsumes a large portion of the required upkeep of individual systems, and is a key problem facing administrators of any computing system. Examples of monitoring include routine virus scanning, intrusion detection, log file analysis, and configuration auditing. In the classical model, which, for monitoring, is the same as the cloud model, state-of-the-art monitoring is accomplished with agents. Often third-party, agents are processes executing inside each VM that perform a useful monitoring task. But, agents have their costs.

Embedding a potentially untrusted, third-party agent within the boundary of an enterprise increases its attack surface—the agent may be hijacked by an intruder and used maliciously. In addition, agents consume valuable resources, and are unpredictable in their resource consumption. An agent using excessive memory causes memory pressure and potentially paging to disk, slowing down or even halting critical services. Finally, if a system is compromised, the agent may be tampered with or deliberately fed false information. In the case of a compromise or misconfiguration that is initially undetected, agents can not be trusted at all—and may hide malicious activity. Today, agents are a necessary evil.

Unsatisfied with the status quo, imagine an agentless world. Enterprises do not increase their attack surface by embedding the ticking time bomb of an extra attack vector into each managed system. Critical services execute with unfettered access to resources no longer competing with non-critical agents. Compromised systems can not report false information to agents no longer under their control. If feasible, efficient, and scalable, agentless monitoring would represent a new paradigm for constructing and deploying monitoring applications. Agentless, cloud-wide monitoring would disrupt a billion dollar agent-based industry.

Leave a Reply

Your email address will not be published. Required fields are marked *