As the industry settles into a market where virtualization has major offerings from Microsoft, VMWare, and the Open Source communities, administrators and security engineers are beginning to grapple with the implications of virtualization on infrastructure security. Fortunately, virtualization is really nothing new. While the utility (and cost savings!) of infrastructure consolidation are being increasingly recognized every day, infrastructures constructed on virtualization and virtual partitioning platforms extend back into the days of mainframe, as well as adoption of hypervisor-based platforms throughout the late 90s and this decade. Security professionals are not without lessons learned to apply in building the next generation of Hyper-V driven infrastructures; the start of securing your enterprise is merely a trick of recognizing how to modify them for today's intensive needs.
In order to understand the security model in a Hyper-V virtualized environment, the administrator or security professional needs to examine the most basic mechanics of virtualization and the relationship that the Guest Virtual Machines (VMs, also called Hosted Instances or Logical Partitions -- LPARs -- in some environments), the Host operating system, and the hypervisor kernel share in Microsoft's Hyper-V architecture.
In Windows Server 2008, the Hyper-V virtualization layer is actually running "below" the level of the Host Operating System and directly handles the low level functions which make a computer operate -- as well as mediating the access requests between the host operating system components and the Guest VMs.
Those working with Virtual Server 2005 will recall that this is a major reconstruction of the operation of the virtualization layer. In Virtual Server 2005, the virtualization layer was running "on top" of the operating system, as a series of windows system services. In Windows Server 2005, this resulted in sometimes harsh performance losses because any input or output request from a guest VM was made from the application on the guest needing the access, had to complete the operating system "stack" on the guest and then was relayed by the VM to the Virtual Server software service(s) through the appropriate emulated hardware. The virtual service in question would then make the necessary requests of the underlying host operating system. The host operating system running Virtual Server would then need to complete its entire stack of software functions to actually make the necessary request of the hardware the server was running on, before passing the resultant data all the way back up through all of these layers.
In Hyper-V the virtualized guest's resource request never has to go through the Host operating system! This is a very important point in Hyper-V which not only results in vastly improved efficiency rates for I/O intensive operations on guest VMs but also is critical to understand in the protection of the consolidated environment. The Hypervisor microkernel, not the Host Operating System, is the only component between the guest VMs and the hardware. Along with this efficiency gain (and its attendant increase in the attractiveness of Hyper-V for enterprise consolidation implementations, is the considerations that a security engineer or network administrator needs to be conscious of all four components: the hardware, hypervisor, host VM, and guest VMs.
When a security engineer examines a Hyper-V virtualized environment, there spring to mind two primary divisions of strategy that apply to your consolidated infrastructure: The first, is that the same security principles that apply anywhere else in your environment need to apply to both host and guests in Hyper-V hosting. The second, is that virtualization has special strategy demands which require closer attention to detail in some areas in order to consider the effect that a unified server has on multiple potential services run in independent VMs on the same underlying hardware.
The Basics Still Apply
Maintain Physical Security
As with any information technology infrastructure, once an attacker has physical access to the hardware that a server is running on, the attacker can easily compromise nearly any level of the operating capacities of the machine through the use of offline threats. Key loggers, local networking monitors, forensic analysis of the hard drives, booting into a "Live CD" with a built-for-the-purpose set of tools. In consolidated environments, it is true that much of the time the data of the guests is not directly stored on Direct Attached Storage, however keep in mind that if use of offline tools by an attacker allows them to obtain administrative credentials on the host and then return the server to service, they can use the host level access to enable further attacks on the guest, including making shadow copies of the storage for the guest VMs which can themselves be subjected to further offline attacks.
To protect physical security of the server, in an enterprise environment, consider investing in remote management hardware such as physical remote management cards which allow server-level access to KVM and drive functions, a consolidated KVM-over-IP solution, and implementing separation of duties. These days, remote management tools are sophisticated enough to ensure that a remote administrator can start a session with a remote management unit and have comparable access to the base keyboard, mouse, and console display (and sometimes even the CD/DVD drive) that the administrator would have accessing the box in person. Note the word 'and' in 'and implementing separation of duties'. Giving your administrators the ability to remotely administer the box is effective only if you also remove their physical access to the data center. In the enterprise, data centers often have specifically employed individuals who can address the operational needs of cabling, racking hardware, installing new hardware, and possibly hitting the power button for a power cycle in the case of some catastrophic failure.
Separating the duties of those with logical access to the server (and administrative ownership over the function of the operating system and applications running on the server) from those with physical access to the server for operational requests allows you to reduce risk of physical access-based attacks through minimizing the number of people with that access. The data center administrators have physical access to the server, but not the authorization information to change the operating system, and the network administrator or engineer has necessary authorization to operate the software running on the server without enabling another entire class of users to have direct access to the data center.
Less Software Means Less Attack Surface
Given that the host operating system in a Hyper-V server acts as little more than a platform for administration tools, the administrator has a new ability to trim down the software packages that run on the server itself. Each installed role or package which is surplus to the Hyper-V server provides another set of code which could either potentially be used directly by an attacker against the host itself (particularly in the case of superfluous pre-staged administration tools), or could contain vulnerabilities which, in the case of running services, could allow an attack vector in the first place.
An administrator which needs to perform changes on the Hyper-V server has access to the server remotely through the Server Manager for the most common administration tasks. For those of less commonality, once an administrator logs into the server, he or she could then map a drive to a shared folder of utilities or could have an installer or zip file which can be copied over from a share, used during the course of the task at hand, and then removed.
In an enterprise environment where the Hyper-V server is expected to host instances of production services, the host Windows Server 2008 installation should be made with Server Core! Server Core provides a somewhat hardened environment for the virtualization services at the expense of removing many of the most common administration tools. Guides are available across the web, including my own, on how to install Hyper-V on a Windows Server 2008 Server Core machine. Additionally, Microsoft has made ample documentation available on how to use the command line and PowerShell to provide advanced from-the-console administrative actions.
Updating the System is Key
Implementing an aggressive firewall on the server and restricting direct access to the machine will be of little avail if the firewall code, or any of the services which are exposed (such as the terminal services instance in most installations), is found to have a vulnerability in the software which remains unpatched on your server. Updating your software on the host as well as each of the guests should be part of the planned architecture of the environment. In some cases, this has policy implications as well. How will your change management work to allow you to reboot the host after applying updates, which also takes down the guest instances? Do you have a universally applicable change window? Do you need to plan a change ticket?
Virtualization does not affect the tenet that your software should be updated! Virtualization simply complicates the planning for host downtime but the update process must still be considered and planned for.
Use Domain Based Credentials to Minimize Threat from Turnover
Managing access to the environment requires attention to your authorization and identity strategy. Credentials which are based on the standalone server provide an opportunity for mistakes in de-provisioning, creating a series of active accounts with privileges possibly as high as local administrator on your host Hyper-V servers and your guest VMs. Just as in any other information technology environment, the virtualized server environment can be easily compromised by these obsolete user accounts.
Engineers should minimize the number of standalone credentials created on the server itself which provide the potential for running into problems. Reduce host services which require custom local accounts to be established. Create a strong password for the local administrator account. If your local environment uses a common password across the environment for administrative access, change the password periodically. Minimize the number of local accounts being created, rather join the server to the domain early in the setup process and add a domain based group or account to the necessary permissions.
Apply the Concept of Least Privilege
Accounts which are granted privilege to the environment, regardless of the origin of the account (local or domain-based), should only have the privileges on the local system which are required for their needs. During installation, the account which will enable the Hyper-V role needs to have administrator privileges, either based on the local administrator account, or a domain administrator. Once the actual installation is done, and sub-administrators and end-users are working with the guest VMs, administrator privileges are no longer required for most operations IF you have invested the time to setup the local permissions.
There is a great blog post specifically on the subject on how to delegate permissions to work with VMs without host privileges, by Lukas Beeler:
http://projectdream.org/wordpress/2008/07/03/delegating-hyper-v-virtual-machines/
Some basic tenants apply. Do not grant permissions on the host data storage to the extent possible except to the directories where the individual is supposed to be able to save ISOs and other files. Use a file share if possible to grant access to the storage location for ISOs and other locally hosted materials that the VM should have direct access to. Avoid giving any users the ability to directly log on to the host via RDP if they do not absolutely need that ability (and force users to justify that request by providing the exact delta of what they need to do over and above what is provided by a file share and use of server manager). Using the above blog, grant end-users permissions only to the VMs to which they should have rights and control.
Virtualization has Special Strategy Demands
Documentation of your Environment is Critical to Success
In a virtualized environment, the relationship of logical servers to physical hardware is no longer 1:1. Many systems documentation systems expect each server to be associated with a physical hardware platform and hence do not make the provision for a logical server to be associated to a host server. In order to properly maintain your environment, virtual servers need to have documentation readily available on what guest VMs are hosted on which physical machines, as well as (critically!) the person or contact for the "owner" of the guest VM! In my experience with virtualized environments, a number of different businesses or administrators will "own" individual virtual machines hosted on the server. In order to prepare for downtime on a given host, it is necessary that the organization be able to properly notify affected teams of needed downtime or changes to the underlying virtual host.
If the organization does not maintain an enterprise-level documentation system for the infrastructure environment, individual administrators can counter this vulnerability by using locally hosted documentation. Create a directory on the server called "Documentation" in a standardized location and use locally hosted text files to store information about the individual VM names, basic configuration information, and the contact information of the owning administrator or business unit.
Planning for Downtime affects More than the Host
The relationship of the host to its guest VMs means at its base that as goes the host, so go the guests. This means downtime has direct effects. The change management, business continuity, and disaster recovery strategies of the organization may need to be modified to take into account the cascading effect that downtime on the Hyper-V host implies. As noted previously, the documentation that your organization maintains should provide the basis for a procedure of approvals or notifications when a host should be taken down or is experiencing a downtime event. Each of the guest VM owners needs to be involved in the downtime discussions as well as the necessary groups which administer the service of the Hyper-V host itself.
In many enterprise environments, this means ensuring that there is some kind of tagging applicable to the Hyper-V hosts themselves to indicate additional teams that are involved in change requests as well as operational functions related to downtime itself. Formalizing this approach in a policy could mean that in the case of an unexpected downtime event, a representative from each operational group affected by the host downtime would be present on a call or system to test and validate the application's return to service as well as to represent the importance of the downtime itself and the criticality of resolving that component in a case where more extended prioritization is required.
In smaller environments, this could take the form of simply expanding the existing process to include additional business owners. Adding a rep from each "owning" team to an email thread and asking for a one-line approval statement from each could be one of the simplest ways to implement the same form of interaction
The Hypervisor has Access to Near Everything
The micro-kernel which provides the hypervisor services between the host and guests and the underlying hardware performs the function of interpreting all of the resource requests for the system. As a result, this system is the linchpin which has direct, low-level access, to every bit and byte of data that will be returned to either the host operating system or the guest VMs. Once an organization introduces Hyper-V into the software stack in your environment, from that point forward, the primary administrators or architects in the environment need to be regularly updating themselves on the status of updates to the Hyper-V platform and understanding how attendant vulnerabilities work.
Once a vulnerability is released that has the capacity to target the Hypervisor, the internal administrator will want to be able to escalate any attendant repair following patch release to ensure that the patch can be applied to the environment as soon as possible.
The administrator also needs to be in a position to protect the hypervisor to the extent possible. Workarounds should be immediately implemented in an environment if a high-severity issue is discovered in the industry that may affect the hypervisor layer so it is necessary to examine a mitigation approach based on operational awareness rather than with any immediate proactive protective action. Establish a process whereby one or more engineers are responsible for staying absolutely up to date on the security of the Hyper-V platform, patches, as well as security discussions in major communities. Ensure that there is a defined and documented process that the selected engineer(s) or administrator(s) can use to immediately raise the visibility of the issue to the appropriate project, management, or change teams that can authorize necessary fixes or being a "fast-tracked" engineering process to determine an acceptable workaround in the environment that takes into account the risk of the potential threat.
Data Classification Assigned to the Hypervisor/Host Should be No Less than the Most Sensitive Guest
Given that the hypervisor micro-kernel has access to all data passing back and forth in streams from the host OS and guest VMs to the underlying hardware platforms, an implied corollary would then dictate that the data classification which is applied to the Hypervisor and Host OS should be no less sensitive in classification than that of the guest VM storing or using the most sensitive data. If your hypervisor is compromised, even though the data being stored in storage partitions of the host OS may be seemingly trivial, it is critical to realize that the host OS has access to everything! Not just its own loaded utilities and tools! In cases of VMs which are using iSCSI LUNs for storage, you also have the consideration that the LUN itself is directly loaded on the host OS as a drive in order to make it available as a passed-through drive to the guest VM.
This approach to using iSCSI LUNs means that the host OS and the hypervisor have the ability to obtain direct and complete access to all of the data stored on that LUN.
In enterprise environments subject to regulatory requirements, your system of documentation should indicate the sensitivity of the host OS for your Hyper-V server at the heightened state of sensitivity in order for your organization to respond appropriately in the case of a compromise. Further, for security settings auditing purposes and ensuring that policy or application requirements for high sensitivity infrastructure is properly applied to your Hyper-V host machines.
Virtualization Enables More than Consolidation Savings!
Implementing Hyper-V to initially consolidate your environment drives a savings in the enterprise that goes beyond reducing the server count in the environment. A security engineer examining the virtualization environment to be implemented in the environment needs to be aware of the available disaster recovery and business continuity benefits available in Hyper-V. Designing the infrastructure to take advantage of System Center Virtual Machine Management 2008 as well as the backup possibilities available in the virtualized environment is best done during the initial design rather than attempting to return and retrofit these technology approaches into the environment, which was already established.
Hyper-V installations accompanied by System Center Virtual Machine Manager 2008 bring to the table the ability to apply high availability to the environment including virtualized clustering, V2V migration, as well as a suite of features to monitor performance and localized resource use. There are several key applications for VMM in the environment, the first and most basic is using the performance monitoring capabilities to determine a baseline of the expectable range of resource use in the environment should be. By establishing this baseline and then configuring alerts based on significant performance deviations, the administrator or engineer could detect when a Denial of Service, either purposeful or incidental to expected traffic, is in place that is preventing the function of one or more guest VMs in the environment.
The SCVMM product provides more than simply the ability to monitor performance and utilization in the environment, Virtual Machine Manager also provides the security administrator the ability to cluster virtual hosts so that in the case of a downtime event, virtual machines can be manually failed over to be brought up on another similarly configured host. This approach provides something of a warm standby at all times in environments where there is a unified storage model such as that provided by a shared SAN. Combining the ability of VMM to provide V2V and P2V migration in the current generation SCVMM product as well as the stated intention for the next version of SCVMM to provide live migration capability to the environment, provides a strong roadmap for security professionals and architects to build a consolidation environment that provides a roadmap for immediate and future improvements in reliability.
Snapshot as Practical Storage Considerations Allow
Hyper-V introduces the capability to take live snapshots of the state of the virtual machine at a point in time. Similar to the mechanics of the snapshot process for a NetApp filer in a storage environment, the snapshot concept in Hyper-V extends the undo disks of Virtual PC and Virtual Server 2005 to provide the ability to capture multiple states and return to any of them according to the needs of the owning organization. The engineer or architect for the consolidation environment should plan on the ability to host snapshots at least to be able to return to the state of the guest VM at the time of initial complete application configuration. In development environments, snapshots provide the ability to return to a pre-change state relatively quickly when a developer is applying an unverified hotfix or software patch to the applications hosted on the guest VM.
When you start taking these kinds of snapshots, however, it is important to recognize that the storage the snapshot takes up is a consideration for which space must be allocated to utilize the snapshot capability. The reliance on differencing between the virtual hard disk and the snapshot allows the snapshot to be significantly smaller than the size of the underlying VHD. The drawback here is that differencing indicates that snapshot growth is based on the amount of change between snapshots and the underlying disk VHD which makes the exact amount of space to set aside difficult to estimate a baseline to plan for.
Applying SOA with Distributed Guests Can Provides More Security in a Virtual Environment
In Service Oriented Architecture, the components of the overall solution being developed can, and often do, exist on independent physical servers. When consolidating the data center using a virtualization platform, these sometimes not-fully-utilized server instances are often an excellent target for virtualization. Larger environments that involve a number of hosts provide the theoretical capacity to design additional security into the environment by partitioning the physical layer on which each layer of the service oriented architecture would be hosted. In theory, if a single host were to host 3 tiers of a multi-tiered service-based application, if the underlying physical server were to be compromised (or the hypervisor residing thereon), all three guests are highly open to compromise and essentially would be assumed to potentially be compromised in short order.
In a larger Hyper-V based architecture, the engineer or architect can plan to distribute the tiers that comprise the entire application across multiple independent servers allows the architect to extend the modular approach to protect against the simultaneous compromise by, at minimum, partitioning the individual tiers onto different host platforms. While the organization needs to be careful not to fall back into the anti-practice of attempting to establish security by obscurity, there is the potential for very real value from the segregation in this solution. If one host, for example, could not apply a critical patch for some reason, and is subsequently compromised, the separated platform would mean that in our 3-tier example above, theoretically the other two tiers may not be compromised based on the design and architecture of the application being hosted. This separation offers the potential ability to extend the overall security of the architecture slightly simply by being careful not to group all instances of application services in the physical space to be compromised.