Posts Tagged ‘VSphere’

XenDesktop 5, vSphere 4.1 and VNX Reference Architecture

May 23rd, 2011

A common theme that I see coming up up time and time again with customers is VDI using Citrix XenDesktop and vSphere 4.1.  It’s popularity generally stems from the previous success companies have had with the more traditional Citrix products such as Presentation Server / XenApp.  I know when I was looking at VDI solutions I was very much in favour of Citrix due to one thing, the ICA protocol.  It works and it works well over long distances, in a lot of companies it has proven itself over a long period of time, it is a protocol they trust to deliver.

Following a customer meeting recently I was desperately searching for an EMC reference architecture (RA) for a XenDesktop / vSphere deployment.  At the time it turned out we didn’t have a completed one,  we did however have one in draft format that was going through the final stages of review.  That RA has now been completed and released for public consumption, an overview of the documents purpose is below.

The purpose of this reference architecture is to build and demonstrate the functionality, performance and scalability of virtual desktops enabled by the EMC VNX series, VMware vSphere 4.1 and Citrix XenDesktop 5.  This solution is built on Machine Creation Services (MCS) in XenDesktop 5and a VNX5300 platform with multiprotocol support, which enabled FC block-based storage for the VMware vStorage Virtual Machine File System (VMFS) and CIFS-based storage for user data.

The RA covers the technologies listed below and details why the VNX array with FAST cache enabled is a perfect match for your Citrix VDI deployment.  One other interesting area that is discussed is the use of Citrix Machine Creation Services (MCS) which is a new feature XenDesktop 5 and provides an alternative to Citrix Provisioning Server (PVS).  For those new to MCS I suggest you have a read through the following Citrix blog post as their are some additional design considerations around IOPS that need to be considered.

  • Citrix XenDesktop 5
  • Microsoft Windows 7 enterprise (32-bit)
  • Citrix Machine Creation Services (MCS)
  • VMware vSphere 4.1
  • EMC VNX 5300 – (Fast Cache & VAAI enabled)
  • EMC virtual storage integrator (VSI) – Free on EMC PowerLink

If you are considering XenDesktop 5 and vSphere 4.1 then I suggest you download and have a read through the RA linked below.

EMC Infrastructure for Virtual Desktops enabled by EMC VNX Series (FC), VMware vSphere 4.1 and Citrix XenDesktop 5

Citrix, EMC, vSphere , ,

Whats new in vSphere 4.1 Storage

September 2nd, 2010

So I haven’t done a lot of real time blogging at VMworld this year as I’ve been busy trying to see and soak up as much as possible.  It’s not every day that you get access to the likes of Chad Sakacc (VP EMC / VMware alliance) Scott Drummond (EMC – ex VMware performance team) and a whole host of other technology movers and shakers. As you can imagine I took full advantage of these opportunities and blogging became a bit of secondary activity this week.

However, I’ve now had time to reflect and one of the most interesting areas I covered this week which was the new storage features in vSphere 4.1. I had the chance to cover these in multiple sessions, see various demo’s and talk about it with the VMware developers and engineers responsible. There are two main features I want to cover in depth as I feel they are important indicators of the direction that storage for VMware is heading.

SIOC – Storage I/O Control

SIOC had been in the pipeline since VMworld 2009, I wrote an article on it previously called VMware DRS for Storage, slightly presumptuous of me at the time but I was only slightly off the mark. For those of you who are not aware of SIOC, to sum it up again at a very high level let’s start with the following statement from VMware themselves.

SIOC provides a dynamic control mechanism for proportional allocation of shared storage resources to VMs running on multiple hosts

Though you have always been able to add disk shares to VM’s on an ESX host, this only applied to that host, it was incapable of taking account of VM I/O Behaviour of other VMs on other hosts. Storage I/O control is different in that it is enabled on the datastore object itself, disk shares can then be assigned per VM inside that datastore. When a pre-defined latency level is exceeded on a VM it begins to throttle I/O based on the shares assigned to each VM.

How does it do this, what is happening in the background here? Well SIOC is aware of the storage array device level queue slots as well as the latency of workloads.  During periods of contention it decides how it can best keep machines below the predefined latency tolerance by manipulating all the ESX Host I/O Queues that affect that datastore.

In the example below you can see that based on disk share value all VM’s should ideally be making the same demands on the storage array device level queue slots.  Without SIOC enabled that does not happen. With SIOC enabled it begins throttling back the use of the second ESX host’s I/O queue from 24 slots to 12 slots, thus equalising the I/O across the hosts.

Paul Manning (Storage Architect – VMware product marketing) indicated during his session that there was a benefit to turning SIOC on and not even amending default share values.  This configuration would immediately introduce an element of I/O fairness across a datastore as shown in the example described above and shown below.


So this functionality is now available in vSphere 4.1 for Enterprise Plus licence holders only.  There are a few immediate caveats to be aware of, it’s only supported with block level storage (FC or ISCSI) so NFS datastores are not supported. It also does not support RDM’s or datastores constructed of extents, it only supports a 1:1 LUN to datastore mapping. I was told that extents can cause issues with how the latency and throughput values are calculated,  which could in turn lead to false positive I/O throttling, as a result they are not supported yet.

It’s a powerful feature which I really like the look of. I personally worry about I/O contention and the lack of control I have over what happens to those important mission critical VM’s when that scenario occurs. The “Noisy Neighbour” element can be dealt with at CPU and Memory level with shares but until now you couldn’t at a storage level. I have previously resorted to purchasing EMC PowerPath/VE to double the downstream I/O available from each host and thus reduce the chances of contention.  I may just rethink that one in future because of SIOC!

Further detailed information can be found in the following VMware technical documents

SIOC – Technical Overview and Deployment Considerations

Managing Performance Variance of applications using SIOC

VMware performance engineering – SIOC Performance Study

VAAI – vStorage API for Array Integration

Shortly before the vSphere 4.1 announcement I listened to an EMC webcast run by Chad Sakacc.  In this webcast he described EMC’s integration with the new vStorage API, specifically around offloading tasks to the array. So what does all this mean, what exactly is being offloaded?

So what do these features enable? Let’s take a look at them one by one.

Hardware assisted locking as described above provides improved LUN metadata locking.  This is very important for increasing VM to datastore density.  If we use the example of VDI boot storms, if only the blocks relevant to the VM being powered on are locked then you can have a more VM’s starting per datastore.  The same applies in a dynamic VDI environment where images are being cloned and then spun up; the impact of busy cloning periods, i.e. first thing in the morning is mitigated.

The full copy feature would also have an impact in the dynamic VDI space, cloning of machines taking a fraction of the time as the ESX host is not involved. What I mean by that is when a clone is taken now, the data has to be copied up to the ESX server and then pushed back down to the new VM storage location.  The same occurs when you do a storage vMotion, doing it without VAAI takes up valuable I/O Bandwidth and ESX CPU clock cycles. Offloading this to the array prevents this use of host resource and in tests has resulted in a saving of 99% on I/O traffic and 50% saving on CPU load.

In EMC Labs a test of storage vMotion was carried out with VAAI turned off, it took 2 mins 21 seconds.  The same test was tried again with VAAI enabled, this time the storage vMotion took 27 seconds to complete. That is a 5x improvement, and EMC have indicated that they have had a 10x improvement in some cases. Check out this great video which shows a storage vMotion and the impact on ESX and the underlying array.

There is also a 4th VAAI feature which has been left in the vStorage API but is currently unavailable, Mike Laverick wrote about it here. Its a Thin Provisioning API and Chad Sakacc explained during the group session that its main use is for Thin on Thin storage scenarios. The vStorage API will in the future provide vCenter insight into array level over provisioning as well as the VMware over provisioning.  It will also be used to proactively stun VM’s as opposed to letting them crash as currently happens.

As far as I knew EMC was the only storage vendor offering array compatibility with VAAI. Chad indicated that they are already working on VAAI v2 looking to add additional hardware offload support as well as NFS Support. It would appear that 3Par offer support, so that kind of means HP do to, right? Vaughan Stewart over at NetApp also blogged about their upcoming support of the VAAI, I’m sure all storage vendors will be rushing to make use of this functionality.

Further detailed information can be found at the following locations.

What does VAAI mean to you? – Chad Sakac EMC

EMC VAAI webcast – Chad Sakac EMC

Storage DRS – the future

If you’ve made it this far through the blog post then the fact we are taking about Storage DRS should come as no great surprise.  We’ve talked about managing I/O performance through disk latency monitoring and talked about array offloaded features such as storage vMotion and hardware assisted locking. These features in unison make Storage DRS an achievable reality.

SIOC brings the ability to measure VM latency, thus giving a set of metrics that can be used for storage DRS.  VMware are planning to add capacity to the storage DRS algorithm and then aggregate the two metrics for placement decisions.  This will ensure a storage vMotion of an underperforming VM does not lead to capacity issues and vice versa.

Hardware Assisted Locking in VAAI means we don’t have to be as concerned about the number of VM’s in a datastore, something you have to manage manually at the moment.  This removal of limitation means we can automate better, a storage DRS enabler if you will.

Improved Storage vMotion response due to VAAI hardware offloading means that the impact of storage DRS is minimised at the host level. This is one less thing for the VMware administrator to worry about and hence smoothes the path for storage DRS Adoption.  As you may have seen in the storage vMotion video above the overhead on the backend array also appears to have been reduced, so you’re not just shifting the problem somewhere else.

For more information I suggest checking out the following (VMworld 2010 account needed)

TA7805 – Tech Preview – Storage DRS


There is so much content to take in across all three of these subjects I feel that I have merely scratched the surface.  What was abundantly clear from the meetings and session I attended at VMworld is that VMware and EMC are working closely to bring us easy storage tiering at the VMware level.  Storage DRS will be used to create graded / tiered data pools at the vCenter level, pools of similar type datastores (RAID, Disk type). Virtual machines will be created in these pools; auto placed and then moved about within that pool of datastores to ensure capacity and performance. 

In my opinion it’s an exciting technology, one I think simplifies life for the VMware administrator but complicates life for the VMware designer. It’s another performance variable to concern yourself with and as I heard someone in the VMworld labs comment “it’s a loaded shotgun for those that don’t know what they’re doing”.  Myself, I’d be happy to use it now that I have taken the time to understand it; hopefully this post has made it a little clearer for you to.

Gestalt-IT, Storage, VMware, vmworld , , ,

SNAPVMX – View your Snapshots at VMFS/virtual disk level

June 9th, 2010

Following a recent implementation of VMware Data Recovery manager we ran into a few issues.  We eventually had to kill the virtual appliances due to the issue we were having and as a result we had a couple of virtual machines with outstanding snapshots.  These snapshots were taken by VDR and as a result could not be viewed or deleted using the snapshot manager.

We raised a call with VMware support and they started a WebEx session to look at the issue.  I always love watching VMware support personnel operating at the service console level, I always pick up a command or two that I didn’t know before.  On this occasion the support engineer was using something called SnapVMX to view the hierarchy of snapshots at the virtual disk level.

At first I thought this was an inbuilt VMware command but it turns out it’s not. It was actually a little piece of code that was written by Ruben Garcia.  What does it do?  well the following extract from the download pages explains it pretty well.

  • Displays snapshots structure and size of snapshots for every disk on that VM
  • Calculates free space needed to commit snapshots for the worst case scenario
  • Checks the CID chain of the analysed files and displays a warning if broken.

I’ve included a little demo screenshot to show what it can do. On the left hand side is  a screenshot from Snapshot Manager within vCenter.  On the right hand side is the same VM being viewed with SnapVMX in the service console.  Put the two together and you get a better idea of the snapshot disk hierarchy and the size of each snapshot.


The other interesting feature is that it tells you what space is required to commit the snapshots.  So for example, say you had taken 5 snapshots of a machine as it was being built and configured.  Say that the overall effect of those 5 snapshots is to fill up your VMFS datastore completely. Chances are that you’re not going to be able to commit the snapshots within the current VMFS datastore.  SnapVMX will be able to tell you the worse case scenario on how much space would be required to commit the snapshots.  Armed with this information you could cold migrate to another datastore that has at least that amount of free space in order to allow you to commit the snapshots.  The screenshot below isn’t the best but the best I could do due to the length of the statement.


For the download and full documentation on how to use this piece of code head over to the following web site. Worth a look if you’re a big user of snapshots.

While searching for a link to Ruben Garcia to put on this article I found that he has a blog site and within that I found a link to a superb troubleshooting VM snapshot problems article which I will definitely be keeping a link to and suggest you check out.  Truly excellent stuff Ruben!

General, Gestalt-IT, VMware , , , ,

Virtual Storage Integrator 3.0.0 for vSphere and EMC Storage

May 11th, 2010

I have been trying desperately this week to keep up to date with the latest announcements coming out of EMC World 2010.  Problem is they appear to be making them and blogging about them faster than I can read and assimilate them.

One blog post that did catch my attention was a post by EMC’s Chad Sakac. Chad constantly amazes me, he generates a massive amount of super high quality technical content for the EMC and VMware community. His blog post was entitled “EMC’s next generation vCenter Plugins” and details the latest and greatest offerings from EMC’s Free vCenter plugins.

The Virtual Storage Integrator (VSI) V3.0.0 is a renaming of the existing EMC Storage Viewer 2.1 plugin that has been available for a while.  Why the rename? Well EMC are introducing tighter integration by enabling storage provisioning from within vCenter, it’s now surpassed being just a storage viewer.  The storage provisioning integration works with the CLARiiON across all protocols (FC, ISCSI, FCOE) and it also works with NFS on the Celerra. It also adds greater degrees of simplicity and reduces risk by automating all the tasks involved in provisioning and presenting storage to your vSphere cluster.

Chad explains it in much more detail and much better than I ever could in the following video.

I personally feel that the benefits of EMC’s ownership and tight working relationship with vSphere are beginning to shine through.  Such tight levels of integration are now being delivered and future development doesn’t look likely to slow down either. The quote from Chad below show’s how aggressively his team are working to constantly bring new features to the table and best of all, there completely free!

EMC Virtual Storage Integrator (or VSI) is the main EMC vCenter plug-in.  Over time, more and more functions that currently require installing one or more additional plugins will fold into the EMC Virtual Storage Integrator.  We’re beefing up the team behind it, and they have a very aggressive roadmap (wait till you see what’s next!!!)

Click the link below to find out more about what vCenter plugins are available, what they’re designed to do and where you can download them from in EMC Powerlink.


Storage, VMware, vSphere , , , ,

Storage I/O control – SIOC – VMware DRS for Storage

May 10th, 2010

Following VMworld in 2009 a number of articles were written about a tech preview session on IO DRS – Providing performance Isolation to VMs in Shared Storage Environments. I personally thought that this particular technology was a long way off, potentially something we would see in ESX 4.5. However I recently read a couple of articles that indicate it might not be as far away as first thought.

I initially came across an article by VMware’s Scott Drummond in my RSS feeds.  For those that don’t follow Scott, he has his own blog called the Pivot Point which I have found to be a invaluable source of VMware performance related content. The next clue was an article entitled ESX 4.1 feature leak article, I’m sure you can probably guess what the very first feature listed was? It was indeed Storage I/O Control.

Most people will be aware of VMware DRS and it’s usage in measuring and reacting to CPU and Memory contention. In essence SIOC is the same feature but for I/O, utilising I/O latency as the measure and device queue management as the contention control. In the same way as the current DRS feature for memory and CPU, I/O resource allocation will be controlled through the use of share values assigned to the VM.


I hadn’t realised this until now but you can already control share values for VM disk I/O within the setting of a virtual machine (shown above).  The main problem with this is that it is server centric as you can see from the statement below from the VI3 documentation.

Shares is a value that represents the relative metric for controlling disk bandwidth to all virtual machines. The values Low, Normal, High, and Custom are compared to the sum of all shares of all virtual machines on the server and the service console.

Two main problems exist with this current server centric approach.

A) In a cluster, 5 hosts could be accessing VM’s on a single VMFS volume, there may be no contention at host level but lots of contention at VMFS level. This contention would not be controlled by the VM assigned share values.

B) There isn’t a single pane of glass view of how disk shares have been allocated across a host, it appears to only be manageable on a per VM basis.  This makes things a little trickier to manage.

Storage I/O Control (SOIC) deals with the server centric issue by introducing I/O latency monitoring at a VMFS volume level. SOIC reacts when a VMFS volume’s latency crosses a pre-defined level, at this point access to the host queue is throttled based on share value assigned to the VM.  This prevents a single VM getting an unfair share of queue resources at volume level as shown in the before and after diagrams Scott posted in his article.

   queues_before_sioc              queues_after_sioc

The solution to the single pane of glass issue is pure speculation on my part. I’d personally be hoping that VMware add a disk tab within the resource allocation views you find on clusters and resource groups.  This would allow you to easily set I/O shares for tiered resource groups, i.e. Production, Test, Development. It would also allow you to further control I/O within the resource groups at a virtual machine level.

Obviously none of the above is a silver bullet! You still need to have a storage system with a fit for purpose design at the backend to service your workloads. It’s also worth remembering that shares introduce another level of complexity into your environment.  If share values are not assigned properly you could of course end up with performance problems caused by the very thing meant to prevent them.

Storage I/O Control (SOIC) looks like a powerful tool for VMware administrators.  I know in my own instance, I have a cluster that is a mix of production and testing workloads.  I have them ring fenced with resource groups for memory and CPU but always have this nagging doubt about HBA queue contention.  This is one of the reasons I wanted to get EMC PowerPath/VE implemented, i.e. use both HBA’s and all available paths to increase the total bandwidth.  Implementing SOIC when it arrives will give me a peace of mind that production workloads will always win out when I/O contention occurs.  I look forward to the possible debut of SOIC in ESX 4.1 when it’s released.


Duncan Epping over at Yellow Bricks has located a demo video of SOIC in action.  Although a very basic demonstration,  it gives you an idea of the additional control SOIC will bring.

Gestalt-IT, New Products, VMware , , , ,

VMware PVSCSI Adapter performance and low I/O Workloads

February 21st, 2010

I’ve recently been implementing a vSphere deployment and have been looking at the new features introduced as part of Virtual Machine Hardware 7.  Obviously one of the major new components is the new Para Virtualised SCSI (PVSCSI) adapter which I wrote about way back in May 2009.  When it first came out there were a number of posts regarding the much improved I/O Performance and latency reduction this new adapter delivered, such as Chad Sakac’s I/O vSphere performance test post.

So the other day I stumbled across a tweet from Scott Drummond who works in the VMware Performance Engineering team. Following a little reading and a bit of digging around it appears that the use of PVSCSI comes with a small caveat.  It would appear that if you use the PVSCSI adapter with low I/O workloads you can actually get higher latency than you get with the LSI Logic SCSI adapter (see the quote below)

The test results show that PVSCSI is better than LSI Logic, except under one condition–the virtual machine is performing less than 2,000 IOPS and issuing greater than 4 outstanding I/Os.

This particular caveat has come to light following some more in-depth testing of the PVSCSI adapter performance.  The full whitepaper can be found at the following link.

PVSCSI whitepaper

For those who don’t want to read the technical whitepaper, a summary of the issue can be found in the following VMware KB article.

VMware KB 1017652

So basically, as opposed to just using the PVSCSI adapter as default with VMs running version 7 of the virtual hardware have a think about it’s I/O profile and whether the PVSCSI or LSI logic adapter would be best.

Gestalt-IT, VMware, vSphere , , , , ,

vSphere vMotion Processor Compatibility and EVC Clusters

January 25th, 2010

In today’s economic climate it’s currently the done thing to sweat existing assets for as long as you possibly can.  At the moment I am working on a vSphere deployment and we are recycling some of our existing ESX 3.5 U4 hosts as part of the project.  So over the weekend I was testing out vMotion between a new host with the Intel Xeon X7460 processor and an old host with the Xeon 7350 processor.  I was getting the following error message displayed which pointed to a feature mismatch relating to the SSE4.1 instruction set.  Thankfully the error pointed me to VMware KB article 1993


Within this KB article it immediately refers you to using Enhanced vMotion Compatibility (EVC) to overcome CPU compatibility issues.  I had never used EVC in anger and wanted to read up on it a bit more before making any further changes.  A quick read of page 190 on the vSphere basic configuration guide gives a very good brief overview for those new to EVC. 

So I was referred to VMware KB article 1003212 which is the main reference for EVC processor support.  Quite quickly I was able to see that EVC was supported for the Intel Xeon 7350 and 7460 using the Intel® Xeon® Core™2 EVC baseline.  In essence as far as vMotion is concerned all processors in the cluster would be equal to an Intel® Xeon® Core™2 (Merom) processor and it’s feature set.  This basically masks the SSE4.1 instruction set on the Intel Xeon 7460 that was causing me the problem.

So I set about enabling my current cluster for EVC,  however when I went to apply the appropriate baseline I was getting the following error displayed. The error related to the host that was currently running 3 Windows 2008 R2 x64 servers.  These servers were obviously using using the advanced features of the Intel Xeon 7460 and as such that host could not be activated for EVC.


The vSphere basic configuration guide (Page 190) makes the following recommendation for rectifying this issue, the example matched my situation exactly.

All virtual machines in the cluster that are running on hosts with a feature set greater than the EVC mode you intend to enable must be powered off or migrated out of the cluster before EVC is enabled. (For example, consider a cluster containing an Intel Xeon Core 2 host and an Intel Xeon 45nm Core 2 host, on which you intend to enable the Intel Xeon Core 2 baseline. The virtual machines on the Intel Xeon Core 2 host can remain powered on, but the virtual machines on the Intel Xeon 45nm Core 2 host must be powered off or migrated out of the cluster.)

Now here is the catch 22, my new vCenter server is virtual and sits on the ESX host giving me the EVC error message.  I had to power it off to configure EVC but I can’t configure the EVC setting on the cluster without vCenter,  how was I going to get round this?  Luckily VMware have another KB Article dealing with exactly this situation.  The aptly titled “Enabling EVC on a cluster when vCenter is running in a virtual machinewas exactly what I was looking for. Although it involved creating a whole new HA / DRS cluster complete with new resource groups, etc it was a lot cheaper than buying a large number of expensive Intel processors. It worked perfectly, rectifying my issue and allowing me to use all servers as intended.

Moral of the story…..… Check out VMware KB article 1003212 for processor compatibility before buying servers and always configure your EVC settings on the cluster before adding any hosts to the cluster.  If it’s to late and you have VMs created already,  well just follow the steps above and you should be fine.

vCenter, VMware, vSphere , , , , , , ,

Using the VMware PVSCSI adapter on a boot disk

January 24th, 2010

Having put in the first of a number of new vSphere ESX 4 Update 1 hosts a colleague today set about building our new Windows 2008 R2 64 bit vCenter server.  He later informed me that he was not able to boot the Windows 2008 virtual machine using the PVSCSI adapter (Paravirtualised SCSI) 

I was absolutely positive I had read that this was now supported in Update 1. As it turns out it is, however its not quite as straight forward as just adding the PVSCSI adapter and installing windows.  In fact windows will not recognise the boot disk as it has no native driver for the VMware PVSCSI adapter.

There are a couple of ways to get round this,  the first involves creating your VM with a normal SCSI adapter that Windows 2008  does support and then installing Windows.  Once the installation is complete add a second virtual disk with a second controller set up as PVSCSI and then install VMware tools.  VMware tools will deploy the driver required for the PVSCSI adapter,  once installed you can safely reconfigure the original SCSI controller to be PVSCSI and remove the secondary controller and virtual disk.  Now when you reboot your machine you won’t be met with a blue screen of death, instead you will have a fully working Windows 2008 server using all the benefits of the PVSCSI adapter.

For full step by step instructions to complete the above process I recommend using Alan Renouf’s article. For those who prefer to use powershell scripting to make their changes, check out this fantastic script from LucD’s website which will do it all for you.

The above method is one way to get the PVSCSI adapter working on the boot drive but to be honest it’s a bit of a hassle to be doing this every time you deploy a Windows 2008 VM.  So I had a look to see what was involved in obtaining the driver files for loading prior to installation.

First you need to extract the driver files from your vCenter server. You can find the relevant driver files located in the following directory.

C:\Program Files\VMware\VMware Tools\Drivers\pvscsi\

Take the 3 files contained within and make a floppy image, I used UltraISO for this particular task but something like WinImage works just as well. Now you need to boot your VM and once the windows installation files have loaded attach your floppy image.

As you can see in the screenshot below the Windows VM does not pick up the attached virtual disk due to the lack of native driver support.


Once you’ve pointed windows to the floppy image it picks up the VMware PVSCSI controller driver contained upon it.  Click next to apply the driver.


Once the system has applied the driver you can see the virtual disk for your installation.


For those who are looking to add this driver or any other VMware tool driver into a Windows 2008 Pre-Installation environment,  this VMware KB article on how to do it could be handy.

VMware, vSphere , , , , ,

Virtual Distributed Switch and vCenter Server failure

December 20th, 2009

I’m currently working with my colleagues on an upgrade of our VI 3.5 infrastructure to vSphere Enterprise Plus.  We have recently been mulling over some of the design elements we will have to consider and one of the ones that came up was virtual Distributed Switches (vDS).  We like the look of it,  it saves us having to configure multiple hosts with standard vSwitches and it also has some nice benefits such as enhanced network vMotion support, inbound and outbound traffic shaping and Private VLANs.

vDSOne of the questions that struck me was,  what happens if your vCenter server fails? what happens to your networking configuration? Surely your vCenter server couldn’t be a single point of failure for your virtual networking, could it?

Well I did a bit of digging about, chatted to a few people on twitter and the answer is no it would not result in a loss of virtual networking.  In vSphere vDS the switch is split into two distinct elements,  the control plane and the data plane. Previously both elements were host based and configured as such through connection to the host, either directly using the VI client or through vCenter. In vSphere because the control plane and data plane have been separated,  the control plane is now managed using vCenter only and the data plane remains host based.  Hence when your vCenter server fails the data plane is still active as it’s host based where as the control plane is unavailable as it’s vCenter based.

One thing I was not aware of was where all this vDS information is stored . Mike Laverick over at RTFM informed me that the central config for a vDS is stored on shared VMFS within a folder called the .dvsData folder. I’ve since learnt that this location is chosen automatically by vCenter and you can use the net-dvs command to determine that location. It will generally be on shared storage that all ESX hosts participating in the vDS have access to.  As a back up to this .dvsData folder a local database copy is located in /etc/vmware/dvsData.db which I imagine only comes into play if your vCenter server goes down or if your ESX host loses connectivity to the shared VMFS with the .dvsData folder.  You can read more about this over at RTFM

Interesting links if your considering VMware Distributed Switches

VMware’s demo video of vDS in action, for those who want to learn more about vDS

Mike Laverick’s great reasoning on whether you should use vDS or not

Eric Sloof’s vDS caveats and best practices article

VMware’s vSphere Networking white paper explaining new vDS features

VMware’s vSphere Networking white paper on vDS network migrations

Jason Boche’s very interesting article on a specific vCenter + vDS issue

ESX, vCenter ,

VMware VMSafe – Are there any actual products yet?

November 29th, 2009

I was doing some work out of hours the other night on my employers Virtual Infrastructure when bang on time the little red triangles started popping up against certain ESX hosts in vCenter.  Why you ask? well it’s AV scanning time on our VM’s of course, or the Sophos summit as we affectionately call it due to its uncanny resemblance to a mountain range when you look at the CPU performance stats in vCenter.

It got me thinking, has any one vendor actually got a product out there utilising the VMSafe API that could help me rid our virtual infrastructure of this problem?

My first stop was of course the main VMSafe page where I did find a large list of official partners who are working on developing products to utilise the VMSafe API. The pleasing thing to see was that there are plenty of mainstream security vendors taking part.  However I’ve still to see any of them releasing a product to market that actually utilises VMSafe.

Earlier this year in Glasgow I heard Mcafee talk about VMSafe as part of the VMware vSphere launch road show.  They talked about building a vApp that could sit in your Virtual Infrastructure and take care of AV scanning with the aim of reducing the CPU overhead that AV scanning introduces. I did a little trawl of the web and couldn’t find anything official, I did however find the following forum post (quoted below) which is definitely the unofficial line.

Virus Scan for Offline images is available, which uses VMSafe APIs to scan offline disks accessed via ESX

Nothing is currently road mapped for on-access scanning – no AV vendor has this technology available (or even road mapped as far as I’m aware) yet.

I did a bit more digging on this “scan offline disks” comment and found a recent article by VMware’s Richard Garsthagen.  This article reveals that a piece of software called the VMware Virtual Disk Development Kit (VDDK) can be used to conduct an offline scan of disks attached to powered on or off virtual machines (quoted below). 

VMware VDDK (also being seen as part of the VMsafe initiative, but has been available for longer). The VDDK is an disk API, that allows other programs to access a virtual machine’s hard disk like the VMware Consolidated Backup solution does. It does not matter is the VM is powered on of off, but a disk can just be ‘extra’ mounted to another virtual machine that for instance runs a virus scanner. The clear downside of VDDK is that nothing is real time.

Surely this would rid me of my daily scheduled Sophos summit, wouldn’t it? Think of a hypothetical scenario where you have a VDI setup with 1000 windows XP VM’s,  imagine the strain put on your ESX clusters by 1000 machines kicking off a scheduled daily AV Scan. Would an appliance that could offline scan disks reduce the strain? Well thinking about it, possibly not.  It would still have to conduct a scan of 1000+ virtual disks, only this time it wouldn’t have nearly as many CPU cycles available to churn through the work. All it would have is the resources assigned to the vApp which is likely to be completely inadequate for such a large task. With this in mind it’s likely that it would probably take a large amount of time to complete.  It could even take longer than a day which wouldn’t be much use for a daily AV scan. I’m sure some companies would rather suffer the ESX CPU resource pain point as opposed to sacrificing security through ineffective or untimely AV Scanning.

Richard’s article along with the solutions tab on the VMSafe webpage did however reveal that a couple of products that use VMSafe have made it to market.  One is called vTrust from Reflex Systems which appears to be a multi faceted application, which according to their site provides dynamic policy enforcement and management, virtual segmentation, virtual quarantine and virtual networking policies.  The other application is a hypervisor based firewall appliance from Altor that supports virtual segmentation and claims to provide better throughput by using the Fast Path element of the VMSafe API.

So it would appear on the surface that progress has been slow.  To only find two VMware certified appliances in the market place was, I have to admit, quite a surprise!  It looks like it’s going to be a while before we see VMsafe being fully utilised by vendors, even then we will  have those wary individuals who will never quite be convinced.

Neil Macdonald of Gartner makes a good point about the potential for VMSafe appliances to introduce possible security vulnerabilities at a lower level in the infrastructure.

If I’m responsible for VM security, I’ll consider it after the APIs ship, after the vendors finally ship their VMSafe-enabled solutions, after I’ve got a level of comfort that these VMSafe-enabled security solutions don’t in of themselves introduce new security vulnerabilities

Edward L Haletky who is very much focused on virtualisation security also makes a good point about low level vulnerabilities and the interaction of multiple VMSafe appliances. 

I fully expect VMware to not only ensure the VMSafe fastpath drivers do nothing harmful to the virtual environment, but also address interaction issues between multiple VMSafe fastpath drivers. In addition, I would like such reports made available to satisfy auditing requirements.

So was VMSafe simply something to bolster the vSphere marketing launch,  an announcement made before it should have been?  Usually VMware are quite good at keeping these kind of things under wraps and releasing them when they are a little more mature and ready for use in real world scenarios.  Now I don’t know what work was done with partners in advance but I would have liked to have seen a couple of the major security vendors releasing appliances at the same time as VMSafe was announced.  For me that certainly would have installed a little more confidence in VMSafe than writing this article has.

If anyone out there is writing appliances utilising the VMSafe API and wants to comment, please do.  I would love to hear some news from the front line as to what is being developed, where it will be applied and when we can expect to see it.

General, Gestalt-IT, vSphere ,