Archive

Archive for the ‘Gestalt-IT’ Category

Blending IaaS, Iaas+ and PaaS to deliver today’s IT

March 2nd, 2014

So,  I have to start off by thanking Brian Gracely (a fellow EMCer) for starting me off on this runaway train of thought.  a few weeks ago I read his article entitled “Will IaaS+ happen before PaaS” the question posed in the title is very apt, a question which only reinforces the fact that we work in such a rapidly changing IT world.  One where the battle for market and mind share continues to alter the landscape on a monthly if not daily basis.

IaaS, PaaS, SaaS…   Reality!!

I remember when I first started at EMC as a vSpecialist,  we were all about building private Infrastructure as a Service (IaaS) capabilities.  Virtualise everything on VMware, utilise vCloud Director to create pools of resource,  securely divide them between tenants, deliver a self service catalogue through a portal and conduct chargeback against all of it.  We also talked about extending that private cloud to the public cloud through a federated hybrid cloud model, but ultimately when you strip away the bells and whistles what we were all talking about from an end user perspective was the basic provisioning of VM’s to the OS layer.

Roll forward 3 years and things have changed up a gear significantly. Platform as a Service (PaaS) has become the new hot topic and the ability to write applications without worrying  about the underlying infrastructure is well and truly upon us.  Pivotal Cloud Foundry, Red Hat OpenShift, AWS Elastic BeanStalk, Microsoft Azure and SalesForce.com’s Heroku are some of the key players in today’s PaaS market. Some of these PaaS offerings are open and can sit on top of multiple cloud infrastructures,  others are more proprietary and locked in. All of them however are trying to capture a share of the customers who broadly speaking fall into the following categories.

Innovate or die — PaaS offers a way to “leapfrog” the competition with the ability to quickly integrate the latest innovations in software development and scale them quickly. Customers get that pie-in-the-sky seamless experience, which is a win for everybody.

Agility is key — PaaS is a strong entry point to embracing the DevOps mindset with minimal investment, helping organizations work toward agile development. When you don’t have to worry about the underlying infrastructure, it becomes a lot easier to achieve continuous deployment and quick, responsive feature updates. Developers don’t need to handle operations and operations don’t need to know how to code in order to take advantage of a PaaS.

Build once, deploy anywhere — This is relatively specific to the open source players, but the ability to build an application on a common platform means that you can write it once and deploy it on any infrastructure you’d like. In other words, if you build an application on Cloud Foundry, that application will run the exact same way on any other instance of Cloud Foundry, with the implication that you can ideally move from the public cloud to the private cloud or between public cloud providers with lessened fear of lock-in.

Quoting – Director of Pivotal’s Cloud Foundry Product Team Dekel Tankel in CiteWorld

The reality (in my opinion)… well it’s a mixed bag of course. Today I still see customers / IT teams striving to provide basic IaaS for their internal users.  In truth developers wanted that simple capability from internal IT over 2 years ago, it is one of the main reasons that many public cloud providers, such as AWS are what they are today.  They offered a viable and quick alternative for infrastructure while internal IT teams were either still trying to work out how to do IaaS themselves or were simply asleep at the wheel not realising that some very real competitor existed out there.

PaaS is an exciting shift in the industry, it’s allowing businesses to move up the stack, forget about the infrastructure and concentrate on building the applications that differentiate them from their competitors.  It’s still early days for PaaS,  though the momentum is building, not least with the Cloud Foundry foundation announcement this week which saw some of the  industry heavy hitters commit to developing the Cloud Foundry project.

image
I certainly don’t think anyone can argue with the concepts of PaaS, or the fact it will rapidly take share in  the coming years as development methods change.  I do however feel that it won’t suit everyone immediately,  It’s great for greenfield start-ups who are not constrained by legacy IT and want to operate in the public cloud, but how will it be adopted into the existing businesses? How quickly will it be adopted into the enterprise?  All enterprise customers should be taking a look at this today,  working out how they can integrate PaaS into their IT function to fundamentally change how they manage the software development lifecycle.

I personally think we’re at another one of those interesting inflexion points.  The business and the developers that work for them want to move ever faster.  Internal IT has maybe only just got to grips with IaaS and can now service them with the VM’s they want quickly.  However the developers have moved on and already want more,  they want to consume services and not just VM’s,  they look to the public cloud and they see rich services being layered on top of cloud infrastructure,  messaging services, Database as a service, in-memory capabilities, etc.  The developers and the business are again demanding more than internal IT are currently delivering,  sounds familiar doesn’t it!  Sounds like the IaaS story from a few years back all over again.

Delivering Iaas, IaaS+ and PaaS

The question I asked myself after reading Brian’s article was what are EMC doing to assist our customers to deliver in this crazy fast moving IT world of ours.  When I say “customers” I mean that in the broadest sense,  it could be assisting small IT departments,  enterprise customers or service providers looking to deliver services back to businesses and public consumers.

At EMC we’ve been talking a lot about the 1st, 2nd and 3rd platform. Some of you may have seen the picture below before,  sums up today’s modern IT world very well I think.  I certainly speak to customers who operate somewhere in all three of these platforms and I can safely say that I don’t see that model changing overnight,  however I do see companies striving hard to leave the legacy behind and leapfrog straight into the 3rd platform.

2014-02-09_2111

If we break it down, what businesses are technically looking to achieve today is optimising the 2nd platform and at the same time enabling the 3rd platform.  This is where I think a blended model of IaaS, IaaS+ and PaaS will cover the majority of use cases.  IaaS and IaaS+ will help optimise legacy and new 2nd platform applications and change how they are delivered.  IaaS+ and PaaS will find itself used for new application requirements in both the the 2nd platform and 3rd platform.

A picture speaks a thousand words,  so I’ve attempted to draw out below what’s in my head on this subject. Thanks to Eric Wright for the graphic inspiration in his recent blog post. In theory by mixing traditional IaaS with PaaS and looking to utilise that combined stack to layer services on top of IaaS,  whether that be PaaS delivered (MemDB aaS or Messaging aaS) or more traditionally delivered app blueprints (DBaaS straight on top of IaaS) we eventually come up with a hybrid model that caters to lots of different requirements.

IaaS_Plus

So if we then take that on one step, what does the EMC federation have to offer in this space to help customers achieve this blended model of IaaS, IaaS+ and PaaS?  I came up with the below diagram, pretty busy isn’t it!!

VMware_EMC_Offering1

Lets break it out from the top down

Services – there is a plethora of services that theoretically could be consumed,  some you may put together yourself (via application blue prints or custom buildpacks for Pivotal CF) others may be pre-packaged for you (either blueprints or packaged services).  They may be consumed on PaaS (buildpacks) or may be deployed straight onto IaaS (blueprints).  Pivotal services such as GemFire, RabbitMQ and TC Server are some of the offerings that can be deployed on either today.

vCloud Automation Center – vCAC has recently had the vFabric Application Director functionality folded in can be utilised to deploy VM’s, Operation Systems and application blueprints straight onto physical infrastructure, multiple hypervisors and multiple cloud platforms.  I’ve included Puppetlabs on the diagram as the integration with vCAC has greatly expanded the capability for service deployment, with vCAC being able to take advantage of the puppet modules library.   I think once VMware get vCAC fully plugged into Pivotal CF and the vCloud Hybrid Service (hopefully both aren’t too far away) it will make it an exceptionally powerful tool for automation whether that be VMware or a heterogeneous cloud environment.

Pivotal Cloud Foundry – The open source Platform as a Service offering that is today compatible with multiple cloud environments (VMware, OpenStack, AWS and I’m hoping vCloud Hybrid Service soon).  It currently comes in an Open Source and a Pivotal enterprise flavour, other custom variations will undoubtedly appear, IBM Bluemix is a recent PaaS based on CF that I’ve been reading about.  Cloud Foundry is creating a real stir in the IT world at the moment with a lot of the IT heavyweights throwing their backing behind this open source project, prompting comments such as the one below.

“Cloud Foundry is on an absolute tear. The number of companies that have bought into the initiative, the amount of code being contributed, the customer wins that ecosystem members are enjoying suggest that Cloud Foundry is preeminent among all the open source PaaS initiatives.”

— Ben Kepes, Forbes


VMware Software Defined Data Centre
– I think everyone knows the story with this, if you don’t you really should :-)   VMware in combination with it’s partners (EMC included) are working hard on delivering the software defined data centre.  The basic software defined compute layer is where VMware earned it’s stripes, that area of the SDDC needs no introduction.  Software Defined Networking became a mainstream topic with the VMworld 2013 announcement of VMware NSX.  With NSX VMware are bring the same consolidation, control and flexibility benefits to the network world as they did to the server world. Granular network policies that can follow a VM around regardless of it’s location (Private or Public clouds) is a key factor in enabling wide spread SDDC and hybrid cloud adoption. The last element is Software Defined Storage, something VMware and EMC are working very hard on.  EMC announced our ViPR offering at EMC world in 2013,  a fundamental change in thinking around storage.  Abstracting the control plane and enabling a single point of management for EMC storage, other vendors storage, as well as commodity and cloud storage was a major change of direction for EMC.  Providing software only data services is another fundamental shift in mindset, but an essential one as the storage world slowly becomes more commoditised.  Today EMC offer HDFS and Object data services through ViPR,  in the future there will be a lot more as EMC focus efforts on producing more abstracted software features.  VMware have also gone down the commodity route with VSAN,  still in Beta but due for release soon,  it will prove popular for those VMware only shops who want to consume commodity server direct attached storage.

VCE vBlock – The leading converged infrastructure stack company created as a joint venture between EMC, Cisco, VMware and Intel.  Converged infrastructure stacks are the easiest means of quickly deploying private cloud infrastructure within your own data centre. Built from best of breed components, available in multiple T-shirt size offerings, fully integrated, built in the factory, certified and ready to roll in 45 days or less,  it is the perfect way to consume infrastructure and underpin a blended IaaS, IaaS+ and PaaS model.

vCloud Hybrid Cloud (vCHS) – This week saw VMware announce their new vCHS service in Europe,  based in the UK.  This is a key announcement for VMware and one that will sure to get customers excited.  Lots of customers have VMware in their own private cloud deployments today,  enabling those customers to extend their internal VMware environments while use the same automation and monitoring tooling will be very appealing.  The capability to move your VM’s back and forth between on-prem VMware private cloud and vCHS or simply deploying your existing VM templates directly into a public cloud offering without upheaval is a huge plus or vCHS.  With IPSEC VPN or direct connect network capabilities on offer,  vCHS offers the simplest means of extending of your data centre to consume as required.  Once the offering beds in and a few more bells and whistles are added (I’m thinking Pivotal CF PaaS here) it will be an even more compelling offering and a large part of VMware’s future business.

Summary

This has been a very long post, written over a number of weeks and written about a moving target (I’m thinking Cloud foundry foundation announcement and vCHS launch here).  It basically comes down to businesses and the developers wanting to constantly innovate and do more.  It’s about IT struggling along to keep up with that insatiable appetite and deliver what they want.  I believe IT departments wants to help them do that, but they have to balance delivering with all the challenges that comes with existing IT Legacy and the requirement to maintain secure and compliant environments as per the rules and regulation that governs their business.

I believe the answer is a mixture of platform 2 and platform 3 solutions,  it’s a mixture of legacy and new world applications, it’s a mix of legacy IT infrastructure, IaaS, IaaS+ and PaaS as I outlined above. With the work going on at EMC + VMware + Pivotal it’s a no brainer that this particular federation of companies is in a perfect position to help the businesses, developers and IT Infrastructure teams with that journey to innovate and change how they do what they do!

EMC, Gestalt-IT, Pivotal, VMware , , , , , , , , , ,

Microsoft VDI licensing –VDA and Microsoft WinTPC

April 18th, 2011

Some time ago I wrote a blog post about Microsoft Virtual Desktop Access (VDA) licensing that was introduced back in July 2010. For those that don’t want to read the whole article the summary of VDA was as follows.

  • You need to licence the endpoint accessing a windows VDI Desktop.
  • It’s £100 per year per endpoint.
  • Multiple endpoints each need a licence, i.e. home PC, office thin client, iPad
  • VDA included if endpoint is Windows and is Software Assured

I remember at the time thinking that this was going to hinder VDI deployment projects.  The additional on-going cost of licensing every potential endpoint a user may use was going to push TCO up, increase the time for ROI to be realised and generally make VDI a very unappealing prospect. Don’t even get me started on how difficult this makes it for service providers to create a Windows Desktop as a Service offering.

Recently one of my esteemed colleagues at EMC (another vSpecialist by the name of Itzich Reich who’s blog you can find here) sent out an email about Microsoft releasing a customer technology preview (CTP) of a product called Windows Thin PC (WinTPC). In summary this is a slimmed down version of Windows 7 and is designed for the re-purposing of old PC equipment as thin client devices.

Windows_Thin_PC

It has a couple of features worth mentioning for those technically minded people out there.

  • RemoteFX support for a richer, higher fidelity hosted desktop experience.
  • Support for System Center Configuration Manager, to help deploy and manage.
  • Write filter support helps prevent writes to disk, improving end point security.


WinTPC and / or VDA

So how does this new product fit in with the rather expensive VDA licensing? Well the good news is that WinTPC can be used to access a VDI desktop without the need for a VDA licence. On the downside WinTPC will only be available as a benefit of Software assurance for volume licensees. Now seeing as the VDA licence doesn’t apply to an endpoint that is windows based and covered by software assurance it makes no real difference from a licensing point of view which option you go for.  So if you have software assurance the choice is yours,  if you don’t, well coughing up for VDA licences each year is your only option I’m afraid.

What WinTPC does allow companies to do is maximise existing PC hardware investments.  This should allow companies to offset some of that initial upfront cost often associated with VDI projects. Microsoft’s idea is that companies can try out VDI using WinTPC and existing PC assets, when these PC’s become end of life they can swap over to using windows embedded devices without needing to change the management tools. Now VDI is not cheap, capital costs can be high,  savings are usually made in operational and management costs later down the VDI journey.  As I mentioned at the start of this post,  the VDA licence has not helped VDI adoption as it increases both capital and operational costs due to it’s annual subscription cost model.  Will this new release from Microsoft help reduce costs?

Just Saying!

My opinion, I personally think Microsoft are in a tricky position, they’re somewhat behind the curve on the VDI front and I always felt the VDA licence was designed to slow VDI adoption while they gained some ground on the competition.  If anyone chose to forge ahead, regardless, well Microsoft would generate some nice consistent revenue through the VDA licence. So the prospect of a WinTPC release is a nice touch by Microsoft during these hard economic times but not everyone can benefit.  What I would like to see is Microsoft offer this outwith Software Assurance,  sell it as a single one off licence cost as an alternative to the annual subscription model used with the VDA.  Give your customers the choice and let them get on with their VDI journey, be part of it as opposed to being the road block!

Links

If you are interested in learning more,  check out the links below.  To download the CTP version of WinTPC then go to Microsoft Connect and sign up to download it, would love to hear what you think.

http://www.microsoft.com/windows/enterprise/solutions/virtualization/products/thinpc.aspx

https://connect.microsoft.com/

Gestalt-IT, Microsoft, New Products ,

Whats new in vSphere 4.1 Storage

September 2nd, 2010

So I haven’t done a lot of real time blogging at VMworld this year as I’ve been busy trying to see and soak up as much as possible.  It’s not every day that you get access to the likes of Chad Sakacc (VP EMC / VMware alliance) Scott Drummond (EMC – ex VMware performance team) and a whole host of other technology movers and shakers. As you can imagine I took full advantage of these opportunities and blogging became a bit of secondary activity this week.

However, I’ve now had time to reflect and one of the most interesting areas I covered this week which was the new storage features in vSphere 4.1. I had the chance to cover these in multiple sessions, see various demo’s and talk about it with the VMware developers and engineers responsible. There are two main features I want to cover in depth as I feel they are important indicators of the direction that storage for VMware is heading.

SIOC – Storage I/O Control

SIOC had been in the pipeline since VMworld 2009, I wrote an article on it previously called VMware DRS for Storage, slightly presumptuous of me at the time but I was only slightly off the mark. For those of you who are not aware of SIOC, to sum it up again at a very high level let’s start with the following statement from VMware themselves.

SIOC provides a dynamic control mechanism for proportional allocation of shared storage resources to VMs running on multiple hosts

Though you have always been able to add disk shares to VM’s on an ESX host, this only applied to that host, it was incapable of taking account of VM I/O Behaviour of other VMs on other hosts. Storage I/O control is different in that it is enabled on the datastore object itself, disk shares can then be assigned per VM inside that datastore. When a pre-defined latency level is exceeded on a VM it begins to throttle I/O based on the shares assigned to each VM.

How does it do this, what is happening in the background here? Well SIOC is aware of the storage array device level queue slots as well as the latency of workloads.  During periods of contention it decides how it can best keep machines below the predefined latency tolerance by manipulating all the ESX Host I/O Queues that affect that datastore.

In the example below you can see that based on disk share value all VM’s should ideally be making the same demands on the storage array device level queue slots.  Without SIOC enabled that does not happen. With SIOC enabled it begins throttling back the use of the second ESX host’s I/O queue from 24 slots to 12 slots, thus equalising the I/O across the hosts.

Paul Manning (Storage Architect – VMware product marketing) indicated during his session that there was a benefit to turning SIOC on and not even amending default share values.  This configuration would immediately introduce an element of I/O fairness across a datastore as shown in the example described above and shown below.

SIOC_Fairness_Full

So this functionality is now available in vSphere 4.1 for Enterprise Plus licence holders only.  There are a few immediate caveats to be aware of, it’s only supported with block level storage (FC or ISCSI) so NFS datastores are not supported. It also does not support RDM’s or datastores constructed of extents, it only supports a 1:1 LUN to datastore mapping. I was told that extents can cause issues with how the latency and throughput values are calculated,  which could in turn lead to false positive I/O throttling, as a result they are not supported yet.

It’s a powerful feature which I really like the look of. I personally worry about I/O contention and the lack of control I have over what happens to those important mission critical VM’s when that scenario occurs. The “Noisy Neighbour” element can be dealt with at CPU and Memory level with shares but until now you couldn’t at a storage level. I have previously resorted to purchasing EMC PowerPath/VE to double the downstream I/O available from each host and thus reduce the chances of contention.  I may just rethink that one in future because of SIOC!

Further detailed information can be found in the following VMware technical documents

SIOC – Technical Overview and Deployment Considerations

Managing Performance Variance of applications using SIOC

VMware performance engineering – SIOC Performance Study

VAAI – vStorage API for Array Integration

Shortly before the vSphere 4.1 announcement I listened to an EMC webcast run by Chad Sakacc.  In this webcast he described EMC’s integration with the new vStorage API, specifically around offloading tasks to the array. So what does all this mean, what exactly is being offloaded?

VAAI_Features 
So what do these features enable? Let’s take a look at them one by one.

Hardware assisted locking as described above provides improved LUN metadata locking.  This is very important for increasing VM to datastore density.  If we use the example of VDI boot storms, if only the blocks relevant to the VM being powered on are locked then you can have a more VM’s starting per datastore.  The same applies in a dynamic VDI environment where images are being cloned and then spun up; the impact of busy cloning periods, i.e. first thing in the morning is mitigated.

The full copy feature would also have an impact in the dynamic VDI space, cloning of machines taking a fraction of the time as the ESX host is not involved. What I mean by that is when a clone is taken now, the data has to be copied up to the ESX server and then pushed back down to the new VM storage location.  The same occurs when you do a storage vMotion, doing it without VAAI takes up valuable I/O Bandwidth and ESX CPU clock cycles. Offloading this to the array prevents this use of host resource and in tests has resulted in a saving of 99% on I/O traffic and 50% saving on CPU load.

In EMC Labs a test of storage vMotion was carried out with VAAI turned off, it took 2 mins 21 seconds.  The same test was tried again with VAAI enabled, this time the storage vMotion took 27 seconds to complete. That is a 5x improvement, and EMC have indicated that they have had a 10x improvement in some cases. Check out this great video which shows a storage vMotion and the impact on ESX and the underlying array.

There is also a 4th VAAI feature which has been left in the vStorage API but is currently unavailable, Mike Laverick wrote about it here. Its a Thin Provisioning API and Chad Sakacc explained during the group session that its main use is for Thin on Thin storage scenarios. The vStorage API will in the future provide vCenter insight into array level over provisioning as well as the VMware over provisioning.  It will also be used to proactively stun VM’s as opposed to letting them crash as currently happens.

As far as I knew EMC was the only storage vendor offering array compatibility with VAAI. Chad indicated that they are already working on VAAI v2 looking to add additional hardware offload support as well as NFS Support. It would appear that 3Par offer support, so that kind of means HP do to, right? Vaughan Stewart over at NetApp also blogged about their upcoming support of the VAAI, I’m sure all storage vendors will be rushing to make use of this functionality.

Further detailed information can be found at the following locations.

What does VAAI mean to you? – Chad Sakac EMC

EMC VAAI webcast – Chad Sakac EMC

Storage DRS – the future

If you’ve made it this far through the blog post then the fact we are taking about Storage DRS should come as no great surprise.  We’ve talked about managing I/O performance through disk latency monitoring and talked about array offloaded features such as storage vMotion and hardware assisted locking. These features in unison make Storage DRS an achievable reality.

SIOC brings the ability to measure VM latency, thus giving a set of metrics that can be used for storage DRS.  VMware are planning to add capacity to the storage DRS algorithm and then aggregate the two metrics for placement decisions.  This will ensure a storage vMotion of an underperforming VM does not lead to capacity issues and vice versa.

Hardware Assisted Locking in VAAI means we don’t have to be as concerned about the number of VM’s in a datastore, something you have to manage manually at the moment.  This removal of limitation means we can automate better, a storage DRS enabler if you will.

Improved Storage vMotion response due to VAAI hardware offloading means that the impact of storage DRS is minimised at the host level. This is one less thing for the VMware administrator to worry about and hence smoothes the path for storage DRS Adoption.  As you may have seen in the storage vMotion video above the overhead on the backend array also appears to have been reduced, so you’re not just shifting the problem somewhere else.

For more information I suggest checking out the following (VMworld 2010 account needed)

TA7805 – Tech Preview – Storage DRS

Summary

There is so much content to take in across all three of these subjects I feel that I have merely scratched the surface.  What was abundantly clear from the meetings and session I attended at VMworld is that VMware and EMC are working closely to bring us easy storage tiering at the VMware level.  Storage DRS will be used to create graded / tiered data pools at the vCenter level, pools of similar type datastores (RAID, Disk type). Virtual machines will be created in these pools; auto placed and then moved about within that pool of datastores to ensure capacity and performance. 

In my opinion it’s an exciting technology, one I think simplifies life for the VMware administrator but complicates life for the VMware designer. It’s another performance variable to concern yourself with and as I heard someone in the VMworld labs comment “it’s a loaded shotgun for those that don’t know what they’re doing”.  Myself, I’d be happy to use it now that I have taken the time to understand it; hopefully this post has made it a little clearer for you to.

Gestalt-IT, Storage, VMware, vmworld , , ,

Symantec Application HA for VMware – VMworld 2010

August 16th, 2010

I was lucky enough last week to be involved in a Gestalt IT conference call with Symantec.  The conference call was designed to give us all a sneak preview of what Symantec were planning to announce at VMworld 2010 in a couple of weeks.  Unfortunately it was under embargo, that is until today!

There were a couple of announcements being made, Symantec introduced a new NFS storage product called VirtualStore and made some further announcements about NetBackup 7 and new VMware specific features.  However the most interesting announcement on the call for me was the release of Symantec Application HA for VMware.

Symantec_Virt_AdoptSymantec have been looking at why customers are not going “the last mile” with virtualisation.  Why are customers not deploying their Tier 1 applications on their virtual platforms? Symantec’s view on this was that customers still have issues with application level failure within guest VM’s.  This product has been designed to fill that void and at present is a product with no real competitors.

As the call progressed the current HA options were described by Symantec and discussed by the group. The obvious one is VMware HA which covers a physical host failure event. Within the VMware HA product there is also VM monitoring which covers you in the event of an OS level failure event, such as a  blue screen.  Then you can of course employ other technologies such as OS level clustering, however you then have to take heed of caveats that hinder the ability to use features such as vMotion and DRS.

I’m always sceptical when I see new virtualisation products, one of my fears is that companies are attempting to just jump on the crest of the wave that is virtualisation. Symantec are obviously a bit more established than your average company, but as always the jury is out until we see a final product doing the business for real.  It transpired during the call that the product is actually based on Symantec Veritas Cluster Server,  a product with a long history in application availability.

Veritas Cluster Server has a lot of in built trigger scenarios for common products such as Microsoft SQL Server, Exchange Server and  IIS.  On top of this built in, out of the box support Symantec also have a VCS development kit allowing for custom scenarios to be written.  I like this approach,  it reminds me of F5 Networks use of the customer community to support the writing of custom rules and features for their product.  If a custom rule or feature has enough demand then they spend the time developing it into their product range.    Perhaps Symantec could look at leveraging their customer base and community in this way and improve the support around VCS trigger scenarios.  One other potential use of the VCS SDK that springs to mind is for application vendors who are making specialist software, CRM, ERP, Finance systems, etc.  They could look to build in Application HA into pre-configured virtual appliances, that would be a great selling point for any software vendor.

The deployment of the product itself takes the form of a guest deployment / agent. Technical deep dive information on the exact integration between the Symantec product and VMware was thin on the ground.  However there was mention of Symantec’s integration with the VMware HA API,  something that I don’t think has been announced by VMware just yet.  The description given to us during the call was that if Symantec Application HA failed to restart the application it could send a downstream API call to VMware HA and ask it to restart the VM’s Operating System.  An interesting concept, something I am sure we’ll hear more about at VMworld.

Licensing for this new product is quite competitive, $350 per virtual machine, a small price to pay for ensuring your Tier 1 application recovery is automated.  Symantec have promised full integration with vCenter Server and the screenshot below shows Symantec Application HA in action monitoring a SQL 2008 server, click on the thumbnail to see a full size image.

If you would like to learn more about Application HA, then get along to VMware and Symantec’s break out session at VMworld. – http://www.vmworld.com/docs/DOC-4658

Alternatively you can listen to a Podcast from Symantec’s Niraj Zaveri discussing the new product.  – http://www.symantec.com/podcasts/detail.jsp?podid=ent_application_ha

General, Gestalt-IT, New Products, VMware , ,

Gestalt IT Tech Field Day Seattle – NEC HYDRAstor

July 16th, 2010

Following my return from my first Tech Field Day I have been reading through my notes and reflecting on the vendors I saw when I was in Seattle.  Of the vendors I saw the one that surprised me most was NEC, everyone has heard of them but not everyone actually knows what they do or what products they make.  As we found out during our visit, NEC have a broad technology portfolio and have quite an interesting offering in the storage space.

Here are some basic facts about NEC that you may / may not know

- Founded in 1899
- Fortune 200 company with over 143,000 staff
- Revenues of $43 Billion in 2009
- $3 Billion spent in R&D each year across 12 R&D global labs
- 48,000 patents worldwide.
- Have been in storage since 1950 

So with that little history lesson over, the main focus of our visit was NEC’s HYDRAstor. This is their modular grid storage offering for customers with backup and archive storage in mind. It’s marketed as “Grid storage for the next 100 years” which may sounds a little far fetched, but data growth and data retention periods are ever increasing.   From what I saw and heard the HYDRAstor could very well live up to this bold claim.

There was a lot of content delivered on the day and the session went on for 4 hours, so I’ve tried to wrap up some of the key features below. I have expanded on the key elements of the HYDRAstor that really caught my attention as I think they are worth exploring in more detail.

Key Features

- 2 tier architecture based entirely on best of breed Intel Xeon 5500 based servers

- 2 tier architecture consists of front end accelerator nodes and back end storage nodes

- Shipped as a turnkey solution, though entry level can be bought for self racking.

- Supports a Maximum of 165 Nodes, 55 accelerator nodes and 110 storage nodes

- All interconnects based on 1GB Ethernet Networking (NEC Network switches included)

- Supports old and new node modules in the same Grid for easy node upgrade and retirement.

- Supports volume presentation with NFS and CIFS (SMB Version 1)

- Non-disruptive auto reallocation of data across any additional grid capacity – DynamicStor

- higher levels of resilience than RAID with a reduced capacity overhead (See DRD below)

- WAN optimised grid to grid replication minimises network bandwidth requirements – RepliGrid

- WORM Support for secure retention / compliance governance – HYDRAlock

- Efficient drive rebuilds, only rebuild the actual data not the whole drive.

- Global inline de-duplication across the entire grid – DataRedux™

- Tight backup vendor integration – strips out backup metadata to improve de-dupe ratios

- Mini HYDRAstor appliance available for remote offices or offsite DR replication.

Data Protection – Distributed Resilient Data™ (DRD)  

The resilience provided by HYDRAstor really caught my eye, primarily because it was so different from anything I had ever seen before.  Distributed Resilient Data (DRD) uses something known as erasure coding to provide extremely high levels of resilience. Now you may think that this would come with a considerable storage and performance overhead, but you’d be wrong.

The HYDRAstor provides 6 levels of protection (1 – 6) all with differing levels of protection and capacity overhead. With the default level 3 selected NEC’s implementation of erasure coding splits the data chunks into 12 parts, 9 data and 3 parity. The use of erasure coding means that it only ever needs 9 parts to make up a complete data chunk. So if that data chunk is spread over 12 disks in a single storage node, it can withstand 3 disk failures. if those 12 chunks are spread over 12 storage nodes then you can withstand 3 complete node failures.

This default level 3 protection requires a 25% capacity overhead, much like RAID 5.  However by providing for 3 disk failures it provides 300% more protection than RAID5 and 150% more protection than RAID 6.  If you want to go to the highest level of protection (level 6) then there is a 50% capacity overhead as with RAID 1, however you can withstand the failure of 6 disks or 6 nodes.

The following video describes Distributed Resilient Data™ (DRD) at the default level 3

 

High Performing

The demonstration NEC gave us was based on their lab setup of 20 accelerator nodes and 40 storage nodes.  This was a 4 rack setup, which as you can see from the photo below is not a small setup. What it is though, is a very high performing storage solution.

image

NEC demonstrated a data copy that utilised a full 10GB per second throughput, which worked out at about 540MB throughput per front end accelerator node.  The screenshot from the management GUI below shows the  total throughput achieved.

The maximum HYDRAstor configuration consists of 11 racks and is capable of 25GB per second or 90TB per hour. This works out at roughly 2 PB’s in a 24 hour period, that is an astounding amount of data throughput.  Surely a level of throughput to deal with even the most demanding backup or archiving use case.
 

image

There were a few negative aspects that I picked up on during our visit, thankfully all ones I feel can be addressed by NEC over time.

User Interface

I felt the user interface was a little dated (see screenshot above), it served it’s basic purpose but wasn’t going to win any awards. It was a stark contrast when compared with the very nice and easy to use GUIs we saw from Nimble storage and Compellent.  That said if the HYDRAstor is only being used as a backup and archive storage and not primary storage, does it actually need to have the worlds best GUI, possibly not.

Solution Size

The HYDRAstor came across as a large solution, though I’m not sure why. When I think about it any storage solution that provides 10GB/sec throughput and 480TB of raw storage is likely to take up 4 racks, in some instances probably a lot more.  Maybe it was the sheer number of network interconnects, perhaps some consolidation with 10GB Ethernet could assist in making the solution appear smaller.  NEC could also look at shrinking down the servers sizes, probably only possible with the accelerator node servers as the storage nodes need 12 x 1TB disk so not a lot of scope for size reduction there.

Marketing

A general consensus among delegates was why have NEC marketing not been pushing this harder,  why had so many of us in the room not heard about it? I suppose that was one of the reasons we were there, to hear about it, discuss it and ultimately blog about it as I’m doing now. There are some specific target markets that NEC maybe need to look at for this product, possibly looking at world wide data retention regulations as a means of identifying potential markets and clients.  More noise needs to be made by NEC about there efficient de-dupe integration with enterprise backup products such as CommVault Simpana, Symantec NetBackup, TSM and EMC Networker.  More comments such as the one below wouldn’t hurt.

with the application aware de-duplication for CommVault we’ve optimized storage efficiency with a four times improvement in space reduction.
Pete Chiccino, Chief Information Officer, Bancorp Bank

EMEA availability

NEC told us that this product is not being actively pushed in the EMEA region.  Currently the product is only available for purchase in North America and Japan.  One of the points I made to NEC was that the HYDRAstor appeared to me to be a product that would have a lot of applications in the European market place, possibly more so in the UK.  I made specific reference to FSA regulation changes where Financial companies are now required to keep all electronic communications for up to 7 years.  NEC’s HYDRAstor with it’s high tolerance for failure, global de-duplication across all nodes and grid like extensibility is perfect for storing this kind key critical complaince data.  That is a very specific example, another is insurance companies who have longer retention requirements and museums digitising historical documents / books which have a “keep forever” retention requirement.

NEC contacted me via twitter after the event to say that although not on sale in EMEA if a company has a presence in the US they will be able to explore purchasing the HYDRAstor through NEC America.

Summary

I had no idea what to expect when we arrived at NEC’s offices, sure I knew who they were but I had no idea what they were doing in the storage space. Gideon Senderov at NEC certainly saw to it that we had all the information needed to form an opinion, his knowledge of his product was simply outstanding.

NEC HYDRAstor is a product that is quite unique. It’s easy to scale up and scale out, has high levels of redundancy without the normal capacity penalty and of course exceptional levels of performance. It strikes me as a product that any IT professionals responsible for backup, archiving and long term data retention would be very, very interested in

Note : Tech Field Day is a sponsored event. I receive no direct compensation and take personal leave to attend, however all event expenses are paid by the sponsors via Gestalt IT Media LLC. The views and content expressed here are my own and is in no way influenced by the sponsors of this event.

Events, Gestalt-IT, Storage , ,

Gestalt IT Seattle Tech Field Day – Day 2 Summary

July 16th, 2010

It’s now been a couple of days since the second day of the Gestalt IT Tech Field Day, I’m actually taking the opportunity to write this on the plane on the way back from Seattle. So once again I thought I would do a summary post until I get the chance to write up a detailed post on each vendor.

 image image

Compellent were one of the main sponsors for the Seattle Tech Field Day and were responsible for us getting access to the Microsoft Campus. So a big thank you to Compellent for their support of Tech Field Day.

Compellent are a company I have had dealings with before, I looked at buying one of their storage devices back in 2008 and was very impressed by the product they had on offer at the time.  This was a great chance for me to revisit Compellent two years on and see how things had changed.

Compellent in general still appears to be much the same product that I liked so much back in 2008.  Their pooled storage model, software controlled RAID write down, space efficient snapshots and WAN optimised thin replication are all superb  features. There main differentiator back in 2008 was their ability to do automated storage tiering (Data Progression™), something that others in the industry are starting to catch up to (EMC FAST). Compellent’s Data Progression technology is one that many customers actively use with good results, I was slightly disappointed though to learn that their data movement engine only executes once every 24 hours and cannot be made more frequent.  I’m not sure how that compares to EMC FAST but is something I’ll include in a more expansive post.

A feature I had heard of but didn’t quite understand previously was Compellent’s Live Volume.  It’s another unique feature for Compellent and one of my fellow delegates even described it as “EMC vPlex that you could actually afford”. Compellent implement the Live Volume feature at software level as opposed to a hardware based implementation like EMC vPlex. Compellent are able to present the same volume, with the same identity in two different locations, they do this using the underlying WAN optimised asynchronous replication. One point of note was that this is not an active / active DR like setup,  this is a setup for use in a controlled maintenance scenario, such as SAN fabric maintenance or a DC Power down test.

Compellent also took the opportunity to share some roadmap information. Highlights included the release of the 64 bit, Series 40 Controller base on the Intel Nehalem, encrypted USB device for seeding replication, a move to smaller 2.5” drives and 256 bit full disk encryption among others.

image 
Although we were situated on Microsoft’s Campus for a large part of Tech Field day we were never presented to by Microsoft, which was a shame.  We did however get the chance to visit the Microsoft store which is for employees only.  It gave us all a chance to buy some discounted Microsoft Software and souvenirs of our visit to Redmond which we all took advantage of.

photo

Tech Field Day delegates Kevin Houston, Stephen Foskett and Jason Boche using their iPhones and iPads in the heart of the Microsoft campus. Note Jason Boche using an iPad and wearing his VMware VCDX shirt, brilliant!

image

Our afternoon session was spent a short bus ride away from Microsoft at NEC America’s Seattle office.  We were here to hear about NEC’s storage offering (I had no idea they even did storage) and more specifically the NEC HYDRAstor range. We had a very in depth session on this fascinating product with Gideon Senderov, Director of Product Management for the HYDRAstor range.

NEC have taken an innovative approach with this product, one I was not expecting. They utilise full blown NEC servers to provide a two tier architecture made up of front end accelerator nodes and back end storage nodes.  On top of this they don’t use the traditional RAID model, instead using something known as erasure coding to provide improved data protection. I will deep-dive this particular data protection method in another article but it was a very interesting and different approach to what I’m used to.

The HYDRAstor grid is marketed as “Storage for the next 100 years” and with it’s grid architecture it’s reasonably easy to see how that statement could be realised.  You can add additional nodes into the grid and it will automatically redistribute itself to take advantage of the capacity.  You can also mark nodes for removal,  the system evacuating the data to enable nodes to be removed from the grid.  This combined with the ability to co-exist old and new HYDRAstor nodes shows why it’s a good storage location for data with a very long term retention requirement.

It appeared to me that HYDRAstor was designed specifically as a location for the output of archive or backup data and not a primary data storage solution. The reason I say this is that when we discussed in-line de-duplication the product was already integrated with major backup vendors (Symantec NetBackup, CommVault Simpana, Tivoli Storage Manager and EMC Networker). NEC were getting very clever by stripping out metadata from these backup vendors to improve the level of de-dedupe that could be achieved with the product when storing backup data.

I will revisit the HYDRAstor, once I have had a chance to go over my notes I fully intend to dedicate a full article to it as I was very impressed.

image           Capture

Rodney Haywood and Gideon Senderov white boarding the configuration of the NEC HYDRAstor

Note : Tech Field Day is a sponsored event. I receive no direct compensation and take personal leave to attend, however all event expenses are paid by the sponsors via Gestalt IT Media LLC. The views and content expressed here are my own and is in no way influenced by the sponsors of this event.

Events, Gestalt-IT, Storage

Gestalt IT Seattle Tech Field Day – Day 1 Summary

July 15th, 2010

So that is Day 1 of the Seattle Tech Field Day out of the way and what a day it has been.  We’ve been out to Microsoft Redmond HQ, or “the temple” as John Obeto calls it.  We saw some new products from Veeam and were privileged enough to be the first port of call for a new and very exciting storage start-up, Nimble Storage.

There has been a lot of information flowing about today, an awful lot. My plan is to spend some time assimilating all the information and doing more detailed posts on everyone we’ve seen, so for now I think a summary will suffice.

image

Veeam are a company that needs very little introduction.  They’ve not been around long (3 years to be exact) but they are a well known and well respected brand in the virtualisation space.  Today Veeam were announcing a new product / concept that they have at the development stage, one that got delegates quite excited.

Veeam were introducing vPower a new product made up of 3 products, SureBackup, Instant Restore and CDP (a much debated point).  What stood out most for Tech Field Day delegates was the some of the Instant Restore functionality, the ability to run your VM direct from backup image was well received.  My personal thought at the time was who wouldn’t want to have a mechanism available to test your backups actually work.  The added bonus was that Veeam also provide network isolation and an almost Lab Manager ability to create groups of machines that should be recovered together. The idea of verifying your backups by running them from the back up storage was one thing,  Veeam had however written their own NFS in order do this.  This means that technically in the event of an outage you can run your machine directly from the Veeam backup server NFS datastore.  It’ isn’t going to be fast but it’s running which is the main thing you should be concerned about.  It was all good stuff and general consensus was that it was a step in the right direction and quite a shift in the VM backup space.

image

Our surprise for the day was a new Tech start-up who were launching themselves and their product for the very first time.  Nimble Storage is a new start company who consist of a number of high pedigree employees with a proven track record at companies such as NetApp and DataDomain.  This is further backed up with an experienced board of directors and top venture capital investment and last but not least, a pretty good product at a good price point.

Without going into to much detail Nimble storage have produced a new array that probably reshapes the way people think about primary and backup storage as well as the use of flash storage within an array. Right at the outset they stated that their aim was to introduce flash storage to the mid size enterprise while also utilising a lot of the features being pioneered by other vendors.  Nimble’s approach is different in that it provides a converged appliance, one that does primary and secondary storage within the same device while also introducing flash caching to provide high performance.  Through the use of inline compression, flash cache, sequential write down to disk, efficient snapshots and replication as well as zero space cloning, Nimble is packing a lot into their product. At the top end you are paying a list price of  $99,000 + $6,000 annual maintenance.  For this you are looking at 18TB of primary storage (not including flash cache) + 15,000 IOPS from a SATA / Flash Mix. They were also looking at 216TB of backup capacity within that same device, driven primarily by their use of space efficient snapshots.  I have a lot of notes on this particular presentation and will be expanding upon this in the coming weeks.

image

Now F5 was a company I was really interested to see, primarily because I wasn’t entirely sure what they offered.  Sure I knew they were into networking but even then what did they do in the networking space, I had no idea.  We were treated to 4 different presentations that covered the following.

  • WAN optimised geographical vMotion
  • Coding of IRules and IControls for the BIG-IP appliances
  • Intelligent client VPN connectivity via BIG-IP’s Edge gateway module.
  • Data Management and Routing using F5’s ARX appliance, file system virtualisation.

 

All were very impressive and I will definitely be looking to dig a little deeper and examine in full some of the technology presented and discussed.  I was particularly impressed with F5’s vision for data management / file level virtualisation, as they seem to be one of the only companies in this space that I am aware of.  This vision was demonstrated to us as a mix of onsite primary tier 1 storage and off site cloud storage.  The ARX appliance would sit as a director presenting a unified view of the storage to the end user, while internally keeping a routing table of up to a billion files.  This will allow IT departments to place files across multiple types of storage, whether that be differing internal storage devices or storage in the cloud. The concept sits well with the current cloud strategies being developed by most major IT companies, what’s surprising is that nobody else is doing it.  There is a lot more to be said about F5,  I plan to delve a little deeper and write some more,

Summary

It’s been a very busy day,  one however that has been exceptionally rewarding. Tech Field Day has been everything I expected it to be so far,  there has been a wealth of information shared and a lot of feedback given. The biggest win for me though is getting the time to learn more about vendors and their product offerings, that and hearing the comments of my fellow delegates.  There is a good mix of intelligent people from varied backgrounds and that has only added to the experience so far.

We ended the night with a tour of the Boeing museum of flight and a couple of drinks with dinner.  It’s now midnight and after just 6 hours sleep last night and a busy schedule ahead for tomorrow,  I am going to call it a night there.

Note : Tech Field Day is a sponsored event. I receive no direct compensation and take personal leave to attend, however all event expenses are paid by the sponsors via Gestalt IT Media LLC. The views and content expressed here are my own and is in no way influenced by the sponsors of this event.

Events, General, Gestalt-IT, Tech Field Day , , ,

Windows Virtual Desktop Access Licensing – What is it?

June 24th, 2010

I try and avoid licensing at all costs, it’s a horrible subject and one that strikes fear in to many.  When you add virtualisation in to the mix it tends to get a little more complicated and you often find that the rules change on a reasonably regular basis. I was involved in a discussion today about Citrix XenDesktop and an interesting point came up when discussing licensing Virtual PCs.  Someone mentioned something called the Microsoft VDA,  I hadn’t a clue what they were talking about so I did a little digging around to find out more.

In summary this is what I found, it’s not pretty reading. As of the 1st of July 2010 Microsoft is changing the way it licences the Windows OS in VDI environments.  The following changes will take place

Windows® Virtual Enterprise Centralized Desktop (Windows VECD) and Windows VECD for Software Assurance (SA) will no longer appear on the price list.

Virtual desktop access rights will become a Windows Client Software Assurance benefit. Customers who intend on using PCs covered under SA will now be able to access their Virtual Desktop Infrastructure (VDI) desktops at no additional charge.

Customers who want to use devices such as thin clients that do not qualify for Windows Client SA would need to license those devices with a new license called Windows Virtual Desktop Access (Windows VDA) to be able to access a Windows VDI desktop.Windows VDA is also applicable to third party devices, such as contractor or employee-owned PCs.

What does it all mean?

In it’s simplest terms you don’t licence the windows virtual machine itself, you instead licence the end point its being accessed from. To further break this down there are two distinct endpoint categories to consider.

1. The end point is a Windows OS covered by Software Assurance (SA)

2. The end point is a non windows device or is a windows device without SA

In the first category you are covered to access a windows virtual machine as Virtual Desktop Access (VDA) is included as a Software Assurance benefit.  In the second category however you need to purchase a VDA subscriptions for each end point device.  Unfortunately this is not a one off purchase either, this is a $100 per year per device subscription cost.

As an example, say you have  a sales person who uses a company laptop and a company smart phone to access their VDI virtual machine.  You would need to have the laptop installed with a software assured copy of windows and buy a VDA subscription for the smart phone.  Alternatively if you have a non SA copy of windows on the laptop you need 2 VDA subscription licences to cover both devices.  This latter example would obviously be the same if the laptop was MAC OS or Linux based.

There is some good news though in that Microsoft have something called extended roaming rights with the windows VDA licence.  In short the primary user of a VDA licensed device can access their VDI desktop from any device that is not owned by the users company.  Examples would be a users home PC, airport kiosk or hotel business centre

There is a lot to take in with licensing, especially in the VDI space. I suggest everyone running or planning to deploy VDI takes a look at the recent changes and considers how they effect existing or planned deployments.  Some people will see this as Microsoft stifling the growth of Virtual Desktop Infrastructure, others will argue that it may actually acts as an enabler.  In truth I’m just not sure. I’m still digesting what it all means and playing through the various scenarios and combinations of VDI access.  On the surface I can see it hindering as opposed to helping this growing virtualisation sector.

For additional information I’d recommend checking out the following Microsoft FAQ article and for those of you who are Gartner customers the linked article below breaks it down quite nicely into simple terms.

Microsoft VDI suites & Windows VDA Frequently Asked Questions PDF

Gartner – Q&A for understanding Microsoft Licensing Requirements before deploying HVDs

General, Gestalt-IT, Microsoft , , , ,

SNAPVMX – View your Snapshots at VMFS/virtual disk level

June 9th, 2010

Following a recent implementation of VMware Data Recovery manager we ran into a few issues.  We eventually had to kill the virtual appliances due to the issue we were having and as a result we had a couple of virtual machines with outstanding snapshots.  These snapshots were taken by VDR and as a result could not be viewed or deleted using the snapshot manager.

We raised a call with VMware support and they started a WebEx session to look at the issue.  I always love watching VMware support personnel operating at the service console level, I always pick up a command or two that I didn’t know before.  On this occasion the support engineer was using something called SnapVMX to view the hierarchy of snapshots at the virtual disk level.

At first I thought this was an inbuilt VMware command but it turns out it’s not. It was actually a little piece of code that was written by Ruben Garcia.  What does it do?  well the following extract from the download pages explains it pretty well.

  • Displays snapshots structure and size of snapshots for every disk on that VM
  • Calculates free space needed to commit snapshots for the worst case scenario
  • Checks the CID chain of the analysed files and displays a warning if broken.

I’ve included a little demo screenshot to show what it can do. On the left hand side is  a screenshot from Snapshot Manager within vCenter.  On the right hand side is the same VM being viewed with SnapVMX in the service console.  Put the two together and you get a better idea of the snapshot disk hierarchy and the size of each snapshot.

SnapVMX_1SnapVMX

The other interesting feature is that it tells you what space is required to commit the snapshots.  So for example, say you had taken 5 snapshots of a machine as it was being built and configured.  Say that the overall effect of those 5 snapshots is to fill up your VMFS datastore completely. Chances are that you’re not going to be able to commit the snapshots within the current VMFS datastore.  SnapVMX will be able to tell you the worse case scenario on how much space would be required to commit the snapshots.  Armed with this information you could cold migrate to another datastore that has at least that amount of free space in order to allow you to commit the snapshots.  The screenshot below isn’t the best but the best I could do due to the length of the statement.

SnapVMX_2

For the download and full documentation on how to use this piece of code head over to the following web site. Worth a look if you’re a big user of snapshots.

http://geosub.es/vmutils/SnapVMX.Documentation/SnapVMX.Documentation.html

While searching for a link to Ruben Garcia to put on this article I found that he has a blog site and within that I found a link to a superb troubleshooting VM snapshot problems article which I will definitely be keeping a link to and suggest you check out.  Truly excellent stuff Ruben!

General, Gestalt-IT, VMware , , , ,

Storage I/O control – SIOC – VMware DRS for Storage

May 10th, 2010

Following VMworld in 2009 a number of articles were written about a tech preview session on IO DRS – Providing performance Isolation to VMs in Shared Storage Environments. I personally thought that this particular technology was a long way off, potentially something we would see in ESX 4.5. However I recently read a couple of articles that indicate it might not be as far away as first thought.

I initially came across an article by VMware’s Scott Drummond in my RSS feeds.  For those that don’t follow Scott, he has his own blog called the Pivot Point which I have found to be a invaluable source of VMware performance related content. The next clue was an article entitled ESX 4.1 feature leak article, I’m sure you can probably guess what the very first feature listed was? It was indeed Storage I/O Control.

Most people will be aware of VMware DRS and it’s usage in measuring and reacting to CPU and Memory contention. In essence SIOC is the same feature but for I/O, utilising I/O latency as the measure and device queue management as the contention control. In the same way as the current DRS feature for memory and CPU, I/O resource allocation will be controlled through the use of share values assigned to the VM.

VM_Disk_Shares

I hadn’t realised this until now but you can already control share values for VM disk I/O within the setting of a virtual machine (shown above).  The main problem with this is that it is server centric as you can see from the statement below from the VI3 documentation.

Shares is a value that represents the relative metric for controlling disk bandwidth to all virtual machines. The values Low, Normal, High, and Custom are compared to the sum of all shares of all virtual machines on the server and the service console.

Two main problems exist with this current server centric approach.

A) In a cluster, 5 hosts could be accessing VM’s on a single VMFS volume, there may be no contention at host level but lots of contention at VMFS level. This contention would not be controlled by the VM assigned share values.

B) There isn’t a single pane of glass view of how disk shares have been allocated across a host, it appears to only be manageable on a per VM basis.  This makes things a little trickier to manage.

Storage I/O Control (SOIC) deals with the server centric issue by introducing I/O latency monitoring at a VMFS volume level. SOIC reacts when a VMFS volume’s latency crosses a pre-defined level, at this point access to the host queue is throttled based on share value assigned to the VM.  This prevents a single VM getting an unfair share of queue resources at volume level as shown in the before and after diagrams Scott posted in his article.

   queues_before_sioc              queues_after_sioc

The solution to the single pane of glass issue is pure speculation on my part. I’d personally be hoping that VMware add a disk tab within the resource allocation views you find on clusters and resource groups.  This would allow you to easily set I/O shares for tiered resource groups, i.e. Production, Test, Development. It would also allow you to further control I/O within the resource groups at a virtual machine level.

Obviously none of the above is a silver bullet! You still need to have a storage system with a fit for purpose design at the backend to service your workloads. It’s also worth remembering that shares introduce another level of complexity into your environment.  If share values are not assigned properly you could of course end up with performance problems caused by the very thing meant to prevent them.

Storage I/O Control (SOIC) looks like a powerful tool for VMware administrators.  I know in my own instance, I have a cluster that is a mix of production and testing workloads.  I have them ring fenced with resource groups for memory and CPU but always have this nagging doubt about HBA queue contention.  This is one of the reasons I wanted to get EMC PowerPath/VE implemented, i.e. use both HBA’s and all available paths to increase the total bandwidth.  Implementing SOIC when it arrives will give me a peace of mind that production workloads will always win out when I/O contention occurs.  I look forward to the possible debut of SOIC in ESX 4.1 when it’s released.

**UPDATE**

Duncan Epping over at Yellow Bricks has located a demo video of SOIC in action.  Although a very basic demonstration,  it gives you an idea of the additional control SOIC will bring.

Gestalt-IT, New Products, VMware , , , ,