Sharing vCloud Director’s -Snapshot functionality. These slide deck is based on vCAT3.1. It is my preparation part for VCAP-CID. Please refer version and date. It is there to reflect if there any changes and when those changes were done.
In previous post I discussed various sizing approach and how it gets influenced using allocation models. Let me continue from the same post. As we already have decided on allocation models and how much each Organization vDC will use and in what way. lets move further.
Previous post missed on one very important aspect. It is a relation between Catalog and VM offering. From above table it is not clear how many VM’s and which sizes are available for the particular BU. It leaves lot of room for consumer to do this calculation. To address this problem you can provide number of small, medium, large size you can create from this, and then assist them fixing the VM template accordingly.There is screen (shown below) in vCloud director which shows this status but it is only vCloud Administrator (only user who can create Organization vDC), who can see it. It is of little use for Organization Administrator
So how do we solve/help organization in deciding VM’s and its sizing they can offer or get most out of it. Let’s try to answer the below question
“How many VM in HR BU we can fit and of which Sizes”
I have created a tiny excel sheet with simple formula. Below part have Virtual Machine catalog defined. Here goal is to fit VMs either using 256 GB RAM or 256 GHz. You can manipulate number of VM’s or you can change the VM catalog as per your fit to consume maximum of both compute section i.e. CPU and Memory
In above figured I have populated Virtual Machine catalog in standard T-Shirt size naming convention. This is very simple excel sheet which is picking up vCPU and Memory defined in Virtual Machine Catalog table.
e.g. As highlighted in table above for M size, we have 2 vCPU and 4 GB RAM configured. So for 5 VMs we will need 10 vCPU and 20 GB RAM. Similarly for other VM size it follows the same formula. In the end you get total VM as 44 with total 128 vCPU and 256 GB RAM
To fit the compute (256 GHz or 256 GB RAM) you can change the VM catalog size and continue to do so once you each either of the limits are reached. In above table have reach 256 GB RAM limit. But you can very well try to reach CPU limit as well by manipulating VM size in the catalog.
I have uploaded the excel sheet on my google drive, will be available on request. I’m just doing this to know how popular it is getting
Once you are done with this permutation and combination you would come up with some standard sizes. Just ensure you create catalog of only such sizes. This will allow you to control and monitor the capacity of organization.
So key take away is you can definitely create VMs of your own sizes to make most out of the capacity you are buying. I have provided excel sheet based formula which will make things bit easier.
Here are my notes on vCenter chargeback manager based on vCAT3.1 in the form of PPT.
I thought PPT would be good way to share to you all
Hope it helps. If you need the copy just let me know via comment.
Capacity planning for vCloud is extremely important. It is quite complicated if you follow me on this post especially if we are moving from vSphere stack. Let me explain how it is different than vSphere capacity planning and then I aim to explain the ways to make it simple. Capacity planning is the basic foundation of various service offering and QoS you as a service provider.
Let’s look at vSphere side first, sizing in vSphere at broad level can be put as
Pretty much we get good consolidation ratio and over-subscription as well. This 100% works, here you don’t have to worry about about how many VMs will be powered ON as CPU & Memory consumption is not always considered to do capacity planning. It is active memory, peaks and overall utilization. vSphere resource pools are hardly used for any CPU and memory for allocating limits and reservations. They are used for prioritizing workloads. But this is what changes here in vCloud. When organization vDC is created, Resources pool gets created (behind the scenes), they get reservations and limits set via allocation model and which in turn limits the total capacity available for new VMs/Resource Pools.
Resource allocation model is selected while creating organization vDC. It cannot be changed once selected. So resource allocation model selection has significant impact on sizing on vCloud in terms what kind of VM sizing we can offer.
Ok now, when you’ve created organization vDC, behind the scene resource pool is created in the vSphere cluster. This resource pool’s CPU and memory reservation and limit is selected based on your allocation model. The vSphere cluster where these resource pool’s are created is associated with Provider vDC. If you have been with me so far, you would have understood it is allocation model which further influences the capacity distribution of Provider vDC. It is like pie is created out of Provider vDC resource as you keep creating organization vDC and how this pie is to be consumed is influenced by Allocation model. So selection of allocation model has significant impact on vCloud offering.
There are two ways you can do sizing of vCloud compute resource.
Case:01 Standardized Offering
In this case we go ahead and choose what would be service offering for our customer and can carve out Organization vDC accordingly. For discussion sake we choose only two allocation model for simplification.
We select Pay as you Go allocation model and plan to reserve 50% of the Provider vDC capacity for it. Please let me remind you, we are planning. We are not carving out this into organization & organization vDC. As organization cannot be created until and unless we know who is the customer.We will only do it when we have a client and provision organization vDC with PAYG model only if he is asking for it. To explain in a simple way, we can offer services to the client based on what we have planned. “We have 50% of the capacity reserved for PAYG model, please tell us if you have use cases for PAYG and if yes, how much you wish to buy from it”. We can offer standard offering influenced by two parameters of PAYG as explained in previous post
And CPU and Memory quota parameters will serve how much percentage of 50% customer needs.
Similar thing you can do for reservation pool model. It is complete block of resources (CPU, memory) dedicated to the customer. Here customer has full freedom how to carve out the block.
Case:02 Customer’s custom requirement
Customer comes to you and request you that they need 50% resources to be reserved for PAYG model as they have lot of transient workload. Remaining 50% should be allocated to dedicate business units as mentioned below
Marketing (25 %)
Customer has freedom to use this resource the way he wants. He can internally standarize VM offering in terms of CPU, Speed. This standarization can be enforced by using catalogs. There is a relation between allocation model, catalog and VM sizing. This relationship I explain in next post.
Below is the block level view of how the organization would look based on customer’s requirement
Provider vDC is divided into 4 organization vDC as per customer’s requirement
Idea here is you can split Provider vDC into organization vDC.
Capacity planning is influenced by allocation model
Allocation model further influences VM sizing
It is strongly recommended to plan allocation model per Provider vDC. It will give good insight as how the resource will be consumed, how much resources will be available and also help you tackle over subscription
If provider vDC is going to have all three allocation models, it is going to make thing bit complicated the way resources are going to be used and when the capacity will needs to replenished
In my opinion plan in advance, simply do not start filling Provider vDC with organization vDC with any allocation model
There are various high availability options available but which is best fit for your application is something we as Architect always have to make a decision.
To make this decision, you must understand business requirement.
Case:01 Zero Downtime
It means you cannot take a downtime during
OS Upgrade and maintenance
DB upgrade and maintenance
Application Update and maintenance
So we need to design for fail at OS, DB and Application and virtualization environment even for ESXi host level failure
vSphere HA will protect against ESXi host failure
OS & Application level failure will be protected using in guest clustering e.g. MSCS, Veritas Cluster, FT (only for application which can be scaled horizontally on single vCPU)
Only application level failure can be achieved using vSphere HA for application protection introduced in 5.5.
You can use Symantec Application HA.You can find the list of application which are supported here.
Note:In some environments, Operations team may not want automatic restart of the application in the event of only an application error. Instead, immediate notification and manual intervention may be preferred to determine the root cause of the problem.
Scenario: 01 Suppose vSphere HA event is triggered due to ESXi host failure, which in turn will fail OS, MSCS/Veritas cluster will detect OS is failed. It will move the application to the other node. During this failover services won’t be able. Time is generally within seconds but no data is lost neither user experiences any appreciable outage
Scenario: 02 Suppose OS inside the VM fails, MSCS/Veritas detects it and fails over application to Another Node. During this failover services won’t be able. Time is generally within minutes but no data is lost neither user experiences any appreciable outage but it again clearly ruled by failover time. So to say it is zero down it not a right term
Scenario: 03 Suppose application service restarts, MSCS/Veritas detects tries to re-start it, if re-start fails it moves the application to another Node.During this failover services won’t be able. Time is generally within minutes but no data is lost neither user experiences any appreciable outage but it again clearly ruled by failover time. So to say it is zero down it not a right term
Scenario: 04 Suppose OS/Application needs a maintenance window, simple failover application using MSCS/Veritas to anther node.
In scenario 02,03,04 discussed above, Downtime is time needed to failover services, it can vary from few seconds to few minutes. So here complete protection is done at Hardware, OS and application level. If any of the layers fails, it won’t impact end users. Biggest reason in vSphere people prefer in-guest agent to get rolling upgrades. We will discuss this in more detail below.
Hosting Business Critical Application in Cloud
In-guest clustering is very complicated to configure. This complexity increases further when you wish to host such application inside vCloud director.
As of 5.1 (haven’t seen anything 5.5 yet)
There is no support for clustering inside vCloud Director
There is no support for RDM when using in vCloud Director
So you can configure a cluster using vSphere but the moment vCloud director comes into picture we face technical limitations. Please note in vCloud director VM is created via vCloud portal not via vCenter
Update: One of the experts in vCloud director actually contacted me and explained why vCD doesn’t support RDM and there is no plan for it. Getting RDM inside cloud breaks the principle of portability in case customer wish to move workloads between cloud.
Business critical application which uses oracle for hosting their application bring another challenge with them. Oracle license policy is most inflexible. Hosting oracle database inside cloud means dedicating host to oracle. It is simply not going to meet the economies of scale. Though we can use VM-Host affinity rule to do so but this is not explicitly accepted or denied by Oracle. You need to read lot into the license agreement as mentioned by Michael Webster here
Understanding Oracle Certification, Support and Licensing for VMware Environments white paper published by VMware
Case of In-Guest Clustering (Why?)
Rolling upgrade is the only use case for recommending in-guest cluster agent . During rolling application remains online i.e from application perspective zero down time. I have always asked my customers don’t we have schedule maintenance window? If answer is always Yes, then 9 out of 10 cases I have not recommended using in-guest cluster. As all plan upgrades/changes to the OS, Database can be done during this window.
Over and above following points makes my cases further strong against in-guest cluster
1. We can use Snapshot technology to do upgrade of OS or database which gives us roll back point.
2. We have vSphere 5.0 improved HA functionality which smartly detects host is isolated or gone down.At the max VM comes up in 15 minutes (along with Applications what I refer as “Ready to Serve”) when HA event occurs. So just for 15 minutes downtime (extremely conservative estimate, refer here), I don’t like operations team to carry overhead of configuring in-guest clustering and bring complexity when it gets on virtualized platform.
3. And how many times in a year application has failed and it needs monitoring? If failure rate is almost no, then again this makes case for no in-guest clustering.
The final design choice will be ruled by how much downtime a business is ready to tolerate, and the cost they are willing to invest in the extra resources and skills to install and operate software that provides application monitoring. It is a trade-off.
This is the last allocation model we will be discussing in this post. Previous two resource allocation models were discussed are Pay-as-you-go and Allocation Pool model. Reservation pool model is perhaps simplest to understand and to implement it. That being said from resource allocation point of view very costly as well. Reservation pool as the name says reserves resources. These resources are reserved even if VM’s are in powered off state.
Reservations in this model are guaranteed and are set to 100%
Let’s see what happens at resource pool level. An organization vDC is created with the following values
CPU allocation = 4 GHz
Memory allocation = 2 GB
These are only two settings you need to configure for reservation pool as can be seen below.
Figure:01 Reservation Pool allocation model and Resource Pool Settings
Right hand side of the image, there is a screen which shows resource pool setting which was created. We can see reservation of 4000 MHz and 2048 MB is applied and same values are used to set limits. So it is not only reservation but also limits are applied to the resource pool.
It is like cutting pie from the available resource and making it 100% available to the Organization vDC up front. As this guaranteed, it is costly. Resources gets reserved irrespective if they are used or not.
If we compare this model with other two model it has number of options missing. It makes it easier to configure.
- You don’t have option to choose vCPU speed right upfront
- You don’t have option to reserve % of resource. It is always 100%
- CPU and Memory both gets allocated up front and are charged by vCenter charge back manager based on this value
- Most important feature, consumer has the option to choose reservation per VM. This gives user complete freedom to prioritize resources for the workload.
Figure:02 Per VM reservation options for Consumers
- Simple to understand and explain to the consumer
- 100% resources are guaranteed to organization vDC. In other words these resources are not available for other organization vDC to use.
- Consumers gets option to configure resources per VM basis provides same flexibility as vSphere Administrators gets