Wednesday, December 02, 2009

Top 10 VMware Consumers Report

If you have even a small VMware environment, you will begin to worry about resource consumption in your VM clusters. Specifically, I was concerned about 1 or more systems behaving poorly and stealing all the CPU, memory, disk, and network resources. So the idea of a top 10 report that uses the VCenter performance statistics came to mind.

Step 1: Find the Cluster ID

This assumes you have more than one cluster in your environment, so you will need to select from them. A simple SQL query will give you the name and ID of the various clusters.

SELECT ID, Name
FROM
vpxv_entity
WHERE Type_ID=3
ORDER
BY Name

Step 2: Find the top 10 consumers

Now that you know which cluster you want, you can put the ID in a variable named @ClusterID. The below report looks at the average CPU utilization for the last 7 days for all systems in your cluster and returns the top 10 consumers. This query performs a lot of work, and if your database isn't working optimally it will take a long time to complete, so be careful when you first run it.

SELECT TOP 10 v.VMID
FROM vpxv_VMs v (NOLOCK)
INNER
JOIN vpxv_entity e (NOLOCK) ON v.HostID=e.ID
INNER JOIN vpxv_entity_moid m (NOLOCK) ON m.EntityID=v.VMID
INNER JOIN dbo.VPXV_HIST_STAT_WEEKLY sd (NOLOCK) ON sd.ENTITY=m.MOID
WHERE e.type_id=1 AND e.Parent_ID=@ClusterID
AND
stat_name='usagemhz'
AND
STAT_ROLLUP_TYPE='average'
AND
sample_time > getdate()-7
GROUP
BY v.VMID, v.Name
ORDER BY sum(sd.stat_value) DESC

Step 3: Chart the top 10 consumers

Now that we know which cluster, and which systems are the top consumers, we can graph it. The below query embeds the query in Step 2 to limit its results, and then returns the name of the VM, sample time, and sample value for the last 7 days. Using SQL Server Reporting Services (SSRS), you can pipe this into a pretty graph and email it out automatically every week.

SELECT e.Name 'Host'
,
sd.stat_name, sd.sample_time, sd.stat_value
FROM vpxv_entity e (NOLOCK)
INNER
JOIN vpxv_entity_moid m (nolock) ON m.EntityID=e.ID
INNER JOIN dbo.VPXV_HIST_STAT_WEEKLY sd (NOLOCK) ON sd.ENTITY=m.MOID
where e.type_id=0
AND
e.ID in
(SELECT top 10 v.
VMID
FROM vpxv_VMs v
INNER JOIN vpxv_entity e ON v.HostID=e.ID
INNER JOIN vpxv_entity_moid m (nolock) ON m.EntityID=v.VMID
INNER JOIN dbo.VPXV_HIST_STAT_WEEKLY sd (NOLOCK) ON sd.ENTITY=m.MOID
WHERE e.type_id=1 AND e.Parent_ID=@ClusterID
AND
stat_name='usagemhz'
AND
STAT_ROLLUP_TYPE='average'
AND
sample_time > getdate()-7
GROUP
BY v.VMID, v.Name
ORDER BY sum(sd.stat_value) DESC)
AND
stat_name='usagemhz'
AND
STAT_ROLLUP_TYPE='average'
AND
sample_time > getdate()-7
ORDER
BY sample_time


Once the CPU graph is done, you can do the same for memory, disk, network, and many other data points. The monitors available to report on differ based on the monitoring level you have configured in VCenter, so if you don't see something you want, look if you can add it.

Below is a sample graph that came from one of my clusters. As you can see, there are 2 VMs consuming a majority of CPU resources, and for the most part the consumption appears flat across the entire week. This suggests that there is either a long-running job, or something is wrong with the VMs.



Wednesday, November 25, 2009

VMware Type IDs

I am looking into the VMware database to try and setup some automated reporting, and the first thing needed is to find the various objects in the database.

Looking at the views I see VPXV_ENTITY, a good place to start looking. It appears that every object (or entity) in VMware is listed here. Now to segment them out based on object type, or TYPE_ID.

Below is a list of type_id's that I have identified in my environment. Not sure if these stay the same between versions and installations, but this is the best I have for now.

0 VM
1 host
3 cluster
4 resources
5 VM folder
6
7
8 Datacenter
16 datastore folder
17 Network folder
18 Datastore
19 Network

Wednesday, November 18, 2009

ETrust updates not workig on Windows 7

I upgraded my system to Windows 7, but noticed that ETrust was not downloading updates. Turns out the ETrust downloader needs to run in Vista compatibility mode.

To fix this, do the following:
  1. Open file explorer and go to C:\Program Files\CA\SharedComponents\ScanEngine
  2. Right-click the file ITMDIST.EXE and select Properties
  3. On the Compatibility tab, click Change settings for all users
  4. Change Compatibility mode to Windows Vista (Service Pack 1)
  5. Click OK and OK
The updates should now install properly

Thursday, November 12, 2009

Setting Custom Attributes in VMware programtically

If you have more than a few VMs in your ESX environment, you have already found a need to properly organize the VMs with folders and hierarchies. However, this structure goes away when you view all the VMs in your datacenter, making your carefully created tree structure useless.

The good news is that VMware has a "Custom Attribute" option for each VM. Displayed next to the Notes field, this allows you to define attributes like Customer, Department, Owner, Production Status or anything else you can imagine to tag every system in your environment. The question is - how to do this without manually typing in the attribute for each VM: and in comes powershell.

Assuming you want to create an attribute to match your folder structure, you can use the Get-Vm -Location powershell command to retrieve a list of all VMs in a folder (and sub folders). Pipe this output into the Set-CustomField command and let the computer do the work for you. An example of this is below

Get-Vm -Location 'App X' | Set-CustomField -Name 'System Function' -Value 'App X'

You can use other switches with Get-Vm to filter on name, datastores, host servers, and other options.

Wednesday, November 11, 2009

VMware guest level monitoring and alerting

Probably 90% of the monitoring needed in any environment consists of extremely basic measures: CPU utilization, Memory Utilization, Disk throughput, Network throughput, etc… Defining thresholds for these and alerting on them provides immesurable insight into an environment and quickly identifies any problems or bottlenecks. Amazingly, VMware provides many of these basic system monitors out of the box.

Out of the box, ESX contains 2 VM monitors, unfortunately no alerting or other action plans are defined. The first monitor is for the virtual CPU utilization and triggers a warning when it has reached 75% for more than 5 minutes, and critical when it reaches 90% for more than 5 minutes. The second monitor is for the virtual memory utilization and triggers a warning when it has reached 75% for more than 5 minutes, and critical when it reaches 90% for more than 5 minutes.

Those 2 monitors identify the most common causes of system slowness I have ever seen. When either of those reaches 80% or more, a huge bottleneck occurs and can cascade into a completely unusable system. Now you can be alerted and preemptively resolve the issues – focusing your time and money on the problems that truly effect your environment. Simply configure an action plan to email you when these events are triggered and your half way there.


 

There are plenty of other monitors/triggers for the Virtual Machines in your ESX environment. Below is a list of available triggers and their default settings. If you are seeing a potential problem area – such as unreliable or slow disk – then feel free to test those triggers and see if they provide insight into how your environment is working, and how it isn't working.

Trigger Type

Condition

Warning

Condition Length

Alert

Condition Length

VM CPU Ready Time (ms)

Is above

4000

for 5 min

8000

for 5 min

VM CPU Usage (%)

Is above

75

for 5 min

90

for 5 min

VM Disk Aborts

Is above

10

for 5 min

25

for 5 min

VM Disk Resets

Is above

10

for 5 min

25

for 5 min

VM Disk Usage (KBps)

Is above

 

for 5 min

 

for 5 min

VM Fault Tolerance Latency

Is equal to

Moderate

n/a

High

n/a

VM Heartbeat

Is equal to

Intermittent Heartbeat

n/a

No Heartbeat

n/a

VM Memory Usage (%)

Is above

75

for 5 min

90

for 5 min

VM Network Usage (kbps)

Is above

 

for 5 min

 

for 5 min

VM Snapshot Size (GB)

Is above

 

n/a

 

n/a

VM State

Is equal to

Powered On

n/a

Powered Off

n/a

VM Total Disk Latency (ms)

Is above

50

for 5 min

75

for 5 min

VM Total Size on Disk (GB)

Is above

 

n/a

 

n/a