With our recent use of VMDK backups via Avamar, I have been annoyed at having to deal with at least 1 failure every night. Resolution would normally require logging into the Avamar console, finding the appropriate machine(s), and relaunching the backup. This assumes that it isn’t a production critical system and can launch a backup during the day. Compound this with the relatively short retention cycle we have defined (2 weeks), we want to make sure all backups are kept current and viable.
A little research and I found that there is a command line client for Avamar that runs on Linux. Reading through the MCCLI Programmer Guide, you will find simple command lines that can launch on-demand backups. So now I have a semi-automated method to initiate failed backups, I just need to identify which backups have failed.
The Avamar grid stores activity in a Postgres database. I am not sure if this is supposed to be used by admins, but it is fairly well laid out and navagable using pgAdmin. With a little trial and error, I was able to craft up a SQL statement that would report back all VMDK backup failures and allow me to relaunch them.
Below is the script and query I used to automate this. The script runs the query and exports it to a file called rerun.sh. Cron is used to initiate the queryDB script, and a few minutes later, initiate the rerun script.
queryDB.sh
export PGHOST=<FQDN of grid here>
export PGPORT=5555
export PGDATABASE=mcdb
export PGUSER=viewuser
export PGPASSWORD=viewuser1
psql -tf /<Path to SQL command>/queryCMD -o /<Path to rerun command>/rerun.sh
Postgesql statement, queryCMD
select distinct '/usr/local/avamar/5.0.3-29/bin/mccli client backup-group-dataset '
|| '--domain=<VC Name>/VirtualMachines '
|| '--group-domain=<VC Name> '
|| '--group-name="' || group_name || '" --name=' || display_name
from v_activities_2
where (display_name, recorded_date_time) in
(select DISTINCT b.display_name, max(b.recorded_date_time)
from v_activities_2 b
where b.group_name like 'Tier 5 VM%'
AND recorded_date_time > CURRENT_TIMESTAMP - interval '6 day'
group by b.display_name)
AND status_code_summary <> 'Activity completed successfully.';
Comments
Post a Comment