Rerun failed Avamar VMDK backup jobs

With our recent use of VMDK backups via Avamar, I have been annoyed at having to deal with at least 1 failure every night. Resolution would normally require logging into the Avamar console, finding the appropriate machine(s), and relaunching the backup. This assumes that it isn’t a production critical system and can launch a backup during the day. Compound this with the relatively short retention cycle we have defined (2 weeks), we want to make sure all backups are kept current and viable.
A little research and I found that there is a command line client for Avamar that runs on Linux. Reading through the MCCLI Programmer Guide, you will find simple command lines that can launch on-demand backups. So now I have a semi-automated method to initiate failed backups, I just need to identify which backups have failed.
The Avamar grid stores activity in a Postgres database. I am not sure if this is supposed to be used by admins, but it is fairly well laid out and navagable using pgAdmin. With a little trial and error, I was able to craft up a SQL statement that would report back all VMDK backup failures and allow me to relaunch them.

Below is the script and query I used to automate this. The script runs the query and exports it to a file called rerun.sh. Cron is used to initiate the queryDB script, and a few minutes later, initiate the rerun script.

queryDB.sh

export PGHOST=<FQDN of grid here>
export PGPORT=5555
export PGDATABASE=mcdb
export PGUSER=viewuser
export PGPASSWORD=viewuser1
psql -tf /<Path to SQL command>/queryCMD -o /<Path to rerun command>/rerun.sh

Postgesql statement, queryCMD

select distinct '/usr/local/avamar/5.0.3-29/bin/mccli client backup-group-dataset '

|| '--domain=<VC Name>/VirtualMachines '

|| '--group-domain=<VC Name> '

|| '--group-name="' || group_name || '" --name=' || display_name

from v_activities_2

where (display_name, recorded_date_time) in

    (select DISTINCT b.display_name, max(b.recorded_date_time)

    from v_activities_2 b

    where b.group_name like 'Tier 5 VM%'

    AND recorded_date_time > CURRENT_TIMESTAMP - interval '6 day'

    group by b.display_name)

AND status_code_summary <> 'Activity completed successfully.';


Comments