Our SQL DBA reported that occasionally he will see errors in the SQL Error Logs similar to:
SQL Server has encountered 45 occurrence(s) of IO requests taking longer than 15 seconds to complete on file [T:MSSQLDatatempdev6_Data.NDF] in database [tempdb] (2). The OS file handle is 0x000005B4. The offset of the latest long IO is: 0x0000000c7b2000
Monitoring for this alert at first sounded simple, but as I dug into it I realized it could be much more difficult than I expected. First off, this error is reported in the SQL Error Logs and only in the SQL Error Logs, so we cant simply use the NT Event log to alert us. Secondly, depending on how you install SQL, these error logs could be anywhere on the system. And lastly, this is needed to be monitored on SQL 2000 and SQL 2005 systems, so xp_ReadErrorLog won’t work for us.
So the breakdown seems fairly simple: find out where the logs are, use the text log monitor, and search for the string.
1. Find out where the SQL Error logs are stored. This first step turns out to be fairly simple, when the SQL DB Engine is discovered, one of the attributes it discovers is “$MPElement[Name=’SQL!Microsoft.SQLServer.DBEngine’]/ErrorLogLocation$“. This should tell us where the logs are stored on each server regardless of how they were built or configured.
2. Use the text log monitor to search the SQL Error logs. Using the “Matches Regex” function, I should be able to simply search for the string “of IO requests taking longer than 15 seconds to complete on file” and report an alert on this
Sounds simple, reality however is rarely simple. The ErrorLogLocation attribute is stored as a single string such as “d:MSSQLlogERRORLOG”, but the text log monitor requires the path and file name to be separate components and there is no method within SCOM (that I am aware of) to split these components from a single string.
Other options:
- Hard code the error log path and name
- This differs on each box
- It could be done with an override, but that would be a nightmare
- Create an extended SQL DB Engine class that includes this setting as 2 separate attributes
- That’s a lot more work than I want to tackle, plus it’s a maintenance nightmare
- Create a new text log monitor that takes only 1 attribute for the path and name
- Turns out this is included in SCOM as a binary (i.e. in a DLL) and would have no idea where to start with that
- Create a script that searches the log file for a RegEx
- This is possible, but potentially filled with problems
Using a script ultimately seems to be the best option, with it I can pass whatever options are needed (log path, how long to go back, the string to search for, and anything else needed) and it can be somewhat expandable in the future. In this case I decided to drop an NT Event message of the error and then use a separate rule to pick up this event and alert on it. I also added another parameter of DBVersion because SQL 2000 and SQL 2005 store their files in different formats
‘ SQLErrorLog.vbs ‘ ‘ param 0 – path to errorlog $Target/Property[Type=”SQLServer!Microsoft.SQLServer.DBEngine”]/ErrorLogLocation$ ‘ param 1 – time in minutes to include 30 ‘ param 2 – string to match “of IO requests taking longer than” ‘ param 3 – db version $Target/Property[Type=”SQLServer!Microsoft.SQLServer.DBEngine”]/Version$
Const ForReading = 1, ForWriting = 2, ForAppending = 8 Const TristateUseDefault = -2, TristateTrue = -1, TristateFalse = 0 CONST EVENT_SUCCESS = 0, EVENT_ERROR = 1, EVENT_WARNING = 2, EVENT_INFORMATION = 4
SET oArgs = WScript.Arguments SET oShell = CreateObject(“Wscript.Shell”) set fso = CreateObject(“Scripting.FileSystemObject”)
errorLog = oArgs(0) iTime = oArgs(1) sMatch = oArgs(2) dbVer = oArgs(3)
‘oShell.LogEvent EVENT_SUCCESS, “Beginning check, errorLog: ” & errorLog & “, iTime: ” & iTime & “, sMatch: ” & sMatch & “, dbVer: ” & dbVer ‘ read text file IF LEFT(dbVer,1) = 9 THEN set f = fso.OpenTextFile(errorLog, ForReading,,TristateTrue) ELSE set f = fso.OpenTextFile(errorLog, ForReading,,TristateFalse) END arLines = split(f.ReadAll,vbCrLf)
‘ find lines that are iTime minutes old for i = UBound(arLines)-1 to 0 step -1 line = arLines(i) lineTime = CDate(Left(line,19)) IF lineTime < DateAdd(“n”,- iTime, Now) THEN EXIT ELSE IF RegExTest(sMatch, line) THEN ‘ generate alert oShell.LogEvent EVENT_ERROR, “SQL I/O Error” & vbCrLf & line wscript.quit END END NEXT
Function RegExTest(sPattern, sString) SET regEx = new RegExp regEx.Pattern = sPattern regEx.IgnoreCase = TRUE regEx.Global = TRUE SET Matches = regEx.Execute(sString) IF Matches.Count > 0 THEN RegExTest = true ELSE RegExTest = false END END |
Comments
Post a Comment