Setting Up Weekly Email Reports for SQL Agent Job Failures: The Proactive DBA's Guide
Introduction: Why Job Failure Reporting Matters
SQL Server Agent jobs form the backbone of database automation, handling everything from nightly backups to critical ETL processes. When these jobs fail silently, problems can snowball before anyone notices. This guide will walk you through setting up a comprehensive weekly email report that gives you complete visibility into job failures across your SQL Server environment.
What You'll Get from This Solution
-
Consolidated view of all failed jobs across multiple servers
-
Actionable details about each failure (error messages, duration, etc.)
-
Trend analysis to spot recurring problems
-
Professional HTML formatting that looks great in email clients
-
Customizable thresholds to focus on what matters most
Step 1: Create the Centralized Monitoring Database
-- Create a dedicated database to store job history
CREATE DATABASE JobMonitoring;
GO
USE JobMonitoring;
GO
-- Table to store job failure details
CREATE TABLE dbo.JobFailureHistory (
FailureID INT IDENTITY(1,1) PRIMARY KEY,
ServerName NVARCHAR(128) NOT NULL,
JobName NVARCHAR(128) NOT NULL,
StepName NVARCHAR(128) NULL,
RunDate DATETIME NOT NULL,
RunDuration INT NULL,
ErrorMessage NVARCHAR(MAX) NULL,
AlertSent BIT DEFAULT 0,
RecordCreated DATETIME DEFAULT GETDATE(),
INDEX IX_JobName (JobName),
INDEX IX_RunDate (RunDate)
);
GO
-- Table to track which servers we're monitoring
CREATE TABLE dbo.MonitoredServers (
ServerID INT IDENTITY(1,1) PRIMARY KEY,
ServerName NVARCHAR(128) NOT NULL UNIQUE,
IsActive BIT DEFAULT 1,
LastChecked DATETIME NULL
);
GO
Step 2: Create the Data Collection Procedure
CREATE PROCEDURE dbo.usp_CollectJobFailures
AS
BEGIN
SET NOCOUNT ON;
DECLARE @ServerName NVARCHAR(128) = @@SERVERNAME;
UPDATE dbo.MonitoredServers
SET LastChecked = GETDATE()
WHERE ServerName = @ServerName;
INSERT INTO dbo.JobFailureHistory (
ServerName,
JobName,
StepName,
RunDate,
RunDuration,
ErrorMessage
)
SELECT
@ServerName,
j.name,
h.step_name,
msdb.dbo.agent_datetime(h.run_date, h.run_time),
((h.run_duration/10000*3600) + ((h.run_duration%10000)/100*60) + (h.run_duration%100)),
h.message
FROM msdb.dbo.sysjobs j
INNER JOIN msdb.dbo.sysjobhistory h ON j.job_id = h.job_id
WHERE h.run_status = 0
AND msdb.dbo.agent_datetime(h.run_date, h.run_time) >= DATEADD(DAY, -7, GETDATE())
AND NOT EXISTS (
SELECT 1 FROM dbo.JobFailureHistory f
WHERE f.ServerName = @ServerName
AND f.JobName = j.name
AND f.RunDate = msdb.dbo.agent_datetime(h.run_date, h.run_time)
);
SELECT @@ROWCOUNT AS NewFailuresFound;
END
GO
Step 3: Create the Email Reporting Procedure
CREATE PROCEDURE dbo.usp_SendWeeklyJobFailureReport
@RecipientEmail NVARCHAR(255) = 'dba-team@yourcompany.com'
AS
BEGIN
SET NOCOUNT ON;
DECLARE @HTML NVARCHAR(MAX);
DECLARE @Subject NVARCHAR(255);
DECLARE @FailureCount INT;
SELECT @FailureCount = COUNT(*)
FROM dbo.JobFailureHistory
WHERE RunDate >= DATEADD(DAY, -7, GETDATE());
SET @Subject = CONCAT(
'SQL Agent Job Failure Report - ',
FORMAT(GETDATE(), 'yyyy-MM-dd'),
' (', @FailureCount, ' failures)'
);
SET @HTML = CONCAT(
N'<!DOCTYPE html>
<html>
<head>
<style>
body { font-family: Arial, sans-serif; }
h1 { color: #2a5885; }
table { border-collapse: collapse; width: 100%; }
th { background-color: #2a5885; color: white; text-align: left; padding: 8px; }
td { border: 1px solid #ddd; padding: 8px; }
tr:nth-child(even) { background-color: #f2f2f2; }
.critical { background-color: #ffcccc; }
.warning { background-color: #ffffcc; }
</style>
</head>
<body>
<h1>SQL Agent Job Failure Report</h1>
<p>Reporting period: ', FORMAT(DATEADD(DAY, -7, GETDATE()), 'yyyy-MM-dd'), ' to ', FORMAT(GETDATE(), 'yyyy-MM-dd'), '</p>
<p>Total failures: <strong>', @FailureCount, '</strong></p>',
CASE WHEN @FailureCount > 0 THEN
N'<table>
<tr>
<th>Server</th>
<th>Job Name</th>
<th>Step</th>
<th>Failure Time</th>
<th>Duration</th>
<th>Error Message</th>
</tr>' +
(SELECT
CONCAT(
N'<tr class="',
CASE
WHEN CHARINDEX('timeout', ErrorMessage) > 0 THEN 'warning'
WHEN CHARINDEX('deadlock', ErrorMessage) > 0 THEN 'critical'
ELSE ''
END,
'"><td>', ServerName, '</td>
<td>', JobName, '</td>
<td>', ISNULL(StepName, ''), '</td>
<td>', FORMAT(RunDate, 'yyyy-MM-dd HH:mm:ss'), '</td>
<td>',
CASE
WHEN RunDuration IS NULL THEN ''
ELSE CONCAT(
FLOOR(RunDuration/3600), 'h ',
FLOOR((RunDuration%3600)/60), 'm ',
RunDuration%60, 's'
)
END,
'</td>
<td>', LEFT(REPLACE(ErrorMessage, '<', '<'), 300), '</td></tr>'
)
FROM dbo.JobFailureHistory
WHERE RunDate >= DATEADD(DAY, -7, GETDATE())
ORDER BY RunDate DESC
FOR XML PATH(''), TYPE).value('.', 'NVARCHAR(MAX)') +
N'</table>'
ELSE
N'<p>No job failures detected in the reporting period.</p>'
END +
N'<p>Report generated: ', FORMAT(GETDATE(), 'yyyy-MM-dd HH:mm:ss'), '</p>
</body>
</html>'
);
EXEC msdb.dbo.sp_send_dbmail
@profile_name = 'DBA_Notifications',
@recipients = @RecipientEmail,
@subject = @Subject,
@body = @HTML,
@body_format = 'HTML';
UPDATE dbo.JobFailureHistory
SET AlertSent = 1
WHERE AlertSent = 0
AND RunDate >= DATEADD(DAY, -7, GETDATE());
END
GO
Step 4: Set Up the Collection and Reporting Jobs
1. Central Collection Job (Run Daily)
-- Job name: Collect_Job_Failures_From_All_Servers
-- Subsystem: PowerShell
-- Command:
$servers = Invoke-Sqlcmd -Query "SELECT ServerName FROM JobMonitoring.dbo.MonitoredServers WHERE IsActive = 1" -ServerInstance "YourCentralServer"
foreach ($server in $servers) {
try {
$result = Invoke-Sqlcmd -Query "EXEC JobMonitoring.dbo.usp_CollectJobFailures" -ServerInstance $server.ServerName
Write-Output ("Collected " + $result.NewFailuresFound + " new failures from " + $server.ServerName)
}
catch {
Write-Output ("Failed to collect from " + $server.ServerName + ": " + $_.Exception.Message)
}
}
2. Weekly Reporting Job (Run Monday Mornings)
-- Job name: Send_Weekly_Job_Failure_Report
-- Step: EXEC JobMonitoring.dbo.usp_SendWeeklyJobFailureReport
-- Schedule: Weekly, Mondays at 6:00 AM
Advanced Customization Options
1. Add Server Importance Weighting
ALTER TABLE dbo.MonitoredServers
ADD Importance TINYINT DEFAULT 5 CHECK (Importance BETWEEN 1 AND 10);
Update the HTML generation logic to emphasize high-importance servers.
2. Failure Trend Analysis
Add a section in your report to compare job failures over two weeks:
SELECT JobName,
COUNT(*) AS FailureCount,
DATEPART(WEEK, RunDate) AS WeekNum
FROM dbo.JobFailureHistory
WHERE RunDate >= DATEADD(DAY, -14, GETDATE())
GROUP BY JobName, DATEPART(WEEK, RunDate);
3. Automatic Ticket Creation
IF EXISTS (SELECT 1 FROM dbo.JobFailureHistory
WHERE AlertSent = 0 AND CHARINDEX('critical', ErrorMessage) > 0)
BEGIN
EXEC dbo.usp_CreateTicketFromFailure @FailureID = ...
END
Best Practices for Job Failure Reporting
-
Start simple – Implement basic reporting first, then add enhancements
-
Test thoroughly – Verify email formatting and delivery
-
Review regularly – Actually read the reports and take action
-
Refine over time – Adjust thresholds and filters based on what matters
-
Document procedures – Ensure team knows how to respond to common failures
-
Archive reports – Store historical data for compliance and analysis
Conclusion: From Reactive to Proactive Monitoring
Implementing this weekly job failure reporting solution transforms how your team manages SQL Server automation:
-
Gain visibility into issues before they become critical
-
Spot patterns in failures that indicate systemic problems
-
Document reliability for compliance and SLA reporting
-
Reduce stress with organized, actionable notifications
The complete solution takes about an hour to set up but pays dividends in reduced downtime and improved system reliability. Start with the basic version today, then enhance it over time to match your organization's specific needs.
Comments
Post a Comment