Building a Comprehensive SQL Server Health Check Script: A DBA's Essential Tool
Introduction
As a SQL Server DBA, having a reliable health check script is like having a stethoscope for your databases it helps you quickly diagnose issues before they become critical problems. In this post, I'll walk you through building a custom SQL Server health check script that goes beyond basic checks to give you a complete picture of your server's wellbeing.
Why a Custom Health Check?
While SQL Server provides numerous DMVs and system views, pulling together the right information quickly during an incident can be challenging. A well-designed health check script:
-
Provides a consistent assessment framework
-
Saves time during troubleshooting
-
Helps identify problems before users notice
-
Creates documentation of your server's baseline state
The Foundation: Key Areas to Monitor
Our health check will examine these critical areas:
-
Instance Configuration
-
Database Status
-
Performance Metrics
-
Resource Utilization
-
Backup Status
-
Index Health
-
Security Checks
-
Drive Space
Building the Health Check Script
1. Instance Configuration Check
-- Basic instance information
SELECT
SERVERPROPERTY('ServerName') AS ServerName,
SERVERPROPERTY('Edition') AS Edition,
SERVERPROPERTY('ProductVersion') AS Version,
SERVERPROPERTY('ProductLevel') AS ProductLevel,
SERVERPROPERTY('IsClustered') AS IsClustered,
SERVERPROPERTY('Collation') AS Collation;
-- Important configuration settings
SELECT
name AS ConfigName,
value AS ConfigValue,
value_in_use AS CurrentValue,
description
FROM sys.configurations
WHERE name IN (
'max degree of parallelism',
'cost threshold for parallelism',
'max server memory (MB)',
'min server memory (MB)',
'optimize for ad hoc workloads',
'backup compression default'
)
ORDER BY name;
2. Database Status Check
-- Database status and recovery model
SELECT
name AS DatabaseName,
state_desc AS Status,
recovery_model_desc AS RecoveryModel,
compatibility_level AS CompatibilityLevel,
user_access_desc AS UserAccess,
is_auto_close_on,
is_auto_shrink_on,
page_verify_option_desc AS PageVerify
FROM sys.databases
ORDER BY name;
-- Database file information
SELECT
DB_NAME(database_id) AS DatabaseName,
name AS LogicalName,
type_desc AS FileType,
physical_name AS PhysicalPath,
CONVERT(DECIMAL(18,2), size/128.0) AS SizeMB,
CONVERT(DECIMAL(18,2), FILEPROPERTY(name, 'SpaceUsed')/128.0) AS UsedMB,
CONVERT(DECIMAL(18,2), (size - FILEPROPERTY(name, 'SpaceUsed'))/128.0) AS FreeMB,
growth AS GrowthPages,
is_percent_growth
FROM sys.master_files
ORDER BY DatabaseName, type_desc;
3. Performance Metrics
-- Top CPU-consuming queries (from cache)
SELECT TOP 20
qs.execution_count,
qs.total_logical_reads/qs.execution_count AS avg_logical_reads,
qs.total_elapsed_time/qs.execution_count AS avg_elapsed_time,
SUBSTRING(qt.text, (qs.statement_start_offset/2)+1,
((CASE qs.statement_end_offset
WHEN -1 THEN DATALENGTH(qt.text)
ELSE qs.statement_end_offset
END - qs.statement_start_offset)/2)+1) AS query_text,
qt.dbid,
DB_NAME(qt.dbid) AS database_name,
qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
ORDER BY qs.total_logical_reads DESC;
-- Wait statistics
SELECT TOP 20
wait_type,
wait_time_ms/1000 AS wait_time_sec,
CONVERT(DECIMAL(12,2), (wait_time_ms*100.0)/SUM(wait_time_ms) OVER()) AS pct,
signal_wait_time_ms/1000 AS signal_wait_time_sec
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (
'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE',
'SLEEP_TASK', 'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH',
'WAITFOR', 'LOGMGR_QUEUE', 'CHECKPOINT_QUEUE',
'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BROKER_TO_FLUSH',
'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT',
'DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
'ONDEMAND_TASK_QUEUE', 'BROKER_EVENTHANDLER', 'SLEEP_BPOOL_FLUSH',
'SLEEP_DBSTARTUP', 'DIRTY_PAGE_POLL', 'HADR_FILESTREAM_IOMGR_IOCOMPLETION'
)
ORDER BY wait_time_ms DESC;
4. Resource Utilization
-- Memory usage
SELECT
physical_memory_kb/1024 AS PhysicalMemoryMB,
committed_kb/1024 AS SQLServerCommittedMB,
committed_target_kb/1024 AS SQLServerTargetMB
FROM sys.dm_os_sys_memory;
-- Buffer pool usage by database
SELECT
DB_NAME(database_id) AS DatabaseName,
COUNT(*) * 8/1024 AS BufferSizeMB,
SUM(CASE WHEN is_modified = 1 THEN 1 ELSE 0 END) * 8/1024 AS DirtyMB
FROM sys.dm_os_buffer_descriptors
GROUP BY database_id
ORDER BY BufferSizeMB DESC;
5. Backup Status Check
-- Last backup dates
SELECT
d.name AS DatabaseName,
MAX(CASE WHEN b.type = 'D' THEN b.backup_finish_date ELSE NULL END) AS LastFullBackup,
MAX(CASE WHEN b.type = 'I' THEN b.backup_finish_date ELSE NULL END) AS LastDiffBackup,
MAX(CASE WHEN b.type = 'L' THEN b.backup_finish_date ELSE NULL END) AS LastLogBackup
FROM sys.databases d
LEFT JOIN msdb.dbo.backupset b ON d.name = b.database_name
WHERE d.database_id > 4 -- Skip system databases
GROUP BY d.name
ORDER BY d.name;
6. Index Health Check
-- Fragmented indexes
SELECT
OBJECT_NAME(ind.OBJECT_ID) AS TableName,
ind.name AS IndexName,
indexstats.avg_fragmentation_in_percent,
indexstats.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) indexstats
INNER JOIN sys.indexes ind ON ind.object_id = indexstats.object_id AND ind.index_id = indexstats.index_id
WHERE indexstats.avg_fragmentation_in_percent > 10
ORDER BY indexstats.avg_fragmentation_in_percent DESC;
-- Missing indexes
SELECT TOP 20
migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,
mid.statement AS table_name,
mid.equality_columns,
mid.inequality_columns,
mid.included_columns,
migs.user_seeks,
migs.user_scans,
migs.last_user_seek,
migs.avg_total_user_cost,
migs.avg_user_impact
FROM sys.dm_db_missing_index_details mid
INNER JOIN sys.dm_db_missing_index_groups mig ON mid.index_handle = mig.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
ORDER BY improvement_measure DESC;
7. Security Checks
-- Logins with sysadmin role
SELECT
l.name AS LoginName,
l.type_desc AS LoginType,
l.is_disabled
FROM sys.server_principals l
JOIN sys.syslogins s ON l.sid = s.sid
WHERE s.sysadmin = 1
AND l.name NOT LIKE 'NT AUTHORITY\%'
AND l.name NOT LIKE 'NT SERVICE\%'
ORDER BY l.name;
-- Orphaned users
EXEC sp_change_users_login 'Report';
-- Database roles and members
SELECT
DP1.name AS DatabaseUser,
DP2.name AS DatabaseRole
FROM sys.database_role_members DRM
RIGHT OUTER JOIN sys.database_principals DP1 ON DRM.member_principal_id = DP1.principal_id
LEFT OUTER JOIN sys.database_principals DP2 ON DRM.role_principal_id = DP2.principal_id
WHERE DP1.type = 'S'
AND DP2.name IS NOT NULL
ORDER BY DP2.name, DP1.name;
8. Drive Space Check
-- Drive space (requires xp_fixeddrives)
CREATE TABLE #DriveSpace (
Drive CHAR(1),
MBFree INT
);
INSERT INTO #DriveSpace
EXEC xp_fixeddrives;
SELECT
Drive,
MBFree/1024.0 AS GBFree
FROM #DriveSpace
ORDER BY Drive;
DROP TABLE #DriveSpace;
Enhancing Your Health Check Script
1. Add HTML Reporting
Convert your results to HTML for email reports:
DECLARE @HTML NVARCHAR(MAX) = N'
<style>
table { border-collapse: collapse; width: 100%; }
th, td { border: 1px solid #ddd; padding: 8px; }
th { background-color: #f2f2f2; }
.warning { background-color: #fff3cd; }
.critical { background-color: #f8d7da; }
</style>
<h2>SQL Server Health Check Report</h2>
<p>Server: ' + @@SERVERNAME + '</p>
<p>Generated: ' + CONVERT(VARCHAR(20), GETDATE(), 120) + '</p>';
-- Add each section to the HTML report
-- Example for database status:
SET @HTML = @HTML + N'
<h3>Database Status</h3>
<table>
<tr>
<th>Database</th>
<th>Status</th>
<th>Recovery Model</th>
<th>Compatibility</th>
</tr>';
SELECT @HTML = @HTML + N'
<tr' +
CASE WHEN state_desc != 'ONLINE' THEN ' class="critical"'
WHEN recovery_model_desc = 'SIMPLE' AND name NOT IN ('tempdb', 'model') THEN ' class="warning"'
ELSE '' END + '>
<td>' + name + '</td>
<td>' + state_desc + '</td>
<td>' + recovery_model_desc + '</td>
<td>' + CAST(compatibility_level AS VARCHAR) + '</td>
</tr>'
FROM sys.databases
ORDER BY name;
SET @HTML = @HTML + N'
</table>';
-- Output the HTML (or send via email)
SELECT @HTML AS HTMLReport;
2. Create a Stored Procedure
Package all checks into a stored procedure:
CREATE PROCEDURE dbo.usp_SQLServerHealthCheck
@SendEmail BIT = 0,
@EmailRecipients NVARCHAR(500) = NULL
AS
BEGIN
SET NOCOUNT ON;
-- Create temp tables for all results
-- Run all checks
-- Generate HTML report
IF @SendEmail = 1 AND @EmailRecipients IS NOT NULL
BEGIN
EXEC msdb.dbo.sp_send_dbmail
@profile_name = 'DBA Mail Profile',
@recipients = @EmailRecipients,
@subject = 'SQL Server Health Check - ' + @@SERVERNAME,
@body = @HTML,
@body_format = 'HTML';
END
ELSE
BEGIN
-- Return raw result sets
-- Execute each query here
END
END
3. Schedule Regular Checks
Create a SQL Agent job to run weekly:
USE msdb;
GO
EXEC dbo.sp_add_job
@job_name = 'Weekly Health Check';
GO
EXEC sp_add_jobstep
@job_name = 'Weekly Health Check',
@step_name = 'Run Health Check',
@subsystem = 'TSQL',
@command = 'EXEC dbo.usp_SQLServerHealthCheck
@SendEmail = 1,
@EmailRecipients = ''dba-team@yourcompany.com''';
GO
EXEC sp_add_schedule
@schedule_name = 'Weekly Saturday 2AM',
@freq_type = 8, -- Weekly
@freq_interval = 64, -- Saturday
@freq_recurrence_factor = 1,
@active_start_time = 020000; -- 2:00 AM
GO
EXEC sp_attach_schedule
@job_name = 'Weekly Health Check',
@schedule_name = 'Weekly Saturday 2AM';
GO
EXEC dbo.sp_add_jobserver
@job_name = 'Weekly Health Check';
GO
Advanced Enhancements
-
Add Historical Tracking: Create tables to store historical data for trend analysis
-
Threshold-Based Alerts: Configure different warning levels for different metrics
-
Custom Checks: Add checks specific to your environment (linked servers, replication, etc.)
-
PowerShell Integration: Wrap the script in PowerShell for multi-server execution
-
Dashboard Integration: Output data in formats compatible with Power BI or other dashboards
Conclusion
A comprehensive SQL Server health check script is an invaluable tool for any DBA. The script we've built provides:
-
Instant visibility into server health
-
Proactive issue detection
-
Documentation of server state
-
Customizable reporting
Remember to:
-
Test thoroughly in non-production first
-
Adjust thresholds for your specific environment
-
Review results regularly to establish baselines
-
Update the script as you identify new important metrics
By implementing this health check process, you'll significantly improve your ability to maintain stable, high-performing SQL Server environments.
Comments
Post a Comment