Building a Comprehensive SQL Server Health Check Script: A DBA's Essential Tool

Introduction

As a SQL Server DBA, having a reliable health check script is like having a stethoscope for your databases it helps you quickly diagnose issues before they become critical problems. In this post, I'll walk you through building a custom SQL Server health check script that goes beyond basic checks to give you a complete picture of your server's wellbeing.

Why a Custom Health Check?

While SQL Server provides numerous DMVs and system views, pulling together the right information quickly during an incident can be challenging. A well-designed health check script:

  • Provides a consistent assessment framework

  • Saves time during troubleshooting

  • Helps identify problems before users notice

  • Creates documentation of your server's baseline state

The Foundation: Key Areas to Monitor

Our health check will examine these critical areas:

  1. Instance Configuration

  2. Database Status

  3. Performance Metrics

  4. Resource Utilization

  5. Backup Status

  6. Index Health

  7. Security Checks

  8. Drive Space

Building the Health Check Script

1. Instance Configuration Check

-- Basic instance information
SELECT
    SERVERPROPERTY('ServerName') AS ServerName,
    SERVERPROPERTY('Edition') AS Edition,
    SERVERPROPERTY('ProductVersion') AS Version,
    SERVERPROPERTY('ProductLevel') AS ProductLevel,
    SERVERPROPERTY('IsClustered') AS IsClustered,
    SERVERPROPERTY('Collation') AS Collation;

-- Important configuration settings
SELECT
    name AS ConfigName,
    value AS ConfigValue,
    value_in_use AS CurrentValue,
    description
FROM sys.configurations
WHERE name IN (
    'max degree of parallelism',
    'cost threshold for parallelism',
    'max server memory (MB)',
    'min server memory (MB)',
    'optimize for ad hoc workloads',
    'backup compression default'
)
ORDER BY name;

2. Database Status Check

-- Database status and recovery model
SELECT
    name AS DatabaseName,
    state_desc AS Status,
    recovery_model_desc AS RecoveryModel,
    compatibility_level AS CompatibilityLevel,
    user_access_desc AS UserAccess,
    is_auto_close_on,
    is_auto_shrink_on,
    page_verify_option_desc AS PageVerify
FROM sys.databases
ORDER BY name;

-- Database file information
SELECT
    DB_NAME(database_id) AS DatabaseName,
    name AS LogicalName,
    type_desc AS FileType,
    physical_name AS PhysicalPath,
    CONVERT(DECIMAL(18,2), size/128.0) AS SizeMB,
    CONVERT(DECIMAL(18,2), FILEPROPERTY(name, 'SpaceUsed')/128.0) AS UsedMB,
    CONVERT(DECIMAL(18,2), (size - FILEPROPERTY(name, 'SpaceUsed'))/128.0) AS FreeMB,
    growth AS GrowthPages,
    is_percent_growth
FROM sys.master_files
ORDER BY DatabaseName, type_desc;

3. Performance Metrics

-- Top CPU-consuming queries (from cache)
SELECT TOP 20
    qs.execution_count,
    qs.total_logical_reads/qs.execution_count AS avg_logical_reads,
    qs.total_elapsed_time/qs.execution_count AS avg_elapsed_time,
    SUBSTRING(qt.text, (qs.statement_start_offset/2)+1,
        ((CASE qs.statement_end_offset
          WHEN -1 THEN DATALENGTH(qt.text)
          ELSE qs.statement_end_offset
          END - qs.statement_start_offset)/2)+1) AS query_text,
    qt.dbid,
    DB_NAME(qt.dbid) AS database_name,
    qp.query_plan
FROM sys.dm_exec_query_stats qs
CROSS APPLY sys.dm_exec_sql_text(qs.sql_handle) AS qt
CROSS APPLY sys.dm_exec_query_plan(qs.plan_handle) AS qp
ORDER BY qs.total_logical_reads DESC;

-- Wait statistics
SELECT TOP 20
    wait_type,
    wait_time_ms/1000 AS wait_time_sec,
    CONVERT(DECIMAL(12,2), (wait_time_ms*100.0)/SUM(wait_time_ms) OVER()) AS pct,
    signal_wait_time_ms/1000 AS signal_wait_time_sec
FROM sys.dm_os_wait_stats
WHERE wait_type NOT IN (
    'CLR_SEMAPHORE', 'LAZYWRITER_SLEEP', 'RESOURCE_QUEUE',
    'SLEEP_TASK', 'SLEEP_SYSTEMTASK', 'SQLTRACE_BUFFER_FLUSH',
    'WAITFOR', 'LOGMGR_QUEUE', 'CHECKPOINT_QUEUE',
    'REQUEST_FOR_DEADLOCK_SEARCH', 'XE_TIMER_EVENT', 'BROKER_TO_FLUSH',
    'BROKER_TASK_STOP', 'CLR_MANUAL_EVENT', 'CLR_AUTO_EVENT',
    'DISPATCHER_QUEUE_SEMAPHORE', 'FT_IFTS_SCHEDULER_IDLE_WAIT',
    'XE_DISPATCHER_WAIT', 'XE_DISPATCHER_JOIN', 'SQLTRACE_INCREMENTAL_FLUSH_SLEEP',
    'ONDEMAND_TASK_QUEUE', 'BROKER_EVENTHANDLER', 'SLEEP_BPOOL_FLUSH',
    'SLEEP_DBSTARTUP', 'DIRTY_PAGE_POLL', 'HADR_FILESTREAM_IOMGR_IOCOMPLETION'
)
ORDER BY wait_time_ms DESC;

4. Resource Utilization

-- Memory usage
SELECT
    physical_memory_kb/1024 AS PhysicalMemoryMB,
    committed_kb/1024 AS SQLServerCommittedMB,
    committed_target_kb/1024 AS SQLServerTargetMB
FROM sys.dm_os_sys_memory;

-- Buffer pool usage by database
SELECT
    DB_NAME(database_id) AS DatabaseName,
    COUNT(*) * 8/1024 AS BufferSizeMB,
    SUM(CASE WHEN is_modified = 1 THEN 1 ELSE 0 END) * 8/1024 AS DirtyMB
FROM sys.dm_os_buffer_descriptors
GROUP BY database_id
ORDER BY BufferSizeMB DESC;

5. Backup Status Check

-- Last backup dates
SELECT
    d.name AS DatabaseName,
    MAX(CASE WHEN b.type = 'D' THEN b.backup_finish_date ELSE NULL END) AS LastFullBackup,
    MAX(CASE WHEN b.type = 'I' THEN b.backup_finish_date ELSE NULL END) AS LastDiffBackup,
    MAX(CASE WHEN b.type = 'L' THEN b.backup_finish_date ELSE NULL END) AS LastLogBackup
FROM sys.databases d
LEFT JOIN msdb.dbo.backupset b ON d.name = b.database_name
WHERE d.database_id > 4 -- Skip system databases
GROUP BY d.name
ORDER BY d.name;

6. Index Health Check

-- Fragmented indexes
SELECT
    OBJECT_NAME(ind.OBJECT_ID) AS TableName,
    ind.name AS IndexName,
    indexstats.avg_fragmentation_in_percent,
    indexstats.page_count
FROM sys.dm_db_index_physical_stats(DB_ID(), NULL, NULL, NULL, NULL) indexstats
INNER JOIN sys.indexes ind ON ind.object_id = indexstats.object_id AND ind.index_id = indexstats.index_id
WHERE indexstats.avg_fragmentation_in_percent > 10
ORDER BY indexstats.avg_fragmentation_in_percent DESC;

-- Missing indexes
SELECT TOP 20
    migs.avg_total_user_cost * (migs.avg_user_impact / 100.0) * (migs.user_seeks + migs.user_scans) AS improvement_measure,
    mid.statement AS table_name,
    mid.equality_columns,
    mid.inequality_columns,
    mid.included_columns,
    migs.user_seeks,
    migs.user_scans,
    migs.last_user_seek,
    migs.avg_total_user_cost,
    migs.avg_user_impact
FROM sys.dm_db_missing_index_details mid
INNER JOIN sys.dm_db_missing_index_groups mig ON mid.index_handle = mig.index_handle
INNER JOIN sys.dm_db_missing_index_group_stats migs ON mig.index_group_handle = migs.group_handle
ORDER BY improvement_measure DESC;

7. Security Checks

-- Logins with sysadmin role
SELECT
    l.name AS LoginName,
    l.type_desc AS LoginType,
    l.is_disabled
FROM sys.server_principals l
JOIN sys.syslogins s ON l.sid = s.sid
WHERE s.sysadmin = 1
AND l.name NOT LIKE 'NT AUTHORITY\%'
AND l.name NOT LIKE 'NT SERVICE\%'
ORDER BY l.name;

-- Orphaned users
EXEC sp_change_users_login 'Report';

-- Database roles and members
SELECT
    DP1.name AS DatabaseUser,
    DP2.name AS DatabaseRole
FROM sys.database_role_members DRM
RIGHT OUTER JOIN sys.database_principals DP1 ON DRM.member_principal_id = DP1.principal_id
LEFT OUTER JOIN sys.database_principals DP2 ON DRM.role_principal_id = DP2.principal_id
WHERE DP1.type = 'S'
AND DP2.name IS NOT NULL
ORDER BY DP2.name, DP1.name;

8. Drive Space Check

-- Drive space (requires xp_fixeddrives)
CREATE TABLE #DriveSpace (
    Drive CHAR(1),
    MBFree INT
);

INSERT INTO #DriveSpace
EXEC xp_fixeddrives;

SELECT
    Drive,
    MBFree/1024.0 AS GBFree
FROM #DriveSpace
ORDER BY Drive;

DROP TABLE #DriveSpace;

Enhancing Your Health Check Script

1. Add HTML Reporting

Convert your results to HTML for email reports:

DECLARE @HTML NVARCHAR(MAX) = N'
<style>
    table { border-collapse: collapse; width: 100%; }
    th, td { border: 1px solid #ddd; padding: 8px; }
    th { background-color: #f2f2f2; }
    .warning { background-color: #fff3cd; }
    .critical { background-color: #f8d7da; }
</style>
<h2>SQL Server Health Check Report</h2>
<p>Server: ' + @@SERVERNAME + '</p>
<p>Generated: ' + CONVERT(VARCHAR(20), GETDATE(), 120) + '</p>';

-- Add each section to the HTML report
-- Example for database status:
SET @HTML = @HTML + N'
<h3>Database Status</h3>
<table>
    <tr>
        <th>Database</th>
        <th>Status</th>
        <th>Recovery Model</th>
        <th>Compatibility</th>
    </tr>';

SELECT @HTML = @HTML + N'
    <tr' +
    CASE WHEN state_desc != 'ONLINE' THEN ' class="critical"'
         WHEN recovery_model_desc = 'SIMPLE' AND name NOT IN ('tempdb', 'model') THEN ' class="warning"'
         ELSE '' END + '>
        <td>' + name + '</td>
        <td>' + state_desc + '</td>
        <td>' + recovery_model_desc + '</td>
        <td>' + CAST(compatibility_level AS VARCHAR) + '</td>
    </tr>'
FROM sys.databases
ORDER BY name;

SET @HTML = @HTML + N'
</table>';

-- Output the HTML (or send via email)
SELECT @HTML AS HTMLReport;

2. Create a Stored Procedure

Package all checks into a stored procedure:

CREATE PROCEDURE dbo.usp_SQLServerHealthCheck
    @SendEmail BIT = 0,
    @EmailRecipients NVARCHAR(500) = NULL
AS
BEGIN
    SET NOCOUNT ON;
    
    -- Create temp tables for all results
    -- Run all checks
    -- Generate HTML report
    
    IF @SendEmail = 1 AND @EmailRecipients IS NOT NULL
    BEGIN
        EXEC msdb.dbo.sp_send_dbmail
            @profile_name = 'DBA Mail Profile',
            @recipients = @EmailRecipients,
            @subject = 'SQL Server Health Check - ' + @@SERVERNAME,
            @body = @HTML,
            @body_format = 'HTML';
    END
    ELSE
    BEGIN
        -- Return raw result sets
        -- Execute each query here
    END
END

3. Schedule Regular Checks

Create a SQL Agent job to run weekly:

USE msdb;
GO

EXEC dbo.sp_add_job
    @job_name = 'Weekly Health Check';
GO

EXEC sp_add_jobstep
    @job_name = 'Weekly Health Check',
    @step_name = 'Run Health Check',
    @subsystem = 'TSQL',
    @command = 'EXEC dbo.usp_SQLServerHealthCheck
                @SendEmail = 1,
                @EmailRecipients = ''dba-team@yourcompany.com''';
GO

EXEC sp_add_schedule
    @schedule_name = 'Weekly Saturday 2AM',
    @freq_type = 8, -- Weekly
    @freq_interval = 64, -- Saturday
    @freq_recurrence_factor = 1,
    @active_start_time = 020000; -- 2:00 AM
GO

EXEC sp_attach_schedule
    @job_name = 'Weekly Health Check',
    @schedule_name = 'Weekly Saturday 2AM';
GO

EXEC dbo.sp_add_jobserver
    @job_name = 'Weekly Health Check';
GO

Advanced Enhancements

  1. Add Historical Tracking: Create tables to store historical data for trend analysis

  2. Threshold-Based Alerts: Configure different warning levels for different metrics

  3. Custom Checks: Add checks specific to your environment (linked servers, replication, etc.)

  4. PowerShell Integration: Wrap the script in PowerShell for multi-server execution

  5. Dashboard Integration: Output data in formats compatible with Power BI or other dashboards

Conclusion

A comprehensive SQL Server health check script is an invaluable tool for any DBA. The script we've built provides:

  • Instant visibility into server health

  • Proactive issue detection

  • Documentation of server state

  • Customizable reporting

Remember to:

  1. Test thoroughly in non-production first

  2. Adjust thresholds for your specific environment

  3. Review results regularly to establish baselines

  4. Update the script as you identify new important metrics

By implementing this health check process, you'll significantly improve your ability to maintain stable, high-performing SQL Server environments.

Comments

Popular posts from this blog

Migrating SQL Server to Azure SQL Database: A Step-by-Step Guide

MS SQL Server Performance Optimization: Best Practices & Tips

Common Causes of Slow Queries in SQL Server and How to Fix Them