MSSQL Connection Timeouts & Resource Exhaustion

1. Executive Summary

Your server experienced connection timeouts because the number of active database connections (1,745) exceeded the Linux operating system's default Open File Limit (ulimit) of 1,024. When this limit is reached, the OS prevents the SQL Server service from opening new network sockets, causing all subsequent connection attempts to hang or time out.


2. Diagnostics (How to verify)

To confirm if you are hitting these limits in the future, use the following commands:

  • Check Active Connections (OS Level): netstat -an | grep 1433 | wc -l (A high count, e.g., >1000, indicates a leak).

  • Check Connection States: netstat -an | grep 1433 | awk '{print $6}' | sort | uniq -c (High "ESTABLISHED" counts suggest application-level leaks).

  • Check OS Limits: ulimit -Sn (The "Soft Limit" is the ceiling that causes the timeout).


3. Immediate Resolution (Increasing Limits)

To allow MSSQL Enterprise to utilize its full capacity, you must increase the file descriptor limit for the service.

  1. Modify the Service Configuration:

    Bash

    sudo systemctl edit mssql-server
    
  2. Add the following configuration:

    Ini, TOML

    [Service]
    LimitNOFILE=64000
    
  3. Apply Changes:

    Bash

    sudo systemctl daemon-reload
    sudo systemctl restart mssql-server
    

    This increases the ceiling from 1,024 to 64,000, preventing immediate timeouts.


4. Root Cause: Addressing Connection Leaks

Increasing the limit is a temporary fix. 1,745 connections is abnormal for most workloads and indicates that the application is not closing connections properly.

  • Identify the Source IP: Use ss -atp | grep :1433 to find which remote server is opening excessive connections.

  • Code Review: Ensure all database connections in your application code use "Dispose," "Close," or "Using" blocks.

  • SQL Cleanup: Run the following query to identify "Sleeping" sessions that have been idle for a long time:

    SQL

    SELECT session_id, login_name, host_name, status 
    FROM sys.dm_exec_sessions 
    WHERE status = 'sleeping' AND is_user_process = 1;
    

    If sessions are 'sleeping' but not 'running', the application has forgotten to close them.


5. Enterprise Best Practices

  • Monitor Threadpool: In MSSQL Enterprise, monitor sys.dm_os_schedulers to ensure active_workers_count isn't reaching its limit, which can happen if connections continue to climb.

  • Connection Pooling: Adjust your application's connection string to set a Max Pool Size (default is 100) to prevent a single application instance from overwhelming the server.

Updated on