Two months ago, one of my clients called me that his DB Server is not responsive.
Their Server is a HP DL580 with this specs:
- CPU: 4 x 12 cores
- Memory: 256 GB
- Storage: 2 x Raid 10 (8 x 600 GB SAS 15K)
- OS: Windows Server 2012 r2
- DBMS: SQL Server 2014 sp2
The situation was strange:
- We had ping
- IIS , SSMS and ODBC Data Sources could not connect to the SQL Server
- RDP was not responsive
We decided to call the Data Center and ask them to restart the server by powering it off. After restarting server, I start investigating what, why and when the issue occurred by using Event Viewer ans SSMS log file viewer. I found that we had an event id 129 exactly before stopping the SQL Server services.
So, I start identifying that what it is and what we can do to resolve it.
I found this MSDN blog post:
I applied its solution, but we had the same problem after a one week.
Then, I found this one:
This one helped me so much to learn what usually happen in Windows I/O layered structure. So, I found that the problem was from the raid controller. We asked Hardware Unit to replace it with new one. We didn’t have any issue after changing it.
based on this article: If you are seeing Event ID 129 errors in your event logs, then you should start investigating the storage and fiber network.