02 January 2013

Recommended actions for corrupt or suspect databases

Overview
Encountering a suspect database or corruption in a database is a rare thing. It can happen, however, most often due to faulty hardware or operational mistakes (like deleting a transaction log file).

More information
The points below are recommendations for handling a situation where you have some type of corruption in a database or if the database goes into suspect status. 



Details
  1. Ensure you have a backup strategy that you can use to recover from hardware failures (including corruption). I recommend performing both database and log backup in most situations.
  2. Look in the ERRORLOG file for for your SQL Server. This will tell you what the problem is, if SQL Server encountered the problem during startup. Like for instance a missing data or log file. Don't skip this step. Say you have a problem with your car, and somebody can tell you exactly what the problem is. Wouldn't you like to know that before you start repairing the car? Same thing applies here.
  3. Do SELECT name, database_id, state_desc FROM sys.databases. State will tell you a bit more of what is the problem. You will typically see RECOVERY_PENDING if a database missing and hindering SQL Server from performing automatic recovery at startup. Some versions back, we would instead see suspect status for the database. For these situations (database file missing), you are likely not able to perform DBCC CHECKDB and if it a log file which is missing you will not be able to perform the last log backup.
  4. If you can, run DBCC CHECKDB against the database: Search Books Online and the Net for the error numbers returned. There might be specific recommendations for your error messages. The NO_INFOMSGS option of DBCC is helpful, it makes DBCC return error messages only. Example:

    DBCC CHECKDB(dbname) WITH NO_INFOMSGS
  5. Find out why this happened. Check errorlog, eventlog, do HW diagnostics etc.; search Books Online and KB for those errors.
  6. If there is a hardware problem, ensure the faulty hardware is replaced.
  7. Backup the log, if possible. This is what we sometimes call "tail-log backup", where I like to call it "the last log backup". This assumes that log backup schedule is in place. If the database is suspect or RECOVERY_PENDING, you must use the NO_TRUNCATE option for the BACKUP command. Also, you might want to do a file backup of the mdf and ldf files, for extra safety.
    • Restore is the best thing to do now. If you managed to backup log in above step, then you will most probably have zero data loss. Then restore the latest clean database backup and the subsequent log backups, including the one taken in above step. As of SQL Server 2005, we have page level restore, meaning that we can restore only the damaged pages (instead of a full backup) and then the subsequent log backups.
    • If the database isn't suspect, then DBCC CHECKDB with a REPAIR option might be a secondary option but this will often result in loss of data. Additional solutions, depending on the errors, may be to manually rebuild non-clustered indexes, manually drop and reload a table if the data is static, and so on. In my opinion, these options are for the more experienced SQL Server DBA. If you feel uncertain, I suggest you get help.

If you feel uncertain with above steps, I recommend letting MS Support hand-hold you through the steps appropriate for your particular situation. Also, Paul Randal is an authority on this topic. Check his blog for great information, examples etc.

No comments: