When running ad-hoc queries against a production database it is paramount that a) you guard against any unintended data modifications, and b) you ensure that your “footprint” is as small as possible in order to guard against interferring with the normal operation of the server. It is to this end that I intend on starting a list of best practices for this scenario. Ultimately, best practice is to NEVER run ad-hoc queries against a production machine, but we all do it from time to time, usually out of necessity, sometimes out of sloppiness. So for those times (and i mean the necessary times, not the sloppy times), please please please follow these small tips.
Please feel free to add to this list:
a) Use the WITH NOLOCK query hint on all select statements. (For instance, SELECT TOP 100 <fieldlist> FROM AuditTable WITH NOLOCK)
Description: The WITH NOLOCK hint tells the SQL engine to NOT place any locks on the rows/tables/database when it performs the query. It also tells the SQL engine to read the table as-is – this means that it will read transactions that are in-progress, and could result in phantom rows or a “non-repeatable read” (look it up if you care about the details). WITH NOLOCK is synonymous with changing the transaction isolation level to READ UNCOMMITTED, but is much simpler to type out, so recommended for ad-hoc querying. While this may comprimize the integrity of your search results in high-transaction systems, it also means that it will never lock or block any other transactions. My guess is that in 99.9% of the ad-hoc querying that you will be doing on a transactional production system, you are more concerned about not messing with the live system than you are about ensuring that the data you’re retrieving is currently being modified, and thus may not be entirely accurate to-the-second.
b) Use the TOP keyword in your SELECT statements to minimize the data transfer and the disk reads required.
If you are querying a table of any significant size (and production systems often contain tables with millions or billions of rows), please use TOP to limit the scope of your select statement. Usually an ad-hoc query is trying to find out something specific, so don’t go running broad select statements that will clog a finely-tuned production system. Be as specific as possible, and only return the rows you need. For instance, if you’re looking through an audit log for the most recent person to modify a particular piece of data, adding TOP 1 to “SELECT UserId FROM AuditTable WHERE DataId=x ORDER BY [Date] DESC” could result in a HUGE improvement in query speed. In fact, if your [Date] column is indexed, especially if it has a clustered index, using TOP will prevent the server from even having to read or consider the rest of the rows! Comparing best-case scenario (clustered index on [Date]) to worst-case scenario (no indexes at all), the server would only have to read one record, instead of reading every record in the table, sort them all, and return them all. Give your database server a break. Use TOP (and indexes, but that’s a different topic) 🙂