I’m not a DBA, But I Play One on TV: Part 2 – CPU and RAM

In Part 1 I discussed SQL Server and Hard Disk configurations. Now let’s have a look at CPU and RAM. This topic is actually kind of easy. More is better… most of the time.

CPU

It’s my opinion that most development environments should have a minimum of 4, 2.5+ GHz Processors, If that’s one socket with two cores, or one socket with 4 cores or, or two sockets with 2 cores, doesn’t really make that much of a difference. For a low utilization production system you’ll need 8, 2.5+ GHz processors. Look, you can get this level of chip in a mid-high grade laptop. Now if you’re looking at a very high utilization system it’s time to think about 16 processors or 32 split up over 2 or more sockets. Once you get to the land of 32 processors advanced SQL Server configuration knowledge is required. In particular you will need to know how to tweak the MAXDOP (Maximum Degree of Parallelism) setting.

Here’s a great read for setting a query hint: http://blog.sqlauthority.com/2010/03/15/sql-server-maxdop-settings-to-limit-query-to-run-on-specific-cpu/

And here are instructions for a system wide setting: http://technet.microsoft.com/en-us/library/ms189094(v=sql.105).aspx

What does this setting do? It controls the number of parallel processes SQL Server will use when servicing your queries. So why don’t we want SQL Server to maximize the number of parallel processes all the time? There is another engine involved in the process that is responsible for determining which processes can and cannot be done in parallel and the order of the parallel batches. In a very highly utilized SQL Server environment this engine can get bogged down. Think of it like air traffic control at a large airport… but there’s only one controller in the tower and it’s Thanksgiving the biggest air travel holiday in the US. Well the one air traffic controller has to assign the runway for every plane coming in and going out. Obviously, he/she becomes the bottleneck for the whole airport. If this individual only had one or two runways to work with, they wouldn’t be the bottleneck; the airport architecture is the bottleneck. I have seen 32 processor systems grind to a halt with MAXDOP set at 0 because the parallelism rule processing system was overwhelmed.

For more information on the parallel processing process: http://technet.microsoft.com/en-us/library/ms178065(v=sql.105).aspx

RAM

RAM is always a “more is better” situation. Keep in mind that if you don’t set the size and location of the page file manually, the O/S is going to try and take 1.5 times of the RAM from the O/S hard drive. The more RAM on the system, the less often the O/S will have to utilize the much slower page file. For a development system 8GB will probably be fine, but now a days you can get a mid-high level Laptop with 16GB even 32GB is getting pretty cheap. For production 16GB is the minimum, but I’d really urge you to get 24GB. And like I said 32GB configurations are becoming very affordable.

I’m not a DBA, But I Play One on TV: Part 1 – Hard Drives

This is the first in a series of posts relating to hardware considerations for a SQL Server 2008 R2 or later server. In Part 1 – Hard Drives I’m going to discuss RAID levels and what works for the Operating System (O/S) versus what works for various SQL Server components.

As a consultant I always go through the same hardware spec dance. It sounds like this:

Q: How much disk space does your application database require?

A: Depends on your utilization.

Q: Ok, what’s the smallest server we can give you for a proof of concept or 30 day trial?

A: Depends on your utilization.

Q: Well we have this VM with a 40 GB disk, 8 GB RAM, and a dual Core virtual processor available. Will that work?

A: Depends on your utilization, but I seriously doubt it.

SQL Server 2008 R2, depending on the flavor will run on just about any Windows Server O/S 2005 and newer, Windows 7 and Windows 8. This isn’t really a discussion about the O/S, more of how the O/S services SQL Server hardware requests. At the hardware level the O/S has two main functions managing memory and the hard disks and servicing requests to those resources to applications.

In a later post we’ll look at memory in a little more depth, but for the hard disk discussion we’ll need to understand the page file. The page file has been part of Microsoft’s O/S products since NT maybe windows for workgroups, but I don’t want to go look it up. The page file is an extension of the physical memory that resides one or more of the system’s hard disks. The O/S will decide when to access this portion of the Memory available to services and applications (processes) requesting memory resources. Many times when a process requires more memory than is currently available the O/S will use the page file to virtually increase the size of the memory on the system in a manner transparent to the requesting process.

Let’s sum that up. The page file is a portion of disk space used by the O/S to expand the amount of memory available to processes running on the system. The implication here is that the O/S will be performing some tasks meant for lightning fast chip RAM, on the much slower hard disk virtual memory because there is insufficient chip RAM for the task. By default the O/S wants to set aside 1.5 times the physical chip RAM in virtual memory disk space. For 16GB of RAM that’s a 24GB page file. On a 40GB drive that doesn’t leave much room for anything else. The more physical chip RAM on the server the bigger the O/S will want to make the page file, but the O/S will actually access it less often.

Now let’s talk RAID settings! You may find voluminous literature arguing the case for software RAID versus Hardware Raid. I’ll leave that to the true server scientists. I’m just going to give quick list of which RAID configurations O/S and SQL Server components will perform well with and which will cause issues. I’m going for understanding here. There are plenty of great configuration lists you can reference, but if you don’t understand how this stuff works you’re relying on memorization or constantly referencing the lists.

Summarization from: http://en.wikipedia.org/wiki/RAID

But this has better pictures: http://technet.microsoft.com/en-us/library/ms190764(v=SQL.105).aspx

RAID 0 – Makes multiple disks act like one, disk size is the sum of all identical disk sizes and there isn’t any failover or redundancy. One disk dies and all info is lost on all drives.

RAID 1 – Makes all the disks act like one, disk size is that of one of the identical disks in the array. Full fail over and redundancy.

RAID 2 – Theoretical, not used. Ha!

RAID 3 – Not very popular, but similar RAID 1, except that each third byte switches to the next disk in the array.

RAID 4 – One drive holds pointers to which drive holds each file. All disks act independently buy access by one drive letter.

RAID 5 – Requires at least 3 identical drives. All but one are live at all times the last acts as a backup should one of the other drives fail.

RAID 6 – Like RAID 5 except, you need at least 4 identical disks and two are offline backup disks.

RAID 10 or 1+0 – A tiered approach where two groups of RAID 1 arrays form a RAID 0 array. So two fully redundant RAID 1 arrays of 500GB made up of 3 500GB disks come together to form 1 RAID 0 array of 1TB. Sounds expensive, 3TB in physical disks to get 1TB accessible drive.

At this point I’ll paraphrase the information found here: http://technet.microsoft.com/en-US/library/cc966534

SQL Server Logs are written synchronously. One byte after the other. There isn’t any random or asynchronous read requests performed against these files by SQL Server. RAID 1 or 1+0 is recommended for this component for two reasons 1. Having a full redundant backup of the log files for disaster recovery. 2. RAID 1 mirrored drives support the sequential write I/O (I/O is short for disk read and write Input and Output. I’m not going to write that 50 times.) of the log file process better than RAID configuration that will split one file over multiple disks.

TempDB is the workhorse of SQL Server. When a query is sent to the databases engine all the work of collecting, linking, grouping, aggregating and ordering happens in the TempDB before the results are sent to the requestor. This makes TempDB a heavy write I/O process. So the popular recommendation is RAID 1+0. Here’s the consideration, TempDB is temporary, and that’s where it gets its name from. So redundancy isn’t required for disaster recovery. However if the disk your TempDB files are on fails, no queries can be processed until the disk is replaced and TempDB restored/rebuilt. RAID 1+0 helps fast writes and ensures uptime. RAID 5 provides the same functionality with fewer disks, but decreased performance when a disk fails.

TempDB and the Logs should NEVER EVER reside on the same raid arrays, so if we’re talking a minimum two RAID 1+0 arrays, might be more cost effective to put TempDB on RAID 5.

Application OLTP (On-line Transaction Processing) databases will benefit the most from RAID 5, which equally supports read and write I/O. Application databases should NEVER EVER reside on the same arrays as the Log files and co-locating with TempDB is also not recommended.

SQL Server comes with other database engine components like the master database and MSDB. These are SQL Server configuration components and mostly utilize read I/O. It’s good to have these components on a mirrored RAID configuration that doesn’t need a lot of write performance, like RAID 1.

A best practice production SQL Server configuration minimally looks like this:

Drive 1: O/S or C: Drive where the virtual memory is also serviced – RAID 1, 80 to 100 GB.

Drive 2: SQL Server Components (master, MSDB, and TempDB) data files – RAID 1+0, 100-240 GB

Drive 3: SQL Server Logs – RAID 1+0, 100-240 GB

Drive 4: Application databases – RAID 5, As much as the databases need…

Where to skimp on a development system? Maybe RAID isn’t available either?

Drive 1: O/S or C: Drive where the virtual memory is also serviced, 80 to 100 GB.

Drive 2: SQL Server Components (master, MSDB, and TempDB) data files Application database files, As much as the databases need…

Drive 3: SQL Server Logs, 100-240 GB

Optimal Production configuration?

Drive 1: O/S or C: Drive – RAID 1, 60 GB.

Drive 2: SQL Server Components (master, MSDB) data files – RAID 5, 100GB

Drive 3: SQL Server Logs – RAID 1+0, 100-240 GB

Drive 4: Application databases – RAID 5, As much as the databases need…

Drive 5: TempDB RAID 1+0, 50–100 GB

Drive 6: Dedicated Page File only RAID 1, 40GB. You don’t want to see what happens to a Windows O/S when the page file is not available.

Buffer I/O is the bane of my existence. I have left no rock unturned on the internet trying to figure out how this process works. So if someone reading can leave a clarifying comment for an edit I’d appreciate it. This I do know, the buffer is kind of like SQL Server’s own page file. A place on a hard disk where information is staged before it is written to the memory pool managed by the O/S. If your system is low on memory and using the page file extensively you will see Buffer I/O waits in the SQL Server Management Studio activity monitor. Basically, this indicates that the staging process is waiting on memory to become available to move data out of the buffer and into the memory pool. The query can’t write more information to the buffer until there is space open in the buffer for it. In fact if the query resultset is big enough, the whole system will begin to die a slow and horrible death as information cannot move in and out of memory or in and out of the buffer because so much information is going in and out of the page file. This is why I highly recommend splitting up the disks so that SQL Server does not have to fight with the page file for Disk I/O.

Look if you have 10 records in one table used by one user 2 times a day that VM with a 40 GB disk, 8 GB RAM, and a dual Core virtual processor available is going to do just fine. But you might as well save some cash and move that sucker onto Access or MYSQL or some other non-enterprise level RDBMS.

 

 

Sometimes You Gotta Clean That Thing

For my first professional job out of college I tried to scrape together any sort of experience I could put on my resume. I was applying for any sort of IT department that would have me. One of the things on my cover letter was designing and following a hardware maintenance plan that included vacuuming out the dust in my personal computer case. I think back about how ridiculous that was at the time.

Just a couple weeks ago the desktop I had my other residence near Chicago, in the vicinity of the industry and steel mills that make up North West Indiana, was blue screening frequently. When I pulled it out and set it on the table you could look through the Plexiglas side door and see every surface covered in fuzzy dust.

I brought the dirty thing back to Fort Wayne and opened it up. It was worse than I ever imagined. I filled up the tank on my little Dyson rechargeable vac. And that wasn’t even the real trouble spot. The dust had gotten so caked up behind the CPU fan that it was blocking the air flow through the heat sinks. The situation was so bad the dust had actually baked into the heat dissipating fins to the point tat it was brittle and had to be chiseled and blown out with compressed air rather than just vacuumed.

Now that I’m older I get to delegate some things, and I think it”s time my son started working on his technical experience.