Let’s say you are trying to determine the performance impact of a neat database design change you have just devised on an application. So you run some tests with the existing design and the tests run for several hours. Coming back the next day, you make the change and re-run the same tests. The test results look fantastic. Now, before you jump up and down announcing to the world how great your new design trick is, double check whether your change is the only variable responsible for the performance improvement.
If you are using Storage Area Network (SAN) and storage is a significant factor in your tests, there is a danger that your conclusion may be built on shifting sand because the performance of the drives provisioned from SAN may have changed between your tests. Unbeknown to you, the test results reflect primarily, not your design change, but some uncontrolled change inside SAN.
I ran into a similar situation when I was checking the performance impact of disk partition alignment (or misalignment). Good thing that the SAN change didn’t happen between tests of different configurations, but took place when I was repeating the same tests. So I caught the change right away. The following chart shows that when I ran the exactly same SQLIO benchmark tests for the 3rd and 4th time, the I/O performance profile of the drive changed dramatically.
Image may be NSFW.
Clik here to view.
Your SAN performance may shift underneath you for many reasons, some good and some not so good. You can try to cozy up to your SAN folks. But that would not guarantee you the inside knowledge of all the changes. Nor would you want to know all the nitty-gritties happening inside SAN.
The best way to guard against this situation from misleading you to an embarrassingly wrong conclusion is to randomize your test schedule. You may want to alternate between the different test configurations as you carry out your tests.
Say, you are testing the performance difference between config 1 and config 2. So instead of doing all the tests on config 1 on day one and all the tests on config 2 on day two, schedule your tests so that config 1 tests are interleaved with config 2 tests throughout the two days. If that’s not possible, conduct all your config 1 tests followed by all your config 2 tests as usual. But before you draw any conclusion, repeat some of your config 1 tests to verify that the results you obtained previously are still repeatable.
Now, don’t get me wrong. I think SAN has been a boon to our industry. It pushes storage down the infrastructure stack and abstracts us away from the nasty details of its management. The adoption of SAN moves us closer to the grand vision of utility computing (okay, not that we are ral close to that vision).
Today, many enterprises have deployed SAN of one kind or another, and some large enterprises are exclusively relying on SAN for storage provisioning. You just have to learn to live with its idiosyncrasies.