Solaris ZFS vs. Linux with Hardware Raid
I’ve had to start using Xen virtualization for a current project we’re working on. I always hate switching back to Linux servers because all of our fancy tools and scripts for automation are written for Solaris since we only have a handful of Linux servers.
At any rate, I’ve got Xen all figured out and really started to dig into Linux’s LVM for the first time. There’s some similarities between LVM and ZFS, but most noticeably LVM doesn’t deal with RAID at all. You have to set up manual Linux software RAID and put a VolumeGroup on the RAID meta-device. So I set up a nice software RAID5 device, created a VolumeGroup, and off I went.
The write performance was horrendous.
So I begrudgingly went into the RAID controller BIOS and set up hardware RAID5 and put LVM on top of that. After the installation, I decided to see how fast this was compared to ZFS raid1z (which is more or less RAID5).
The machines are identical:
- Dual 6 Core Opteron
- Sun STK RAID Controller (Adaptec) — 256MB cache, write-back cache mode enabled
- 16 Gigs of memory
Here’s the results:
Linux — 21GB Write
# time dd if=/dev/zero of=/root/test bs=10240 count=2009600
2009600+0 records in
2009600+0 records out
20578304000 bytes (21 GB) copied, 146.226 seconds, 141 MB/s
real 2m26.377s
user 0m4.068s
sys 1m53.823s
Linux — 1GB Write
# time dd if=/dev/zero of=/root/test bs=10240 count=102400
102400+0 records in
102400+0 records out
1048576000 bytes (1.0 GB) copied, 2.69437 seconds, 389 MB/s
real 0m2.702s
user 0m0.108s
sys 0m2.584s
Solaris — 21GB Write
# time dd if=/dev/zero of=/zonepool/test bs=10240 count=2009600
2009600+0 records in
2009600+0 records out
20578304000 bytes (21 GB) copied, 55.3566 s, 372 MB/s
real 0m55.412s
user 0m0.913s
sys 0m27.012s
Solaris — 1GB Write
# time dd if=/dev/zero of=/zonepool/test bs=10240 count=102400
102400+0 records in
102400+0 records out
1048576000 bytes (1.0 GB) copied, 1.25254 s, 837 MB/s
real 0m1.257s
user 0m0.046s
sys 0m1.211s
837MB/s for burst writes on raidz1! ZFS is too awesome.
Here’s the controller configurations:
Linux Controller Configuration
Solaris Controller Configuration
now i’m curious what the write performance is with linux on the sun hardware raid card!
nevermind i’m retarded and misread your post. is write caching enabled in your adaptec raid card?
@mark
Yes, write-back cache mode is enabled for both. I added the full controller configs to the end of the post as well.
Another item of note is that on the Solaris machine, the array is only split across 4 disks (with 1 as a spare) while the hardware array is split across 5.
you don’t mention the linux filesystem (unless I’m missing it). you might try dd’ing directly out to the raw device to eliminate the fs as a variable if you don’t want to test various examples. also, the block size is a bit small; I’d suggest something more like 32k or 64k.
LVM most certainly does both striping and mirroring.
Which filesystem did you test under Linux, ext3, ext4 or xfs?
What storage backend are you using for your VMs?
Did you try testing with a real test suite, like bonnie++?
Did you test from within the VMs?
There are a lot of subtleties to VM I/O and a lot of factors you don’t seem to have considered. I’d venture to say that your results, while perhaps interesting, do not say much about your intended use.
@Oshu
@Mike
This was by no means a real test of throughput. This was just a simple “I wonder what this would do” exercise. I understand that there are tons of variables. This simple test showed me that the I/O throughput on Linux is sufficient for what I needed, so I didn’t need to do anything too exhaustive. Don’t try to read too much into this. The filesystem was ext3, though.
I’ll have to take a deeper look into LVM. I didn’t see any options to stripe/mirror, but I didn’t look incredibly hard. Everything I found referenced putting an LVM volume on top of a software RAID metadevice.
By the way, ZFS does zero-page write detection, while ext3 and LVM don’t, so ZFS is not actually writing 21MB (or 1MB) to disk in your test. A fairer test would take some fixed data (e.g., from /dev/urandom), store it in memory, and then write it to disk using each filesystem.
What hard drives did you use in the test, and which Linux filesystem did you use?
The results are quite astonishing for ZFS, but it seems like the jump is too hight from Linux