(updated) Distributed File System benchmark
Note: this is an update to my previous test
I'm investigating various distributed file systems (loosely termed here to include SAN-like solutions) for use in Docker, Drupal, etc. and couldn't find recent benchmark stats for some popular solutions so I figured I'd put one together.
Disclaimer: This is a simple benchmark test with no optimization or advanced configuration so the results should not be interpreted as authoritative. Rather, it's a 'rough ballpark' product comparison to augment additional testing and review.
My Requirements:
- No single-point-of-failure (masterless, multi-master, or automatic near-instantaneous master failover)
- POSIX-compliant (user-land FUSE)
- Open source (non-proprietary)
- Production ready (version 1.0+, self-proclaimed, or widely recognized as production-grade)
- New GA release within the past 12 months
- Ubuntu-compatible and easy enough to set up via CloudFormation (for benchmark testing purposes)
Products Tested:
- GlusterFS 3.7.6 [2015-11-09]
- ('replicated volume' configuration) - CloudFormation script
- LizardFS 3.9.4 [2015-12-09]
XtreemFS 1.5.1 [2015-03-12]- couldn't get write-replication (WqRq or WaR1) to work ("Input/output error") - CloudFormation script
- CephFS 9.2.0 [2015-11-06]
SheepFS 0.9.3 [2015-11-05]- can't write any files to the client mounted folder ("cannot touch 'text.txt': Function not implemented") - CloudFormation script
- SXFS 2.0 [2015-12-15]
- Bazil is not production ready
- BeeGFS server-side components are not open source (EULA, Section 2)
- Behrooz (BFS) is not production ready
- Chirp/Parrot does not have a *.deb package
- Gfarm version 2.6.8 compiled from source kept returning x.xx/x.xx/x.xx for gfhost -H for any non-local filesystem node (and in general the documentation and setup process was terrible)
- GPFS is proprietary
- Hadoop's HDFS is not POSIX-compliant
- Lustre does not have a *.deb package and requires a patched kernel
- MaggieFS has a single point of failure
- MapR-FS is proprietary
- MooseFS only provides high availability in their commercial professional edition
- ObjectiveFS is proprietary
- OpenAFS kerberos requirement is too complex for CloudFormation
- OrangeFS is not POSIX-compliant
- Ori latest release Jan 2014
- QuantcastFS has a single point of failure
- PlasmaFS latest release Oct 2011
- Pomegranate (PFS) latest release Feb 2013
- S3QL does not support concurrent mounts and read/write from multiple machines
- SeaweedFS is not POSIX-compliant
- Tahoe-LAFS is not recommended for POSIX/fuse use cases
- TokuFS latest release Feb 2014
AWS Test Instances:
- Ubuntu 14.04 LTS paravirtual x86_64 (AMI)
- m1.medium (1 vCPU, 3.75 GB memory, moderate network performance)
- 410 GB hard drive (local instance storage)
Test Configuration:
Three master servers were used for each test of 2, 4, and 6 clients. Each client runs a small amount of background disk usage (file create and update):
(crontab -l ; echo "* * * * * ( echo \$(date) >> /mnt/glusterfs/\$(hostname).txt && echo \$(date) > /mnt/glusterfs/\$(hostname)_\$(cat /dev/urandom | tr -dc 'a-zA-Z0-9' | fold -w 25 | head -n 1).txt )") | sort - | uniq - | crontab -
Results of the three tests were averaged. Benchmark testing was performed with bonnie++ 1.97 and fio 2.1.3.
Example Run:
$ sudo su -
# apt-get update -y && apt-get install -y bonnie++ fio
# screen
# bonnie++ -d /mnt/glusterfs -u root -n 1:50m:1k:6 -m 'GlusterFS with 2 data nodes' -q | bon_csv2html >> /tmp/bonnie.html
# cd /tmp
# wget -O crystaldiskmark.fio http://www.winkey.jp/downloads/visit.php/fio-crystaldiskmark
# sed -i 's/directory=\/tmp\//directory=\/mnt\/glusterfs/' crystaldiskmark.fio
# sed -i 's/direct=1/direct=0/' crystaldiskmark.fio
# fio crystaldiskmark.fio
Translation: "Login as root, update the server, install bonnie++ and fio, then run the bonnie++ benchmark tool in the GlusterFS-synchronized directory as the root user using a test sample of 1,024 files ranging between 1 KB and 50 MB in size spread out across 6 sub-directories. When finished, send the raw CSV result to the html converter and output the result as /tmp/bonnie.html. Next, run the fio benchmark tool using the CrystalDiskMark script by WinKey referenced here."
Important Notes:
1. Only GlusterFS and LizardFS could complete the intense multi-day bonnie++ test. The others failed with these errors:
- CephFS (both kernel and fuse)
- Can't write block.: Software caused connection abort
- Can't write block 585215.
- Can't sync file.
- SXFS
- Can't write data.
Seq Create (sec) | Rand Create (sec) | |
---|---|---|
GlusterFS 3.7.6 | 173 | 164 |
LizardFS 3.9.4 | 3 | 3 |
3. GlusterFS took at least twice as long as LizardFS to complete the bonnie++ tests (literally 48 hours!). Switching to xfs out of curiosity helped performance significantly (less than 24 hours), however all tests were done with ext4 (Ubuntu default).
4. CephFS did not complete the "Rand-Read-4K-QD32" fio test
Results (click to view larger image):
(Note: raw results can be found here)
_______________________________________________________
Concluding Remarks:
- Since GlusterFS and LizardFS were the only ones that could complete the more intense bonnie++ test, I would feel more confident recommending them as "production ready" for heavy, long-term loads.
- Also (as mentioned above), LizardFS was much faster than GlusterFS (at the cost of higher CPU usage).
- In terms of setup and configuration, GlusterFS was easiest, followed by LizardFS, then SXFS, and finally (in a distant last place) CephFS.
- SXFS shows promise but they'll need to simplify their setup process (especially for non-interactive configuration) and resolve the bonnie++ failure.
- My overall recommendation is currently
LizardFSGlusterFS. (Update: I have stopped recommending LizardFS because metadata HA is not currently supported out of the box -- see comments below).
Thank you for sharing your tests.
ReplyDeleteYou're welcome, Mateus!
ReplyDeleteGreatly appreciate these
ReplyDeletethanks for these, pretty informative
ReplyDeleteThanks very much for not only providing some nice, neutral results but sharing the exact methods needed to reproduce this for others.
ReplyDeleteHello Mr. Blue Coat! Thank you for sharing these results! :)
ReplyDeleteYou say that you use a "410 GB hard drive (local instance storage)" but the CloudFormation script for CephFS doesnt seem to have that configured. I'm sorry if this sounds noobish, but how do you set up such a thing in a cloudformation script?
Thanks again! :)
Hi Chris, the 410 GB hard drive is the default with my selected AWS instance: https://gist.github.com/anonymous/f9e36edf2c341db4d8c3#file-cephfs-json-L54-L64 You don't have to configure additional drive space, it just comes with it by default.
ReplyDeleteIs it possible to share performance results between LizardFS and Moosefs.
ReplyDeleteThanks.
MooseFS is a commercial product and I don't have a license. Sorry.
ReplyDeleteMoosefs have 2 version of software - One is a complete open source (GPLv2 license) which does not need any license and the commercial one which needs a license. The only difference between both is, commercial comes with built in support for HA for master server and in open source version you need to configure it using corosync or ucarp etc.
ReplyDeleteYou can try comparing Lizardfs and open source version of Moosefs and lizardfs is fork from open source version of Moosefs.
Link to open source Moosefs
https://moosefs.com/download/sources.html
Hi Kiran, this guide is intended for busy and new sysadmins that simply want a single free HA product that meets their needs. My goal is not academic research-grade completeness by combining and configuring a suite of technical tools. Feel free to use my scripts above, though, and perform that additional study.
ReplyDeleteThank you for this work, this is very useful and informative! I've been considering LizardFS myself however after chatting with one of their reps today I found that both the Windows Native Client and Metadata HA mechanism are closed sourced and provided only as part of a Commercial agreement. Both are important for my use case as we have a mixed Windows/Linux environment and Metadata HA is kind of a basic requirement :)
ReplyDeleteI do wonder, how did you setup the HA for Metadata servers ? from brief review of the CloudFormation script I see that you have 3 Masters, Master1 (Personality=master) and Master2+3 (Personality=shadow).
With such a configuration, the fail-over is not automatic, in case Master1 dies one of the Shadow Masters needs to be promoted manually (as was explained to me today).
I guess something like corosync could be used to handle this, but I do wonder how you've managed this.
Thanks!
Danny
Good point Danny! It seems from https://github.com/lizardfs/lizardfs/issues/266#issuecomment-93455849 and https://github.com/lizardfs/lizardfs/issues/299 and https://github.com/lizardfs/lizardfs/issues/326 that Metadata HA is not reliable or requires additional tooling. I'll update my recommendation to prefer GlusterFS. Thanks!
ReplyDeleteAnother consideration is RozoFS. According to Wikipedia, it seems to fit most of your requirement:
ReplyDelete- No single-point-of-failure: Normally, the exportd is the SPOF (http://rozofs.github.io/rozofs/develop/AboutRozoFS.html#rozofs-data-protection), but the replicated metaservers provide fail over (http://rozofs.github.io/rozofs/develop/AboutRozoFS.html#rozofs-fundamentals).
- POSIX-compliant: Yes
- Open source: GNU General Public License v2
- Production ready: 2.5.1
- New GA release within the past 12 months: 26 February 2018
- Ubuntu-compatible and easy enough to set up via CloudFormation: I have no idea. :(
RozoFS is no longer open source: https://github.com/rozofs
Delete