Distributed File System benchmark
See my updated post here
I'm investigating various distributed file systems (loosely termed here to include SAN-like solutions) for use in Docker, Drupal, etc. and couldn't find recent benchmark stats for some popular solutions so I figured I'd put one together.
Disclaimer: This is a simple benchmark test with no optimization or advanced configuration so the results should not be interpreted as authoritative. Rather, it's a 'rough ballpark' product comparison to augment additional testing and review.
My Requirements:
- No single-point-of-failure (masterless, multi-master, or automatic near-instantaneous master failover)
- POSIX-compliant (user-land FUSE)
- Open source (non-proprietary)
- Production ready (version 1.0+, self-proclaimed, or widely recognized as production-grade)
- New GA release within the past 12 months
- *.deb package and easy enough to set up via CloudFormation (for benchmark testing purposes)
Products Tested:
- GlusterFS 3.6.2 [2015-01-27]
- ('replicated volume' configuration) - CloudFormation script (Ubuntu version)
- LizardFS 2.5.4 [2014-11-07]
XtreemFS 1.5.1 [2015-03-12]- couldn't get product to work ("Input/output error") - CloudFormation script (Ubuntu version)
- Bazil is not production ready
- BeeGFS server-side components are not open source (EULA, Section 2)
- Behrooz (BFS) is not production ready
- CephFS -- will be included in next round of testing
- Chirp/Parrot does not have a *.deb package
- Gfarm version 2.6.8 compiled from source kept returning x.xx/x.xx/x.xx for gfhost -H for any non-local filesystem node (and in general the documentation and setup was terrible)
- GPFS is proprietary
- Hadoop's HDFS is not POSIX-compliant
- Lustre does not have a *.deb package
- MaggieFS has a single point of failure
- MapR-FS is proprietary
- MooseFS only provides high availability in their commercial professional edition
- ObjectiveFS is proprietary
- OpenAFS kerberos requirement is too complex for CloudFormation
- OrangeFS is not POSIX-compliant
- Ori latest release Jan 2014
- QuantcastFS has a single point of failure
- PlasmaFS latest release Oct 2011
- Pomegranate (PFS) latest release Feb 2013
- S3QL does not support concurrent mounts and read/write from multiple machines
- SeaweedFS is not POSIX-compliant
- SheepFS -- will be included in next round of testing
- SXFS -- will be included in next round of testing
- Tahoe-LAFS is not recommended for POSIX/fuse use cases
- TokuFS is latest release Feb 2014
AWS Test Instances:
- Debian 7 (wheezy) paravirtual x86_64 (AMI)
- m1.medium (1 vCPU, 3.75 GB memory, moderate network performance)
- 410 GB hard drive (local instance storage)
Test Configuration:
Two master servers were used for each test of 2, 10, and 18 clients. Results of the three tests were averaged. Benchmark testing was performed with bonnie++ 1.96 and fio 2.0.8.
Example Run:
$ sudo su -
# apt-get update -y && apt-get install -y bonnie++ fio
# screen
# bonnie++ -d /mnt/glusterfs -u root -n 4:50m:1k:6 -m 'GlusterFS with 2 data nodes' -q | bon_csv2html >> /tmp/bonnie.html
# cd /tmp
# wget -O crystaldiskmark.fio http://www.winkey.jp/downloads/visit.php/fio-crystaldiskmark
# sed -i 's/directory=\/tmp\//directory=\/mnt\/glusterfs/' crystaldiskmark.fio
# sed -i 's/direct=1/direct=0/' crystaldiskmark.fio
# fio crystaldiskmark.fio
Translation: "Login as root, update the server, install bonnie++ and fio, then run the bonnie++ benchmark tool in the GlusterFS-synchronized directory as the root user using a test sample of 4,096 files (4*1024) ranging between 1 KB and 50 MB in size spread out across 6 sub-directories. When finished, send the raw CSV result to the html converter and output the result as /tmp/bonnie.html. Next, run the fio benchmark tool using the CrystalDiskMark script by WinKey referenced here."
Results (click to view larger image):
(Note: raw results can be found here)
_______________________________________________________
Concluding Remarks:
Both GlusterFS and LizardFS had strong showings with pros and cons for each. Both should work fine for production use. While not an endorsement, I will mention that GlusterFS had more consistent results (less spikes and outliers) between each test and I also like the fact that GlusterFS doesn't distinguish between master servers (master-master peers versus LizardFS' master-shadow[slave] configuration).
Update: GlusterFS requires your number of bricks to be a multiple of the replica count. This adds complexity to your scaling solution. For example, if you want two copies of each file kept in the cluster you must add/remove bricks in multiples of two. Similarly, if you want three copies of each file kept in the cluster you must add/remove bricks in multiples of three. And so on. Since they recommend one brick per server as a best practice, this will also likely add cost to your scaling solution. For this reason, I'm now preferring LizardFS over GlusterFS since it does not impose that limitation.
P.S. Check out this related article by Luis Elizondo for further reading on Docker and distributed file systems.
Maybe you should check release dates before claiming there have been no updates for some time?
ReplyDeleteHi Bernd, I tried to find release dates for each product before making the claims above but I'm not perfect (and software updates happen all the time) so please feel free to suggest corrections. Thanks!
ReplyDeleteI'm biased for this, as it is my previous work, but there are anyway quite frequently updates: BeeGFS, i.e. Changes in 2014.01-r14 (release date: 2015-03-23)
ReplyDeleteProblem seems to be as ever the version scheme - the date and the month specify the major update date (so far network protocol incompatible updates), but minor releases (which also can be full of new features) are given by -rXY
Sorry, I mean the "Year and the month"...
ReplyDeleteOh, I see. I'll update the entry for BeeGFS to remind me to check the Changelog date next time. Thanks again!
ReplyDeleteUnfortunately, BeeGFS is not fully open source: http://www.beegfs.com/wiki/FAQ#open_source
ReplyDeleteYes, sorry, that is unfortunately true.
ReplyDeleteLizardFS is a fork of *old* version of MooseFS (1.6). LizardFS 2.5.x and 2.6.x = MooseFS 1.6.x (e.g. in performance).
ReplyDeleteThere is a *huge* difference in performance between MooseFS 2.0 and 1.6. Also a lot of algorithms (e.g. rebalance algorithm) has been improved since 1.6.
MooseFS is not a dead software. It is updated more frequently than LizardFS, so your sentence "MooseFS was forked by LizardFS (tested above)" is unfair for MFS, because it suggests, that MFS is not being developed anymore and LizardFS substituted it, which is not true.
You can find a lot of info about new version, download source code etc. on https://moosefs.com (not .org).
Hi oxide94, thanks for the update! Unfortunately, MooseFS 2.x or 3.x will not work for my needs because I need an open source product *with* high availability. I've updated my MooseFS reasoning above.
ReplyDeleteOk, I see ;)
ReplyDeleteThere has been a lot of GfarmFS releases since the Apr-13 date you mentioned above, in fact as recent as Aug-15:
ReplyDeletehttp://sourceforge.net/projects/gfarm/files/?source=navbar
Hi Jane, does gfarm2fs (the fuse client) have a *.deb package?
ReplyDeletehttps://packages.debian.org/source/wheezy/gfarm looks pretty outdated
ReplyDeleteTry this:
ReplyDeletehttps://answers.launchpad.net/debian/+source/gfarm/+changelog
Hi Jane, I'll do a follow-up test soon using Ubuntu 15.10
ReplyDeleteSorry, Jane, but after wrestling with Gfarm for a couple days I gave up because the documentation is awful, there are missing critical files, and there's too much manual process (copy shared secret to hostb, copy gfarm2.conf to hostb, run the following command on hosta, etc.) to automate via CloudFormation. If you have a working CloudFormation template with master-slave metatdata synchronization I'll include it in future tests.
ReplyDeleteHello, could you please update the tests?
ReplyDeletehttps://lizardfs.com/release-of-lizardfs-3-9-2/
https://lizardfs.com/download/
Hi Mateus, thanks for the heads up! LizardFS jumped from 2.6.0 to 3.9.2? Weird. Either way, I'll carve out some time soon to update the tests since technically LizardFS has moved from 2.x to 3.x
ReplyDeleteHi Mateus, looks like 2.6.0 is the latest version in the Ubuntu repositories: http://packages.ubuntu.com/search?keywords=lizardfs&searchon=names&suite=wily§ion=all
ReplyDeleteLet me know when that changes and I'll re-run the tests.
Would be nice to see a comparison to GPFS or Lustre. Despite the reasons you mention for not including them, they are probably the two most widely-used enterprise clustered file systems. Performance numbers from one or both of these would provide a good baseline for comparison, since most storage professionals are pretty familiar with the two products. Also, any notes you have on documentation (like you have in the comments for Gfarm) and support experiences you've had are valuable. While GPFS is proprietary, it's very stable, has excellent documentation, scales easily, and offers solid support avenues (phone (paid) and the GPFS forum).
ReplyDeleteFull disclosure, I've run GPFS for years both as a customer and employee of IBM. I now work for another company and am looking at testing different products for service offerings. The kind of testing you're doing here is really helpful.
Hi Doophy,
ReplyDeleteI too would like to expand the result set to more products and was disappointed that my fairly straightforward requirements didn't produce more matching options.
Regarding your suggestions (and thank you for your full disclosure), I generally distrust commercial offerings in this sector since I believe they're overpriced, overcomplicated, and obtaining GPFS would be difficult since they don't appear to offer it as a trial. Furthermore, there appears to be issues getting it to work on Debian/Ubuntu (which the Docker community favors). For example, see: https://help.ubuntu.com/community/SettingUpGPFSHowTo
Lustre, on the other hand, would be nice to include. I'm rather surprised they don't support Debian considering their product popularity and developer community size, but I guess they don't feel it's a priority. I'll do some research on how feasible it would be to convert and install via CloudFormation.
As for tech notes, I agree I could do better at highlighting the hidden dragons and tricky setup gotchas that are buried in the CloudFormation scripts. Since this posting seems to be gaining in popularity, I may flesh out the details a little more.
Hi Doophy,
ReplyDeleteDoes Lustre require a patched Linux kernel? See: https://lists.01.org/pipermail/hpdd-discuss/2015-June/002286.html
If so, I'd say that would be a show-stopper for me.
Storage is big money, and only getting bigger! Companies have no problem leveraging open source code, but they hate giving their stuff away when money could be made!
ReplyDeleteFYI, GPFS actually comes w/ deb packages now. The doc you linked to mentions v3.1 which hit EOL years ago. Deb packages were included somewhere in the 3.5 timeline, which began in 2012. As for getting copies of GPFS, it's sad that IBM doesn't offer it for personal use. Sad, but not surprising.
You wouldn't be alone in not wanting to patch the kernel. From what I've seen at various customer engagements, most enterprise storage installations exist in pretty closed-off environments. Hence, admins often don't see any need to keep things patched. This is particularly true when supporting scientific communities. Consequently, lots of code intended for enterprise use is only supported on stock kernels (RHEL#(.#) or SLES##(SP#)).
My experience comes mainly from GPFS, where kernel modules are installed. The process is very simple and generally supports updated kernels. I'm not sure about Lustre, but it wouldn't surprise me if it has similar requirements. I'm also not terribly surprised by its lack of Ubuntu support. While its popularity is growing in the enterprise sector, it's still a far cry from the Red Hat's market share. I tend to think many companies support SUSE only because it's RPM-based and adapting a product from supporting RH to SUSE generally doesn't require much work.
Personally, I'm not averse to kernel patching - even in stateless installations. I wish more effort was put into DKMS as that has the possibility of alieviating much of the pain of patching kernels!
Considering that GPFS 3.5 is still maintained, and 4.1 is the current recommended release, it's especially sad that IBM doesn't offer 3.5 for personal/small business/non-profit use.
ReplyDeleteInteresting comparison! Would love to see sxfs in the list.
ReplyDeleteIt's included in Skylable SX 2.0: http://www.skylable.com/get-started
Is it POSIX-compatible?
ReplyDeleteUnfortunately SXFS won't work for this test due to incompatibility with CloudFormation (see updated note in my post) and it leans too heavily on interactive mode (prompting user for data input) for my tastes. In addition, it doesn't separate the metadata server role from the mounted client role so it will consume resources from client instances and the client setup is just as complex as the volume servers (since they're the same thing).
ReplyDeleteHi All,
ReplyDeleteI finally found a way to securely share runtime secrets between cluster nodes in a CloudFormation template so I'm in the process of creating an updated test with Ubuntu 14.04 LTS and the following products:
GlusterFS
LizardFS
XtreemFS
CephFS
SheepFS
SXFS
...stay tuned
Really looking forward to the above - staying tuned....
ReplyDeleteUpdated test results have been posted: http://mrbluecoat.blogspot.com/2016/02/updated-distributed-file-system.html
ReplyDelete