Convert MyISAM to InnoDB (Drupal)


Convert Drupal MySQL engine from MyISAM to InnoDB using Drush:
drush updatedb -y && drush sql-query "SELECT ENGINE AS 'Engine Before:' FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='node';" && ($(drush sql-connect | sed -e 's/^mysql/mysqldump --no-data/' | sed -e 's/--database=/--databases /') | sed 's/ENGINE=MyISAM/ENGINE=InnoDB/' | sed 's/varchar(256)/varchar(255)/' > ~/schema.sql) && drush sql-dump -q --data-only --result-file=~/data.sql && drush sql-dump -y --result-file=~/backup.sql && drush sql-drop -y && ($(drush sql-connect) < ~/schema.sql) && ($(drush sql-connect) < ~/data.sql) && rm ~/schema.sql && rm ~/data.sql && drush sql-query "SELECT ENGINE AS 'Engine After:' FROM INFORMATION_SCHEMA.TABLES WHERE TABLE_NAME='node';" && drush cc all && echo "Remember to delete ~/backup.sql after verifying the site still works."

P.S. This is safer and faster and causes much less strain on your database server than a lot of ALTER statements on the live database (especially for cluster solutions, like Percona)

Distributed File System benchmark


See my updated post here

I'm investigating various distributed file systems (loosely termed here to include SAN-like solutions) for use in Docker, Drupal, etc. and couldn't find recent benchmark stats for some popular solutions so I figured I'd put one together.

Disclaimer: This is a simple benchmark test with no optimization or advanced configuration so the results should not be interpreted as authoritative.  Rather, it's a 'rough ballpark' product comparison to augment additional testing and review.

My Requirements:
  • No single-point-of-failure (masterless, multi-master, or automatic near-instantaneous master failover)
  • POSIX-compliant (user-land FUSE)
  • Open source (non-proprietary)
  • Production ready (version 1.0+, self-proclaimed, or widely recognized as production-grade)
  • New GA release within the past 12 months
  • *.deb package and easy enough to set up via CloudFormation (for benchmark testing purposes)

Products Tested:

AWS Test Instances:
  • Debian 7 (wheezy) paravirtual x86_64 (AMI)
  • m1.medium (1 vCPU, 3.75 GB memory, moderate network performance)
  • 410 GB hard drive (local instance storage)

Test Configuration:

Two master servers were used for each test of 2, 10, and 18 clients.  Results of the three tests were averaged.  Benchmark testing was performed with bonnie++ 1.96 and fio 2.0.8.
Example Run:
$ sudo su -
# apt-get update -y && apt-get install -y bonnie++ fio 
# screen
# bonnie++ -d /mnt/glusterfs -u root -n 4:50m:1k:6 -m 'GlusterFS with 2 data nodes' -q | bon_csv2html >> /tmp/bonnie.html
# cd /tmp
# wget -O crystaldiskmark.fio
# sed -i 's/directory=\/tmp\//directory=\/mnt\/glusterfs/' crystaldiskmark.fio
# sed -i 's/direct=1/direct=0/' crystaldiskmark.fio 
# fio crystaldiskmark.fio
Translation: "Login as root, update the server, install bonnie++ and fio, then run the bonnie++ benchmark tool in the GlusterFS-synchronized directory as the root user using a test sample of 4,096 files (4*1024) ranging between 1 KB and 50 MB in size spread out across 6 sub-directories.  When finished, send the raw CSV result to the html converter and output the result as /tmp/bonnie.html.  Next, run the fio benchmark tool using the CrystalDiskMark script by WinKey referenced here."

Results (click to view larger image):

(Note: raw results can be found here)


Concluding Remarks:

Both GlusterFS and LizardFS had strong showings with pros and cons for each.  Both should work fine for production use.  While not an endorsement, I will mention that GlusterFS had more consistent results (less spikes and outliers) between each test and I also like the fact that GlusterFS doesn't distinguish between master servers (master-master peers versus LizardFS' master-shadow[slave] configuration).

Update: GlusterFS requires your number of bricks to be a multiple of the replica count.  This adds complexity to your scaling solution.  For example, if you want two copies of each file kept in the cluster you must add/remove bricks in multiples of two.  Similarly, if you want three copies of each file kept in the cluster you must add/remove bricks in multiples of three.  And so on.  Since they recommend one brick per server as a best practice, this will also likely add cost to your scaling solution.  For this reason, I'm now preferring LizardFS over GlusterFS since it does not impose that limitation.

P.S. Check out this related article by Luis Elizondo for further reading on Docker and distributed file systems.