The return of SSI


Single System Image (SSI), sometimes referred to as a distributed operating system and not to be confused with Server Side Includes, is a compute cluster technology that was assumed dead many years ago.  It hails from prehistoric roots where mainframes ruled among dinosaurs and monolithic architectures were the norm.  In our snazzy modern society where diminutive microservice mammals like Docker have evolved to dominate the landscape and pets are not allowed, it would seem logical that ancient behemoths would simply fade away into historical obscurity.  The gradual die-off of most SSI solutions would certainly support that theory.  

However, one hope remains: Stateful Big Data


Microservices work well for stateless horizontal scaling.  Need more umph?  Add a node (or 50).  Easy!

Things get trickier when you add state to the equation.  User sessions, personalized dashboards, ad-hoc queries...  Fortunately, load balancers can do clever things like sticky sessions and state can often be offloaded to scalable architectures like object storage and NewSQL databases.

So we're good, right?  Not quite.

Unfortunately, load balancers and offloading state won't help you if you run into a situation where a given operation requires more than a single node's resources can handle.  At that point you generally have two choices: scale vertically or break the operation into smaller tasks that can be processed in parallel.

Scaling vertically is much easier nowadays with big beefy cloud instance types and the ability to snapshot images or live migration.  However, there is a practical limit to how far you can scale vertically (>2TB memory, for example).  There's also additional risk if a node goes down and complexity when you need to upgrade versions or move to another server.

Accordingly, many companies hoping to data mine, or troll their data lake (or whatever hip analogy is in vogue nowadays), turn to distributed MapReduce-based cluster solutions like Hadoop.  This architecture allows you to process and analyze massive amounts of data efficiently.  However, older non-cloud-ready applications cannot take advantage of this new computing paradigm and often require a significant code rewrite to do so.  It also requires more staffing overhead as DevOps need to familiarize themselves with new, complex tools.

So, when it comes to stateful big data, SSI really begins to outshine the competition.  SSI provides both horizontal and vertical cluster scaling but exposes those resources in a way that to the end user it appears to be a single server.  This is in contrast to other cluster architectures that simply manage a cluster of machines as separate entities.  As more nodes are added to the cluster, that single virtual server appears to magically grow more powerful.  All the complexity regarding memory management, load balancing, file management, etc. is handled for you transparently!  Since the interface appears to be a single server, user interaction is as intuitive as working on a local tower under your desk.  It also supports a far broader array of applications, including legacy applications not originally designed for cloud scalability.

So what's the catch?  Well, it's hard to do so most people either give up trying or attempt to sell/license it.  Currently, the most active communities for SSI are:
Only LVS is open source but it's more like a loose collection of tools and methodologies than a single installable product.  The HP and IBM solutions are very complex and include vendor/hardware lock-in.  MOSIX appears to be the most user-friendly but its closed source, lack of high availability, Australian proprietary license, and failed commercial attempt have hampered community interest.

Comments

Popular Posts