NEWS & NOTES
FROM THE THOUGHT LEADERS OF DATA STORAGE
By Hollis Beall, Technical Director, X-IO Cloud Business Development
With the Citrix announcement of XenDesktop and XenApp versions 7.5, Citrix is taking a giant step forward in the vision of integrated public/private cloud management, and is continuing to drive down the cost of VDI (but they aren’t doing it alone). As a Citrix Ready Premier Alliance Partner, we do extensive testing with Citrix around storage performance to help reduce the cost of virtual desktop and application virtualization deployments. Capacity isn’t the only requirement for storage in a VDI solution, as performance will have a direct impact to the end user experience. I can’t tell you how many customers I talk to who have a “pilot” in place that goes great for the initial testing (<50 users), but when the solution is put into production, the storage hardware can’t keep up and it falls over with just a hundred (or so) users. High performance and high capacity, at the same time, are not something that traditional storage technology has typically done well with, resulting in:
- A prohibitive amount of CapEx (and time) to implement
- Large warranty increases in years 4 and 5 (Up to 20% of original cost in each year!)
- Excessive power/heat requirements
- Increasing failures as the system grows
- Decreasing performance as the capacity utilization increases
These storage inefficiencies have a large impact to the “cost per user” that the system will be measured against. If you are doing this as a sellable solution (as a service), this is a major obstacle to profitability.
The ISE is the only Enterprise storage system that advertises full performance at full capacity, where other systems can start seeing performance problems when just 50% full. This has a direct impact to the number of VDI sessions the system can support, and increases the amount of equipment required to get to a target capacity level (i.e., more power, cooling, rack space, maintenance, etc.). This inefficiency is why storage hardware can be 40% to 50% of the solution cost. (Spoiler Alert: ISE can make this 20%.)
With the Seagate™ method of remanufacturing drives in place, and the ability to spare around bad components inside the drive, the ISE has the industry’s only 5-year hardware warranty included with every system. We believe the system is so reliable that we include 5 years of hardware maintenance for free. The ISE can completely change the cost structure for service providers, who can now fully amortize the cost over 5 years (instead of just 3). This can reduce the Cost/User/Month, due to storage, by up to 40%.
With this new functionality in XenApp and XenDesktop, customers will be able to “flex” their computing across mobile, thin client, Desktop VDI, and IaaS Public/Private Cloud environments, providing a new freedom and a more dynamic environment. XenDesktop, XenApp, and IaaS CloudStack solutions are built to manage these highly dynamic environments and, together with the ISE technology, can have a significant impact on the cost (and profitability) to the business.
When looking into storage reliability, it’s all too easy to get caught up in the “hard disk drives are unreliable” melodrama being created by most of the All-Flash Array vendors. To argue that one media is more reliable than another is analogous to arguing that cars are more reliable than trucks – they’re two different tools for two different jobs.
The good news is that when it comes to traditional hard drives, there’s actually more statistical evidence out there to help organisations architect solutions based upon balancing cost and risk. One more piece of evidence that’s recently emerged is a study by Backblaze, a cloud backup provider who’s been looking at their 25,000 odd disk drive implementation and plotting out lifecycles. In the past we’ve seen studies such as those from Google and Carnegie Mellon University, however Backblaze looked at both consumer and enterprise grade drives.
What they initially saw was of no great surprise to us here at X-IO. They showed in their first study that their SATA consumer grade drives had an Annual Failure Rate (AFR) of 5.1% in the first 18 months and that drive failures followed a “bathtub” curve with a dramatic increase in years four and five of the drives’ lifecycle. Why do you think the vast majority of storage vendors are happy to give a three year warranty but get a little jumpy when you ask for an inclusive five year one? Well here’s the answer in this graph right here, the drives are much more likely to fail once they get into later life.
Now your first thought may well be, “Ahhh, but these are consumer grade drives – things will be much better with enterprise grade”! Well this is where things get interesting as Backblaze then went on to look at their enterprise grade drive implementation and they plotted drive years of service against failure rates to show an AFR for enterprise drives of 4.6% against consumer drives one of 4.2%. Now it’s not a direct comparison as the enterprise drives are small in number and have only been installed for two years. There’s also an interesting point raised by Seagate on their blog that Backblaze created the “perfect storm” with their use case and physical mounting. This proves a point we’ve been making here at X-IO of “It ain’t what you do, it’s the way that you do it.”
Anyone can build a storage array. Pop down to your local PC supplies company, grab some drives, grab a server, get an OEM drive shelf enclosure, pop ‘em in, load up some open source software and hey presto – you’ve got an “Enterprise Grade Storage Array.” Well that’s what some manufacturers would have you believe anyway. The truth is that hard disk drives are sensitive little creatures. Take a look at an excellent video by Sun Microsystems (remember them?) back from a few years ago. The video was produced to show off their funky new software that could analyse drive latency but it proved the point that drives are sensitive to vibration – in this case an Australian engineer shouting at them. Vibration and noise aren’t the only drive killers – heat and density are a big factor too. Add in the error correcting capabilities of consumer grade drives and you start to see some of the AFRs that Backblaze saw.
So how come X-IO can then afford to offer an inclusive five year warranty on its array? That we have customers who’ve been using our Intelligent Storage Element (ISE) units for such time periods and have never swapped a single disk drive? Well that’s simply because we have an AFR of 0.1%.
The key here is good old fashioned hardware engineering, some simple applied logic and most importantly some very clever patented software invented back when we were part of Seagate. Firstly acknowledge that drives are sensitive to such elements and deal with them. Stop them vibrating, keep them cool and treat them with a little bit of respect by mounting them evenly and horizontally. If I held you vertically, jiggled you about and kept you at high temperatures for a few years, you’d probably feel a little poorly too.
Secondly is to apply some logic. What happens when a drive goes back to base to be replaced? Well most of the time it just needs a software recondition – low level format, realign the heads, re-layout the tracks, etc. Well why does that need to fly around the world to happen? Why not just run that inside the array? These things are supposed to be intelligent right? Why don’t all storage arrays do that? Well that’s where the third point listed above comes in. We are fortunate enough at X-IO to have some clever people who’ve written some very clever software to do exactly that. Instead of running the risk of a flat footed clumsy engineer swapping the wrong drive, knocking a cable out of hitting the EPO instead of the exit button in your data centre, we just let the array deal with its issue and not affect any workloads. Should there still be a physical defect on the drive, work around it and only swap a drive pack should we be running low on spare space (that’s where the 0.1% AFR comes in – it’s extremely unlikely with X-IO but still can be a background task in any event).
So the next time someone says to you, “Oh hard disk drives, they’re mechanical devices and therefore fail – here’s the proof,” remind them that not all storage is created equal. Ask for their AFRs and then ask for PROOF that their arrays can last for the lifecycle claimed. If you can find another storage vendor with implementations over five years old that have never needed a service then we’d be stunned. Hey, we may even give you a prize (as long as you can provide proof as solid as ours!).
Gavin McLaughlin, Solutions Development Director, X-IO International
UK businesses are being bamboozled by overhyped marketing myths about the performance, reliability and power consumption of all-flash storage arrays, despite practical evidence and common-sense arguments to the contrary.
74% have a rose-tinted view…
According to a recent survey*, close to three-quarters (74%) of IT managers have a rose-tinted view of all-flash storage, despite misgivings about the cost, risk and general lack of need for all-flash arrays.
As many as 76% accepted the myth that all-flash arrays are faster than hybrid arrays. But while flash can undoubtedly assist with lowering latency for random reads, it can often be the same or even slower than well designed hard disk arrays under some workloads, particularly sequential writes.
A similar number accepted that all-flash solutions used less power and cooling compared to hybrid storage. But real-life testing by numerous users has found hybrid arrays use half the power of all-flash arrays in like-for-like evaluations. Flash modules/SSDs draw less power as raw components than HDD, but enterprise storage arrays are not just raw storage. They use processors and cache memory is also needed in some designs, both of which require power. On average, all-flash arrays draw double the amount of power and require more cooling than true hybrid arrays.
Two in five (40%) respondents believed all-flash arrays provided a higher degree of reliability than hybrid arrays, based on marketing messages from all-flash vendors that HDDs are unreliable. But this is not the case. Some enterprise storage vendors have been guilty of using the wrong tool for the job by relying on consumer grade SATA drives to cut supply chain costs, causing hard disk drives to be seen as problematic. When issues with flash media are taken into account, particularly with cell failure on NAND silicon, flash arrays usually have shorter duty cycles than hybrid storage.
It’s clear that IT managers are unaware of the truth behind these all-flash myths. Many vendors are trying to convince customers and resellers that hard disks are outdated technology and flash is the most appropriate media for all use cases. The truth is that hybrid storage is the practical option. It offers the best of both worlds by combining the advantages of flash with the established benefits of hard drives.
The survey was undertaken by independent research firm Vanson Bourne, which interviewed senior IT decision-makers at 100 large enterprises across the UK. The purpose of the survey, commissioned by hybrid storage vendor X-IO Technologies, was to compare the adoption plans of all-flash arrays in the enterprise against the hype being promoted by all-flash array vendors. It also shows where the market is educated about the promise of all-flash arrays and where it is misinformed.
IT budgets are currently under tight restrictions, IT managers need to implement a storage solution which provides the right amount of performance required whilst remaining cost effective. In time, storage architects and buyers will realise that flash is a tool rather than a solution. But in the meantime, it won’t stop some users getting their fingers burnt in the world of storage sales, a place where people push hard to close deals that often fail to offer the best solution for a customer’s needs. It is up to the storage industry to help customers and resellers understand the strengths and weaknesses of each type of storage, helping people cut through the hype.
*The survey data was collected via an online survey completed by a nationally representative sample of 100 IT managers from key industry sectors such as financial services, manufacturing, retail, distribution and transport and the commercial sector from across the United Kingdom. The survey was prepared on behalf of Vanson Bourne and conducted in February 2013. The identity of the 100 respondents will remain confidential, in accordance with the Vanson Bourne Code of Conduct. The X-IO name was not revealed during the interview to ensure the data remained unbiased.
By Gavin McLaughlin, Solutions Development Director, X-IO International
Open any IT magazine, read a technology news website, or subscribe to any data storage mail lists and you’re probably fed up to the back teeth with headlines that make such bold statements as “How to Use SSDs in Your Business” or “Why Hard Disk Drives are Dead.” Whilst most intelligent readers will have the sense to take over-dramatised headlines with a pinch of salt, there are sadly many falling for the hype.
All marketing noise aside, there’s a concerning trend emerging here and that’s the over-use of vendors selling storage components (be it in the form of flash, SSDs, hard drives, or cardboard boxes) rather than selling business-effecting solutions. “Customers don’t buy storage, they buy solutions” is an old cliché amongst sales teams; however, it suddenly seems to have been forgotten within the storage industry.
If I look back at some recent customer projects that I’ve been involved in, the opening line from IT departments has been things, such as . . .
- Our SAP system is running slow and we think the storage layer might be an issue
- Our system performance drops when we’re around 85% full
- We spend way too much operational time escorting service engineers in the building
With all of these, the customer was leading with a business challenge that may require differing toolsets to solve the underlying technical issue. Would running into their building shouting, “Hey, swap out all your HDDs for SSDs!” solve their problems? Well, whilst there’s always a chance someone could get lucky, they may well find that this is just a short-term solution, that they’ve shifted the bottleneck elsewhere in the stack, or even that they’ve just dramatically increased the risk of system failure.
What we’re seeing all of sudden is a dramatic increase in the number of single-technology vendors, particularly all-flash-array sellers, trying to argue that their technology is the only tool for the job and therefore, flash is the saviour of all storage woes. To coin a commonly used analogy: this is a great example of a salesman who only sells square pegs, trying to convince you that his product is a perfect fit for a round hole, and he’ll give a free hammer to you if you buy them. Of course the answer is to buy the right shaped peg for the appropriate hole, but of course he’ll have a go at convincing you that the hammer option is much simpler and will save you time.
Flash is a great innovation for the storage industry, there’s no doubt about it; however, it’s just a tool, not a solution. The correct way to look at all these new storage approaches is to first look at the challenges your organisation faces, and then when it comes to building out a solution, see what toolsets can help you solve those challenges in the most efficient manner. Somehow, I doubt that the most efficient approach will involve a hammer.
— Storage Horizons Blog —
What kind of data access makes up an application, and why is this important when thinking about tiering (for performance) vs. basic HSM?
Application Data Attributes
One of the key points is about “reaction time” for tiering. Another way to look at this is “Dynamic Data Placement.” Can the data for an application’s use be in the right tier, at the right time, so as to make I/O consistent for all portions of the application?
All applications have data that have access patterns with the following traits:
- Tight Random—frequent R/W across a small range of the data set
- Random—seldom, non-localized access
In the history of arrays, the above aspects of application data sets have been covered by either up front, hand tuning of LUNs per data type, fixing the location of data into RAM, short stroking many HDDs, and SSDs (old and new). With hybrid tiering arrays, it’s all going to be about dynamic data placement, if it can be done without causing inconsistent performance.
Application data sets are rarely the size of the array’s storage capacity; meaning that for a customer, the desire is naturally to run multiple applications on the array to drive efficiency, however, “the devil is in the details.” Most arrays can’t handle multi-tenancy for applications as it causes the arrays to have inconsistent performance.
If storage for an application was like an ice cream parfait and cost was right, then everything would be simple . . . but as noted above, it is not.
In the world of Virtualization, VDI, and just plain, old small to medium businesses (trying to make the most of their IT storage purchases), the need for true, multi-tenancy storage that can adapt dynamically and provide automatic QoS, for all volumes across multiple applications, is VERY HIGH.
The problem to be solved is that of being able to ingest the entire data stream from multiple applications, simultaneously, and to properly analyze the attributes of the data and characterize them, based upon the usage patterns above. It’s the Ultimate Big Data problem, and X-IO has solved this with its unique Continuous Adaptive Data Placement (CADP) algorithms, layered upon all the unique ISE technology. After solving all the fundamental problems to drive reliability, availability, and capacity utilization, CADP is made simpler and more effective, as a result, for up-tiering, consistent high performance, and multi-tenancy for applications.
As compared to big data, or even remote sensing of weather for accurate weather forecasting, the aspects of application I/O density, the size of the area with a given density, and its location, among many other attributes, play the key roles in determining where to place data for optimum and consistent I/O, for an application or multiple applications, running against a storage device.
So, X-IO effectively does solve the dynamic data placement problem for multiple applications, running simultaneously, against a Hyper ISE hybrid. Basically, data placement for right time/right place is done across the entire capacity of the hybrid array known as Hyper ISE, and multiple applications can be provided with consistent I/O and throughput. This means more VMs, more VDI users, more databases, etc. It’s all about being able to more with less.
X-IO Hyper ISE has proven this with benchmarks such as Temenos and Redknee, running on Microsoft SQL 2012, beating out larger arrays with traditional tiering; and furthermore, with the best of Microsoft TechEd wins, two years in a row, X-IO is seen as the best storage on the planet.
— Storage Horizons Blog —
Tiering and Hybrid Arrays: What’s the truth about these?
Basic differences in performance between DRAM, SSD, HDD, and Nearline HDD (in X-IO terms) mean vastly different opportunities with respect to data tiering for Hierarchical Storage Management/Information Life Cycle Management (HSM or ILM) of data and backups, and also for performance.
Performance tiering is much different than life cycle management, because data is actively tiered up for application; and life cycle management is about getting data down to the lowest storage cost, over time for data at rest—typically backups. However, many vendors are trying the mix during normal application operation, which leads to really non-linear performance for applications, headaches, overprovisioning, and a large amount of human intervention.
In consideration of data tiering that has gone on for years, let’s talk about the norm, and also the new tiers that now exist for storage, today.
Since about 1990, an array basically had two or three tiers. The first tier for data was the DRAM within the array. The second tier was the enterprise drives that were placed in enclosures (JBODs). In the early 2000s, a new tier, using SATA or Nearline drives (there are differences), was added. Arrays already had tiering between controller-based cache DRAM, as well as, enterprise drives. But now with the addition of SATA or nearline drives, some storage companies decided to place HSM software within the arrays; and with some rudimentary migration, not only “down-tier” but “up-tier,” based on average activity, on chunks of data, within the volumes on the array.
HSM has been around a long, long time. Hierarchical Storage Management or Life Cycle Management of data was pretty much developed for “down-tiering,” over time of containers, of data for applications. It was initially meant for moving old copies of data sets, from applications, to slower, more archival storage. When first developed, HSM was basically from disk to tape. A few years ago, arrays added “tiering” or “fluid data,” and basically aberrated this initial meaning. Why do I say aberrated? It’s because these arrays were actually mixing a portion of the data set volumes between the enterprise and nearline disk data—“live.” This has multiple effects. Instead of just storing copies (clones, snapshots, etc.) made from the enterprise data set volume(s), for an application on the nearline storage, the volumes now mixed portions of the data, on the different tiers, while the application runs.
This has many effects, some with some merit, but most with some dubious and potentially detrimental effects. On the outside, it is a surefire marketing play to say you can save money, by placing live data on two tiers of storage, while the application is running. It sounds great. However, the issues involved in doing this are multi-fold:
- The amount of DRAM it takes ($, compute time) to represent the different tiers and manage the bookkeeping, of what and how much is where on the tiers, is complex. This leads to potential for more bugs and potential availability issues. That aside, the additional DRAM (to keep the bookkeeping data and the compute time to act upon tiering at the block level) SLOWS down the overall operation. This means the need for more expensive controllers, more storage, etc.
- The reliability differences, between the drive types, mean different RAID levels will most likely be used on the different tiers. RAID 1 or 5 would be used on the enterprise drives, while RAID 6 would be used on the nearline drives. This means not only different performance characteristics, but a good deal of over provisioning with respect to drives, of different types, in different JBOD enclosures, and more enclosures to cover single points of failure. This is very inefficient.
- The ability to “up-tier” based on activity for a given volume, now has some non-linear performance effects if attempted dynamically, because of the hysteresis involved in moving the data and accesses during the transition. This is why, in many cases with tiered storage, manual intervention is required; and in many cases to preserver performance, volumes are locked into a tier, defeating the purpose of tiering. Reaction time for up-tiering is essential for a system that wants to actively use multiple tiers.
- The complexity of the entire multi-tier environment is now greatly increased, which means software testing becomes very difficult to cover all of the cases involved. This drives down performance, drives up the potential for software bugs that hurt availability or data integrity, AND means more expensive processors that eat up $ and power.
— Storage Horizons Blog —
Hybrid and SSD Arrays
- What are they good for and what are they not?
- What are enterprise hybrids?
- What are just SANs in a can for SMB doing HSM/ILM?!
- What about All SSD arrays?
- Are they worth it?
All HDD Arrays, Hybrids, and all SSD Arrays: An Introduction
All HDD arrays are still the mainstay of the world today, but that may be changing, in the future, as SSDs come down some in price and the world’s demand, for quick business decisions, increases.
HDD arrays, such as the original ISE-1 and now the ISE-2, provide multiples of performance, over traditional arrays, with the best price/performance and lowest Total Cost of Ownership (TCO) on earth. This has been proven, time and again, all over the world and in real benchmarks. Our focus is on the fundamentals of storage that have been ignored for 20+ years! The industry has ignored them but, at
X-IO, we strive for performance across all the capacity, reliability that is 100x over all others, half the power, and half (or less) the rack space, and acquisition costs that are competitive with any mainstream enterprise storage solution!
The hype of all-SSD arrays has really been the focus of the media today, but are the benefits real and the TCO really less? I believe that in some cases it is, but the number of use cases is small. Why? There are many reasons.
If SSD were the same price as HDD, then I’d be all over SSD because of its overall speed advantage. However, that’s not the case, so then, why all SSD in many companies? Enterprise SSD is 10-20x the $/GB of enterprise or nearline HDD. These prices will not converge anytime soon. Enterprise HDD will also come down in price which will keep the delta, between the technologies, significant.
The wild thing is that some all-SSD vendors are playing in the space where they should—high-end niches of trading floor applications. The problem is that many start-ups are trying to play in the basic enterprise where multi-application, high-capacity, and consistent I/O performance is required, not to mention where availability and reliability is needed. The numbers just are not there when it comes to cost, let alone TCO, when power, space, etc. are factored in.
When it comes to overcoming the TCO argument, some all-SSD vendors have added more “features” to be able to claim that their $/GB is the same or better than HDD. They do so by adding features, such as deduplication and compression. However, these features come with a price tag. The cost is for very high-powered servers to run the array, which drives up cost, but really drives up power! And the sheer usage of these features drives the application performance down, even with a bunch of SSD. The end result is dubious to the user, because most tier one applications either have dedupe or compression built in, or they just don’t need them, because there is not that much dedupe required.
The other aspect of adding such features to an all-SSD or tier one array is that of complexity. The complexity of the software goes up by magnitudes, driving the reliability of the software down, while also slowing down the system even further. There is no free lunch; but in this world of marketing, when ideas such as these keep on coming, good storage that is always available, gives consistent good performance, and is averse to service, is the key to all IT managers, in the end.
I believe, at this point in time, deduplication and compression are for data at rest, which is for backups and archived data. Performing these operations on tier one applications seems like a waste because it always depends on the application. But when the data is at rest, these features are in their perfect environment. Performance is not an issue here, and cost savings (by driving down capacity, currently used to free up for more data at rest) IS!
In the end, my thoughts on all-SSD arrays, today, go like this:
- If SSD was the price of enterprise HDD and supply was able to meet demand . . .
- And then if applications could use all the IOPS . . .
- Then SSD will replace HDD.
Until then, all-SSD arrays are either a niche or a shill on TCO with dubious value, in most cases.
So going forward, I’m going to leave SSD arrays for now and talk about tiering and hybrid arrays, as to me, they offer the most benefit, for years to come, in both price/performance and overall TCO reduction.
— Storage Horizons Blog —
What are the specific aspects that an array can and should have, to be efficient and TCO friendly? How does ISE meet and exceed these aspects by making the “whole greater than the sum of the parts”? I will deal with these questions across my next few blogs, but first consider the design of a storage product from the ground up. A storage array is a specialized computer system. It has a clear focus on data storage, but it’s also much more than that. A storage array has a few laws it must live by:
- It must protect data from at least a single failure
- It must never lose data after a power failure
- It must withstand a failure as a result of a power failure (see number 1)
- Reads and writes should be expected and be capable of being performed, at a proper duty cycle, depending on the tier of storage (e.g., ISE is a Tier 0/1 device, meaning the duty cycle should be 100%, meaning anything at any time/all the time, and be low latency, high IOPS/throughput).
So, what makes up an array that meets these “laws” in such a way that it’s not just a small server or even a PC with a bunch of Band-Aids on top (or “perfume on a pig”!)?
Array Hardware and Its Effect on TCO
Given that a storage array typically has two controllers, aspects that make or break TCO include:
1. Are both controllers active at the same time for access to same data volumes? If they are not, then typically an active/passive system or one that is only active to some volumes, and the other to the rest of the volumes, causes availability issues and/or software reliability issues, driving cost. An active/passive system would most likely throw more hefty hardware at each controller, driving up power to make up for performance loss, in the normal case of both controllers operating. Also, in cases where active-active is not within an array, then software called multi-pathing drivers, must be put into play that add complexity, sometimes cost extra money, and drive the overall solution cost up—either way with storage companies seeking to recover development and support costs by hiding costs inside of high warranty costs.
2. Do both controllers have a communication link that has near zero latency? This makes a difference in case 1, above, when failover is to occur; but most importantly to solve issues with an application’s write workload with the lowest latency and overall cost. Mirroring of write data between controllers is the best method to ensure data integrity in the case of failure, and also for lowest latency across the widest range of host access patterns. True active-active operation with a dual controller array is possible when this communication link is fast enough. Not only does this allow for faster failover, in the event of a controller reboot or failure, but also additive performance to all volumes when both controllers are operational. In addition, servers no longer need special drivers to control multiple paths to the storage.
3. Related to case 2 is how the dynamic random access memory (DRAM) cache is used for writes and how it is protected. A good write-back cache can smooth out most application I/O “outliers” from the standpoint of overall access to the dataset for the application. A small amount of DRAM with non-volatility, as well as a very fast inter-controller communication link, allows for I/O latency to be reduced on the first order. Remember, DRAM is 1000x faster than SSD, which in turn is much faster than HDD (for random I/O). Using DRAM in the proper quantities can reduce TCO, but throwing a large amount at it without intelligence just drives up cost and power usage.
4. Good cache algorithms that can aggregate I/O, pre-fetch, do full raid strip writes, atomic writes, parity caching, etc., are all aspects of a very cost-effective usage of a small amount of DRAM that points to all the back-end storage devices, which in the end must have I/O performed to/from them, in the most efficient ways possible, for each back-end device type.
5. What kind of back-end device types should be considered? Nearline HDD (SATA or SAS), Enterprise HDD (10K or 15K), SSD in drive or plug-in card form factor? It all depends on the mission of the array. If it is price/performance and TCO, then my mind goes to how to use the 10K HDD, as well as MLC SSD, for some applications for the job. Using nearline HDD has its place in very low performance or sequential I/O environments, mainly in backup and archive use cases; because the extremely low I/O density causes the ability to utilize the capacity behind these typically high-capacity drives (to disallow efficient full capacity utilization). Remember though, low-cost, high-capacity drives have a different duty cycle than enterprise drives. For example, throwing multiple sequential workloads against high cap drives is just like a random workload and will kill these drives prematurely, resulting in more service events, slower performance during long rebuilds, potential data loss, and sub-optimal performance.
6. Does the array have the ability to utilize all the capacity with I/O to all attached capacity? This is a key metric in effective TCO vs. the old adage of $/GB. If an array can utilize ALL the capacity under load, then efficiency drives down TCO. The ability to utilize all the capacity is the function of the data layout, effective utilization of back-end devices, andt also how the caching and controller cooperation work. All of this can drive TCO way down or way up depending on how well it’s done.
7. Does the array have a warranty greater than three years? If so, then it’s either because the technology reduces service events OR it’s a sales tactic. If it’s the former, then it truly drives TCO down as more storage is purchased. If it’s not, then its “pay me now or pay me later.” Technology that provides for less service is based on a design for reliability and availability that goes far past just dealing with errors that occur in a system. It’s a system approach, similar to Six Sigma to reduce variation in the system, which reduces the chance of failure. In an array, that means how the devices are packaged, how the removable pieces are grouped together, and how the software can deal with potential faults in the system and keep the application running without loss of QoS. A system that can do this drives TCO down because of the fact that customers don’t have to design for failure, or in other words, design around the shortcomings of the array by over provisioning (as many cloud vendors do). Many cloud providers have designed for failure with mass amounts of over-provisioned storage, n-way mirroring, etc. The industry has been trained around the shortcomings of array design and error recovery, so those that build their own datacenters just go for the cheapest design with the cheapest parts because of this. In contrast, a storage system that really does provide for magnitudes-greater reliability, availability, capacity utilization, and performance across that capacity, can actually change this mindset. However, it takes belief that a design of this nature is possible . . . and it has been done with the ISE from X-IO.
8. Does the array provide real-time tiering that maintains a consistent I/O stream for multiple applications across the largest amount of capacity possible? An array that can effectively do this with the highest I/O and largest capacity, at lowest cost, wins the TCO battle. Beware of marketing fear, uncertainty, and doubt (FUD) that sound the same, but the architecture and design of the product, as well as results, are what matter.
9. Does an array add features that, under the right circumstances, reduce capacity footprint via de-dupe or compression? If so, I smell snake oil because in most tier1 applications, compression and de-dupe just drive up cost of the controller while giving dubious results. On paper it might look good for the $/GB, but other aspects like space, power, and utilization go down. And if it’s done with all SSD, in order to artificially say the cost is less, all the worse.
Why am I harping on the way that arrays are designed? It is because all of this drives the TCO up or down based on architecture and methods used to drive up performance, capacity utilization, reliability, and availability . . . or NOT!
Most arrays today are very wasteful when it comes to the:
- amount of compute power inside the array
- amount of actual usable capacity
- overall reliability (or aversion to service events)
- availability of the array to the application
Also, adding features such as those noted above, as well as many kinds of replication, make the performance of the array inconsistent, causing IT architects to over-provision their gear and “work around the SAN.” SANs got a bad name for bloated, framed architectures with big iron, big license fees for every feature on the planet, poor performance, poor reliability, poor capacity utilization, etc., etc., etc . . . A SAN was originally meant to just put storage on a private network that servers could share. Oh, how things get polluted over time when greed takes over by a vendor.
As noted before, putting the right amount of compute, against the right amount of storage, will drive costs down in power, space, and application efficiency.
Most arrays also have the mindset of “when in doubt, throw it out” when it comes to replaceable components within the system, also known as Field Replaceable Units (FRUs). This leads to more service events, higher warranty costs, as well as potential and real performance loss at the application, and even down time.
What Makes ISE Tick?
X-IO is now in its second generation of ISE, a balanced storage system that breaks all the molds of the traditional storage system. Unique aspects of ISE and its second generation are:
1. All the things ISE solves, including two to three times the I/O, per HDD, over any other array manufacturer.
2. Dual super-capacitor subsystems, in order to always be able to hold up both controllers, for up to 8 minutes, to flush the mirrored write-back cache on both controllers to a small SSD on each controller. This ENDS the issue of the batteries or UPS, to either hold up cache or hold up the entire array, to write out write-back cache to a set of log disks. It now means reliability goes up exponentially over a batter which was already good—it not only keeps the price the same, but also make the data readily available for server usage when power comes back on. (Note: Two super-caps are in each ISE but only one is necessary for hold-up. Two are provided for high availability and no single point of failure.)
3. Reliability that is increased tenfold, over the first generation ISE, for the back-end devices in datapacs that are using the new Hyper ISE 7-series (with additional groupings of HDDs). This extends the art of ISE-deferring-service, and includes the 5-year hardware warranty that X-IO extends on all its ISE systems.
4. Unique Performance Tiering in the Hyper ISE hybrid that allows for full use of the HDD capacity with a small % of SSD. The new 7-series extends this capability, with varying capacities of the Hyper ISE, as well as SSD capacity for application acceleration.
5. No features that are not necessary for application performance. ISE does NOT do de-duplication as it’s not necessary if the application does it—which most do—but moreover, since we are the only company in the world that allows for full utilization of the storage purchased, de-duplication/compression is relegated to where it should be: for data at rest NOT for tier 1 storage. Furthermore, features like thin provisioning are not necessary as the mainline OS, such as Windows and Linux, let alone VMware, allow for proper grow and shrink of volumes that ISE does support.
Read more on http://stevesicola.com.
— Storage Horizons Blog —
When it comes to storage systems, the cost to build the product and the subsequent acquisition cost are only two aspects of the overall cost of owning and operating the storage. The $/GB argument does not hold up anymore as the only important point in storage, because the enterprise and this world demand much, much more. Price/ Performance is very important but other aspects, in this day and age, play equal roles in most cases. Aspects like how the storage array is designed, how much capacity can be utilized (e.g., getting I/Os to it), and then how the software is layered on it, make all the difference in the world, to the total cost of operation (TCO) of storage. So many aspects make up a good array that provides performance, reliability, availability, and capacity utilization; and it’s important not just about a specific aspect, but how they play together. It’s all about making “the whole greater than the sum of its parts.” Environmental aspects make up a huge part of the cost of owning and managing storage in a datacenter and many times are “invisible” costs because of departmental silos.
The aspects to consider in a storage system, today, when buying and then owning the system are:
- Cost of acquisition
- Cost of warranty service
- Cost of power
- Cost of space
- Cost of features with licenses, etc.
- Cost for managing and attaching the storage to the system/application
How the array is designed—from mechanicals and electronics to the software that runs it all—plays a key role to drive TCO up or down.
When I consider building storage, I look at what gives the biggest bang for the buck when it comes to performance, at the lowest cost. I also look at reliability, availability, and the usable capacity. Stan Zaffos of Gartner coined the term, “Available Performance,” which seems to sum it up pretty well. Can you make the storage available, all the time, with a consistent amount of performance? That ties together price/performance, reliability, availability, and usable capacity.
TCO is not just about $ per GB anymore, nor has it been for some time, but many storage companies still seem to focus on it. Then there are others that now seem to focus only on $ per I/O, which is like fishing with dynamite when using all RAM or SSD! It’s also NOT about putting every feature, on the planet, inside the storage system because today, most applications provide features that obviate the need for features within the array. Focusing on the wrong things drive TCO up, not down. Our online whitepaper about common mistakes that are made in storage purchases, “How to Minimize Data Storage Costs and Avoid Expensive Mistakes,” puts this all in a business perspective. Putting all of the data management/protection features inside the storage reduces the scalability of the storage, locks customers to a vendor, and also drives down the efficiency of the storage, in terms of consistent performance and capacity utilization, let alone reliability and availability.
When considering the build of a storage array, I look at multiple factors:
- Processor Speed and Capability: If a processor can have speed, as well as RAID acceleration, without the need of having multiple additional components or custom chips, it is the winner. New x64 processors, from Intel’s Jasper Forest to the new Sandy Bridge, provide that capability. Choosing the right processor is important, because too many times, the processor that is recommended is more than what is necessary and this drives up power costs, needlessly.
- Memory Capability: Dynamic random-access memory (DRAM) is still the fastest. Its 1000x the speed of flash but of course, it’s much more expensive. Using the right amount, for the job of buffering and caching, is a key to cost containment, as well as array efficiency.
- Write-back Cache: This feature is amazingly effective, if the algorithms used, smooth out the accesses to the back-end devices, whether they are HDD or SSD.
- Non-volatility and Mirrored Cache: This feature, for most applications, is a key point when it makes a storage subsystem appear to be faster than it really is. It also provides for data integrity and availability in first- and second-order benefits.
- Back-end Storage Device Choice (Enterprise HDD, Nearline/High Cap HDD, and SSD): Each of these choices has ramifications to all aspects of the array from cost, reliability, performance, and availability.
- Storage Tiering: Tiering has been around for a long time. It was initially coined as Hierarchical Storage Management (HSM), then Life Cycle Management, etc., and now tiering. But tiering can be different, depending on what the goal is. Is it the performance or some all-in-one desire to have tier-one storage mixed with tier-x storage? Is it within the array or across arrays?
- Design for Reliability and Availability: These are subtly different and relate to things, such as how many different pieces there are to the solution, and how the intelligent parts of the array allow for availability, in the event of failures (fault tolerance). Packaging of the devices and the different components—without cables and with fewer replaceable components—are keys to driving up reliability and availability, as well as driving TCO down. In the end, design for reliability is all about reducing service events that affect the storage consumer, in one way or another, while availability is all about making sure the storage is available for access, all the time.
The world of computing is complicated enough. We do not need to see so many start-ups confusing the basics of computing and storage with statements like “SSD for the cost of HDD,” or “one tier for all,” or “automatic QoS,” or “no caching,” even to the extreme of “No more HDDs.” It’s all a game to try and sell people on how price/performance could be, not what it SHOULD be!
Basically, architecture is everything. Brute force only works so far by “hiding the cheese” and adding features that mask the overall cost with dubious claims about cost savings (by de-dupe and compression). They are like the wares of the old “snake oil” salesmen of the 19th century. Are these aspects of a storage array that really make a difference?
Read more on http://stevesicola.com.
There are titanic shifts occurring in data storage requirements today, often resulting in buyers making expensive storage purchasing mistakes. The most significant disruption to traditional storage thinking is a new problem brought about by the appeal of all-SSD systems, and that is the over-provisioning of performance (IOPS), in order to achieve the proper capacity (TBs). At the same time, we are faced with the opposite problem of legacy storage, that is, the over-provisioning of capacity (TB) in order to have the performance required. Fortunately, with today’s technology, especially systems that combine the best of SSD and HDD, it is possible to find the balance, leading to outstanding financial and operational results. As I meet with our customers and propects, I have noticed how the strong appeal of all-flash solutions, offering millions of IOPS, has caused them to settle on a very high $/GB solution that simply requires only 50,000 low-latency IOPS and could be accomplished for half the cost. There is a considerable amount of capital savings to be realized by sizing the IOPs and GBs that are required and buying the storage that matches the need. The following paper outlines a structure to think about data storage and how to avoid those expensive mistakes.