Quick and simple new way to look at storage. Stop buying flash arrays that offer a bunch of bells and whistles. Two main reasons, 1. It increases your $/TB and 2. It locks you into their platform. Lets dive deeper.
1. If you go out and buy an All Flash Array (AFA) from one of the 50 vendors selling them today you will likely see there is a wide spectrum not just from the media (eMLC, MLC, cMLC) but also in the features and functionality. These vendors are all scrambling to put in as many features as possible in order to reach a broader customer base. That said, you the customer will be looking to see which AFA has this or is missing that and it can become an Excel Pivot Table from hell to manage. The vendor will start raising the price per TB on those solutions because now you can have more features to do things therefore you now have more storage available or data protection is better. But the reality is you are paying the bills for those developers who are coding the new shiny feature in some basement. That added cost is passed down to the customer and does increase your purchase price.
2. The more features you use on a particular AFA, the harder it is to move to another platform if you want a different system. This is what we call ‘stickiness’. Vendors want you to use their features more and more so that when they raise prices or want you to upgrade it is harder for you to look elsewhere. If you have an outage or something happens where your boss comes in and say “I want these <insert vendor name> out of here”, are you going to say well the whole company runs on that and its going to take about 12-18 months to do that?
I bet your thinking well I need those functions because I have to protect my data or i get more storage out of them because I use this function, but what you can do is take those functions away from the media and bring it up into a layer above them in a virtual storage layer. This way you can move dumb storage hardware in and out as needed and more based on price and performance than feature and functionality. By moving the higher functionality into the virtual layer the AFA can be swapped out easily and allow you to always look at the lowest price system based solely on performance.
Now your thinking about the cost of licenses for this function and that feature in the virtualization layer and how that is just moving the numbers around right? wrong! For IBM Spectrum Virtualize you buy a license for so many TBs and that license is perpetual. You can move storage in and out of the virtualization layer and you do not have to increase the amount of licenses. For example. You purchase 100TB of licenses and your virtualize a 75TB Pure system. You boss comes in and says, I need another 15TB for this new project that is coming online next week. You can go out to your vendors and choose a dumb storage AFA array and insert it into the virtual storage layer and you still get all of the features and functions you had before. Then a few years go by and you want to replace the Pure system with a nice IBM flash system. No problem, with ZERO downtime you can insert the Flash 900 under the virtual layer, migrate the data to the new flash and the hosts do not have to be touched.
The cool thing that I see with this kind of virtualization layer is the simplicity of not having to know how to program APIs, or have a bunch of consultants come in for some long drawn out study and then tell you to go to ‘cloud’. In one way this technology is creating a private cloud of storage for your data center. But the point here is by not having to buy licenses for features every time you buy a box allows you to lower that $/TB and it gives you the true freedom to shop the vendors.
Cloud is changing the storage business in more ways than just price per unit. It is fundamentally changing how we design our storage systems and which way we deploy, protect and recover them. For those most fortunate companies who are just starting out the cloud is an easy task as there is no legacy systems or tried and true methods, it has always been on the ‘cloud’.
For most companies that are trying to find ways to cut their storage cost while keeping some control of their storage, cloud seems to be the answer. But getting there is not an easy tasks as most have seen. The transfer of data, code that has to be rewritten, systems and processes that all have to be changed just to report back to their CIO that they are using the cloud.
Now there are many ways to get to the cloud but one that I am excited about is using technology originally deployed back in the late 90s.
GPFS (errr, $1 in the naughty jar) Spectrum Scale is a parralel file system that can spread the data across many different tiers of storage. From flash to spinning drives to tape, Scale has the ability to alleviate storage administration by policy based movement of data. This movement is based on the metadata and is written, moved and deleted based on policies set by the storage admin.
So how does this help you get to the cloud? Glad you asked. IBM released a new plug in for Scale that treats the cloud as another tier of storage. This could be from multiple cloud vendors like IBM Cleversafe, IBM Softlayer, Amazon S3 or a private cloud (Think Openstack). The cloud provider is attached to the cloud node over ethernet and allows your Scale system to either write directly to the cloud tier or move data as it ages/cools.
This will do a couple of things for you.
- Because we are looking at the last read date, data that is still needed but the chance you will read it is highly unlikely can be moved automatically to the cloud. If a system needs the file/object there is no re-coding that needs to be done as the namespace doesn’t change.
- If you run out of storage and need to ‘burst’ out because of some monthly/yearly job you can move data around to help free up space on-perm or write directly out to the cloud.
- Data protection such as snapshots and backups can still take place. This is valuable to many customers as they know the data doesn’t change often but like the idea they don not have to change their recovery process every time they want to add new technology.
- Cheap Disaster Recovery. Scale does have the ability to replicate to another system but as these systems grow larger and beyond multiple petabytes, replication becomes more difficult. For the most part you are going to need to recover the most recent (~90 Days) of data that runs your business. Inside of Scale is the ability to create mirrors of data pools. One of those mirrors could be the cloud tier where your most recent data is kept in case there is a problem in the data center.
- It allows you to start small and work your way into a cloud offering. Part of the problem some clients have is they want to take on too much too quickly. Because Scale allows customers to have data in multiple clouds, you can start with a larger vendor like IBM and then when your private cloud on Openstack is up and running you can use them both or just one. The migration would be simple as both share the same namespace under the same file system. This frees the client up from having to make changes on the front side of the application.
Today this feature is offered as an open beta only. The release is coming soon as they are tweaking and doing some bug fixes before it is generally available. Here is the link to the DevWorks page that goes into more about the beta and how to download a VM that will let you test these features out.
I really believe this is going to help many of my customers move into that hybrid cloud platform. Take a look at the video below and how it can help you as well.
So many things to talk about but a couple of notes of interest from today:
- DS8870 is a new system not just an upgrade. IBM went from the P6 server to the P7 which should give them a huge performance bump. I heard there are some impressive SPC numbers coming soon.
- XIV gets a GUI improvement with Multi-system manager. This will help drive some efficiency in management of those environments with larger deployments.
- V7000 Unified gets compression for file. Same story as on block but now for file objects.
Here are links to the hardware and software announcements from today.
|IBM System Storage DS8870 (Machine type 2423) Models 961 and 96E with three-year warranty|
|IBM System Storage TS1060 Tape Drive offers an Ultrium 6 Tape Drive for the TS3500 Tape Library|
|IBM Virtualization Engine TS7700 supports disk-based encryption|
|IBM System Storage DS8870 (Machine type 2421) Models 961 and 96E with one-year warranty|
|IBM System Storage DS8870 (Machine type 2424) Models 961 and 96E with four-year warranty|
|IBM System Storage DS8000 series high-performance flagship – Function Authorizations for machine type 239x|
|IBM System Storage DS8870 (Machine type 2422) Models 961 and 96E with two-year warranty|
|IBM Systems Director Standard Edition for Linux on System z, V6.3 now manages zBX blades|
|IBM Systems Director product enhancements provide tools to better manage virtual and physical networks|
|XIV management is designed to enable more effective XIV deployments into private cloud computing environments and improve multi-system management|
|IBM Storwize V7000 Unified V1.4 includes real-time compression, local authentication server support, four-way clustering, and FCOE support|
|IBM Programmable Network Controller V3.0, when used with OpenFlow-enabled switches, provides architecture for centralized and simplified networking|
|IBM SmartCloud Virtual Storage Center V5.1 offers efficient virtualization and infrastructure management to enable smarter storage|
|IBM Tivoli Storage Manager V6.4 products deliver significant enhancements to manage data protection in virtual environments|
|IBM Infoprint XT for z/OS, V3.1 provides support to transform Xerox data streams and highlight color resources for printing on AFP printers and enhances DBCS support|
|IBM Security zSecure V1.13.1 products and solutions enhance mainframe security intelligence, compliance, administration and integration|
|IBM Tivoli Storage FlashCopy Manager V3.2 extends application-aware snapshot management to IBM N series and NetApp devices and enables seamless disaster recovery|
Here is a a Q&A session with newly knighted IBM Fellow, Jeff Jonas, on his takes with Big Data, Protection, and some of the complications associated with analytics and Big Data.
Jeff, Mr. Jonas.
Big Data Q&A for the Data Protection Law and Policy Newsletter
May be of interest to some of my readers.
1. Data protection challenge of the future: what is Big Data?
The three V’s – Volume, Velocity, and Variety – are the essential characteristics of “Big Data”. While data protection and privacy laws are still busy catching up with technologies of yesterday, Big Data is growing at a lightning speed on a daily basis. How can companies deal with the data protection challenges brought about by Big Data in order to truly benefit from the opportunities introduced by Big Data? First, one must truly grasp what is Big Data. We interview Jeff Jonas, Chief Scientist at IBM Entity Analytics, to obtain his perspectives and definition of Big Data, and his experience handling Big Data.
2. When did data become big?
Big Data did not become big overnight. What I think happened is data started getting generated faster than organizations could get their hands around it. Then one day you simple wake up and feel like you are drowning in data. On that day, data felt big.
3. Please explain and elaborate on the characteristics of Big Data?
Big Data means different things to different people.
Personally, my favorite definition is: “something magical happens when very large corpuses of data come together.” Some example of this can be seen at Google, for example Google flu trends and Google translate. In my own work, I witnessed this first in 2006. In this particular system, the system started getting higher quality predictions and faster as it ingested more data. This is so counter intuitive. The easiest way to explain this though is to consider the familiar process of putting a puzzle together at home. Why is it do you think the last few pieces are as easy as the first few – even though you have more data in front of you then ever before? Same thing really that is happening in my systems these days. It’s rather exciting to tell you the truth.
To elaborate briefly on the new physics of Big Data, I pinpointed the three phenomena of Big Data physics in my blog entry – Big Data. New Physics – drawing from my personal experience of 14 years of designing and deploying a number of multi-billion row context accumulating systems:
1. Better Prediction. Simultaneously lower false positives and lower false negatives
2. Bad data good. More specifically, natural variability in data including spelling errors, transposition errors, and even professionally fabricated lies – all helpful.
3. More data faster. Less compute effort as the database gets bigger.
Another definition of Big Data is related to the ability for organizations to harness data sets previously believed to be “too large to handle.” Historically, Big Data means too many rows, too much storage and too much cost for organizations who lack the tools and ability to really handle data of such quantity. Today, we are seeing ways to explore and iterate cheaply over Big Data.
4. When did data become big for you? What is your “Big Data” processing experience?
As previously mentioned, for me, Big Data is about the magical things that happen when a critical mass is reached. To be honest, Big Data does not feel big to me unless it is hard to process and make sense of. A few billion rows here and a few billion rows there – such volumes once seemed a lot of data to me. Then helping organizations think about dealing volumes of 100 million or more records a day seemed like a lot. Today, when I think about the volumes at Google and Facebook, I think: “Now that really is Big Data!”
My personal interest and primary focus on Big Data these days is: how to make sense of data in real time, that is fast enough to do something about the transaction while the transaction is still happening. While you swipe that credit card, there is only a few seconds to decide if that is you or maybe someone pretending to be you. If an unauthorized user is inside your network, and data starts getting pumped out, an organization needs sub-second “sense and respond” capabilities. End of day batch processes producing great answers is simply late!
5. What are the technologies currently adopted to process Big Data?
The availability of Big Data technologies seems to be growing by leaps and bounds and on many fronts. We are seeing a large corporate investments resulting in commercial products – at IBM two examples would be IBM InfoSphere Streams for Big Data in motion and IBM InfoSphere Big Insights for pattern discovery over data at rest. There are also many Big Data open source efforts under way for example HADOOP, Cassandra and Lucene. If one were to divide these into types one would find some well suited for streaming analytics and others for batch analytics. Some help organizations harness structured data while others are ideal for unstructured data. One thing is for sure – there are many options, and there will be many more choices to come as Big Data continues to get investment.
6. How can companies benefit from the use of Big Data?
I’d like to think consumers benefit too, just to be clear. To illustrate my point, I find it very helpful when Google responds to my search with “did you mean ______”. To pull this very smart stunt, Google must remember the typographical errors of the world, and that I do believe would qualify as Big Data. Moreover, I think health care is benefiting from Big Data, or let’s hope so. Organizations like financial institutions and insurance companies are benefitting from Big Data also by using these insights to run more efficient operations and mitigate risks.
We, you and I, are responsible in part for generating so much Big Data. These social media platforms we use to speak our mind and stay connected are responsible for massive volumes of data. Companies know this and are paying attention. For example, my friend’s wife complained on Twitter about a specific company’s service. Not long thereafter they reached out to her because they too were listening. They fixed the problem and she was as happy as ever. How did the company benefit? They kept a customer.
7. What is the trend of processing Big Data?
I think a lot of Big Data systems are running as periodic batch processes, for example, once a week or once a month. My suspicion is as these systems begin to generate more and more relevant insight, it will not be long before the users say: “Why did I have to wait until the end of the week to learn that? They already left the web site.”; or, “I already denied their loan when it is now clear I should have granted them that loan.”
8. What are the complications dealing with the privacy implications brought about by Big Data compare to average sized data?
There are lots of privacy complications that come along with Big Data. Consumers, for example, often want to know what data an organization collects and the purpose of the collection. Something that further complicates this: I think many consumers would be surprised to know what is computationally possible with Big Data. For example, where you are going to be next Thursday at 5:35pm or your three best friends, and which two of them are not on Facebook. Big Data is making it harder to have secrets. To illustrate using lines from my blog entry – Using Transparency As A Mask – ‘Unlike two decades ago, humans are now creating huge volumes of extraordinarily useful data as they self-annotate their relationships and yours, their photographs and yours, their thoughts and their thoughts about you … and more. With more data, comes better understanding and prediction. The convergence of data might reveal your “discreet” rendezvous or the fact you are no longer on speaking terms your best friend. No longer secret is your visit to the porn store and the subsequent change in your home’s late night energy profile, another telling story about who you are … again out of the bag, and little you can do about it. Pity … you thought that all of this information was secret.’
9. What are the privacy concerns & threats Big Data might bring about – to companies and to individuals whose data are contained in ‘Big Data’?
My number one recommendation to organizations is “Avoid Consumer Surprise.”
10. How are companies currently applying privacy protection principles before/after Big Data has been processed?
I think there are many best practices being adopted. One of my favorites involves letting consumers opt-in instead of opting them in automatically and then requiring them to opt-out. One new thing I would like to see become a new best practice is: a place on the web site, for example my bank, where I can see a list of third parties whom my bank has shared my data with. I think this transparency would be good and certainly would make consumers more aware.
11. What is “Big Data”, according to Jeff Jonas?
Big Data is a pile of data so big – and harnessed so well – that it becomes possible to make substantially better predictions, for example, what web page would be the absolute best web page to place first on your results, just for you.