I have closet in my house that I keep all kinds of computer gear. Most are things from some fun project that I was working or a technology that is past is prime. There is everything from Zip drives to coax termination to a Ultra-wide scsi interface for an external CDROM. Why do I keep these things in a box in a closet? Great question that usually comes up one a year from some family member that sticks there head in there looking for a toy, coat or looking to make a point.
But on more than one occasion I have had to go to the closet of ‘junk’ to get something that helped me in completing a project. A Cat5 cable for my son’s computer, an extra wireless mouse when my other one died. Yes I could go through it all and sort it out and come up with some nice labels for it all, but that takes time. It’s just easier to close the container lid and forget about it until I realize I need something and its easy enough to grab it.
Now this is not a hoarding issue like those you see on TV where people fill their house, garage, sheds and barns with all kinds of things. Those people who show up on TV have taking the ‘collecting’ business to another level and some call them ‘hoarders’. But if you watch shows like “American Pickers” on the History Chanel, you will notice that most of the ‘hoarders’ know what they have and where, a meta data knowledge of their antiques.
When you look at how businesses are storing their data today, most are looking to keep as much as possible in production. Data that is no longer serving a real purpose but storage admins are too gun shy to hit the delete button on it for fear of some VMWare admin calling up to see why their Windows NT 4 server is not responding. If you have tools that can move data around based on the age or last accessed then you have made a great leap into making savings. But these older ILM systems can not handle the growth of unstructured data of 2017.
Companies want to be able to create a container for the data and not have to worry if the data is on prem, off prem, on disk or tape. Set it and forget it is the basic rule of thumb. But this becomes difficult due to the nature of data as it has many different values depending on who you ask. A 2 year old invoice is not as valuable to someone in Engineering as it is to the AR person who is using it to base their next billing cycle.
One of the better ways to cut through the issue is to have a flexible platform that can move data from expensive flash down to tape and cloud with out changing the way people access the data. If the user can not tell the difference where his data is coming from and does not have to change the way he gets to it then why not look at putting the cold data on something low cost like tape and cloud tape.
This type of system can be accomplished but using the IBM Spectrum Scale platform. The file system has a global name space across all of the different types of media and can even use the cloud as a place to store data without changing the way the end user will access the data. The file movement is policy based and allows admins to not ask the user if the data is needed, it simply can move it to a lower cost as it gets older/colder. The best part is because of a new licensing scheme, customers only pay the TB license for data that is on disk and flash. Any data that sits on Tape does not contribute to the overall license cost.
For example: 500TB of data, 100 TBs that is less than 30 days old and 400 that will greater than 30 days. If stored on a Spectrum Scale file system, you only have to pay for the 100 TBs that is being stored on disk and not the 400 TB on tape. This greatly reduces the cost to store data as while not taking features away from our customers.
For more great information on the IBM Spectrum Scale go here to this link and catch up.
Cloud is changing the storage business in more ways than just price per unit. It is fundamentally changing how we design our storage systems and which way we deploy, protect and recover them. For those most fortunate companies who are just starting out the cloud is an easy task as there is no legacy systems or tried and true methods, it has always been on the ‘cloud’.
For most companies that are trying to find ways to cut their storage cost while keeping some control of their storage, cloud seems to be the answer. But getting there is not an easy tasks as most have seen. The transfer of data, code that has to be rewritten, systems and processes that all have to be changed just to report back to their CIO that they are using the cloud.
Now there are many ways to get to the cloud but one that I am excited about is using technology originally deployed back in the late 90s.
GPFS (errr, $1 in the naughty jar) Spectrum Scale is a parralel file system that can spread the data across many different tiers of storage. From flash to spinning drives to tape, Scale has the ability to alleviate storage administration by policy based movement of data. This movement is based on the metadata and is written, moved and deleted based on policies set by the storage admin.
So how does this help you get to the cloud? Glad you asked. IBM released a new plug in for Scale that treats the cloud as another tier of storage. This could be from multiple cloud vendors like IBM Cleversafe, IBM Softlayer, Amazon S3 or a private cloud (Think Openstack). The cloud provider is attached to the cloud node over ethernet and allows your Scale system to either write directly to the cloud tier or move data as it ages/cools.
This will do a couple of things for you.
- Because we are looking at the last read date, data that is still needed but the chance you will read it is highly unlikely can be moved automatically to the cloud. If a system needs the file/object there is no re-coding that needs to be done as the namespace doesn’t change.
- If you run out of storage and need to ‘burst’ out because of some monthly/yearly job you can move data around to help free up space on-perm or write directly out to the cloud.
- Data protection such as snapshots and backups can still take place. This is valuable to many customers as they know the data doesn’t change often but like the idea they don not have to change their recovery process every time they want to add new technology.
- Cheap Disaster Recovery. Scale does have the ability to replicate to another system but as these systems grow larger and beyond multiple petabytes, replication becomes more difficult. For the most part you are going to need to recover the most recent (~90 Days) of data that runs your business. Inside of Scale is the ability to create mirrors of data pools. One of those mirrors could be the cloud tier where your most recent data is kept in case there is a problem in the data center.
- It allows you to start small and work your way into a cloud offering. Part of the problem some clients have is they want to take on too much too quickly. Because Scale allows customers to have data in multiple clouds, you can start with a larger vendor like IBM and then when your private cloud on Openstack is up and running you can use them both or just one. The migration would be simple as both share the same namespace under the same file system. This frees the client up from having to make changes on the front side of the application.
Today this feature is offered as an open beta only. The release is coming soon as they are tweaking and doing some bug fixes before it is generally available. Here is the link to the DevWorks page that goes into more about the beta and how to download a VM that will let you test these features out.
I really believe this is going to help many of my customers move into that hybrid cloud platform. Take a look at the video below and how it can help you as well.