Cloud is changing the storage business in more ways than just price per unit. It is fundamentally changing how we design our storage systems and which way we deploy, protect and recover them. For those most fortunate companies who are just starting out the cloud is an easy task as there is no legacy systems or tried and true methods, it has always been on the ‘cloud’.
For most companies that are trying to find ways to cut their storage cost while keeping some control of their storage, cloud seems to be the answer. But getting there is not an easy tasks as most have seen. The transfer of data, code that has to be rewritten, systems and processes that all have to be changed just to report back to their CIO that they are using the cloud.
Now there are many ways to get to the cloud but one that I am excited about is using technology originally deployed back in the late 90s.
GPFS (errr, $1 in the naughty jar) Spectrum Scale is a parralel file system that can spread the data across many different tiers of storage. From flash to spinning drives to tape, Scale has the ability to alleviate storage administration by policy based movement of data. This movement is based on the metadata and is written, moved and deleted based on policies set by the storage admin.
So how does this help you get to the cloud? Glad you asked. IBM released a new plug in for Scale that treats the cloud as another tier of storage. This could be from multiple cloud vendors like IBM Cleversafe, IBM Softlayer, Amazon S3 or a private cloud (Think Openstack). The cloud provider is attached to the cloud node over ethernet and allows your Scale system to either write directly to the cloud tier or move data as it ages/cools.
This will do a couple of things for you.
- Because we are looking at the last read date, data that is still needed but the chance you will read it is highly unlikely can be moved automatically to the cloud. If a system needs the file/object there is no re-coding that needs to be done as the namespace doesn’t change.
- If you run out of storage and need to ‘burst’ out because of some monthly/yearly job you can move data around to help free up space on-perm or write directly out to the cloud.
- Data protection such as snapshots and backups can still take place. This is valuable to many customers as they know the data doesn’t change often but like the idea they don not have to change their recovery process every time they want to add new technology.
- Cheap Disaster Recovery. Scale does have the ability to replicate to another system but as these systems grow larger and beyond multiple petabytes, replication becomes more difficult. For the most part you are going to need to recover the most recent (~90 Days) of data that runs your business. Inside of Scale is the ability to create mirrors of data pools. One of those mirrors could be the cloud tier where your most recent data is kept in case there is a problem in the data center.
- It allows you to start small and work your way into a cloud offering. Part of the problem some clients have is they want to take on too much too quickly. Because Scale allows customers to have data in multiple clouds, you can start with a larger vendor like IBM and then when your private cloud on Openstack is up and running you can use them both or just one. The migration would be simple as both share the same namespace under the same file system. This frees the client up from having to make changes on the front side of the application.
Today this feature is offered as an open beta only. The release is coming soon as they are tweaking and doing some bug fixes before it is generally available. Here is the link to the DevWorks page that goes into more about the beta and how to download a VM that will let you test these features out.
I really believe this is going to help many of my customers move into that hybrid cloud platform. Take a look at the video below and how it can help you as well.
Here is a great little article talking about how NAS is giving way to Object based storage. Thanks to Mr. Backup at StorageSwiss.com for the article. Look for an update on the Spectrum Scale product this week on object based storage for hybrid clouds here on The Storage Tank.
Today’s apps aren’t your father’s apps. Applications developed today take for granted things that were not even thinkable not that long ago — especially in the storage space. The scale that is needed by modern day applications was never envisioned when traditional shared storage was invented. (NFS and SMB were released in 1984, five years before Tim Berners-Lee would invent the world-wide web.)
It wasn’t that long ago that developers would assume an app would run on a single server and access a single filesystem. Once apps started to run on clusters, most “clusters” were really just active-active pairs, or even active-passive pairs. Of course, there were a few truly clustered applications that ran on several nodes. However, all of these systems assumed one thing: a filesystem or LUN that could be shared by all nodes, or synchronously replicated storage that mimicked that behavior.
Modern day developers assume they can…
View original post 416 more words
Yes, IBM is at it again with it’s storage innovation receiving 12 new patents for tape systems. What? You thought tape was a dead? Again? Tape is very much alive and kicking and while you may be jaded one way or another, tape is still the cheapest most reliable long term storage platform out there.
IBM is known for it’s innovation and the patents it is awarded every year. For the last 23 years, it has been awarded more patents in the US than any other company. Just in 2015, IBM was awarded 7355 patents compared to 7852 patents for Google, Microsoft GE and HP combined. Roughly 40% of the 18172 patents awarded went to IBM.
When you look at the 12 storage patents (listed here), you notice they are all from 2010- to 2014/15. They range from how the data is written to abrasion check. The people behind these technologies are brilliant to say the least and it shows in the details of the filing. While they are sometimes hard to read, the technology being introduced will save IBM customers time and money down the road.
IBM also uses its patents as a revenue source. Just in the last year, IBM sold patents to both Pure Storage and Western Digital. Since Pure and IBM compete in the all flash array environment, IBM must of gotten a huge sum of money for those patents to offset the ability to crush your competitor. None the less, IBM utilizes its investment of R&D buy selling the technology to others who may be spending their money elsewhere (like marketing and selling).
If you want to learn more about the IBM Storage Patents, click over here to read about them in detail.
Great new Blog from my friend Ravi Prakash. Follow him for all things Spectrum Control!….
Today if you are a customer in a sector like financial, retail, digital media, biotechnology, science or government and you use applications like big data analytics, gene sequencing, digital media or scalable file serving, there is a strong possibility that you are already using IBM Spectrum Scale (previously called General Parallel File System or GPFS).
A question foremost in your mind may be: “If Spectrum Scale has its own element manager – the Scale GUI, what would I gain from using Spectrum Control?”
The Spectrum Scale GUI focuses on a single Spectrum Scale cluster. In contrast, the Spectrum Control GUI offers a single pane-of-glass to manage multiple Scale clusters, it gives you higher level analytics, a view of relationships between clusters, the relationships between clusters and SAN attached storage. In future, we expect to extend this support to Spectrum Scale in hybrid cloud scenarios where Spectrum Scale may be backed…
View original post 382 more words
I got a great question the other day regarding VMware Raw Device Mappings:
If an RDM is a direct pass though of a volume from Storage Device to VM, does the VM need MPIO software like a physical machine does?
The short answer is NO, it doesn’t. But I thought I would show why this is so, and in fact why adding MPIO software may help.
First up, to test this, I created two volumes on my Storwize V3700.
I mapped them to an ESXi server as LUN ID 2 and LUN ID 3. Note the serials of the volumes end in 0040 and 0041:
On ESX I did a Rescan All and discovered two new volumes, which we know match the two I just made on my V3700, as the serial numbers end in 40 and 41 and the LUN IDs are 2 and 3:
I confirmed that the…
View original post 356 more words
We have been getting this question about clustering the storage controllers on the V7000 Unified (V7kU) more and more as people start expanding their systems beyond their initial controllers. But let’s step back a few steps and understand what we are working with first.
- V7kU is a mixed protocol storage platform. It uses Spectrum Scale as the file system and Storwize as the operating system. This is important as people get interested in how they can adopt a high speed, parallel file system with grace and ease. The V7kU comes preloaded so no need to understand the knobs and switches of installing and configuring Spectrum Scale (formerly known as GPFS). The V7kU supports SMB (CIFS), NFS, FC, FCoE, iSCSI and can be used with other building blocks like Openstack to support Object Storage too.
- V7kU can scale up to 20 disk enclosures per controller. This platform can cluster up to four controllers giving customers a chance to max out around 7.5 PBs of storage. The best part is you can mix and match drives types and sizes. You can have flash drives in the same enclosure as SAS and NLSAS drives.
- Single interface is the best part of this solution. You can provision both block and file access from the same gui/cli. Data protection like snapshot s and flash copies, replication and remote cache copies.
- Policy based data management. One of my favorite parts of the solution is I can create policies to manage the data on the box. For example, I can create a policy that says if my flash pool becomes 75% full start moving the oldest data to the NLSAS pool. Not only does this make my job easier not having to manage the data move, but it frees up the flash pool and extends the buying power of the flash. Since flash is the most expensive part of the storage, I want the best bang for the buck there.
Now comes the part of can we cluster these V7000s to make a bigger pool, yes we can. Not only can we cluster the systems (multiple IO groups) we can mix the file and block independently. The best part as you add IO groups you add more performance, capacity all the while managing it from the same single interface.
This was taken from the V7000 Infocenter:
- Issue this CLI command to list the node candidates:
This output is an example of what you might see after you issue the lsnodecandidate command:
id panel_name UPS_serial_number UPS_unique_id hardware 50050768010037DA 104615 10004BC047 20400001124C0107 8G4 id panel_name UPS_serial_number UPS_unique_id hardware 5005076801000149 106075 10004BC031 20400001124C00C1 8G4
- Issue this CLI command to add the node:
addnode -panelname panel_name -name new_name_arg -iogrp iogroup_name
where panel_name is the name that is noted in step 1 (in this example, the panel name is 000279). The number is printed on the front panel of the node that you are adding back into the system. The new_name_arg is optional to specify a name for the new node; iogroup_name is the I/O group that was noted when the previous node was deleted from the system.Note: In a service situation, add a node back into a clustered system using the original node name. As long as the partner node in the I/O group has not been deleted too, the default name is used if -name is not specified.
This example shows the command that you might issue:
addnode -panelname 000279 -name newnode -iogrp io_grp1
This output is an example of what you might see:
Node, id [newnode], successfully addedAttention: If more than one candidate node exists, ensure that the node that you add into an I/O group is the same node that was deleted from that I/O group. Failure to do so might result in data corruption. If you are uncertain about which candidate node belongs to the I/O group, shut down all host systems that access this clustered system before you proceed. Reboot each system when you have added all the nodes back into the clustered system.
- Issue this CLI command to ensure that the node was added successfully:
This output is an example of what you might see when you issue the lsnode command:
id name UPS_serial_number WWNN status IO_group_id IO_group_name config_node UPS_unique_id hardware 1 node1 1000877059 5005076801000EAA online 0 io_grp0 yes 20400002071C0149 8F2 2 node2 1000871053 500507680100275D online 0 io_grp0 no 2040000207040143 8F2
All nodes are now online.
Currently, I am working with a customer on their archive data and we are discussing which is the better medium for their data that never gets read back into their environment. They have about 200TB of data that is sitting on their Tier 1 that is not being accessed, ever. The crazy part is this data is growing faster than the database that is being accessed by their main program.
This is starting to pop up more and more as the unstructured data is eating up storage systems and not being used very frequently. I have heard this called dark data or cold data. In this case its frozen data.
We started looking at what it would cost them over a 5 year period to store their data on both tape and cloud. Yes, that four letter word is still a very good option for most customers. We wanted to keep the exercise simple so we agreed that 200TB would be the size of the data and there would be no recalls on the data. We know most cloud providers charge extra for the recalls so we wanted and of course the tape system doesn’t have that extra cost so we wanted an apples to apples comparison. As close as we could.
For the cloud we used Amazon Glacier pricing which is about $0.007 per GB per month. Our formula for cloud:
200TB X 1000GB X $0.007 x 60 months = $84,000
The tape side of the equation was a little more tricky but we decided that we would just look at the tape media and tape library in comparison. I picked an middle of the road tape library and the new LTO7 media.
Tape Library TS3200 street price $10,000 + 48 LTO7 tapes (@ $150 each) = $17,200
We then looked at the ability to scale and what would happen if they factored in their growth rate. They are growing at 20% annually which translates to 40TB a year. Keeping the same platforms what would be their 5 year cost? Cloud was..
200TB + (Growth of 3.33TB per month) x 1000GB x 60 months = $125,258
Tape was calculated at:
$10,000 for the library + (396TB/6TB LTO7s capacity)x$150 per tape = $19,900
We all here how cloud is so much cheap and easier to scale but after doing this quick back of the napkin math I am not so sure. I know what some of you are saying that we didn’t calculate the server costs and the 4 FTEs it takes to manage a tape system. I agree this is basic but in this example this is a small to medium size company that is trying to invest money into getting their product off the ground. The tape library is fairly small and should be a set it and forget it type of solution. I doubt there will much more overhead for the tape solution than a cloud. Maybe not as cool or flashy but for $100,000 over 5 years they can go out and buy their 5 person IT staff a $100 lunch everyday, all five years.
So to those who think tape is a four letter word and is that thing in the corner that no one wants to deal with, I say embrace it and squeeze the value out of them. Most IT shops have tape still and can show to their finical teams how they can lower their cost with out putting their data at risk in the cloud with this: