As many respected industry analysts and players have recently voiced, 2014 looks like the year when Object Storage gets real in the enterprise. And, I agree!
Of course, IT is not going to instantly “get religion” and object stores will not infiltrate all enterprises overnight. This being said, I see some industry verticals (FSI, for example) where there is more than just curiosity – there is true buying motion. These markets have what I like to call “Object Storage Affinity” with needs in active archiving, ILM, governance and compliance and collaboration to name a few.
Back at the dawn of object storage (somewhere in the very late 1990’s), a CTO promoting it was more akin to preachers converting the miscreant than anything else. During that time, I was happy to put my holy scriptures aside, listen to the feedback of early adopters and focus on identifying the next stages of evolution for the technology.
As I have mentioned before, Object Storage must be rooted in Efficiency and abide by the tenets of Scalability, Performance, Accessibility and Reliability which, when talking about the future of object storage, also involves resiliency so as to not only ensure that a system will not break often, but that it will keep working even if broken. As I ponder on “What’s next for Object Storage?” I come to the conclusion that the focus has to be on the Application. Why? Simply because an object store should help the application it serves to be efficient too, to perform at scale and to be resilient. After all, what’s the point of putting an efficient, reliable, high performance set of tires on a junker of a car?, I will postulate that a technology answer to the question above is Embedded Metadata Management. I believe it will have a definitive positive effect on applications.
Embedded Metadata Management
Let me explain: one of the benefits of an objet store (a well-designed one, of course!) is that it keeps an information object and all of its metadata as a cohesive entity with strong referential integrity; the store persistence layer does not “break apart” metadata from payload to physically store it on another type of media.
Take the example of a voice recording in a Lawful Interception use case scenario where the payload of the object is a voice recording (aka the .wav file) and the metadata elements are the fields of the call data record, such as incoming number, length of call, and also maybe a list of keywords which are identified by a Speech-To-Text processing phase or even a “sentiment indicator” obtained by a sophisticated analysis. These fields are of the utmost of importance to the law enforcement analyst in search of a “Person Of Interest”; in fact they maybe even more important than the recording itself.
In a traditional-legacy architecture application, the voice file is given a name and stored in a traditional file system “flat file;” the metadata is stored in a DB (something complemented by inverted index too); and a “reference link” is added to the indexes to point to the flat file name. In most cases, different physical stores are used for payload and metadata and many operational shenanigans put in place to attempt to ensure the referential integrity and resiliency of each “reference link” to each voice file. In many cases, these don’t work and instead, one is left with a file system, plenty of voice files on a file system and a corrupted DB that may have “forgotten” all of the reference links. Add to that issue the poor scalability of file system and the overall system performance and reliability of this data will be very poor.
Object Storage for Embedded Metadata Management
An object store (with high reliability, measured across dimensions including security, availability, resiliency infrastructure and good split brain management) will prevent this situation, as the voice recording and its metadata will be persisted as one homogenous entity in a given physical store. Scalability wise, the object store will not be burdened by the limits seen in traditional file systems due to its flat namespace. Sounds pretty cool and attractive, right?
Unfortunately, this only solves a small part of the problem. We really just replaced the “reference link” from a filename pointing to voice file to an object ID pointing to the voice file and its metadata. It really only helps quick recovery in case of DB failure and we made the voice recording store more scalable. In order to search, query and retrieve the voice recordings pertinent to a particular case, we still need to have a copy of the metadata in a DB (with associated indexing infrastructure). This brings about two very important challenges:
- Consistency issues in case of application errors between the metadata copy and the one stored in Object Storage
- And, even more importantly, very costly and complex scalability issues for the DB. We are at the dawn of #bigdata and the volume of information objects we will need to manage in the future is no longer in the hundreds of millions, but in the hundreds of billions so the DB has to scale to these metrics!
It is this particular metadata management scalability dragon that I believe Object Storage can help slay with support from Embedded Metadata Management. Generally most truly scalable, enterprise-class Object Stores are implemented as a collection of intelligent self contained nodes (as opposed to a pile of JBODs or a massive NAS); With such an architecture it is possible to address the metadata management scalability issue by taking a divide a conquer approach. In this approach, each node is in charge of the metadata indexing and search of the subset of the collection of information objects it holds, not the total number of object in the store. This therefore allows (via the help of a distributed query and leveraging the inherent resiliency of the object store) a more scalable and high-performing application, a lower total cost of operations, and a more resilient infrastructure.
DDN is delivering Embedded Metadata Management with the release of our new WOS 360. This is an initial release, the number of metadata fields being actively managed (as opposed to just persisted) is modest but the foundation to extent this capability to all metadata fields is there.
To stay with the example above: One can store voice files in WOS 360, each with a metadata field for “speaker accent.” one for “mood sentiment” and one for “time of day of recording.” WOS 360 will automagically manage that metadata and allow an application to issue a request such as retrieve all recording of a speaker with a French accent which mood is happy and recording was done in the morning. All this would retrieve pertinent voice files across the WOS 360 cloud without the use of an external DB to duplicating and indexing that metadata.
As we evolve WOS 360 in the coming months you will see, dear reader, an increasing number of functionalities focused on this application acceleration and efficiency improvements. We will, of course, continue to further refine the base store function but you can expect to see more and more embedding of capabilities from higher up in the application stack to move inside the nodes and leverage our highly scalable peer to peer grid architecture.
If you’re interested in learning more about our next generation WOS, please join us for a webinar on March 11 where we will discuss just how we’ve built WOS to become the market’s first true object storage platform, designed to help companies tune their systems to meet the five key requirements of scalability, availability, reliability, efficiency and performance.