Jan Brass Takes a Hard Look at the Cloud

Managing A Distributed Database in the Cloud

database rdbmsDatabase theory has undergone a dramatic transformation in the past few years. For decades, the gold standard was the RDBMS system developed in 1974 by E.F Codd. It provides a mathematically perfect database model for the storage of structured data that is scalable, reliable, and can be queried in a formal language – SQL. It works brilliantly. It is no exaggeration to say that the relational database management system is the backbone of a staggering number of applications worldwide, many of them high-performance and mission-critical.

The growth of the Internet has placed new demands on data storage systems that traditional RDBMS databases are unable to cope with. To start, the sheer volume of data is immense. Terabytes are old hat. Facebook for example needs the capacity to store 180 PB of data every year! While traditional relational database management systems are fully capable of storing large quantities of data, this kind of volume is in another league altogether. So-called “big data” presents a unique set of challenges. Availability, consistency, durability etc., become almost impossible to manage with old paradigms.

database noSQLThis has led companies to develop alternative technologies. Now a new class of database systems have emerged called “NoSQL” that ditch the traditional relational model. Data is still stored in tables, but it doesn’t have the expected normalization. Instead, the primary storage is “key-value” pairs similar to a hash map. The speed of transactions is also mind-boggling and this requires large portions of the database to be “in memory” for quick reads and writes rather than stored on the disk. Again, old-fashioned databases are not built with this kind of performance in mind. NoSQL databases allow for a great deal of optimization such as horizontal partitioning, or “sharding” for example, that allows logically segregated data sets to be stored separately allowing the creation of multiple indexes where many queries will not need to be performed against the entirety of the data set.

databaseTwo of the most popular NoSQL databases are Redis and Memcached. Large Internet companies such as Facebook, Twitter, and Instagram are built upon these platforms. They are ideally suited for distributed cloud applications. Unfortunately, setting up, maintaining, and configuring these databases is no trivial task and is mostly well beyond the capabilities of smalltime developers. This has led to the growth of a company called Garantia Data which provides developers with abstracted Redis and Memcached NoSQL databases that run in a variety of public cloud offerings such as Amazon’s AWS, and Windows Azure.

databaseThe idea is to place the power of distributed and highly available NoSQL databases at the disposal of small time cloud application developers with no capital investment and a “pay-as-you-go” pricing model. All the heavy lifting is done by a third-party provider including complex functionality like sharding.

Truth be told, not many enterprises today require the full-blown performance of a NoSQL database. Foremost, the traditional relational database management system or RDBMS will be more than sufficient. But for public cloud applications that have the potential to attract millions of users, there really is no substitute for a powerful NoSQL database. And if your organization is maintaining or planning to develop such an application, Garantia Data is willing to do the management for you.

About Jan Brass

Speak Your Mind