Sunday, 20th April, 2008

Amazon to offer persistent storage for EC2

Category: Storage , Compute Cloud , Amazon

There has been a ongoing request for persistent storage for Amazon's EC2 (Elastic compute cloud). And a good news is that Amazon is working on a major upcoming feature - persistent storage for EC2.
The new feature will provide reliable and persistent storage volume for all EC2 instances. The persistent storage volume would act as a raw, unformatted hard drive, which can be formatted and configured based on the needs of the application.
Using Amazon EC2 persistent storage, customers will be able to create volumes ranging from 1GB to 1TB, and will be able to attach multiple volumes to a single instance. Isn't it cool! a much awaited request from every EC2 customer.
Like EC2, persistent storage will also come with its own set of APIs, to create & delete volumes, to create and delete snapshot. Yes EC2 persistent storage will enable you to automatically create snapshots of your volumes.
If you are interested in participating in EC2 persistent storage, sign up here

Posted by Amaltas Bohra at 8:38 p.m.
0 comments

Wednesday, 23rd January, 2008

Beyond MySQL, a paradigm shift from RDBMS

Category: Database , Storage

I think the future of storage for web applications is beyond traditional RDBMS. Companies like Amazon, Google & Yahoo are providing tools and APIs to developers to utilize their infrastructure and store data in their clouds. Let's think beyond RDBMS and explore other ways of data storage.

We have been using Mysql and other relational databases for years, and will continue to do so. There are myriad of books on web development using Mysql/Postgresql. Infact, almost every web framework is based on MySql, Postgresql or Sqlite.

Every application requires some sort of data storage, and its very important to think in terms of scalability, replication, clustering, before you decide which data storage engine will be the best fit for your project.

I have been exploring various storage engines and used few of them based on the needs of my project. In this post I will list various data storage options that I have used, apart from traditional RDMBS viz, Mysql, Postgresql, MS Sql Server etc.

Every project has different storage needs. You might want to store data in XML format and query the data using any XML query language like XQuery or XPath. You might wish to embed the database within your application, or may be just need to store key value pairs, with read only access and no frequent updates and inserts.

So let's explore different storage mediums available for your data storage needs.

Different storage mediums

  • XML Storage: If your data is mainly XML format there are various XML databases available (both open source and commercial). Almost all the XML databases are compliant of Xquery and Xpath specifications. Think of XQuery to XML database as SQL to RDBMS. The benefit of using XML database is you don't have to think of mapping your XML data in other data structure, but simply store XML data, and retrieve back the XML data.
    Some of the popular XML databases includes:
    • eXist: eXist-db is an open source database management system entirely built on XML technology. It stores XML data according to the XML data model and features efficient, index-based XQuery processing. eXist doesn't have transaction support.
    • Berkeley DBXML: Berkeley DBXML is an open source embeddable XML database, with XQuery-based access to documents stored in containers. The main advantage of using Berkeley DBXML is that its embedded database, which means that it runs as the same process with the application. As Berkeley DBXML is built on top of Berkeley DB it inherits full ACID transactions. Berkeley DBXML comes with libraries in Java, C++, Python, Perl, PHP and TCL.
  • Embedded database: By embedded database, we mean database which is part of the application, and there is no separate instance of database server running.
    • Berkeley DB: Berkeley DB is a high performance, scalable and reliable non-relational storage system. According to Berkeley DB documentation,
      Berkeley DB is a tool for software developers, not for IT professionals or DBAs. It is intended to provide fast, reliable data storage for applications. The only way to use Berkeley DB is to write code. There is no standalone server and no SQL query tool for working with Berkeley DB databases
      Berkeley DB stores data in key value pairs, so you have write implementation of complex queries. Berkeley DB is ACID compliant and provides libraries in Java, C++ and various other scripting languages.
    • Greendb: Greendb is written in C++ and built on top of Berkeley DB. Interfaces currently exists for perl, python, and guile, using SWIG.
  • Text file storage:: I have also used plain text storage for storing tabbed delimited data. Almost all programming languages provide support for text file processing. If you need advanced text based storage mechanism, you can try:
    • txtdb: txtdb provides a simple, portable, transparent, persistent data store for online and offline application developers. All data is stored in text files that can be independently read or written by other programs.
      In txtdb, everything is a string, data is stored in key value pairs. Libraries are available in C and Python. txtdb is built on the top of Berkeley DB.
  • Data in the cloud: All the data storage software mentioned above has both advantages and disadvantages. Few of them lack transaction support, few have scalability issues. Amazon recently launched Amazon SimpleDB. This service provide ability to store and query data in the cloud. So as your user base and traffic grows, you don't have to worry about scalability. But unlike relational database, Amazon SimpleDB does not store data in tables or xml, rather it stores data as items which have attribute value pairs. Amazon provides APIs to CREATE, GET, PUT, DELETE items. More details can be found on SimpleDB can be found here.

Posted by Amaltas Bohra at 10:02 p.m.
1 comment

-