1. Technology
Send to a Friend via Email

Your suggestion is on its way!

An email with a link to:

http://ruby.about.com/od/nosqldatabases/a/nosql1.htm

was emailed to:

Thanks for sharing About.com with others!

You can opt-out at any time. Please refer to our privacy policy for contact information.

NoSQL and Document Oriented Databases

By

Chances are, you've spent your entire database life with relational databases. Data gets organized into columns, and stored in rows. You manipulate data with SQL (Structured Query Language) and there's really no other way to do it. There are competing SQL servers out there (MySQL, SQLite, etc), but they all more or less operate in the same way. But there's much more out there than SQL databases, especially with all the new software in the past few years.

NoSQL is not a specific piece of software, but rather a philosophy. NoSQL is just the concept of using databases other than SQL databases to store your data, and more specifically, databases other than relational databases. In general, NoSQL refers to document-oriented databases. But, you may ask, why? SQL is a good tool for the job!

Horizontally Scalable

The problem is that SQL doesn't scale well. In particular, it doesn't scale horizontally. If your SQL performance is poor, you can't just add more SQL servers to make it faster. In general, you need rather large computers to handle large databases, which means some very expensive hardware. In addition, since you need large computers, this doesn't fit well with the cloud model.

Document-oriented databases (such as CouchDB and MongoDB) are designed for horizontal scalability. This means as your database grows, you can simply add more commodity hardware, or more resources from the cloud. But how does it achieve this?

These types of databases operate on something similar to distributed hash tables (DHTs). DHTs store a key/value pair in hash buckets. These buckets hold a number of key/value pairs indexed by "hash value." This hash value is a number generated from the data in such a way that all key/value pairs are distributed evenly among the hash buckets. For example, if the DHT has 5 hash buckets and 50 key/value pairs are stored, each hash bucket should have about 10 key/value pairs.

One of the advantages here is that this is extremely easy to parallelize. Want more database servers? Just add more hash buckets. As your database grows, you just add more servers, and none of them need to be super-computers either. This is what it means to be "horizontally scalable."

Schema-less

Another defining feature of document-oriented databases is that they're schemaless. This is a hard pill to swallow if you've been using relational databases for a long period of time. Instead of each record existing in a row of carefully designed columns, each record exists in a document. Think of it as a file on the filesystem. This document can store any data it wants, it doesn't have to follow a schema.

Though, while these documents are schemaless, they're not freeform. Many databases opt to use the JSON format, which helps you store key/value pairs in a formatted way. A document can have any number of key/value pairs. Instead of using a schema, documents of the same time (for example, documents representing blog posts) all have a similar set of key/value pairs.

One example of this is compactness. Since all documents of the same type don't have to have the same set of key/value pairs, you can save space by leaving some of them off. So, if not all blog posts have associated links, you can simply leave that key/value pair out. Not just leave it empty, you can leave it out entirely.

But that has further implications. Not only does it save a bit of space, it makes adding features to a database relatively free. Doing an ALTER statement on a large SQL database can take hours of crunching. If it goes wrong, you'll have to restore a backup of the database, figure out what went wrong and try again. With document-oriented databases, you simply start adding new key/value pairs to your documents, it's as easy as that.

Cloud Model

The trend in web applications (and many other fields) is toward that of cloud computing. If you're not familiar with cloud computing, imagine a huge server farm. Your web application is on one of these servers, shared with many other applications. But, someone posts a link to your application on a popular website and you're suddenly inundated with traffic. On a traditional hosting platform, you'll reach the limitations of your virtual server and hit a brick wall. On a cloud, more servers can be dynamically allocated to deal with this traffic. Once the traffic is over with, the space on those servers return to the cloud. Nothing broke, and your web application didn't even slow down.

One of the problems with SQL servers is they don't work well in a cloud. As databases grow and as traffic increases, larger and faster computers are required. Load balancing can be achieved by mirroring the servers, but they still need to be large and fast. This just doesn't fit with the cloud model. NoSQL servers, on the other hand, can simply add more nodes.

  1. About.com
  2. Technology
  3. Ruby
  4. Tutorials
  5. NoSQL Databases
  6. NoSQL and Document Oriented Databases

©2014 About.com. All rights reserved.