Thought this was cool: SQL vs. NoSQL | Linux Journal
Comments: “SQL vs. NoSQL | Linux Journal”
So, what is the big deal about NoSQL databases? For one, they’ve
introduced new ways (or perhaps re-introduced old ways) of thinking about
what databases are and what they
can do. For another, they’re shiny and new, and all the cool kids seem to
be using them. You could argue that Google’s BigTable is the database
that inspired the NoSQL movement. Or, maybe it was Amazon’s S3. Both of
them are closed source, but they were (or are) impressive enough to
inspire open-source interpretations.
The current NoSQL field includes HBase, Cassandra, Redis, MongoDB,
Voldemort, CouchDB, Dynomite, Hypertable and several others. Some
have followed the model of BigTable, others follow S3’s model, some are a
mix of the two, and others are charting their own path. Some of these
projects are more mature than others, but each of them is trying to
solve similar problems.
Instead of having tables with columns and rows like you would find
in a traditional RDBMS, most NoSQL databases are simple “key-value stores”. Each piece of data that goes into the database is given
a key, and when you want the data back, you use the key to get it. This
simplicity is beneficial, because it helps busy sites achieve extremely
low latency, even under high load, when paired with a large number of
servers and a fast network. The simplicity of the key-value model also
A step beyond simply having keys and values are the so-called document
databases. A document, in this case, is a collection of various fields
of information. Each individual document can have a different number
of fields of varying lengths. These databases are useful if you have a
lot of semi-structured data, and they are a good fit for object-oriented
programming models (for example, you can consider the database as a
storage area for objects).
Why do traditional database users dislike these
newcomers? D. Richard
Hipp, the creator of SQLite, in a talk given at my local LUG, derisively
called NoSQL databases “post-modern databases”, because instead of giving
you a definite answer to your question, they give you “an
opinion” or their
“best guess”. His purpose was to paint NoSQL databases in a bad light,
and for most of the old-school database world, the NoSQL, non-relational,
BASE model (see the What Is ACID? sidebar) is more than a bit heretical.
The heresy comes because historically, databases almost always
have tried to implement the relational model and be fully ACID-compliant.
If your transactions weren’t ACID, or your database wasn’t relational, the
argument went, you couldn’t call yourself a “real” database. Look
at the MySQL vs. PostgreSQL flame wars for ample evidence of this thinking.
The problem though, is that being relational and ACID is not necessary
for some use cases and can add unnecessary overhead, which you don’t
want if you are running a popular, heavily trafficked Web site. Many
early users of MySQL knew this and were mocked for choosing MySQL over
“real” databases like PostgreSQL. It is ironic now that MySQL has
gained what every “expert” said it should have (ACID transactions),
that a new movement has started up claiming that all the old database technology
isn’t actually necessary.
What is necessary for top-tier Web sites, according to proponents of NoSQL,
is massive scalability, low latency, the ability to grow the capacity
of your database on demand and an easier programming model. These,
and others, are things which, according to them, SQL RDBMSes just don’t
provide in a cost-effective manner.
Most classic RDBMSes initially were designed to run on a single large
server. That is how it was done in the late 1970s and early
1980s, and the idea exists in the design of many RDBMSes to this day.
The P in CAP (see the What Is CAP? sidebar) is meaningless when the database is on a single server
(the server is either up or down, rarely or never only partly up),
and traditional RDBMSes have focused mainly on Consistency, aka ACID,
with Availability thrown in if you mirror between database servers or
use hardware with no single points of failure.
Some NoSQL databases also focus on the C and A parts of CAP. Unlike
traditional RDBMSes though, these databases are designed from the ground
up to be run on dozens, hundreds or even thousands of nodes in a single
data center. Partial partition tolerance for these databases is obtained by
mirroring database clusters between multiple data centers. The advantage
these databases have over a traditional RDBMS is that with the work
spread over all of those machines, you can achieve ultra-low latency
even when there are extremely high numbers of reads and writes, and
with all those machines, you can analyze massive amounts of data quickly.
Other NoSQL databases focus on the A and P parts of CAP and are
designed to span multiple data centers. True to CAP, strong consistency
is impossible for these databases. Weak consistency is an especially
heretical thought to the RDBMS old guard. Instead, these NoSQL databases
implement eventual consistency, whereby any changes are
replicated to the entire database eventually, but at any given time, a
single node or group of nodes may not have the latest data. Like the NoSQL
which focus on C and A, the focus for A and P databases is on low latency,
high throughput and anything else that makes the Web site more responsive
and a richer experience for users.
In addition to sometimes abandoning consistency in favor of scalability
and latency, another way NoSQL databases break with tradition is in their
abandonment of the relational model. To be fair, some data truly does not
naturally fit the relational model. This could be because the data changes
form or size often, or because the data is completely unstructured.
The final break with tradition in NoSQL databases is the thing that gave
them their name. They don’t use SQL. The reasons for dropping SQL usually
revolve around it not fitting in with modern object-oriented development
processes or some perceived difficulty in working with SQL. Sometimes
the excuse given for not using SQL is a simple “SQL sucks”, which isn’t
really a reason. Stupid reasons aside, the SQL language was designed
for use with relational databases, and NoSQL databases are mostly
non-relational, so it makes sense that they don’t use it.