Here comes the anti-database “movement”
July 6, 2009 24 Comments
I’ve been seeing more and more articles like this one from Computerworld about abandoning SQL databases.
The meet-up in San Francisco last month had a whiff of revolution about it, like a latter-day techie version of the American Patriots planning the Boston Tea Party. The inaugural get-together of the burgeoning NoSQL community crammed 150 attendees into a meeting room at CBS Interactive. Like the Patriots, who rebelled against Britain’s heavy taxes, NoSQLers came to share how they had overthrown the tyranny of slow, expensive relational databases in favor of more efficient and cheaper ways of managing data.
NoSQLers? Oh boy are we going to be in for it when they hear how critical databases are for the geospatial industry. To me this “revolution” sounds more like a backlash against the traditional SQL DBA who doesn’t want to change in the face of “Web 2.0″. Of course it is very easy to move to a new data storage platform when you either have a ton of money or no product yet. While I do think technology such as Google’s BigTable and Amazon’s SimpleDB as an inevitable course for many web applications, wholesale abandonment of SQL and databases such as Oracle/SQL Server/PostgreSQL is absurd.

No-SQL "Patriots" dump RDBMS without a care to the implications...

What you don’t want to spend the next year or 2 redesigning your apps for Hadoop and CouchDB.
Sure, you paying for that?
It’s all about balance.
Sure, there are more efficient role-specific data stores (look for pieces by Stonebraker, Monash, etc on this), but the cost/benefit balance for majority of these doesn’t tip unless you’re scaling to levels that most of us couldn’t dream of, or you only use in-house software and can afford to make decisions on efficiency alone.
If you care about being able to access your data from multiple applications without custom development, a SQL-based RDBMS is still generally your best choice.
I think it depends on the aims of your application.
For applications with light data structures and where speed is not an option, Tokio Cabinet – CouchDb – Amazon Simple DB et alii is the way to go.
But, for application with very complicated structures, relational db will be never abandoned and SQL will continue to live
my 2 cents
I don’t think it needs to be an either or construct. The object store that runs GeoCommons/GeoIQ is not a relational database, but we also connect to SQL databases to interconnect data layers. For some things relational databases don’t make sense, especially when dealing with large amounts of data at Web scale, and I believe we’ll see more hybrids emerging going forward. If you think about all the light weight geo-enabled data that is going to be created by mobile devices, sensors and web services you’ll be hard pressed to make that work in a standard relational database. You don’t need to rebuild you apps at all, but can definitely augment them by tying into the services being deployed across the new data platforms.
I feel like I have been hearing about this for a while — from the very sophisticated couchdb massive document store people on the one hand to web framework developers with scorn for SQL on the other. The problem is really not with the NOSQL technical presenters except insofar as they convince the less sophisticated developers that SQL is really a bore and best ignored or interacted with via some abstraction layer that completely neglects query optimization, proper indexing, thoughtful modeling, etc. If you think SQL is slow, wait for SQL in the hands of someone who is taught to disdain it!
Isn’t a file geodatabase a ‘NOSQL’?
More techno-boosterism, bah.
Sure, there are some interesting “noSQL” alternatives out there, but people like Amazon, Google or Facebook have particular needs that are not really suited to the relational model in the first place. They have huge volumes of data, but that data is relatively simple in structure, and they don’t generally need the flexibility of SQL to run arbitrary queries with complex query conditions against their massively optimised data-sets, because they control the queries anyway. Which is probably just as well, because Google App Engine’s DataStore won’t let you include >1 “not equals” condition in a query and only returns up to 1000 results.
But the reason the relational model has been so successful is because it gives us a solid foundation for modelling arbitrarily complex data domains, and the RDBMS provides a robust and flexible implementation of that model. And it works pretty well – that’s why it has lasted more than 30 years and counting.
Most places I’ve worked (as a mainstream Oracle developer), the biggest problem is not that the relational model is not appropriate, but that the people building the systems don’t understand the model in the first place. And this has become a much bigger problem since OO languages like Java took over the middle tier application space, because there is still a tendency for inexperienced OO developers to regard the database as simply a glorified flat-file, rather than a massively powerful tool for managing your data in a robust and flexible manner. So they end up writing lots of complex processing to do stuff that a database is already designed to do for you really easily. Then they complain about the nasty old relational database.
As for GIS, there still seems to be a significant group of people who have yet to adopt these new-fangled database thingies in the first place.
Sure, as Sean says, there are plenty of applications where the standard relational model is less appropriate or simply less efficient, and data-warehousing and star schemas are one widely adopted response to that situation, just like Google et al with their own alternatives. So if those alternatives work for you, go for it.
But you will still find you have to design your data model to fit your non-relational platform, and each of the new platforms is different, and each has its own fairly drastic limitations – limited query options, weak or non-existent data-typing, lack of spatial support, and so on. Even the “NoSQL” crowd will find that they’re actually using SQL-like tools, such as Google’s GQL and so on, because you still need some way of running queries against your data store.
I can see that some applications will work well with BigTable etc because these platforms answer the particular needs of those applications, and many others will work simply because they’re too trivial for it to make any difference what platform they run on. But for the majority of mainstream commercial database applications, “NoSQL” simply makes “NoSense” at all.
Bah. Now I’m going to be grumpy all day.
“Most places I’ve worked (as a mainstream Oracle developer), the biggest problem is not that the relational model is not appropriate, but that the people building the systems don’t understand the model in the first place… because there is still a tendency for inexperienced developers to regard the database as simply a glorified flat-file, rather than a massively powerful tool for managing your data in a robust and flexible manner. So they end up writing lots of complex processing to do stuff that a database is already designed to do for you really easily. Then they complain about the nasty old relational database.”
Amen!!
Funny thing is this sounds an awful lot like the response to those silly web maps a few years back.
@Sean: Far from it. I’m all for making the most of these new tools where appropriate. But there is a strong tendency in IT to leap onto the next bandwagon, regardless of whether it is actually appropriate.
Web mapping introduced powerful new tools for delivering new applications in the spatial sector. If used properly, those tools complement and extend existing ones, but they are not a complete replacement for the things that “traditional” GIS already does well.
Use the right tool for the job, and learn to use them well. Don’t ditch your existing tools just because somebody just sold you a shiny new toy.
@Kimo – I like that! Until now I hadn’t realized lack of SQL for a file gdb is really a feature.
My point is that we spend a good chunk of our time bashing new innovations and defending current tools – especially in the GIS world. Maybe we don’t need to jump on every IT bandwagon, but I’d say we could stand to be a bit more open minded.
Even in the case of Web mapping – I’d bet a good chunk of money that “traditional” desktop GIS is going to be the equivalent of a mainframe in the not too distant future. Not that the science of GIS or the functionality will go away but the technology to deliver it will evolve. Just as data management mechanisms will evolve.
I’d also add the caveat that the NoSQL crowd did not write the article. A ComputerWorld reporter did who wants lots of people to read it by creating controversy.
@Sean: “I’d say we could stand to be a bit more open minded…”
I agree. But that also includes being open-minded about the merits of the boring unfashionable toys as well, even the ones we don’t understand. And one difference between web-mapping and databases is that DB’s are about “persistence” i.e. the data sticks around and gets re-used for lots of different applications, long after you’ve moved your bleeding edge web-mapping app onto Web 3.0 and so on. So you need a robust, flexible and sustainable platform to make the most of your data in the long term, whichever platform you choose.
Incidentally, if anybody else is curious about the differences between relational DBs and Amazon SimpleDB, there’s a good article on migrating from an RDBMS to SimpleDB at:
http://developer.amazonwebservices.com/connect/entry.jspa?externalID=1292
For what it’s worth, the nosql meetup was just that — a meetup. There was no dumping of tea, and I’m pretty sure every one of the presenters mentioned that their business is using the non-relational database *in addition to* a SQL database. Facebook is known to have thousands of MySQL servers. Google as well bases a large part of its infrastructure on MySQL. LinkedIn mentioned having a RAC installation in addition to Project Voldemort. I know of few serious contenders (if any) that have completely abandoned the RDBMS, and I imagine things will stay that way for quite some time.
Basically, the point I’m trying to make is that the ComputerWorld article was of dubious accuracy and certainly did not capture the mood of the event. Like you said, SQL (and the relational model) have their places. And there are sizes of data or rates of queries where most reasonably priced RDBMSes fall over – if you want to blow a couple million on a SAN full of 15k SAS drives and a large RAC installation, go ahead, but if you don’t need true consistency and relational modeling, one of the alternatives might be a better choice.
-Todd
@Todd:
Aw, now you’ve spoiled a good argument! Thanks for the clarification though.
I’m all for moving away from relational databases. I think it’s high time we all stopped thinking in 2 dimensions.
That said, though, I’m a long-time subscriber to the “If it ain’t broke, don’t fix it” school of thought. If a relational database gets the job done, then what more do we need to know? This does not mean, however, that we should keep doing things the way we’ve been doing them simply because it’s the way we’ve been doing them.
Years ago, a crusty old carpenter gave me some advice that I’ve applied, to great effect, to a variety of aspects of life: Use the right tool for the job at hand. Some times a relational database is the right tool. Sometimes it’s not.
Looks like the demise of the desktop may be sooner than we think?
http://googleblog.blogspot.com/2009/07/introducing-google-chrome-os.html
Well no comment from me really, this link says it all!
http://www.theregister.co.uk/2009/07/09/dzuiba_google_chrome_redux/
For those that might be interested I ran across the link to the presentations/videos of the said NoSQL community meeting.
http://blog.oskarsson.nu/2009/06/nosql-debrief.html
I’m not sure that “the demise of the desktop may be sooner than we think”.
However, this one line in the Google blog post was certainly instructive:
They want their computers to always run as fast as when they first bought them.
With that one line there is no doubt they are taking aim at MS and Windows. We’ll see how it turns out.
Hey James. I noticed your little picture doesn’t have PostgreSQL being dumped. Is this meant to be some sort of omen?
Hmm, I wonder if I was sending a message on my graphic….
I believe the last word on distributed erlangish key-value stores has been spoken, and it is here: http://browsertoolkit.com/fault-tolerance.png