What You Need To Know About Amazon SimpleDB
Well after being under NDA for so long, I’m glad to be able to say that Amazon SimpleDB has gone into limited beta. Congratulations to everyone on the SDS / SimpleDB team; their several years of work on SimpleDB (formerly called SDS) is a brilliant piece of engineering.
What’s cool about SimpleDB
- Really large data sets
- Really Fast
- Highly Available - It’s Amazon. Running Erlang. Whoa.
- On demand scaling - Like S3, EC2, with a sensible data metering pricing model
- Schemaless - major cool factor for me here; items are little hash tables containing sets of key, value pairs
Considerations you’ll want to think about
- Eventual Consistency - Data is not immediately propagated across all nodes… the latency is usually around a second, but for high data sets or loads, you may experience more latency. On the plus side, your data isn’t lost!
- Queries are lexigraphical - You’ll need to store data in lexicographical ordered form (zero-pad your integers, add positive offsets to negative integer sets, and convert dates into something like ISO 8601)
- Search Indexes - You’ll need to construct your own indexes for text search - The SimpleDB query expressions don’t support text search, so you’ll have to construct inverted indexes to properly do “text search”. This is actually a really great lightweight way to do this and I’m sure many interesting indexing schemes will be possible.
Under the hood
According to the SimpleDB team, SimpleDB is built on top of Erlang. One of the developers, Jim Larson and I worked together at Sendmail, and he was part of a team doing some amazing stuff with an Erlang message store way back in 2000.
While you don’t need to know Erlang to use SimpleDB, many people have visited here interested in its Erlang roots. If you are interested in learning Erlang, I can recommend Programming Erlang, written by Erlang’s creator - the best introduction you can find. I’ve associate-linked to it on Amazon; just for a little meta-fun.
The data model is simply:
- Large collections of items organized into domains.
- Items are little hash tables containing attributes of key, value pairs.
- Attributes can be searched with various lexicographical queries.
Now you can easily build:
- Search indexes
- Log databases / analysis tools -
- Data mining stores
- Tools for World Domination
Further Reading
I also wrote a very basic Python module for SimpleDB to handle the XML and REST stuff (too bad it’s not JSON, at least for now), which I’ll release as soon as I figure out how much of the NDA is now lifted. There are a few floating around, so it shouldn’t be too long before they appear publicly.
Updates:
- Added a link to Nick Christenson’s paper on Sendmail’s Erlang message store - A great read for those of you building large scale messaging systems or anything in Erlang.
- Added a link to Werner Vogels’ article on eventual consistency - a great background behind SimpleDB’s consistency design choice.
- Whether or not SimpleDB and Dynamo are the same underlying technology has never been confirmed by an authoritative source. That’s all I’m allowed to say.
Technorati Tags: Amazon SimpleDB, SimpleDB, Amazon, Erlang, Databases
<%SHARE%>
This is one of the cooler things that I’ve read about. I’m loving these web services–keep ‘em coming! Thanks for posting about it.
Comment by Eric Florenzano — December 14, 2007 @ 1:03 am
Hi there
We’ve just started using EC2 and S3 and the one thing that has been holding us up is the Mysql/database side of things and our concern for our data. What you describe looks great and I’d be very interested in seeing your Python script as we’re Python guys too. Thanks for the post
John
Comment by Tenders — December 14, 2007 @ 1:38 am
This is great news we at Folknology have been waiting for and AWS db cloud facility. It means we can fast forward our migrations and new apps on AWS.
One question given we are building in Erlang on EC2 is there an Erlang module/library we can use to access (pretty please) SimpleDB?
We are looking at using such a module and maybe adding mnesia caching to it.
(al at folknology)
regards
Al
Comment by Al — December 14, 2007 @ 4:43 am
Google is going to be all over this thing like a fat kind on a Twinkie.
Comment by Seanie — December 14, 2007 @ 7:21 am
Urgh! Erlang is horrible. Any rumours of a PHP interface? : )
Comment by Joe Aston — December 14, 2007 @ 9:50 am
Oops!
Didn’t see the developer’s guide. REST / SOAP interface? Awesome!
Very excited about this : )
Comment by Joe Aston — December 14, 2007 @ 9:52 am
I agree that inverted indices seem like it might be a good way to do text search, but the limit of 256 key/value pairs means that it might turn out rather contrived (I’m guessing you could have one data element plus one or more term elements that point to the data element to do inverted indices). The only other thing slightly annoying about the current system is Query currently does not return a total number of matches (only a token to the next batch), which is unfortunately de rigeur for web apps thanks to Google, but I’m expecting that Amazon might be able to correct this for future releases… otherwise, it’s pretty cool, and having played with it a bit so far, it’s a lot of fun to boot…
Comment by Jacob Harris — December 14, 2007 @ 10:19 am
didn’t find out any mentions to sorting… are app developers supposed to grab any given thousand objects and sort them themselves? S3 also has this “small” issue. any ideas if this will be implemented in the future?
Comment by cpinto — December 14, 2007 @ 10:46 am
Here is a different perspective on SimpleDB:
http://marcelo.sampasite.com/brave-tech-world/Amazon-SimpleDB-What-nobody-is-t.htm
Comment by Marcelo Calbucci — December 14, 2007 @ 11:10 am
I can’t wait. I’m a big fan of all of AWS services. Well, I haven’t used Turk yet…lol
http://codershangout.com
Comment by cbmeeks — December 14, 2007 @ 12:10 pm
Running Erlang? Citation, please…
Comment by Joe — December 14, 2007 @ 12:27 pm
The fact that SimpleDB is built using Erlang comes direct from the A2Z team that built it. It’s pretty nifty!
Comment by Charles Ying — December 14, 2007 @ 3:01 pm
What impact will the 1 second lag have in the actual code - writing? It will be tremendously inefficient from a development perspective if we have to maintain tonnes of try-catch blocks just to ensure that the data is the most updated.
Comment by Ming Yeow Ng — December 14, 2007 @ 3:24 pm
This is good news, I’m really looking forward to playing around with it once the beta opens up.
More specifically, I needed a tool like this for the eventual world taking over of. I’m pretty thrilled to see that it finally has arrived.
Comment by Scott Deming — December 14, 2007 @ 7:45 pm
I am hoping that it really was written in Erlang.
What sucks is that there is NO support for Erlang from Amazon’s forums. Hell, most people never even heard of it.
That’s a shame.
Me, I’ve devoted an entire forum section to Erlang…lol
http://codershangout.com
Comment by cbmeeks — December 14, 2007 @ 10:16 pm
This seems like a really cool way for yet another big company to make money off of every transaction I make and every breath I take. Another way for a big company to know everything about me and all of my customers. Another way for a big company to make small companies dependent on them for their survival.
I don’t care how cool the technology is, If I can’t run it on my own server then I will not have anything to do with it.
Comment by Grant Robertson — December 15, 2007 @ 7:44 am
Grant you just made yourself something to do with it
Comment by Al — December 15, 2007 @ 10:40 am
Where did you see that (erlang)? This seems to be a public version of their Dynamo system described here:
http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html
Where it says:
“In Dynamo, each storage node has three main software components: request coordination, membership and failure detection, and a local persistence engine. All these components are implemented in Java.”
Comment by Rich — December 16, 2007 @ 8:18 am
simpledb will be altenative way of RDBMS, right?
Comment by m — December 17, 2007 @ 3:25 am
@Grant
A little paranoid are we? lol
Besides, small companies always depend on big companies for survival. Name one business where this is not true.
Hosting, electric, water, sewage, telecommunications, etc. Chances are that all small companies will use one of the above in some form. They are all big companies (or big governments).
http://codershangout.com
Comment by cbmeeks — December 17, 2007 @ 6:52 am
I read that overview of Dynamo and it sure doesn’t sound like it is based on Erlang. Is it possible that SimpleDB has been confused with CouchDB, which is definitely written on top of Erlang?
http://couchdb.org/
Comment by mark — January 3, 2008 @ 3:08 pm
Someone asked, “simpledb will be altenative way of RDBMS, right?” Yes, it will be the alternative: the alternative for illiterate programmers.
Some people just don’t get it. RDBMSs are based on **set theory** and, as such, support many operations that simpledb requires you to do manually. RDMSs are to mathematics what simpledb is to basic math skills. Sure, you can do many things in the world only knowing addition and subtraction, but if you actually knew algebra, then you could do a lot more.
Do you guys really want simpledb because it is actually is a good fit for your application, or are you just scared of learning SQL?
Comment by DB — March 26, 2008 @ 4:52 am
Personally, I’ve been working with SQL for ten years, building corporate OLTP databases. I definitely would not use something like SimpleDB for the stuff I do at work.
But that stuff that has a lot of complex, interrelated data, has intricate reporting requirements including complex adhoc queries, and doesn’t need huge scalability.
On the other hand, I’m working on some personal web projects. I’m hoping to need a lot of scalability, and I don’t have a lot of money to spend on it. My requirements are relatively simple, and I don’t have a fickle client imposing them on me. For these projects, I’m very interested in SimpleDB and the rest of AWS.
Right tools for the jobs, that’s all.
Comment by jeo — March 28, 2008 @ 9:29 am