satine.org

by Charles Ying

What You Need To Know About Amazon SimpleDB

December 13th, 2007

Well after being under NDA for so long, I’m glad to be able to say that Amazon SimpleDB has gone into limited beta. Congratulations to everyone on the SDS / SimpleDB team; their several years of work on SimpleDB (formerly called SDS) is a brilliant piece of engineering.

What’s cool about SimpleDB

  • Really large data sets
  • Really Fast
  • Highly Available – It’s Amazon. Running Erlang. Whoa.
  • On demand scaling – Like S3, EC2, with a sensible data metering pricing model
  • Schemaless – major cool factor for me here; items are little hash tables containing sets of key, value pairs

Considerations you’ll want to think about

  • Eventual Consistency – Data is not immediately propagated across all nodes… the latency is usually around a second, but for high data sets or loads, you may experience more latency. On the plus side, your data isn’t lost!
  • Queries are lexigraphical – You’ll need to store data in lexicographical ordered form (zero-pad your integers, add positive offsets to negative integer sets, and convert dates into something like ISO 8601)
  • Search Indexes – You’ll need to construct your own indexes for text search – The SimpleDB query expressions don’t support text search, so you’ll have to construct inverted indexes to properly do “text search”. This is actually a really great lightweight way to do this and I’m sure many interesting indexing schemes will be possible.

Under the hood

According to the SimpleDB team, SimpleDB is built on top of Erlang. One of the developers, Jim Larson and I worked together at Sendmail, and he was part of a team doing some amazing stuff with an Erlang message store way back in 2000.

While you don’t need to know Erlang to use SimpleDB, many people have visited here interested in its Erlang roots. If you are interested in learning Erlang, I can recommend Programming Erlang, written by Erlang’s creator – the best introduction you can find. I’ve associate-linked to it on Amazon; just for a little meta-fun.

The data model is simply:

  • Large collections of items organized into domains.
  • Items are little hash tables containing attributes of key, value pairs.
  • Attributes can be searched with various lexicographical queries.

Now you can easily build:

  • Search indexes
  • Log databases / analysis tools –
  • Data mining stores
  • Tools for World Domination

Further Reading

I also wrote a very basic Python module for SimpleDB to handle the XML and REST stuff (too bad it’s not JSON, at least for now), which I’ll release as soon as I figure out how much of the NDA is now lifted. There are a few floating around, so it shouldn’t be too long before they appear publicly.

Updates:

  • Added a link to Nick Christenson‘s paper on Sendmail’s Erlang message store – A great read for those of you building large scale messaging systems or anything in Erlang.
  • Added a link to Werner Vogels’ article on eventual consistency – a great background behind SimpleDB’s consistency design choice.
  • Whether or not SimpleDB and Dynamo are the same underlying technology has never been confirmed by an authoritative source. That’s all I’m allowed to say.

Technorati Tags: , , , ,

47 Responses to “What You Need To Know About Amazon SimpleDB”

  1. Eric Florenzano Says:

    This is one of the cooler things that I’ve read about. I’m loving these web services–keep ‘em coming! Thanks for posting about it.

  2. Tenders Says:

    Hi there

    We’ve just started using EC2 and S3 and the one thing that has been holding us up is the Mysql/database side of things and our concern for our data. What you describe looks great and I’d be very interested in seeing your Python script as we’re Python guys too. Thanks for the post

    John

  3. Al Says:

    This is great news we at Folknology have been waiting for and AWS db cloud facility. It means we can fast forward our migrations and new apps on AWS.

    One question given we are building in Erlang on EC2 is there an Erlang module/library we can use to access (pretty please) SimpleDB?

    We are looking at using such a module and maybe adding mnesia caching to it.

    (al at folknology)

    regards Al

  4. Seanie Says:

    Google is going to be all over this thing like a fat kind on a Twinkie.

  5. Joe Aston Says:

    Urgh! Erlang is horrible. Any rumours of a PHP interface? : )

  6. Joe Aston Says:

    Oops!

    Didn’t see the developer’s guide. REST / SOAP interface? Awesome!

    Very excited about this : )

  7. Jacob Harris Says:

    I agree that inverted indices seem like it might be a good way to do text search, but the limit of 256 key/value pairs means that it might turn out rather contrived (I’m guessing you could have one data element plus one or more term elements that point to the data element to do inverted indices). The only other thing slightly annoying about the current system is Query currently does not return a total number of matches (only a token to the next batch), which is unfortunately de rigeur for web apps thanks to Google, but I’m expecting that Amazon might be able to correct this for future releases… otherwise, it’s pretty cool, and having played with it a bit so far, it’s a lot of fun to boot…

  8. cpinto Says:

    didn’t find out any mentions to sorting… are app developers supposed to grab any given thousand objects and sort them themselves? S3 also has this “small” issue. any ideas if this will be implemented in the future?

  9. Marcelo Calbucci Says:

    Here is a different perspective on SimpleDB: http://marcelo.sampasite.com/brave-tech-world/Amazon-SimpleDB-What-nobody-is-t.htm

  10. cbmeeks Says:

    I can’t wait. I’m a big fan of all of AWS services. Well, I haven’t used Turk yet…lol

    http://codershangout.com

  11. Joe Says:

    Running Erlang? Citation, please…

  12. Charles Ying Says:

    The fact that SimpleDB is built using Erlang comes direct from the A2Z team that built it. It’s pretty nifty!

  13. Ming Yeow Ng Says:

    What impact will the 1 second lag have in the actual code – writing? It will be tremendously inefficient from a development perspective if we have to maintain tonnes of try-catch blocks just to ensure that the data is the most updated.

  14. Scott Deming Says:

    This is good news, I’m really looking forward to playing around with it once the beta opens up.

    More specifically, I needed a tool like this for the eventual world taking over of. I’m pretty thrilled to see that it finally has arrived.

  15. cbmeeks Says:

    I am hoping that it really was written in Erlang.

    What sucks is that there is NO support for Erlang from Amazon’s forums. Hell, most people never even heard of it.

    That’s a shame.

    Me, I’ve devoted an entire forum section to Erlang…lol

    http://codershangout.com

  16. Grant Robertson Says:

    This seems like a really cool way for yet another big company to make money off of every transaction I make and every breath I take. Another way for a big company to know everything about me and all of my customers. Another way for a big company to make small companies dependent on them for their survival.

    I don’t care how cool the technology is, If I can’t run it on my own server then I will not have anything to do with it.

  17. Al Says:

    Grant you just made yourself something to do with it

  18. Rich Says:

    Where did you see that (erlang)? This seems to be a public version of their Dynamo system described here:

    http://www.allthingsdistributed.com/2007/10/amazons_dynamo.html

    Where it says:

    “In Dynamo, each storage node has three main software components: request coordination, membership and failure detection, and a local persistence engine. All these components are implemented in Java.”

  19. m Says:

    simpledb will be altenative way of RDBMS, right?

  20. cbmeeks Says:

    @Grant

    A little paranoid are we? lol

    Besides, small companies always depend on big companies for survival. Name one business where this is not true.

    Hosting, electric, water, sewage, telecommunications, etc. Chances are that all small companies will use one of the above in some form. They are all big companies (or big governments).

    http://codershangout.com

  21. mark Says:

    I read that overview of Dynamo and it sure doesn’t sound like it is based on Erlang. Is it possible that SimpleDB has been confused with CouchDB, which is definitely written on top of Erlang?

    http://couchdb.org/

  22. DB Says:

    Someone asked, “simpledb will be altenative way of RDBMS, right?” Yes, it will be the alternative: the alternative for illiterate programmers.

    Some people just don’t get it. RDBMSs are based on set theory and, as such, support many operations that simpledb requires you to do manually. RDMSs are to mathematics what simpledb is to basic math skills. Sure, you can do many things in the world only knowing addition and subtraction, but if you actually knew algebra, then you could do a lot more.

    Do you guys really want simpledb because it is actually is a good fit for your application, or are you just scared of learning SQL?

  23. jeo Says:

    Personally, I’ve been working with SQL for ten years, building corporate OLTP databases. I definitely would not use something like SimpleDB for the stuff I do at work.

    But that stuff that has a lot of complex, interrelated data, has intricate reporting requirements including complex adhoc queries, and doesn’t need huge scalability.

    On the other hand, I’m working on some personal web projects. I’m hoping to need a lot of scalability, and I don’t have a lot of money to spend on it. My requirements are relatively simple, and I don’t have a fickle client imposing them on me. For these projects, I’m very interested in SimpleDB and the rest of AWS.

    Right tools for the jobs, that’s all.

  24. Erlang, or Utility-computing vs. Appliance-computing « X marks reality Says:

    [...] that has all the right properties and mechanisms in place to do what utility computing requires. Amazon SimpleDB is built upon Erlang. IMDB (owned by Amazon) is switching from Perl to Erlang. Google Gears is using Erlang-style [...]

  25. A First Look at Amazon SimpleDB | Prosumer News Says:

    [...] It??s written in Erlang [...]

  26. Weekly linkdump #106 - max - блог разработчиков Says:

    [...] What You Need To Know About Amazon SimpleDB — очень кратко о новом сервисе Амазон для разработчиков [...]

  27. mmo Says:

    I keep hearing great things about SimpleDB. We have been utilizing SQL as well but Amazon keeps coming out with terrific solutions. Great work

  28. @Joe Aston Says:

    Have you ever used erlang?

    Didn’t think so.

  29. Anti-RDBMS: A list of distributed key-value stores | Richard Jones, Esq. Says:

    [...] Amazon’s SimpleDB Service, and some commentary [...]

  30. 分布式key-value存储系统的比较列表[译] - Taixiang Shi‘s Blog - Do One Thing And Do It Well Says:

    [...] Amazon’s SimpleDB Service, and some commentary [...]

  31. Перевод “Anti-RDBMS: A list of distributed key-value stores” « 13 попугаев Says:

    [...] Amazon’s SimpleDB Service, и некоторые комментарии [...]

  32. sandrar Says:

    Hi! I was surfing and found your blog post… nice! I love your blog. :) Cheers! Sandra. R.

  33. angelina jolie Says:

    I love your site. :) Love design!!! I just came across your blog and wanted to say that Ive really enjoyed browsing your blog posts. Sign: ndsam

  34. megan fox Says:

    Sign: umsun Hello!!! rcuwwymhyw and 4076ssgfhphzye and 3100I will try to recommend this post to my friends and family, cuz its really helpful.

  35. hrmpqkvtksod Says:

    nwanrkcwrnxe

  36. Anti-RDBMS: A list of distributed key-value stores | Weez.com Says:

    [...] Amazon’s SimpleDB Service, and some commentary [...]

  37. Amazon opens testing for in-cloud database | 云生活 Says:

    [...] some more technical details, the Inside Looking Out blog has some, and Amazon has a SimpleDB developer [...]

  38. чaтЪлaнбaзник Says:

    На самом деле очень прикольный блог! Спасибо и… разумеется, пишите еще!

  39. Suzan Gaarsland Says:

    Is there anyone else who can’t view the last part of this page? I believe the writer needs to check the source code on this post maybe?

  40. cindy Says:

    amazon’s api has always been challenging to deal with. Great info on the article though.

    http://www.maccsl.org

  41. 分布式key/value存储系统比较 | haohtml's blog Says:

    [...] Amazon’s SimpleDB Service, and some commentary [...]

  42. loveavtoua Says:

    Вас интересуют подержанные автомобили (бу авто)? Вы часто пишете в поисковых ситемах «куплю авто», продать автомобиль», «подержанные авто», «бу авто» и проводите часы в поисках подходящих предложений? Никак не можете себе купить автомобиль? Тогда вы там, где нужно! Украинский сайт предлагает вашему вниманию старые автомобили. Благодаря этому порталу, теперь Вы можете давать объявления автобазар», «продам авто бу», «авто бу». Мы всегда можем помочь тем, кто хочет продать подержанные автомобили, и тому, кому необходимо преобрести автомобиль (бу авто).

  43. Infant Strollers Says:

    Fascinating read. There is currently quite a great deal of information close to this topic close to and about about the net and some are most defintely better than others. You’ve caught the detail right here just correct which makes for any refreshing alter – thanks.

  44. HD Movie Says:

    This is the post I was looking for long time and finally got it.. thanks for sharing

  45. Power Balance pas cher Says:

    Hi I love this forumI’m also passionated in extreme sport and strategies to be more efficient…I found a piece technology that make me more efficientThanks again for your forumBye power balance pas cher power balance discount cheap powerbalance

  46. ufc 120 live stream Says:

    Thanks for sharing..loved ur post..how do i subscribe ur blog

  47. buy mdma Says:

    Find out where to buy MDMA online. Discover the best MDMA to buy online.