RethinkDB (YC S09): MySQL Storage Engine Built From The Ground Up For SSD

edw519 · on July 29, 2009

Very interesting. I wish you great success.

Since your primary market will probably be developers, describing RethinkDB will be necessary, but not sufficient. Also, demonstrated performance will be necessary, but again, probably not sufficient. I, for one, want to understand what goes on under the hood (and be able to describe it to my customers).

The "Time to Insert 2 Million Records" graph was impressive. How does the "Time to Retrieve, Sort, & Present 2 Million Records" graph look? How about the "Time to Modify 14% of the 2 Million Records" graph?

Your append-only approach sounds great for adds. How is it for changes and deletes? How will garbage collection affect performance?

"No more locks" is a great claim, but how will it work in a real world enterprise-quality app? User A takes 30 seconds to change Zip Code, Phone Number, and increase Credit Limit while User B takes 10 seconds in the middle of that to change City and decrease Credit Limit for the same customer. Who wins? This is a difficult scenario in both optimistic and pessimistic environments. I can only imagine how it's handled in a "no lock" environment.

(I'm not looking for answers here, just spouting off what's on the top of my head, but I will be looking to better understand on your website. Your white papers oughta be interesting.)

You make ambitious claims. I look forward to seeing you fulfill them. Best wishes!

bkudria · on July 29, 2009

Keep in mind, for your application example, that's a need for locks or merging logic on the application level - this project is a bit "lower-level". MyISAM and InnoDB don't solve this issue either, locks or no locks.

Locks in this context refers to readers and writers at the DB engine level.

coffeemug · on July 29, 2009

Thank you, this is a great post! We are looking forward to answering all of these (and many other) concerns by publishing benchmarks, papers, blog posts, tech talk videos, etc. We hope to do all of this in the coming weeks, we just took it one step at a time.

mahmud · on July 29, 2009

Good luck slava! And don't forget to send the compatibility patches for clsql-mysql ;-)

netsp · on July 29, 2009

Busy couple of weeks. Good luck!

leif · on July 29, 2009

We'll have more performance numbers up soon; for the past few weeks we've been working on features to get wordpress working, but we'll be profiling and tuning, and we'll get you and everyone else your graphs in good time.

Hope you enjoy the papers.

cperciva · on July 29, 2009

Congratulations -- but keep in mind that cleaning (aka. garbage collection) is the hardest part of any log-structured storage system. I learned this lesson first-hand when writing tarsnap. :-)

leif · on July 29, 2009

Thanks; we're doing GC over the next few days (starting...now actually).

cperciva · on July 29, 2009

Full GC (removing everything which is not needed to read the current DB state), or partial GC (removing old metadata such as indexes, but leaving behind data which has been deleted/modified)?

The former is much harder -- it's a significantly complex task just to figure out which data records are still live.

coffeemug · on July 29, 2009

Full GC. In the framework of what we're doing, it's not very complicated (we think). This is because of our indexing scheme (we can simply walk the index tree while the database answers queries). We hope to release more info on this shortly.

cperciva · on July 29, 2009

Cool -- good to see that you're thinking about this stuff ahead of time rather than sitting down to write code and suddenly realizing that you painted yourself into a corner. :-)

dbz · on July 29, 2009

I did that once and regretted it.

mahmud · on July 29, 2009

you must have programmed once in your life then.

functional-tree · on July 29, 2009

Cool snippet from one of the [founders?] on TechCrunch:

> All of our interesting work is MySQL API-independent, so a Postgres port is not out of the question. We’ve also been entertaining the idea of porting to SQLite, as many embedded devices use that, and have SSDs already.

leif · on July 29, 2009

yep, that's me. hi!

zhyder · on July 29, 2009

Looks like it's not open source, though they're apparently considering it: http://news.ycombinator.com/item?id=729338 . For such an important part of infrastructure, I don't think closed source will do, especially not from a startup that's worked on it for only a few months. Open sourcing will:

- Allow others to vet your codebase for stability and security

- Give customers some recourse if your startup folds

- Make you comparable to MyISAM/InnoDB/PostgreSQL, unless you want to be compared to Oracle or Microsoft SQL

leif · on July 29, 2009

To your points:

- Nothing much I can say about stability and security, but then again, we're not saying it's stable and secure yet. Besides, people trust Oracle without seeing their sources, don't they?

- If our startup folds, I doubt we'll drag our source to the grave. That said, maybe it gives our users incentive to make sure we don't fold? :P

- There are closed-source MySQL storage engines out there with whom we'd rather be compared (TokuDB, Falcon (is Falcon open-source?)).

We just don't want to close any doors yet. If it makes good business sense, we'd be glad to open the source.

zhyder · on July 29, 2009

Oracle's been around a while and their stability and security have been battle-tested, or at least that's the perception that makes a prospective customer trust them.

Interesting that there are other closed-source MySQL storage engines. I wonder how big a piece of the pie (among paying or willing-to-pay users) they have compared to MyISAM/InnoDB.

antonovka · on July 29, 2009

Using Apple's Xcode and Dictionary icons in the banner of the RethinkDB website looks pretty skeevy. It's hard to do by accident, and beyond violating Apple's copyright, it doesn't place your company in a good light.

I wouldn't bother commenting on it, but surprisingly this is not the first time I've seen someone lifting Apple icons for their startup website, and it's something that needs to be addressed: artists are expensive, but you can't take other people's artwork, and worse yet, it's blazingly obvious when you use Apple's.

mglukhovsky · on July 29, 2009

All of the icons we use are under an open license[1,2]. If you do find any specific copyright violations, we’ll certainly remove them.

[1] http://www.iconfinder.net/icondetails/6166/128/

[2] http://www.iconfinder.net/icondetails/8722/128/

antonovka · on July 29, 2009

The icons are almost exact clones of Apple's -- the differences are so minor that I didn't notice them when actually looking at the applications in my dock. The Xcode icon, for instance, appears identical barring the reversal of the hammer.

[edit]

The Xcode icon you linked to comes from a user-uploaded KDE Icon Theme:

http://www.kde-look.org/content/show.php/Dark-Glass+reviewed...

If you download the actual theme set, you'll find that the copyright ownership is unknown: "99% of this set is GPL now and what's not is most likely creative commons (a tiny number of the mime types may be proprietary). PLEASE abide by the licence rules, if you use icons from this set please research and credit the appropriate people. I have been given permission to release other peoples art work under the GPL so respect the licence." (from the README)

The original icon theme may be found here: http://www.mentalrey.it/project.html

As noted by mikejs below, Apple actually uses the same icon you're using. Digging a bit, it appears it's the icon Apple used for Xcode in Mac OS X 10.4 Tiger (Xcode 2.5): http://www.command-tab.com/images/photoshop/tiger_icons/prev...

Xcode 3.0 actually introduced the right-facing hammer:

http://developer.apple.com/DOCUMENTATION/Cocoa/Conceptual/Ob...

mglukhovsky · on July 29, 2009

Thanks for bringing potential copyright infringement issues to our attention!

Rather than figuring out who has provenance, we decided to just change the icons (which are now all GPL). If you notice any other issues, please let us know (founders@rethinkdb.com).

mikejs · on July 29, 2009

Apple seems to use the exact variant of the icon that these guys are using on some parts of their site as well, see http://www.apple.com/ca/science/whymac/righttool.html

catch23 · on July 29, 2009

How do you know Apple isn't the one copying it from somewhere else?

antonovka · on July 29, 2009

Apple doesn't use stock icons for their product icons, and the Xcode icon has been in use in similar forms since 2003.

iconfinder · on July 30, 2009

Hey everybody.

I run Iconfinder and I'm trying hard not to add proprietary icons, but sometimes icons slip through in the large icon sets with 1,000s of icons. The icons you mention above are removed from the site.

Best regards, Martin Leblanc

neilc · on July 29, 2009

Database consistency problems require traditional databases to use complicated locks. Because RethinkDB data is always consistent, locking is unnecessary.

You may not use locking for concurrency control (plenty of "traditional DBs" don't, either), but you still need some sort of concurrency control scheme -- just using append-only/log-structured storage doesn't make CC free. I'd be curious to hear how you guys are doing this.

leif · on July 29, 2009

For the moment, we allow one writer at a time, but unbounded (I think) readers. The only locks taken are for arithmetic operations; no locks are held while performing I/O or even memcpy(2). You can read more about that bit in the paper about our caching architecture on the wiki.

Eventually, we will allow multiple writers, and use some form of STM for that; we just haven't gotten to it yet.

If you'd like to speak interactively on this, shoot me an email.

coffeemug · on July 29, 2009

Most industrial strength storage engines use multiversion concurrency control, but they still require a fair share of locking - row level locks that protect the data while a second copy is made, for example. No storage engine is entirely "lock-free", but we have only very few extremely granular locks. This is completely different from what was done before (as far as we know), and from a practical perspective is as good as lock-free.

I don't think I can explain the technology in a comment, but we'll definitely be releasing more information in the coming weeks. We had to get this release out for now. Stay tuned!

gaius · on July 29, 2009

Statements like that set my alarm bells ringing.

neilc · on July 30, 2009

Sorry, statements like what?

gaius · on July 30, 2009

Statements like "traditional X does Y but..."

The "traditional" (and I don't know how you could call the latest version of any of the major databases "traditional", this is a brutally competitive market) databases don't lock just for the fun of it, but to enable features that users want. Anyone can come up with a product that doesn't do Y if it can't do X either. So what're we missing here?

neilc · on July 30, 2009

I don't know how you could call the latest version of any of the major databases "traditional"

Oracle has done MVCC for many years, as has Postgres. The canonical paper on optimistic concurrency control for DBs is from 1981 (http://www.seas.upenn.edu/~zives/cis650/papers/opt-cc.pdf).

I didn't follow the rest of your comment, I'm afraid -- I was just saying that I didn't see how using append-only storage immediately makes concurrency control a non-issue. The comments from the RethinkDB guys upthread support that: not supporting concurrent writers makes your concurrency control much more straightforward.

jrockway · on July 29, 2009

I am confused by this graph. The linear default behavior makes sense -- it always takes the same amount of time to insert a row. The logarithmic behavior confuses me -- as you add more rows to the database, the time to insert a row decreases? If you add an infinite number of rows, each row can be inserted in zero time? That doesn't make sense to me.

I would like to read the benchmark script.

Also, I'm afraid to read their source code after reading the license agreement. I can't sell support for any product that can communicate with RethinkDB? That sounds unenforceable, but it is scary enough to prevent me from even looking at the code.

coffeemug · on July 29, 2009

The benchmark is incremental. It measures how fast you can insert N rows given M rows already in the database. So, the second data point means you can insert roughly 750,000 rows in 140 seconds, given 750,000 rows already in the database. The limit certainly doesn't approach zero as the number of elements approaches infinity :)

The engine isn't open source. We're considering open sourcing it in the future, but we want to understand all business implications of this decision before we proceed - it's a decision you can't easily retract. The license is a bit draconian, but this is because we've only released a developer pre-alpha. We don't want people to use the engine in production yet - it's not ready. AFAIK, the license says you can't sell RethinkDB support, not that you can't sell support for software that uses RethinkDB.

jrockway · on July 29, 2009

AFAIK, the license says you can't sell RethinkDB support, not that you can't sell support for software that uses RethinkDB.

From the license:

Prohibited activities include but are not limited to:

Selling support for products which incorporate RethinkDB.

This is the problem with rolling your own software licenses.

leif · on July 29, 2009

For the moment, we cannot endorse using RethinkDB in a production environment, and even then, this license is meant to be free for non-commercial. Once we get to that point, we'll be re-visiting the license anyway, before we start licensing to commercial users.

coffeemug · on July 29, 2009

Thanks for bringing this to our attention. I'm honestly not sure what the intention was since I didn't do the license, but I'll talk to the guys here about it.

by · on July 29, 2009

So is the X-axis mislabelled? Along the bottom it says "Number of elements being inserted" going from zero on the left to 2,000,000 on the right, but the title says "Time to insert 2,000,000 elements". Something's very wrong somewhere.

mrduncan · on July 29, 2009

Congrats guys, sounds very exciting!

I just had one of those "why didn't I think of that" moments as I read through your wiki thinking back to this paper I read a few months ago: http://publications.csail.mit.edu/lcs/specpub.php?id=773

neilc · on July 29, 2009

In fairness, log-structured storage is not a new idea. This paper is a great, classic read on the subject: http://www.eecs.berkeley.edu/~brewer/cs262/LFS.pdf

leif · on July 29, 2009

Very much not new. Nobody's done it for MySQL yet to my knowledge though, and I don't think many have looked at the implications of log-structured storage on SSDs. That kind of storage is notoriously difficult to do cleanly on a rotational drive, but our indexing scheme is quite simple. It's essentially some combination of shadow paging and side-effect-free style from the functional programming world, if that makes any sense to you.

vicaya · on July 29, 2009

I still think SSD is a mere diversion in storage. According to Jason Hoffman (CTO/Founder of Joyent, speaking on Structure'09), SSD under their typical workload can only last for a month before wearing out: http://gigaom.com/2009/06/25/structure-09-how-to-scale-up-wi...

Cheap commodity disks however have an annualized failure rate of 4% in Google's datacenter (according to their disk analysis paper.)

mattyb · on July 29, 2009

Congratulations! Glad to see some fellow Stony Brook folks on the map.

bkudria · on July 29, 2009

I know, right?

rjurney · on July 29, 2009

MySQL's pluggable storage engine model is so much win. You can have MyISAM, InnoDB, InfoBright and this in the same database engine for the same application. And there are many others.

kvs · on July 29, 2009

My question is, how much of the speed-up is due to RDB being append-only and how much of it is due to specialization to SSD.

henryl · on July 29, 2009

Congrats. These guys seriously know what they're doing from the talks I've had with them.

leif · on July 29, 2009

vaksel: who are you? you stole our thunder :( actually :)

vaksel · on July 29, 2009

nope, just your karma points.

Stealing thunder = releasing the same exact product, a month before you finish.

prakash · on July 29, 2009

Good luck, guys! This is an interesting market to be in.

It's also nice to see a lot of interesting database related companies coming out of Stony Brook -- I think one of the founders of tokutek is from Stony Brook as well.

leif · on July 29, 2009

Michael Bender (one of our professors) is a co-founder of Tokutek, and several SBU grad students work there.

They are doing some _really_ cool stuff!

chime · on July 29, 2009

I did not see any mention of full-text capabilities. MyISAM is the only option in MySQL for doing full-text searching and it's pretty limited due to the table-locking issue. Are there any plans to have a good full-text search feature in Rethink? Having a lock-free full-text table in MySQL would be awesome for many many sites out there.

pmorici · on July 29, 2009

Does RethinkDB support full text search?

defen · on July 29, 2009

Since they're building a MySQL storage engine, you should be able to just hook it into Sphinx and have everything work.

http://www.sphinxsearch.com/

ovi256 · on July 29, 2009

Sphinx is awesome. On a database where MySQL answered full-text search queries in 20s, Sphinx builds indici in 2s, and queries are instantaneous for all practical purposes.

leif · on July 29, 2009

Interesting. We'll have to take a look at that. Thanks!

coffeemug · on July 29, 2009

Not right now. We've only been at it for two months!

aberman · on July 29, 2009

You guys are gonna kill it. You deserve it. Great concept.

siong1987 · on July 29, 2009

Congratulations. Another YC 09 startup.

Confusion · on July 29, 2009

This sounds pretty similar to Drizzle?

billclerico · on July 29, 2009

congrats guys

mglukhovsky · on July 29, 2009

Thanks, Bill! We're all very excited here.

lzhou · on July 29, 2009

Grats guys!

mrandle · on July 29, 2009

Nice work. A great idea well executed.

shiftace15 · on July 29, 2009

You guys rock!