Hacker Newsnew | past | comments | ask | show | jobs | submitlogin

Bah! I was so dissappointed when I heard about the Apple acquisition of FoundationDB. Will any of the technology behind it ever see the light of day?


Unfortunately I'm the last person to ask. While I did start at FoundationDB pretty early (second employee), I ceased to be involved at the point of the acquisition, and beyond that I've only heard a few rumors from former coworkers.

As a business it was always an ambitious effort, and I'm not sure what could or should have been done differently. But since then I've used a number of other systems and thought to myself "boy, I wish I had FDB right now."


Another former-FoundationDB guy here (hi Ian!), and I actually think the business case for Apple open sourcing is very strong. I'm a fan of the layered architecture we chose, but building efficient and powerful layers on top of the core key-value store is a serious engineering effort in its own right. By encouraging an open-source layer ecosystem (and operational and deployment tools), Apple could leverage its investment in the core technology more effectively.

Whether Apple's leadership agrees with me is another question. :)


We specifically chose a monolithic architecture for FaunaDB, since performance improvements invariably come from breaking interface boundaries and sharing additional information. It's been working out well.


Yes, this is the argument that VoltDB made as well:

https://www.voltdb.com/blog/foundationdbs-lesson-fast-key-va...

My feelings on this topic are mixed. On the one hand, I think many of the specific examples chosen in that post are false (and have told John as much in person). On the other hand, the general point that you can squeeze out constant factor performance improvements by violating abstraction boundaries is obviously usually true.

Nevertheless, I still think this is a bad argument. While it's true that abstractions are rarely costless, they can often be made so cheap that the low-hanging performance fruit is elsewhere. And in particular, cheap enough that they're worth it when you consider all the other benefits that they bring.

When I built a query language and optimizer on top of FoundationDB, my inability to push type information down into the storage engine was about the last thing on my mind. Perhaps someday when I'd made everything else perfect it would've become a big pain (and perhaps someday we would've provided more mechanisms for piercing the various abstractions and providing workload hints), but in the meantime partitioning off all state in the system into a dedicated component that handled it extremely well made the combined distributed systems problem massively more tractable. The increased developer velocity and reduced bugginess in turn meant that I (as a user of the key-value store) could spend scarce engineering resources on other sorts of performance improvements that more than compensated for the theoretical overhead imposed by the abstraction.

I won't claim that a transactional ordered key-value store is the perfect database abstraction for every situation, but it's one that I've found myself missing a great deal since leaving Apple.

But I'm glad to hear that things are going well for you guys. Best of luck, this is a brutal business!


Hi Will. Thanks for the shout-out.

I still think many of the arguments in that blog post hold up for non-embedded KV stores. I think you can mitigate a lot by aggressively caching metadata, but eventually you end up moving the SQL engine closer and closer to the storage layer to get performance. And yeah, you end up more monolithic and testing gets harder. Sigh.

Some of this is workload dependent. If you're not touching many rows in your queries and transactions, then you can get away with a lot more. But if you give someone SQL, they're going to want to scan.

I wouldn't mind being proven wrong. Maybe Apple made FDB run SQL at legit speeds. I haven't seen much from public projects that work this way to change my mind yet.

> I won't claim that a transactional ordered key-value store is the perfect database abstraction for every situation, but it's one that I've found myself missing a great deal since leaving Apple.

How does Spanner not satisfy that itch? Not ordered matters?


> How does Spanner not satisfy that itch? Not ordered matters?

I was probably unclear in my previous comment. Spanner is great! (And Spanner is ordered). The particular aspect of FDB that I miss is what some of our old customers called "the bottom half of a database" or "a database construction kit". In fact FDB was an awesome modular building block for all kinds of distributed systems, not just databases. We hacked up prototypes for a whole bunch of these but sadly never got around to releasing them.

Spanner is a full-fledged enterprise grade database with opinions about your data model, query language, types, etc. For the vast majority of customers, that's much more useful than what FDB provided. But for me as somebody who enjoys kicking around silly new ideas for distributed systems, it's a bit less fun.


Monolithic databases (CockroachDB, Spanner etc) don't eliminate abstraction boundaries on the larger scale. They are simply aligned with the database boundary, pushing the burden of crossing them to the application logic. Software will still have to pay the price of crossing the gap with code complexity and performance (object-relational impedance mismatch comes to my mind).

It feels like the building block approach lets you achieve better design and performance for your entire application in the long run. Especially, if you can treat building blocks as blueprints and modify them to fit the task at hand.


"When I built a query language and optimizer on top of FoundationDB, my inability to push type information down into the storage engine was about the last thing on my mind."

What was on your mind? What performance problems did you encounter?


Layer modeling techniques shared by FoundationDB (and still available in the internet archive) are still immensely helpful even without the database.

We are happily using them to implement and optimize our local storage on top of LMDB (another awesome database). However, these approaches could be applied to any other key-value database with transactions and lexicographically stored keys.


What do you guys think about TiDB and CockroachDB, both of which are SQL layers on top of a distributed K/V store?


Full disclosure: I now work at Google on Cloud Spanner which competes with both products you mentioned. These are just my personal (and probably highly biased) opinions.

I have some concerns about CockroachDB on both the performance and the reliability fronts. But I hugely admire what they're trying to do and I've heard that they're rapidly improving in both areas. TiDB is an exciting project that I've heard great things about but have never tried myself. I think it's also relatively immature.

Honestly if I were starting a project right now and had neither FDB nor Spanner available to me, I'd probably try to push Postgres as far as I possibly could before considering anything else.


Agreed, Postgres does scale very well for non-Google-sized apps, though a lingering issue is handling failover.

But if one does need a bit more horizontal scalability, there don't seem to be a lot of options if you also want atomic, transactional updates (though not necessarily strict transaction isolation). I have an app that is conceptually a versioned document store, where each document is the sum of all its "patches"; when you submit a batch of patches, the rule is that these are applied atomically, and that the latest version of document thereafter reflects your patch (optimistic locking and retries take care of serialization and concurrent conflicts). I'm using PostgreSQL right now, which does this beautifully, but with limited scalability. I've looked for a better option, but not come up with anything.

Redis would handle this, but it would work purely thanks to single-threaded; and I don't feel like Redis is safe as a primary data store for anything except caches and such. Cassandra might do it, using atomic batches, although its lack of isolation could be awkward to work around.


What do you think Postgres should do in this area? It seems there are a bunch of approaches being explored by different teams. I'd be very interest to hear a Spanner person's take.


Do you know what Apple is using it for, and at what scale?




Guidelines | FAQ | Lists | API | Security | Legal | Apply to YC | Contact

Search: