So lets suppose I have several billion integers sitting in a data store, and I want to sort, count, and sum them. Do I have to collect all this data to my local cache first? What if millions of people are using my application who want the same value?
Remember, the 'peer' doesn't have to be embedded in your front-edge application (even though that's one use case). You could have a single 'peer' which sits on it's own beefy server expressly for this kind of calculation.
It's about writing software which is simpler and more flexible.
If you write a traditional shared-nothing web app client with a traditional bag-o-sprocs database server, you'll probably be good as long as your workload doesn't change much. Assuming your write volume never exceeds that of a single (as beefy as necessary) server (which seems to be working out so far for Hacker News!) then you're ok.
However, products/services evolve and requirements change. Let's assume, for example, that you want to do some heavy duty number crunching. This number crunching involves some critical business logic calculations. Some of those calculations are in sprocs, but some of them are in your application code's native language. How do you offload that work to another server? You may have to juggle logic across that sproc/app boundary back and forth. It's pretty rigid; change is hard.
You can think of Datomic as a way of eliminating your sproc language and moving the query engine, indexes, and data itself into your process. Basically, you get everything you need to write your own database server. Furthermore, you can write specialized database servers for specialized needs... as long as you agree to allow a single Transactor service to coordinate your writes.
Back to the big number crunching. You've got the my-awesome-app process chugging along & you don't want to slow it down with your number crunching, so you spin up a my-awesome-cruncher peer & the data gets loaded up over there. Now you have the full power of your database engine in (super fast) memory and you can take your database-client-side business logic with you!
Now let's say you're finding that you're spending a lot of CPU time doing HTML templating and other web-appy like things. Well, you can trivially make additional my-awesome-app peers to share the work load.
You can do all this from a very simple start: One process on one machine. Everything in memory. Plain-old-java-objects treated the same as datums. No data modeling impedance mismatch. No network latency to think about. You can punt on a lot of very hard problems without sacrificing any escape hatches. You get audit trails and recovery mechanisms virtually for free.
Again, all this assumes the write-serialization trade offs are acceptable. Considering the prevalence and success of single-master architectures in the wild, that's not a hugely unacceptable tradeoff. Furthermore, the append-only model may enable even higher write speeds than something like Postgres' more traditional approach.
Yeah, I had a similar concern but it's just a confusion from thinking that every peer has to be equally powerful (i.e. all weak) and has to run on the end-user's computer.
If you performed this type of calculation before with a traditional database, you had to have a powerful enough to computer to perform the calculation. In this model, you would still have that computer; it's just now a "peer".
If millions of people want the same piece of data that requires a huge calculation to get, then you would set up one powerful machine of your own just to do this calculation and then write the result to the database, so the many "thin" peers can just read the result.
So I need some sort of storage area network of clients.
Okay, so I'm chopping up my database server somewhat, Making it easier to scale horizontally. I can live with this, but I wish it were stated explicitly.
The SaaS model they're offering won't work for the sorts of things I'm interested in.