Thus Datomic would be very great for centrally-operated systems, but not so much with highly distributed systems where many peers are often partitioned out because, for example, they have no Internet connectivity for a few days, and they still need to operate within their limited universe.
So if such a highly distributed system was to use Datomic, it would be harder to guarantee that each peer can work both for reads & (local) writes while being partitioned from the transactor. One would need to program the software to log those new facts (writes) locally before submitting (syncing) them to the transactor. And make that durable. Also, one might also need to make the query/read cache durable, since there's no network to fetch it back in case of a reboot of the peer. So it seems there's a missing local middleman/proxy that needs to be implemented to support such scenarios. At least, thanks to Datalog, the local cache would still be able to be used with this log, using db.with(log).
What do you think, is this use case quite simply implementable over/with Datomic, without asking it to do something out of its leagues?
Right. So the only way to make Peers resilient to network partitions is to install a middleman between them and the DB/Transactor. One whose responsibility is to ensure this Peer's app always has durable access to everything it's ever going to need to be able to read for its queries, and always has durable access to some local write log that doesn't exist in the current implementation.
Thus my question is: is introducing such a middleman in the system going to denaturate Datomic?
I don't believe Datomic is designed to operate in a scenario where Peers don't have network connectivity. The local cache Peers keep is to cut down on network traffic and improve performance, not as a reliable "offline mode".
Notwithstanding what it's initially designed for, I think it may be quite good at supporting an "offline mode" as long as:
1. the app developer can confidently predict which queries the app will need through its lifespan, and
2. the app developer is willing to program and configure a layer that can persist and make durable a cache that spans all the data needed to run those queries (thus, persisting locally what amounts to a dynamic shard of the DB), and
3. the app developer is willing to program a layer that can persist and make durable all writes intended for the Transactor, and synchronize those to the Transactor when the app recuperates from a network partition, and
3.1. the app developer is willing to plan-or resolve-potential conflicts in advance of-or when-eventual conflicts, thus he's willing to sacrifice global consistency in the event of a network partition, in order to obtain availability, and
4. the app developer is willing to plug into the query engine in such a way that queries will include the local write log when there's a network partition.
Now, solution-wise:
1. depends on the requirements but most small to medium apps can predict the queries they'll need;
2. seems to be quite easy for small to medium apps:
2.1 run all possible queries at regular times, and
2.2 use a durable key-value store to keep the db values;
3. (1) make sure you're subscribed to events on partition and recovery; (2) coordinate writes over the same key-value store, probably using Clojure's STM and/or Avout; (3) on network recovery, replay those writes not present in the central DB;
3.1 due to the immutable nature of things and total ordering of the DB transactions, I expect to see no issue regarding eventual consistency when write logs are replayed centrally after a local Peer recovers from a network partition;
4. considering how Datalog works and is integrated into the Peer, this seems like a piece of cake.
So isn't this quite feasible to support the highly distributed case for apps in which each local Peer represents its own logical, dynamic and relatively natural and autonomous shard of the database?
Seams to be true but the intressting part is that peers can be made parallel and if one datacenter explodes you can go to an other without losing information. The only "Single Point of Failure" is the transactor and only for reads.
So if such a highly distributed system was to use Datomic, it would be harder to guarantee that each peer can work both for reads & (local) writes while being partitioned from the transactor. One would need to program the software to log those new facts (writes) locally before submitting (syncing) them to the transactor. And make that durable. Also, one might also need to make the query/read cache durable, since there's no network to fetch it back in case of a reboot of the peer. So it seems there's a missing local middleman/proxy that needs to be implemented to support such scenarios. At least, thanks to Datalog, the local cache would still be able to be used with this log, using db.with(log).
What do you think, is this use case quite simply implementable over/with Datomic, without asking it to do something out of its leagues?