Wednesday, July 22, 2020

A First Replicating Type

Drew McCormack:

You may be wondering why the Entry type includes a UUID identifier. It already has a timestamp, which is an identity of a sort. Isn’t that timestamp unique enough?

Maybe, but you will sleep better at night if you assume it is not unique enough. A timestamp has something like millisecond accuracy. A computing device can do thousands, even millions of operations in that time. Two changes to the same value on the same device may very well fall on exactly the same tick of the timestamp clock.

What would happen if we used the timestamp in isolation? If two changes collided — had the same timestamp — the ‘winner’ would effectively be random. Your devices could easily pick different outcomes, and your type will have diverged — your app is no longer in sync. To avoid this, we need some way to pick the same winner on all devices, even if the timestamps are exactly the same. For that, we add the UUID. It ensures a deterministic result in cases where the timestamps collide.

4 Comments RSS · Twitter

A primary key can be improperly defined as too narrow - the case that’s described here with the timestamp as the sole primary key. It can also be defined as too wide. In that case two or more unique identifiers point to at the same piece of data. For example, if you identify a car by license plate number and colour, the car with “Nevada 123 ABC” and blue colour points to a different record in the database from “Nevada 123 ABC” black colour. Same license, different car.

My hunch is that a UUID on its own is unique enough and adding a timestamp to it might lead to the same object with the same UUID and a different timestamp occurring in multiple records in the database. But this is just a hunch. :-)

@Arjan I don’t think the timestamp and UUID are being used as a primary key in this example.

@arjan the UUID is to ensure uniqueness of a value, and the timestamp is to provide ordering (though it needs to be combined with the UUID for the ordering to be absolute, in case 2 values are created at the exact same time). Of course, if two entries have the same UUID but a different timestamp or value, all bets are off, but that's a programming error, not a logical error in the algorithm.

Interesting to see CRDTs mentioned on a more “mainstream” blog! Maybe they’re finally having their moment. As the article points out, the basics of CRDTs are pretty intuitive, but things get really interesting when you start working with sets, dictionaries, and (especially) sequences. I went on a several-months-long R&D journey into the subject matter a few years ago and wrote at length about it; perhaps you would be interested: http://archagon.net/blog/2018/03/24/data-laced-with-history/

Leave a Comment