Wednesday, May 21, 2014

Findings 1.0 and PARStore

Findings from Charles Parnot et. al. looks like a neat app:

When doing science and running experiments, it is crucial to keep track of what one is doing, to be able to later reproduce the results, assemble and publish them. This is what lab notebooks are for. There is something great about paper, and the freedom and flexibility it affords. But in 2014, paper starts to show its limits in other areas where computers have taken over: storing results, analysing data, searching, replicating, sharing, preserving, and more.

Findings ambition is simple: make your computer a better tool than paper to run your experiments and keep your lab records.

Findings stores experiments, which can be organized into projects, and protocols, which cut across experiments:

Protocols are the primary building blocks of your experiments. You can drag them into the calendar view of an experiment and combine them in any way you need. Once integrated into an experiment, a copy of your protocol is made, so you can modify it just for that one use, and leave the original untouched.

It’s currently Mac-only, but they intend to add support for iOS and syncing. The storage layer, the open-source PARStore, is designed for cloud-agnostic syncing, e.g. Dropbox or iCloud document storage.

PARStore has an interesting design:

  1. It’s a key-value store.
  2. There’s one copy of the store for each device, and each device has all of the copies. The device opens its own copy read-write and the other copies read-only.
  3. Each copy is a log of timestamped changes, i.e. “Set key to value.” It only ever adds to the log, so there can be many entries for a given key.
  4. For each key, the entry with the latest timestamp is the truth. Presumably you could rewind to an earlier version of the document by searching for timestamps before a given date.
  5. Periodically, each device reads the other device’s logs and incorporates any changes with more recent timestamps into its own log.
  6. Each log is implemented as a Core Data SQLite database.
  7. There can also be attached files (blobs), which are stored outside of the database.

This seems like an elegant solution for synchronizing modest amounts of data, provided that it’s suited to key-value rather than row-column storage. The crux of it is that, normally, multiple devices cannot simultaneously open the same database file because SQLite’s locks don’t work across Dropbox. Or, rather, you can do it, but you’ll probably corrupt the database. PARStore gets around this by allowing multiple readers per file but only one writer. I’m not convinced that this will work 100% of the time, though:

  1. There doesn’t seem to be a mechanism to prevent Core Data from opening a database that Dropbox is in the process of writing to (e.g. updating it for changes made on other devices). It’s only opening it as read-only, so this shouldn’t corrupt the file, but it’s probably undefined what happens when it tries to read from the file.
  2. Even if the database file is fully written, there’s no telling whether the adjacent -wal and -shm files match it to form a consistent whole.
  3. It’s not entirely clear to me how SQLite handles read-only databases in WAL mode. The documentation implies that write access is needed. If it’s writing to the -shm file even in read-only mode, that might cause problems for the device that’s opened the database read-write.

That said, with a good Internet connection and relatively small files, I doubt that there would be many problems in practice.

Update (2014-05-30): Charles Parnot:

Findings has only been out for 8 days, and I am really proud of the launch, impressed by the response and excited about all the work that’s ahead. But before marching into the future, I thought I should look back into the past. While the core functionality of the app has remained the same, it is quite amazing to see how much of the look and the design of the app has changed over the years… I am a big fan of ‘making of’ posts on apps. I wish there were more of these, so here is one for Findings!

1 Comment

Excellent point on the -wal mode. I missed that as I initially developed under 10.8, and the new default mdoe was introduced in 10.9. I need to change things and switch to the simpler journaling mode.

There will still be the issue of the 'shm' journal file.

The 'read only' discussion was also something I overlooked. Read-only from SQLite perspective means the client won't modify the database, but in some circumstances, it may still need to **write** to disk:

- In WAL mode, even a read-only client will need write access to the filesystem even in normal read operations
- In non-WAL mode, write access will only be needed for recovery from an inconsistent state, so the journal can be adjusted/deleted if not committed to the main db

I think I'll play with different journaling modes for the read-only part of PARStore, to make things as robust as possible. In the worst case, a database that needs recovering can be fully ignored, and the PARStore should gracefully handle that. In the best case, I may be able to fully work around the issue (maybe setting the journal mode to 'OFF' for the read-only mode?).

In any case, thanks for looking at PARStore is such details, and pointing out the WAL issue!

Stay up-to-date by subscribing to the Comments RSS Feed for this post.

Leave a Comment