Archive for May 21, 2014

Wednesday, May 21, 2014

Findings 1.0 and PARStore

Findings from Charles Parnot et. al. looks like a neat app:

When doing science and running experiments, it is crucial to keep track of what one is doing, to be able to later reproduce the results, assemble and publish them. This is what lab notebooks are for. There is something great about paper, and the freedom and flexibility it affords. But in 2014, paper starts to show its limits in other areas where computers have taken over: storing results, analysing data, searching, replicating, sharing, preserving, and more.

Findings ambition is simple: make your computer a better tool than paper to run your experiments and keep your lab records.

Findings stores experiments, which can be organized into projects, and protocols, which cut across experiments:

Protocols are the primary building blocks of your experiments. You can drag them into the calendar view of an experiment and combine them in any way you need. Once integrated into an experiment, a copy of your protocol is made, so you can modify it just for that one use, and leave the original untouched.

It’s currently Mac-only, but they intend to add support for iOS and syncing. The storage layer, the open-source PARStore, is designed for cloud-agnostic syncing, e.g. Dropbox or iCloud document storage.

PARStore has an interesting design:

  1. It’s a key-value store.
  2. There’s one copy of the store for each device, and each device has all of the copies. The device opens its own copy read-write and the other copies read-only.
  3. Each copy is a log of timestamped changes, i.e. “Set key to value.” It only ever adds to the log, so there can be many entries for a given key.
  4. For each key, the entry with the latest timestamp is the truth. Presumably you could rewind to an earlier version of the document by searching for timestamps before a given date.
  5. Periodically, each device reads the other device’s logs and incorporates any changes with more recent timestamps into its own log.
  6. Each log is implemented as a Core Data SQLite database.
  7. There can also be attached files (blobs), which are stored outside of the database.

This seems like an elegant solution for synchronizing modest amounts of data, provided that it’s suited to key-value rather than row-column storage. The crux of it is that, normally, multiple devices cannot simultaneously open the same database file because SQLite’s locks don’t work across Dropbox. Or, rather, you can do it, but you’ll probably corrupt the database. PARStore gets around this by allowing multiple readers per file but only one writer. I’m not convinced that this will work 100% of the time, though:

  1. There doesn’t seem to be a mechanism to prevent Core Data from opening a database that Dropbox is in the process of writing to (e.g. updating it for changes made on other devices). It’s only opening it as read-only, so this shouldn’t corrupt the file, but it’s probably undefined what happens when it tries to read from the file.
  2. Even if the database file is fully written, there’s no telling whether the adjacent -wal and -shm files match it to form a consistent whole.
  3. It’s not entirely clear to me how SQLite handles read-only databases in WAL mode. The documentation implies that write access is needed. If it’s writing to the -shm file even in read-only mode, that might cause problems for the device that’s opened the database read-write.

That said, with a good Internet connection and relatively small files, I doubt that there would be many problems in practice.

Update (2014-05-30): Charles Parnot:

Findings has only been out for 8 days, and I am really proud of the launch, impressed by the response and excited about all the work that’s ahead. But before marching into the future, I thought I should look back into the past. While the core functionality of the app has remained the same, it is quite amazing to see how much of the look and the design of the app has changed over the years… I am a big fan of ‘making of’ posts on apps. I wish there were more of these, so here is one for Findings!

Problems With Core Data Migration Manager and Journal_mode WAL

Pablo Bendersky:

When you use a Migration Manager, Core Data will create a new database for you, and start copying the entities one by one from the old DB to the new one.

As we are using journal_mode = WAL, there’s an additional file besides DB.sqlite called DB.sqlite-wal.

From what I can tell, the problem seems to be that Core Data creates a temporary DB, inserts everything there, and when it renames it to the original name, the -wal file is kept as a leftover from the old version. The problem is that you end up with an inconsistent DB.

A different part of Core Data is aware of the multiple files, though:

To safely back up and restore a Core Data SQLite store, you can do the following:

  • Use the following method of NSPersistentStoreCoordinator class, rather than file system APIs, to back up and restore the Core Data store:

    - (NSPersistentStore *)migratePersistentStore:(NSPersistentStore *)store toURL:(NSURL *)URL options:(NSDictionary *)options withType:(NSString *)storeType error:(NSError **)error

    Note that this is the option we recommend.

  • Change to rollback journaling mode when adding the store to a persistent store coordinator if you have to copy the store file.

Why objc_autoreleaseReturnValue Differs for x86_64 and ARM

duhanebel:

The implementation for x86_64 on NSObject.mm is quite straightforward. The code analyses the assembler located after the return address of objc_autoreleaseReturnValue for the presence of a call to objc_retainAutoreleasedReturnValue.

But for ARM:

It looks like the code is identifying the presence of objc_retainAutoreleasedReturnValue not by looking up the presence of a call to that specific function, but by looking instead for a special no-op operation mov r7, r7.

Bill Bumgarner:

ARM’s addressing modes don’t really allow for direct addressing across the full address space. The instructions used to do addressing -- loads, stores, etc… -- don’t support direct access to the full address space as they are limited in bit width.

Greg Parker:

A resolved dyld stub is simple on Intel: it’s just a branch to a branch. On ARM the instruction sequences for the branch to the stub and the branch from the stub can take many different forms depending on how long the branches are. Checking for each combination would be slow.

Why I Prefer Nisus Writer

Joe Kissell:

Things turned around in 2011 with the release of Nisus Writer Pro 2.0. This was the first version of Nisus Writer to include both change-tracking and comments, plus most of my favorite features from Nisus Writer Classic and a bunch of new capabilities. All of a sudden I had my old toolkit back, in a modern package. It was as though I’d been limited to a machete and an open fire for all my cooking needs, and then walked into a fully equipped restaurant kitchen. In the years since, it has grown even more capable and reliable.

Eventually Take Control Books switched its entire operation over to Nisus Writer Pro, and I’ve already used it to write half a dozen books, plus new editions of several older titles. As an author, I can’t overstate how much Nisus Writer Pro improves not only my productivity but also my attitude toward writing. It’s fun again, and I no longer feel as though I must constantly fight with my word processor.

I really like Nisus Writer, but these days almost all of my writing is in reStructuredText (to be processed for product documentation), direct HTML (for online), plain text notes (for syncing with my iPhone), or in Word or Google Docs (for collaboration). Nisus Writer just doesn’t seem to fit in, though I still find it invaluable for the occasional project where I need to process lots of styled text.

eBay Security Breach

USA Today:

Online marketplace eBay says it will urge users to change their passwords following a “cyberattack” impacting a database with encrypted passwords and non-financial data.

The database includes information such as customers’ names, encrypted passwords, email and physical addresses, phone numbers and dates of birth.

[…]

EBay also was using a more easily-cracked method for protecting the passwords it kept on file. There are two commonly used ways to secure passwords, encryption and hashing. EBay was using encryption, which is the more easily broken, said Coates.

“Encryption allows eBay, or anyone who access the decryption key, to decrypt and see your actual password. Password hashing allows eBay to check if the password you enter is correct or not, but doesn’t allow eBay (or hackers) to get the plaintext of your actual password,” he said.

The Verge:

In addition to passwords, the database contained basic login information like name, email, phone number, address and date of birth, but officials stressed that, aside from the passwords, no confidential or personal information was included in the breach.

That’s an odd way of putting it, since those pieces of data are exactly what show up on the “Personal Information” page of my eBay account.

Update (2014-05-25): eBay:

All eBay users are being asked to change their password. All eBay users will be notified. At the end of Q1, we had 145 million active buyers.

The Daily Beast:

The online auction site eBay has admitted that the name, address, date of birth, telephone number, email address and encrypted password of every eBay account holder worldwide – 233 million people – have been obtained by hackers, in one of the world’s largest ever online security breaches.

Update (2014-05-26): I finally received an e-mail from eBay recommending that I reset my password.

Making dispatch_once() Fast

I had assumed that dispatch_once() was implemented as a basic atomic compare-and-swap, but the source for dispatch_once_f contains an interesting comment:

Normally, a barrier on the read side is used to workaround the weakly ordered memory model. But barriers are expensive and we only need to synchronize once! After func(ctxt) completes, the predicate will be marked as “done” and the branch predictor will correctly skip the call to dispatch_once*().

A far faster alternative solution: Defeat the speculative read-ahead of peer CPUs.

Modern architectures will throw away speculative results once a branch mis-prediction occurs. Therefore, if we can ensure that the predicate is not marked as being complete until long after the last store by func(ctxt), then we have defeated the read-ahead of peer CPUs.

In other words, the last “store” by func(ctxt) must complete and then N cycles must elapse before ~0l is stored to *val. The value of N is whatever is sufficient to defeat the read-ahead mechanism of peer CPUs.

On some CPUs, the most fully synchronizing instruction might need to be issued.

N is determined by dispatch_atomic_maximally_synchronizing_barrier(), which has different assembly language implementations for different architectures.

Update (2014-05-28): Greg Parker explains a consequence of this optimization:

dispatch_once_t must not be an instance variable.

The implementation of dispatch_once() requires that the dispatch_once_t is zero, and has never been non-zero. The previously-not-zero case would need additional memory barriers to work correctly, but dispatch_once() omits those barriers for performance reasons.

Instance variables are initialized to zero, but their memory may have previously stored another value. This makes them unsafe for dispatch_once() use.

Update (2014-06-06): Mike Ash:

While the comment in the dispatch_once source code is fascinating and informative, it doesn’t quite delve into the detail that some would like to see. Since this is one of my favorite hacks, for today’s article I’m going to discuss exactly what’s going on there and how it all works.