Monday, May 12, 2014

How to Efficiently Read Thousands of Small Files With GCD

Kenny Carruthers:

I’ve considered manually throttling the requests with an NSOperationQueue and a low value for the maxConcurrentOperationCount but that just seems wrong as the newer MacPros can clearly handle a ton more I/O compared to an older, non-SSD, MacBook.

Bill Bumgarner:

The reality is that a modern, multi-tasking, multi-user system that runs across many configurations of hardware, automatically throttling an I/O bound task is nigh impossible for the system to do.

You’re going to have to do the throttling yourself. This could be done with NSOperationQueue, with a semaphore, or with any of a number of other mechanisms.

Normally, I’d suggest you try to separate the I/O from any computation so you can serialize I/O (which is going to be the most generally reasonable performance across all systems), but that is pretty much impossible when using high level APIs. In fact, it isn’t clear how the CG* I/O APIs might interact with the dispatch_io_* advisory APIs.

I like GCD, but you have to take its promise with a huge grain of salt:

In the past, introducing concurrency to an application required the creation of one or more additional threads. Unfortunately, writing threaded code is challenging. Threads are a low-level tool that must be managed manually. Given that the optimal number of threads for an application can change dynamically based on the current system load and the underlying hardware, implementing a correct threading solution becomes extremely difficult, if not impossible to achieve.

[…]

Rather than creating threads directly, applications need only define specific tasks and then let the system perform them. By letting the system manage the threads, applications gain a level of scalability not possible with raw threads.

You still have to manage things manually unless you’re not doing I/O. It’s just that you’re dealing with queues rather than threads. And we don’t yet have good tools for adapting code to systems with different I/O capabilities.

1 Comment RSS · Twitter

[…] Is this because there are more background processes that are accessing the disk? If so, is there anything that can be done about that? Within my apps, I try to limit concurrent operations on the same physical disk. You might think GCD would do something like this system-wide, but it doesn’t. […]

Leave a Comment