Archive for July 20, 2015

Monday, July 20, 2015

Java Strings No Longer Share Storage

Heinz M. Kabutz (via Hacker News):

From Java 1.0 up to 1.6, String tried to avoid creating new char[]’s. The substring() method would share the same underlying char[], with a different offset and length. For example, in StringChars we have two Strings, with "hello" a substring of "hello_world". However, they share the same char[].

This is no longer the case with Java 7:

“Why this change?”, you may ask. It turns out that too many programmers used substring() as a memory saving method. Let’s say that you have a 1 MB String, but you actually only need the first 5 KB. You could then create a substring, expecting the rest of that 1 MB String to be thrown away. Except it didn’t. Since the new String would share the same underlying char[], you would not save any memory at all. The correct code idiom was therefore to append the substring to an empty String, which would have the side effect of always producing a new unshared char[] in the case that the String length did not correspond to the char[] length.

Apple’s GCD actually employs the opposite optimization: sometimes a larger data object shares the storage of two smaller ones, unless you specifically ask for contiguous storage.


So, it’s a trade-off between the original and new behaviour; the original way more or less caps the memory usage at the size of the original string, but at the expense of not being able to GC it if even a single substring exists, while the new way increases memory usage for each substring generated but does not prevent any of the strings from being GC’d.

This is why the article’s code example yields such a huge difference in memory usage in Java 6 vs Java 7; it is effectively a sort of "anti-pattern" when used against the new substring() method. (i.e. iterating through a large string and generating lots of sub-strings).


The thing I mind is that there is now no way to get the old behavior. String is a final class, so you cannot override it and add a field, even. You can roll your own - if there is no code you do not control that takes a string. (And if you don’t mind having to write your own string class!)


I would not have objected to releasing this change in Java 1.7. But releasing it in a BUG FIX RELEASE causes me to lose confidence in the maintainers of the JVM. One of the reasons that large companies like mine build in Java is because of Sun’s long history of extremely careful attention to backward compatibility. Oracle is no Sun.


I agree - echoing the sentiments of another commentator here, I feel like one of the tenets of Java is backwards compatibility. While the change doesn’t affect functionality, it can turn code that previously had a space complexity of O(1) into one that is O(n). This is probably a Bad Thing.


I’m the author of the substring() change though in total disclosure the work and analysis on this began long before I took on the task. As has been suggested in the analysis here there were two motivations for the change;

  • reduce the size of String instances. Strings are typically 20-40% of common apps footprint. Any change with increases the size of String instances would dramatically increase memory pressure. This change to String came in at the same time as the alternative String hash code and we needed another field to cache the additional hash code. The offset/count removal afforded us the space we needed for the added hash code cache. This was the trigger.
  • avoid memory leakage caused by retained substrings holding the entire character array. This was a longstanding problem with many apps and was quite a significant in many cases. Over the years many libraries and parsers have specifically avoided returning substring results to avoid creating leaked Strings.


The comments about the substring operation becoming O(n) assume that the substring result is allocated in the general heap. This is not commonly the case and allocation in the TLAB is very much like malloca()--allocation merely bumps a pointer.


We investigated the regressions to see if performance was still acceptable and correctness was maintained. The most significant performance drop turned out to be in an obsolete benchmark which did hundreds of random substrings on a 1MB string and put the substrings into a map. It then later compared the map contents to verify correctness. We concluded that this case was not representative of common usage. Most other applications saw positive footprint and performance improvements or no significant change at all. A few apps, generally older parsers, had minor footprint growth.


Our importer went from a few minutes to parse a couple gigabytes of data to literally centuries. In the context of theoretical computer science that means correctness is preserved. In the real world, however, this means that the program stops progressing until a frustrated user presses the cancel button and calls our hotline.


It could have been so easy. Introduce a new function called something like subcopy(). Make substring() deprecated. In the deprecation comment, explain the memory leak problem and announce that substring() is schedule for removal in java 2.0. Port the jdk and glassfish and your other applications which might have a problem to use subcopy() everywhere when available. Check for performance regressions. Once java 2.0 is released, you can reclaim the memory for the offset and index variables.

And here is he crux of the problem: there is no java 2.0. The optimal time frame for making a set of major changes to the language has already passed, and nobody dares to propose it now. What you do instead is to release backwards incompatible changes anyway, as we see here, because you cannot fix all the old problems in any other way. This was already bad when upgrading between minor versions. Now we get the same in bugfix releases, and additionally, we need to look up some new bizzare numbering scheme to see which bugfix release is actually just fixing bugs and which isn’t.

FastMail Enables IMAP Push for iOS

FastMail (via Gabe Weatherhead):

While our own app has had push notifications for some time, with the built-in Mail app you would have to wait until it next decided to check the server for your new messages. But no more! From today, new mail will be pushed straight to your inbox.

robn_fastmail (via Milen Dzhumerov):

IMAP “push” usually means the IDLE extension, where a client holds a connection open and waits for the server to report that something has happened. This works ok, but isn’t great on mobile because holding a TCP connection open is usually difficult on flaky networks and consumes battery.

iOS Mail doesn’t implement it, instead doing a poll every 15 minutes. However it also implements a separate push system which allows true push if the server supports it. The details on how to support this aren’t public information.

We talked to Apple, and they were kind enough to give us access to this system. We implemented it on our side, and now when an email arrives at FastMail we can immediately signal to that there’s new mail available.

Emphasis added.

Apple vs. the PC Industry

John Gruber:

To put Apple’s current industry position in perspective, the company probably sold somewhere between 60-65 million iOS devices last quarter. (I’m guessing ~50 million iPhones, ~10-12 million iPads, and a handful of million iPod Touches.) The average selling price of a PC has fallen to under $400. The average selling price of an iPhone has been estimated to be as high as $660. So while iOS devices, taken as a whole, might still fall a few million units short of the PC industry, they’re clearly generating more in revenue.

Photos for Mac 1.1

Jason Snell:

Yes, in Photos 1.1 you can add a location to an image or batch of images that weren’t geotagged, as well as edit the location of data of already-geotagged images.

Nick Heer:

Based on what I’ve heard and played with so far, I don’t think that I’m going to fall in love with it yet, in the same way I did with Aperture. But it’s now the primary way I edit my photos, and I like it more each time I use it.


As he pointed out, it relies on the Apple Maps POI database, which can be a crapshoot as we’ve previously discussed. Furthermore, because it relies upon search, it’s incredibly difficult to bulk tag photos in slightly different places – that is, you must tag them all identically, or modify them one at a time, which is tedious.

I don’t see anything there about using a GPS track to assign locations to photos.

I’m in the progress of learning Lightroom. It is quite a shift from Aperture, but I am liking a lot of what I see. And, unlike Photos, it supports both Google Maps and terrain view.