Archive for July 7, 2021

Wednesday, July 7, 2021

Backblaze Computer Backup 8.0

Yev Pusin:

Our latest version is pretty great: It cranks up the speed—letting you upload at whatever rate your local system can attain—all while reducing stress on key elements of your computer by an order of magnitude.


We’ve also re-architected the way we handle file copies. In our previous 7.0 version of Backblaze Computer Backup, the client app running on your laptop or desktop made a copy of your file on your hard drive before uploading it. In version 8.0, this step has been removed. Now the client reads the file, encrypts it in RAM, and uploads it to the Backblaze data center. This results in better overall system performance and a reduction in strain on HDDs and SSDs on your laptops and desktops.

What about large files that don’t fit in RAM or that would use more RAM than you want? This seems like the perfect time to use an APFS file clone to ensure a consistent snapshot of the data.

In version 8.0, you’ll get more information about what is getting uploaded and when. When we transfer large files, sometimes the app will appear to “hang” on uploading a part of that file, when in reality that file’s already been transmitted and we’re starting to work on the next batch of files. The UI will now reflect upload status more clearly.

The most important question for me is, if the UI reports that the upload is complete, does that actually mean that the file exists on Backblaze’s server and that it can be restored? Or, as with previous versions, does it require additional information to be uploaded by the client over the next 1–8 hours?

And, secondly, does this update address the longstanding issues with large bzfileids.dat files?

iOS: Closing of the Frontier

Francisco Tolmasky:

I think the @AppStore may represent a “Closing of the Frontier” moment (in the American history “Frontier Thesis” sense) that may in part explain the dramatic slowdown in UI and UX innovation in iOS (and even more so in iPadOS) following the iPhone’s initial dramatic launch.

It’s no secret that macOS has… borrowed many of its now familiar workflows from 3rd party devs. Spotlight (Watson and QuickSilver), Widgets (Konfabulator), and iCloud Drive (Dropbox) to name just a few. And to be clear, this a good thing and has generally been wll received.

The key thing here is that these utilities started on the “fringe”… the frontier.


And IMO a big reason for that is because there’s no “frontier” for enthusiasts to experiment and possibly break into the mainstream. Innovation can only come from Apple, where changes are riskiest. The ecosystem has no way to derisk through organic growth in the market.

And jailbreaking doesn’t (and can’t) serve this role. It’s a big scary binary switch (that is constantly being mitigated by Apple). You can’t install “one well known cool system extension.” There’s either jailbreaking your phone, or not.

No one can invent the next Dropbox on iOS, and perhaps not even on Android. I guess the frontier is now the desktop platforms, but do they have enough mindshare for the next big thing to break through?

Tanner Bennett:

On iOS, the features they take come from jailbreak tweaks.

• Control center was SBSettings
• BiteSMS had quick reply before iOS
• PredictiveKeyboard (obvious)
• Someone delivered multitasking before  did, but it’s a stretch to call that a Sherlock; same with dark mode

It’s actually crazy how many things the community beat Apple to, year after year. Most of them are obvious steps forward, but still a lot of them are definite Sherlocks.

Dan Grover:

Even before “Sherlocking” became a verb, like half of System 7.5 was random 3P hacks that Apple bought out -- including the menubar clock! Sandboxed app stores were a faustian bargain: less stuff to Sherlock, but it bridged gap and made regular users behave more like power users.


GitHub Copilot and Copyright

Rian Hunter (via Hacker News):

I do not agree with GitHub’s unauthorized and unlicensed use of copyrighted source code as training data for their ML-powered GitHub Copilot product. This product injects source code derived from copyrighted sources into the software of their customers without informing them of the license of the original source code. This significantly eases unauthorized and unlicensed use of a copyright holder’s work.

Julia Reda (tweet):

Since Copilot also uses the numerous GitHub repositories under copyleft licences such as the GPL as training material, somecommentators accuse GitHub of copyright infringement, because Copilot itself is not released under a copyleft licence, but is to be offered as a paid service after a test phase. The controversy touches on several thorny copyright issues at once. What is astonishing about the current debate is that the calls for the broadest possible interpretation of copyright are now coming from within the Free Software community.


In the US, scraping falls under fair use, this has been clear at least since the Google Books case.


The short code snippets that Copilot reproduces from training data are unlikely to reach the threshold of originality. Precisely because copyright only protects original excerpts, press publishers in the EU have successfully lobbied for their own ancillary copyright that does not require originality as a precondition for protection. Their aim is to prohibit the display of individual sentences from press articles by search engines.


On the other hand, the argument that the outputs of GitHub Copilot are derivative works of the training data is based on the assumption that a machine can produce works. This assumption is wrong and counterproductive. Copyright law has only ever applied to intellectual creations – where there is no creator, there is no work. This means that machine-generated code like that of GitHub Copilot is not a work under copyright law at all, so it is not a derivative work either.

Luis Villa:

“independent creation” is a doctrine in US law that protects you if you write the same thing without knowing about the first thing. May or may not apply here, but I mention it because it is non-intuitive and speaks directly to “but what if the code is the same”.

There is an observable trend in US law, based on fair use and older notions in US copyright law of the need for creativity, that judges give a looooot of leeway to “machines that read”. Copilot fits pretty squarely in that tradition.


Article 4 of the 2019 Directive seems to clearly make Copilot’s training unambiguously legal in the EU, but authors can explicitly opt out.


Note that this is an interesting example of what I wrote about in the context of databases, where rights are not the same across countries, making it hard to write a generic global license.

James Grimmelmann:

Almost by accident, copyright law has concluded that it is for humans only: reading performed by computers doesn’t count as infringement. Conceptually, this makes sense: Copyright’s ideal of romantic readership involves humans writing for other humans. But in an age when more and more manipulation of copyrighted works is carried out by automated processes, this split between human reading (infringement) and robotic reading (exempt) has odd consequences: it pulls us toward a copyright system in which humans occupy a surprisingly peripheral place. This Article describes the shifts in fair use law that brought us here and reflects on the role of robots in copyright’s cosmology.


Infringement is for humans only; when computers do it, it’s fair use.


GitHub Support just straight up confirmed in an email that yes, they used all public GitHub code, for Codex/Copilot regardless of license.

Adam Jacob:

Those of us who remember when open source was the novel underdog, allowing us to learn, grow, and build things our proprietary peers could not - we tend to see the relationship to corp $ in OSS as a net benefit, pretty much always.

That’s because we remember when it wasn’t so, and it took a lot of work to make it legit. But if you started your career with that as the ground truth, you’re much more likely to see the problematic aspects of it; that your open code can be used by folks in ways you dislike.

The Free Software Foundation has received numerous inquiries about our position on these questions. We can see that Copilot’s use of freely licensed software has many implications for an incredibly large portion of the free software community. Developers want to know whether training a neural network on their software can really be considered fair use. Others who may be interested in using Copilot wonder if the code snippets and other elements copied from GitHub-hosted repositories could result in copyright infringement. And even if everything might be legally copacetic, activists wonder if there isn’t something fundamentally unfair about a proprietary software company building a service off their work.

With all these questions, many of them with legal implications that at first glance may have not been previously tested in a court of law, there aren’t many simple answers. To get the answers the community needs, and to identify the best opportunities for defending user freedom in this space, the FSF is announcing a funded call for white papers to address Copilot, copyright, machine learning, and free software.

GitHub Copilot and API Keys

Mohammed Abubakar:

For starters, it’s an assistant that can help you with better code suggestions, but it has been recently brought to notice that the AI is leaking API keys that are valid and still functional.

First reported by a SendGrid engineer, he asked the AI for the keys, and it showed them.

Linus Groh:

@GitHubCopilot gave me a link with a key that still works (and stops working when changing it), so...

Airbnb haven’t noticed they leaked that somewhere OR GitHub is feeding private code to Copilot OR somehow it’s intentionally public.


Software Vulnerabilities in the Boeing 787

Ruben Santamarta (PDF):

IOActive has documented our detailed attack paths and component vulnerabilities to describe the first plausible, detailed public attack paths to effectively reach the avionics network on a commercial airplane from either non-critical domains, such as Passenger Information and Entertainment Services, or even external networks.

Andy Greenberg (Hacker News):

IOActive’s attack claims—as well as Honeywell’s and Boeing’s denials—are based on the specific architecture of the 787’s internals. The Dream liner’s digital systems are divided into three networks: an Open Data Network, where non-sensitive components like the in-flight entertainment system live; an Isolated Data Network, which includes somewhat more sensitive components like the CIS/MS that IOActive targeted; and finally the Common Data Network, the most sensitive of the three, which connects to the plane’s avionics and safety systems. Santamarta claims that the vulnerabilities he found in the CIS/MS, sandwiched between the ODN and CDN, provide a bridge from one to the other.

But Boeing counters that it has both “additional protection mechanisms” in the CIS/MS that would prevent its bugs from being exploited from the ODN, and another hardware device between the semi-sensitive IDN—where the CIS/MS is located—and the highly sensitive CDN. That second barrier, the company argues, allows only data to pass from one part of the network to the other, rather than the executable commands that would be necessary to affect the plane’s critical systems.


But even granting Boeing’s claims about its security barriers, the flaws Santamarta found are egregious enough that they shouldn’t be dismissed, says Stefan Savage, a computer science professor at the University of California at San Diego, who is currently working with other academic researchers on an avionics cybersecurity testing platform. “The claim that one shouldn’t worry about a vulnerability because other protections prevent it from being exploited has a very bad history in computer security,” Savage says. “Typically, where there’s smoke there’s fire.”

Via Bruce Schneier:

This being Black Hat and Las Vegas, I’ll say it this way: I would bet money that Boeing is wrong. I don’t have an opinion about whether or not it’s lying.