Monday, November 5, 2018

Backblaze bzfileids.dat Scaling and Little Snitch

Backblaze:

As part of our backup process, Backblaze will run a checksum against each file before uploading it. This requires the entire bzfileids.dat file to be loaded into RAM. After a long time, or if you have an extraordinarily large number of files, the bzfileids.dat file can grow large causing the Backblaze directory to appear bloated. The only way to resolve this would be to repush (or reupload all of your data).

Via Matt:

Known for >7 years. Bit me when a drive died and I restored. Could not handle deduping it’s index after restoration and it grew too big. The software has one job...

I don’t understand why this is still an issue. Couldn’t they use a database or other indexed structure instead of keeping everything in RAM all the time?

Ryan Waggoner:

I use LittleSnitch and bztransmit has recently been prompting to be allowed to connect to Google.com, Wikipedia.org, and Reddit.com? Seems suspicious.

Brian Gerfort:

Uhm. @backblaze, I don’t feel so safe with you anymore if your developers think this is an acceptable thing to do? Your backup software should only be talking to you. This is not a good look.

brianwski:

The Backblaze client needed to solve a technical problem, which is to distinguish when the customer’s computer is entirely offline (no network connectivity) and when Backblaze’s datacenters and servers are offline (which is unusual). The way I implemented this is that ONLY IF the client cannot reach the Backblaze servers, then in that case the client attempts to fetching the homepages of these three websites without any cookies and ignoring the results that come back (other than verifying they are valid HTML)[…] to establish the difference between “the Backblaze service is down” and “you have no internet connectivity at all.”

The reasoning actually makes a lot of sense; the issue is that the user has no way of knowing what is happening or why.

xudo:

FYI in my case this happens when I am connected to a network that allows connections to Reddit, Wikipedia and Google but blocks backblaze. I quickly figured out that this was some kind of connectivity detection but is mildly annoying (but no big deal).

Yev Pusin:

when I worked at Wells Fargo (before Backblaze) almost all websites were allowed for social reasons (marketing teams need access to Facebook/Twitter/LinkedIn) but Backblaze was actually blocked as well. Why? Because it’s a backup service and Wells Fargo does NOT want you backing up your company machine, so all backup companies were blocked

Update (2019-01-04): Richard Gaywood, on another scaling issue:

When the only free restore method @backblaze offers can’t restore all your files...

How annoying. I can work around by doing it in chunks, but that’s not a great UX.

2 Comments RSS · Twitter

NordVPN now checks connectivity using these websites as well, and I don't recall any update note telling me it was going to do that. I was pretty nervous till I found a Reddit post about it. Not a good look.

The website check actually makes sense. I'm not a programmer at all (unless shell scripts, rsync and the like counts…yes, I know it does not 😀) but I've done the same thing for some of my own stuff. You check hugely popular sites to see what's down, be it network, Internet, subsection of the Internet, etc.

Disclosure makes sense of course for services reaching third party customers.

Leave a Comment