Archive for March 11, 2019

Monday, March 11, 2019 [Tweets] [Favorites]

The State of Mac Hardware in Early 2019

Wojtek Pietrusiewicz:

Macs, like many other computers, have always had their share of problems. These past few years feel particularly bad though, so here’s my quick take on the current state of the Mac lineup.

Maybe things will turn around later this year, with the Mac Pro and rumored new pro notebooks, but right now we’re in quite a dark period for the Mac—both hardware and software.

Lewis Hilsenteger (via Damien Petrilli):

After many years using MacBook variants I’ve made the switch to Windows. I’ve used every version of MacBook Pro and MacBook Air that have been released. My current laptop of choice is the Lenovo Thinkpad X1 Carbon / Lenovo Thinkpad X1 Extreme. Turns out switching from Mac to Windows isn’t as painful as I expected.

See also: Accidental Tech Podcast.

Previously:

Update (2019-03-12): Colin Cornaby:

I’ve seen from time to time people suggest, that even with a Mac’s decline in quality, it’s not like people are going to switch to Windows. Yet I know people who’ve happily switched over. I worry that if Marzipan leads to another decline in quality there will be more switchers.

The Mac’s main competitor is still Windows. That’s what’s so frustrating about the pro app conversation around Marzipan. Unless the big packages port over from Windows, nothing is going to change. And those guys aren’t upset about AppKit, they’re upset over Metal and Nvidia.

Most pro app vendors already have their own UI libraries (which is already frustrating if you care about the Mac experience.) AppKit vs UIKit isn’t really going to move the needle much for these app vendors in porting over from Windows.

Uupis:

The next MBP refresh feels personally pivotal to me, be it this year or later; even aside from the state of macOS. I’m not buying another first-generation design after the experience with 2016 model. And I won’t be able to convince myself to buy another MBP with a TouchBar.

Nick Fugitt:

I’ve said this repeatedly but Apple has spent 2015-2019 (half the decade) executing and then recovering from horrible decision-making around 2012-14. It’s truly a dark age for the Mac and there’s no guarantee we’re about to be done

John Gordon:

The iMac dust problem hit me. Rare that gets a mention.

The Sad State of Logging Bugs for Apple

Corbin Dunn (tweet, Hacker News):

This is where things get screwy depending on the component your bug lands in, since bug management is group dependent. Many groups will have only one or two QA people to do the initial screening of those large drop areas for bugs. QA engineers are sometimes instructed to screen bugs with a priority and “fix period” before passing them off to the engineer responsible for the code. This is terrible because many engineers will not look at bugs with a low priority. It is much better for the engineer who “owns the code” to look at a bug and determine the priority. The QA engineers will frequently get a huge back log of bugs to screen, and it can take weeks, or even months, for some bugs to get screened. Sometimes this leads to a mass screening of bugs, marking them all with a low priority. Bug originators have to notice this, and complain about it for the priority to get increased. Worse yet, some groups mass close bugs older than a year or so, and ask the originator to re-open the bug if the issue still exists. A lot of people don’t pay attention to bugs that need verification, and they simply become lost.

[…]

Engineers also dislike screening bugs because sometimes they have to add them to their queue for the current release. This increases their required workload for that release, which is something people don’t like doing. So, instead, many bugs stay unscreened.

[…]

Sometimes QA screen bugs with a low priority and holds onto them. They never get moved to the appropriate code engineers, and effectively become lost in the system. Sadly, I had seen this happen way too often.

[…]

When a bug is sent back as fixed, the internal developer who originated the bug is supposed to verify the problem is resolved. They can send it back if the problem isn’t resolved. However, internal developers don’t really have an incentive to verify bugs. Management doesn’t keep track of bugs that need verification or really require developers to verify them. Most engineers do verify bugs; they like to make sure problems are resolved. But external developers are left in a more sad state. The bug becomes closed for them, and is dead.

[…]

Internal engineers need to take more responsibility in promptly screening bugs. Management needs to allow engineers to have more time to do this, which is at the expense of working on features or fixing already screened bugs. Engineers should always be expected to have a very low unscreened bug count.

This matches what we felt like must be going on when filing bugs, as well as the way the smaller bugs seem to hang around forever, with new ones added each year. Even Mojave, which was supposed to be a refinement release, seems to have, on balance, increased the number of bugs. As a user, it sucks that things don’t work as well as they used to. As a developer, I spend too much time working around OS bugs and breakage—in other words, preventing my apps from getting worse rather than actually making them better. I assume other developers are in the same boat, and this may be one reason there seems to be less excitement around apps these days. Everyone is wasting a lot of energy treading water.

It’s as if the OS is rotting away before our eyes. The good news is that this should be fixable. Apple has tons of smart engineers who care. But the process does not seem to be set up to produce quality. Management talks a good game but clearly has other priorities. There are undoubtedly many policies that could be changed to improve the organizational incentives, and a core problem seems to be that Apple remains understaffed for its ambitions. The headcount can’t and shouldn’t be massively increased in a short period of time, but there is something Apple could do today to help stem the tide: go off the annual schedule.

Peter Ammon:

The single easiest and most effective thing Apple could do to improve its SWE org is to invest in Radar.

Radar’s importance within Apple cannot be overstated. It subsumes what would be multiple tools in other orgs. As an Apple SWE you spend a massive amount of time in it. And yet Apple treats Radar as a cost center, developed by an outsourced offshore team. It’s slow to search, supports only plain text, is hard to script, and is missing obvious features, e.g. automatic duplicate finding.

Hire five good SWEs, give them a mandate to serve the needs of the org, and you will massively increase the effectiveness of every other engineer.

ThemalSpan:

Anecdotally, I didn’t find the situation internally to be much better. Many bugs internally go unanswered because there is not enough time allocated to fixing core systems and designing better replacements. The truth is, I know personally of several teams that aren’t able to get through the queue of internally filed and scheduled bugs.

To me, it feels like Apple hasn’t resourced core pieces of infrastructure and engineering teams in line with upper management’s plans for growth. While many teams are relatively sequestered, once you start talking to folks elsewhere in the company it becomes clear that many teams are struggling to stay above water. More still, everyone shrugs about it because it’s not clear exactly what is wrong. The best description I’ve heard is in many cases engineers are willing to offer hacks as a solution to meet management’s demands, and management is either willing to accept those hacks or doesn’t know better.

satisfice:

We originally designed Radar so that bugs would be verified as closed by the person with most interest in seeing this happen: the tester assigned to that part of that project. Then management swooped in with an edict that bugs must be verified as closed by whomever originally reported them. This is a stupid idea, because it creates the perverse incentive that no one should report a problem if they are outside the team (because then you are committing to verify the fix, which just means more work for you that has nothing to do with any of your main responsibilities).

When I pointed out that the system would now discourage people on different teams from helping each other, the sponsoring director said “that’s what pink slips are for.” Direct quote. Soon after that I resigned from the design team.

Without reasonably skilled and principled leadership, you just don’t get quality software. And “quality is everyone’s job” is just an empty and childish slogan. Excellence is not transmitted through slogans and wishful thinking. You have to assign responsibility, provide resources and time (which means lowering velocity of new development), and follow-up.

The fundamental reason why it doesn’t happen is the technology market is not efficient. Quality is, in fact, not as important as career testers wish it were. You can get away with doing terrible work and not lose your job. The fact that Apple pays no significant penalties for having buggy products insulates it from our slings and arrows.

Corbin Dunn:

Some obvious issue, like “this button should do X but it does Y” can be verified by almost anyone. But some issues need the attention of the original author to really verify the bug. Maybe what needs to be done is someone in QA needs to attempt to verify the bug, or “pre-verify”, and then it goes back to the originator for final verification, who can also verify it, or simply close it if they feel like QA did a good job.

drfindley:

What’s even sadder is we used to be better at this when I started at Apple in 2008. Bugs often got screened and triaged and sometimes fixed within a week. I blame the yearly release schedule, where shipping features became a higher priority than overall quality

Corbin Dunn:

I feel the same way; people took more time on bugs back in those days. I also think the yearly schedule is to blame.

akecheck:

When a process is annoying and you do nothing, people eventually do give up and leave. When it reaches that point, they’re not coming back even if you finally wake up and fix what bugged them.

Apple software quality is in serious danger precisely because of this type of community and infrastructure rot. They are not encouraging developers to help them, and a not-surprising number of serious issues have shown up in released products in recent years.

jakobegger:

By now, a significant fraction of bugs are bugs in Apple’s frameworks. We try to report them to Apple, but they are ignored, or simply closed because they are related to deprecated APIs.

Of course, customers don’t complain that Apple frameworks are buggy -- they complain that our app crashes! So Apple has no incentive to fix it.

pkamb:

The entire value of WWDC is going to the Labs, giving an Apple engineer your Radar number, and having them read you or paraphrase the internal-only notes attached to the ticket. Half the time the question/bug will be clearly resolved internally or a workaround posted. But no updates are added to the public ticket, and it will remain open and unchanged for years.

Gus Mueller:

Corbin is a former AppKit engineer, and this is a must read for developers. It’ll make you angry, and it’s stuff you already figured was happening.

Tanner Bennett:

This confirms what we already knew. Almost no one at Apple takes bug reporting seriously. Reports will stack up indefinitely and eventually macOS will be a shell of its former self.

Corbin Dunn:

It is not just macOS, but iOS too.

Paul Haddad:

Interesting read. My view on bugs, work around them and move on. Even if it gets assigned it’s not getting fixed for at least a year.

Jeff Johnson:

IMO Radar screening issues are merely a symptom. The root problem is that Apple produces a completely unmanageable volume of bugs. Even if they screened all Radars quickly, then what? Bugs still get written much much faster than they get fixed. That’s unsustainable.

I suspect that Radar screening is allowed to be lax precisely because everybody knows that a huge volume of bugs will never get fixed anyway. It’s like bailing out the Titanic.

There’s also a tolerance for shipping bad bugs. If heads rolled at the company for shipping bad bugs, then Radars would get screened.

Adam Savage:

Just getting my music downloaded to my phone is a recurring nightmare I relive every time I upgrade. Having my music ON my device should be a simple choice, & you’ve made it Byzantine. How is it that I have to visit a support forum to learn how to download SONGS to my PHONE?!

The language of permissions is still fascinatingly and infuriatingly opaque, to the degree that when using iTunes, I’m regularly convinced it has an agenda antithetical to mine. Searching Suport forums is also nightmarish as helpful buttons from one version disappear in others.

Previously:

Update (2019-03-20): Michael Nachbaur:

It’s easy to blame Apple for poor bug handling practices, but I feel it’s a two-way street. It’s just as much our responsibility as theirs to ensure important bugs get fixed; we should do everything in our power to make their jobs easier in solving bugs. And if we can’t, then at the very least we can treat Apple’s engineers with respect.

Safety Experts Weigh in on the Boeing 737 MAX

Max Prosperi (via Yan Zhu):

The preliminary investigation following Lion Air Flight 610 revealed that prior to the crash, a system called Maneuvering Characteristics Augmentation System or MCAS had engaged, without the pilots’ knowledge. The MCAS lowers the nose automatically to prevent a stall, or the loss of lift, if it detects that the angle of the plane’s nose is too high relative to the ground. A malfunctioning sensor may have led the MCAS to engage repeatedly, countering the pilots’ maneuvers.

[…]

Diehl recalled that leading up to the implementation of the MCAS, an FAA official came to him and asked whether or not he thought the automation of aircraft was safe. Diehl’s advice: “Automation, if done right, is great, but it can also bite you.”

After the Lion Air crash, Boeing denied that it had not properly communicated to pilots the addition of the MCAS to the MAX-series 737s, a major difference with previous models of the airplane. (That position contradicts what some airlines have said.)

David Fickling:

A software update intended to fix the problem identified in the Lion Air crash still hasn’t been rolled out. The fact that the crew on Flight 610 are likely to have been aware of the known issues with the aircraft, too, raises the more worrying possibility that there’s an unknown complication.

Andrew:

Two hull loss incidents in under a year on a brand-new aircraft type with only 350 in service.

Compare that to the 787 which, despite serious development problems, has about 800 aircraft in service for many more years, and still zero hull loss incidents.

Or compare the MAX 8 with it’s predecessor, the 737-800, which has had 16 hull loss incidents in a fleet of 5000 aircraft across more than two decades of service.

The 737-MAX seems less safe to operate, whether the reason is an aircraft defect or difficulty of operation, or lack of adequate training.

vbscript2:

Statistics do funny things with such low sample sizes.

Take the 777, for example. It went nearly 20 years, much of that time as the most popular widebody flying, before its first accident resulting in a passenger death. Then it had 3 in a year. It hasn’t had another in the 5 years since that time. Was the 777 any less safe in 2013-2014 than in its other 25 years of service history? Obviously not. Similarly, the A320 family had a streak of fatal crashes in the last few years, yet there’s no reason to believe the A320 isn’t safe, let alone that it’s any less safe than it has been for the rest of its service history.

Update (2019-03-12): See also: Jon Ostrower (via Hacker News) and New York Times (via Hacker News).

McCloud:

I think required reading should be Normal Accidents: Living with High-Risk Technologies by Charles Perrow.

Mac McClellan (via Martin Steiger):

Though the pitch system in the MAX is somewhat new, the pilot actions after a failure are exactly the same as would be for a runaway trim in any 737 built since the 1960s. As pilots we really don’t need to know why the trim is running away, but we must know, and practice, how to disable it.

The problem for Boeing, and maybe eventually all airplane designers, is that FBW avoids these issues. FBW removes the pilot as a critical part of the system and relies on multiple computers to handle failures.

Boeing is now faced with the difficult task of explaining to the media why pilots must know how to intervene after a system failure. And also to explain that airplanes have been built and certified this way for many decades. Pilots have been the last line of defense when things go wrong.

What makes that such a tall order is that FBW airplanes – which include all the recent Airbus fleet, and the 777 and 787 from Boeing – don’t rely on the pilots to handle flight control system failures. FBW uses at least a triple redundant computer control system to interpret the inputs of the cockpit controls by pilots into movement of the airplane flight controls, including the trim. If part of the FBW system fails, the computer identifies the faulty elements and flies on without the human pilots needing to know how to disable the failed system.

Update (2019-03-13): Dallas Morning News (Hacker News):

Pilots repeatedly voiced safety concerns about the Boeing 737 Max 8 to federal authorities, with one captain calling the flight manual “inadequate and almost criminally insufficient” several months before Sunday’s Ethiopian Air crash that killed 157 people, an investigation by The Dallas Morning News found.

Update (2019-03-15): Jon Ostrower (via John Gruber):

Every airplane development is a series of compromises, but to deliver the 737 Max with its promised fuel efficiency, Boeing had to fit 12 gallons into a 10 gallon jug. Its bigger engines made for creative solutions as it found a way to mount the larger CFM International turbines under the notoriously low-slung jetliner.

See also: Hacker News (3)

tuna-piano:

Assuming the author is correct, and the reaction to the MCAS issues is a simple reaction that every pilot should know by memory: Is it really acceptable that once every 3 months a 737-Max will attempt a nose dive and require a vigilant pilot who can identify and correct the issue before the plane crashes into the ground?

And this likely happened at least twice, while there were 300 MAXs in service. If there were 3,000 MAXs in service, MCAS misfires would presumably be happening 3x a month worldwide - each misfire requiring a proper pilot reaction. How can you defend Boeing in that case?

amluto:

Here’s what I don’t get about this whole situation:

AIUI 737 MAX has an instability such that, in near stall conditions, some attempts to recover can make the stall worse. To mitigate this, Boeing added MCAS, and MCAS can malfunction with a single sensor failure. Imagine that this failure occurs and the pilot successfully turns off MCAS but ends up in a dive, too close to the ground, or otherwise in a bad situation. Now the pilot has to recover, but they are facing a faulty AoA indicator (if they have one at all) as well as a plane that, because MCAS is off, is unstable in near-stall conditions. And the pilot has never been trained in the handling of type 737 MAX under these conditions.

Am I wrong for some reason, or is this a potentially rather dangerous situation that could be caused by a single instrument failure?

Update (2019-03-21): Dominic Gates (via Nick Visser):

Current and former engineers directly involved with the evaluations or familiar with the document shared details of Boeing’s “System Safety Analysis” of MCAS, which The Seattle Times confirmed.

The safety analysis:

Understated the power of the new flight control system, which was designed to swivel the horizontal tail to push the nose of the plane down to avert a stall. When the planes later entered service, MCAS was capable of moving the tail more than four times farther than was stated in the initial safety analysis document.

Failed to account for how the system could reset itself each time a pilot responded, thereby missing the potential impact of the system repeatedly pushing the airplane’s nose downward.

Assessed a failure of the system as one level below “catastrophic.” But even that “hazardous” danger level should have precluded activation of the system based on input from a single sensor — and yet that’s how it was designed.

Trevor Sumner (Hacker News):

BEST analysis of what really is happening on the #Boeing737Max issue from my brother in law @davekammeyer, who’s a pilot, software engineer & deep thinker. Bottom line don’t blame software that’s the band aid for many other engineering and economic forces in effect.👇🎖🤔

See also: The Talk Show.

John Cassidy:

Early on, employees of the F.A.A. and Boeing decided how to divide up the certification work. But, partway through the process, a former F.A.A. safety engineer told the Seattle Times, “we were asked by management to re-evaluate what would be delegated. Management thought we had retained too much at the FAA.” The engineer said that “there was constant pressure to re-evaluate our initial decisions,” and “even after we had reassessed it … there was continued discussion by management about delegating even more items down to the Boeing Company.”

Even the work that was retained, such as reviewing technical documents provided by Boeing, was sometimes curtailed. “There wasn’t a complete and proper review of the documents,” the former engineer added. “Review was rushed to reach certain certification dates.”

Alan Levin and Harry Suhartono:

That extra pilot, who was seated in the cockpit jumpseat, correctly diagnosed the problem and told the crew how to disable a malfunctioning flight-control system and save the plane, according to two people familiar with Indonesia’s investigation.

The next day, under command of a different crew facing what investigators said was an identical malfunction, the jetliner crashed into the Java Sea killing all 189 aboard.

[…]

Airline mechanics tried four times to fix related issues on the plane starting Oct. 26, according to the Indonesia preliminary report. After pilots reported issues with incorrect display of speeds and altitude in the two prior flights, workers in Denspasar, Bali, replaced a key sensor that is used by the Boeing plane to drive down its nose if it senses an emergency.

Flight data shows the sensor, called the “angle of attack” vane, which measures whether air is flowing parallel to the length of the fuselage or at an angle, was providing inaccurate readings after that.

Steven Ashley:

At Boeing, safety really is an option: Optional cockpit ‘disagree lights’ wld have alerted pilots that the anti-stall, angle-of-attack (AOA) sensors were not in agreement. But after 2 crashes, they’re suddenly standard equipment...

Update (2019-04-05): Boeing CEO Dennis Muilenburg (via Hacker News):

The full details of what happened in the two accidents will be issued by the government authorities in the final reports, but, with the release of the preliminary report of the Ethiopian Airlines Flight 302 accident investigation, it's apparent that in both flights the Maneuvering Characteristics Augmentation System, known as MCAS, activated in response to erroneous angle of attack information.

Update (2019-04-09): Chris Woodyard:

Boeing “violated a basic principle of aircraft design by allowing a single point failure to trigger a sequence of events that could result in a loss of control,” said Brian Alexander, an attorney for a law firm specializing in aviation accidents, Kreindler & Kreindler in New York, that is contemplating lawsuits on behalf of victims’ families in the Ethiopian Airlines crash.

Update (2019-04-18): Philip Greenspun (via Hacker News):

Had the systems engineers and programmers checked Wikipedia, for example, (or maybe even their own web site) they would have learned that “The critical or stalling angle of attack is typically around 15° – 20° for many airfoils.” Beyond 25 degrees, therefore, it is either sensor error or the plane is stalling/spinning and something more than a slow trim is going to be required.

[…]

We fret about average humans being replaced by robots, but consider the Phoenix resident who sees that the outdoor thermometer is reading 452 degrees F on a June afternoon. Will the human say “Arizona does get hot in the summer so I’m not going to take my book outside for fear that it will burst into flames”? Or “I think I need to buy a new outdoor thermometer”?

Update (2019-05-01): Gregory Travis (via Hacker News):

I have been a pilot for 30 years, a software developer for more than 40. I have written extensively about both aviation and software engineering. Now it’s time for me to write about both together.

Skype for Web Drops Support for Safari

Chance Miller:

In a statement to VentureBeat, Microsoft explained that Skype for Web uses a “calling and real-time media” framework that functions differently across the various browsers. Thus, it decided to prioritize Skype for Web support in Microsoft Edge and Google Chrome[…]

Previously:

Update (2019-03-11): My hospital’s Web site just stopped working in Safari. The login page just endlessly reloads itself. It works in Chrome and Firefox.

Update (2019-04-09): Jeff Johnson, quoting the IRS Web site:

We cannot recommend Safari, due to current compatibility and display issues.

Update (2019-04-16): Gus Mueller:

More and more sites aren’t working in Safari, or they are just stupid slow or rendered bad and I’m having to use Chrome more and it’s making me sad.

Johnathan Nightingale:

When I started at Mozilla in 2007 there was no Google Chrome and most folks we spoke with inside were Firefox fans. They were building an empire on the web, we were building the web itself.

When chrome launched things got complicated, but not in the way you might expect. They had a competing product now, but they didn’t cut ties, break our search deal - nothing like that. In fact, the story we kept hearing was, “We’re on the same side. We want the same things.”

I think our friends inside google genuinely believed that. At the individual level, their engineers cared about most of the same things we did. Their product and design folks made many decisions very similarly and we learned from watching each other.

But Google as a whole is very different than individual googlers. Google Chrome ads started appearing next to Firefox search terms. gmail & gdocs started to experience selective performance issues and bugs on Firefox. Demo sites would falsely block Firefox as “incompatible.”

All of this is stuff you’re allowed to do to compete, of course. But we were still a search partner, so we’d say “hey what gives?”

And every time, they’d say, “oops. That was accidental. We’ll fix it in the next push in 2 weeks.”

Over and over. Oops. Another accident. We’ll fix it soon. We want the same things. We’re on the same team.

There were dozens of oopses. Hundreds maybe?

I’m all for “don’t attribute to malice what can be explained by incompetence” but I don’t believe google is that incompetent.

I think they were running out the clock. We lost users during every oops. And we spent effort and frustration every clock tick on that instead of improving our product. We got outfoxed for a while and by the time we started calling it what it was, a lot of damage had been done.

Update (2019-05-30): I’m now having problems accessing the US Social Security site with Safari, and accessing the Internet Archive fails with a kCFErrorHTTPParseFailure error, despite working in other browsers.

Update (2019-05-31): Colin Cornaby:

I still use Safari as my only web browser. But it’s bumming me out that more and more I have to open Chrome for a one off use of a web page.

I don’t know if Chrome is breaking things or Safari is falling behind. But it feels like the early days of Safari again.

Flickr Protects Photos From Deletion

Flickr (via Hacker News):

When we recently announced updates to Flickr Free accounts, we stated that freely licensed public photos (Creative Commons, public domain, U.S. government works, etc.) as of November 1, 2018 in excess of the free account limit would not be deleted. We wanted to make sure we didn’t disrupt the hundreds of millions of stories across the global internet that link to freely licensed Flickr images. We know the cost of storing and serving these images is vastly outweighed by the value they represent to the world.

In this spirit, today we’re going further and now protecting all public, freely licensed images on Flickr, regardless of the date they were uploaded. We want to make sure we preserve these works and further the value of the licenses for our community and for anyone who might benefit from them.

[…]

In memoriam accounts will preserve all public content in a deceased member’s account, even if their Pro subscription lapses. The account’s username will be updated to reflect the “in memoriam” status and login for the account be locked, preventing anyone from signing in.

Previously: