Saturday, December 12, 2015

Yahoo’s Engineers Move to Coding Without a Net

Tekla Perry (Hacker News, Slashdot):

What happens when you take away the quality assurance team in a software development operation? Fewer, not more errors, along with a vastly quicker development cycle.

That, at least, has been the experience at Yahoo, according to Amotz Maimon, the company’s chief architect, and Jay Rossiter, senior vice president of science and technology. After some small changes in development processes in 2013, and a larger push from mid-2014 to the first quarter of 2015, software engineering at Yahoo underwent a sea change. The effort was part of a program Yahoo calls Warp Drive: a shift from batch releases of code to a system of continuous delivery. Software engineers at Yahoo are no longer permitted to hand off their completed code to another team for cross checking. Instead, the code goes live as-is; if it has problems, it will fail and shut down systems, directly affecting Yahoo’s customers.

The article doesn’t say what they mean by “errors” or how they are being counted.

chojeen:

Before the switch, our team (advertising pipeline on Hadoop) used the waterfall method with these gigantic, monolithic releases; we probably released a handful of times a year. Almost without exception, QA was done manually and was painfully slow. I started to automate a lot of the testing after I arrived, but believe you me when I say that it was a tall order.

Soon after I moved into development, QA engineers without coding chops were let go, while the others were integrated into the development teams. The team switched over to agile, and a lot of effort was made to automate testing wherever possible. Despite some initial setbacks, we got down to a bi-weekly release cycle with better quality control than before.

reid:

I’m a programmer at Yahoo -- deploying multiple times a day to production, with the confidence your code will work, feels great.

Manual (“batch-release”) deployments have been forbidden for over a year, which is a forcing function to change development process to allow deploying to production continuously multiple times a day. This requires robust test and deployment automation and for engineers to better understand what they build. It’s pretty nice overall!

diivio:

Microsoft switched to this model a few months after Satya took over.

For the majority of Microsoft teams it worked really well and showed the kinds of results mentioned in this yahoo article. Look at many of our iOS apps as an example.

But for some parts of the Windows OS team apparently it didn’t work well (according to anonymous reports leaked online to major news outlets by some Windows team folks) and they say it caused bugs.

See also: Why Good QA Matters to Businesses.

7 Comments RSS · Twitter

It seems pretty obvious. The error count is down because they fired all the people who were counting the errors. Problem solved!

What is interesting is how successfully they (Yahoo and just about every other tech company) have sold this blatant cost-cutting move to both developers and the general public. No one ever said automated testing and developer responsibility for bugs was a bad idea. But did anyone ever consider combining that with a real, independent, empowered QA team that actually did usability tests on the final product?

Since when was QA ever supposed to have been done manually in the first place? All they have done is pushed testing back onto the developer's plate. The used to be considered a bad idea because developers were too close to the code to catch edge cases or unexpected input. If they wrote the tests and their performance was based on the quality of the software as measured by the tests, guess whose tests would alway succeed? QA was meant to be an independent verification of software quality. Now, with name like "warp drive", the only metric is speed - of development. We have gone from measuring software in LOC (lines of code) to LOCPH (lines of code per hour).

"It seems pretty obvious. The error count is down because they fired all the people who were counting the errors. Problem solved!"

Well, here's another half-baked theory:

Maybe 'coding without a net' is just fine for rapidly iterating web development, but is a far, far worse idea for OS's.

I take the Microsoft quote as supporting my theory:

For the majority of Microsoft teams it worked really well and showed the kinds of results mentioned in this yahoo article ... But for some parts of the Windows OS team apparently it didn’t work well ... and they say it caused bugs.

@John: Amusing, and certainly not uncommon in pointy-haired circles. However, it is also possible that making devs do their own damn testing, instead of being lazy toads who toss it off as someone else's problem the moment it compiles, does in fact make them produce better work.

This is not to say having a fresh set of eyes is not also valuable - it absolutely is. e.g. Having worked both sides of publishing, I can tell you a top editor is absolutely worth their weight in gold. However, their job is to ensure a final spit and polishing, not to wade through mountains glaring problems that a responsible developer should have identified and fixed themselves, which is insanely inefficient and a total waste of a valuable resource. The only thing that guarantees is that bugs will ship as schedules are blown and QA burns out and rages at being treated as lazy and incompetent developers' personal slaves.

..

I suspect ditching Big Development Strategy also goes a long way to improving quality, because building any product - especially a complex one - is a constant learning exercise as much as a doing one. That means being able to explore and experiment, to work quickly and make mistakes, learn from those mistakes, and change often until you finally have a solution that does the job.

Heck, anyone who's developed software for others knows that users rarely know exactly what they need up front; it's only by a constant to-ing and fro-ing that both users and developers start to work out the shape of the problem, and from there figure out how best to solve it. Waterfall's just a ready-made disaster from day one, precisely because continuous learning and communication, or admitting one's mistakes and making changes, are utterly anathema to that culture and usually fatal for job security too.

@has: That's funny. Your description of Waterfall is almost exactly my description of modern "continuous development." Developers have always done their own testing and would never "toss it off as someone else's problem the moment it compiles". It is lazy companies who ship low-quality software and make it the customer's responsibility to do any beta and compatibility testing. But what is different now is that there is never a stable, well-known system. If you can identify the bugs, you can get productive use out of virtually any system. But the never-ending betas never become stable and no one ever gets a chance to learn how it works. Old bugs are constantly being re-implemented and new bugs introduced. It is more like a software addict alway hoping that the next (bug) fix will do the trick. But all it is really designed to do is keep the junkie on the hook.

"Instead, the code goes live as-is; if it has problems, it will fail and shut down systems, directly affecting Yahoo’s customers."

Isn't it the data that goes live as-is at Yahoo? Because there is quite a bunch of crap articles everyday on their home page.

I think there's some validity to the idea that making bug detection somebody else's problem might cause developers to be less careful about what they ship, and what they won't ship. It's just an unintended side-effect of having a team whose sole responsibility it is to find problems: you're outsourcing worrying about quality to another team. Whether going back to not having QA is actually the solution to this problem seems doubtful to me. Maybe somebody should ask some actual psychologists about how to solve this conundrum, instead of just throwing out QA altogether.

None of this has anything to do with waterfall or agile or whatever. Also, as an aside, "waterfall" does not mean "all planning up-front without any opportunity to iterate, or re-evaluate earlier decisions". That's a dumb straw man put up by people who think that deciding between different development methodologies is some kind of holy war.

Lukas: "Maybe somebody should ask some actual psychologists about how to solve this conundrum"

IANAP, but from what I've seen there are two basic dysfunctions in corporate software development: managers whose sole interest is erecting and defending their own private personal empire, and programmers whose sole interest in writing code. Permitting/Encouraging/Requiring users and developers to work directly together in order to mutually learn and solve problems completely undermines the status and control of the former, and forces the latter to admit ignorance and work to learn stuff that doesn't interest them. (Obligatory.)

..

As to what "waterfall" (or, for that matter, "continuous development") means versus how it actually operates in practice, again look to the individual motivations of those involved. A lot of managers and programmers (hell, a lot of people, period) aren't big on taking personal responsibility or admitting their mistakes, and are often deeply hostile to any sort of change from what they already know and are comfortable with (e.g. just look at Kodak, who invented the digital camera, and lost everything because "we do film").

Sure, waterfall allows backtracking, but no PM who values their continued employment is going to invoke that option, because 1. it means admitting error, and thus is an open invite for her rivals to eat her alive; and 2. because it "threatens schedules", even though the ultimate cost of proceeding to do the entire job wrong (instead of throwing out and redoing) totally blows them anyway; not to mention that schedules for large-scale projects are impossible to predict accurately and thus insanely optimistic. (Because what ambitiously upward executive is going to approve a Impressively Grand (and Bonus-Guaranteeing) Project that has a painfully pessimistic - if realistic - schedule?)

Thus, waterfall becomes a haven to those that despise change and do everything to frustrate it and to bury the ultimate costs of their own inaction, while CD is easily hijacked by cowboys and children who want to play all day and never take responsibility for it. And both can be hijacked by glory seekers and empire builders for their own personal gain; though waterfall projects - being naturally given to Big Promises, Grand Plans, and Career-Making Attention - are almost certainly a much more tempting target.

..

Honestly, if you want to know what development process is right, you have to look at the user's needs, not the supplier's. Nowadays a lot of custom software-using businesses need the ability to turn on a dime in order to respond to the ever evolving demands of their own users. A solution promised in six months (and which likely as not ships in twelve and immediately found to be even slower and less reliable than just doing the damn job by hand) is simply not going to meet that need. That means lots of small, modest, frequent rollouts of individual changes and new features as soon as each one is ready, rather than stacking them all up for a grand spectacle release months down the line, with the features that are completed first haemorrhaging value while waiting on the final item on the promised feature list to work.

And again, what empire-building exec wants to be the one to tell their increasingly angry client that they have to bounce some features to the next release because they can't deliver everything they promised on time? Or, for that matter, what stakeholder wants to scale down and divvy up their hugely ambitious Niagara Waterfall Project into lots of tiny little garden sprinklers? The latter may be far more realistically delivered, but nobody ever made themselves famous by working as a humble gardener.

Leave a Comment