Archive for July 19, 2023

Wednesday, July 19, 2023

Meta’s Microservice Architecture

Darby Huye et al. (PDF, via Hacker News):

We present a top-down analysis of Meta’s microservice architecture, starting from its service-level topology and descending into individual request workflows. (Request workflows describe the order and timing of services visited by requests when executing.) Our focus is on underreported characteristics of microservice architectures important for developing microservice tools and artificially modeling microservice topologies. Specifically, we describe growth and churn of the microservice topology (to inform tools that learn models of the topology), whether elements of the topology fit power-law distributions common to large graphs (to inform potential artificial topology generators), and the predictability of individual request workflows (to inform the vast number of tools that work by aggregating trace data).


A basic assumption of Meta’s architecture (which may or may not be true for other organizations’ architectures) is that business use case is a sufficient partitioning by which to define services, scale functionality, and observe behaviors.


Scale is measured in millions of instances: On 2022/12/21, the microservice topology contained 18,500 active services and over 12 million service instances.


We investigated the services that have the highest fan-in and fan-out degrees. The former is a vault server storing credentials for use by other services. The latter is a service for querying hosts for arbitrary statistics.


Services can be written in many programming languages. There are currently 16 different programming languages in use at Meta, with the most popular being Hack (a version of PHP), measured by lines of code. Other popular languages include: C++, Python, and Java, with the rest forming a long tail.


Removing the Python GIL

Jonathan Corbet (2021, Serdar Yegulalp, Hacker News):

Concerns over the performance of programs written in Python are often overstated — for some use cases, at least. But there is no getting around the problem imposed by the infamous global interpreter lock (GIL), which severely limits the concurrency of multi-threaded Python code. Various efforts to remove the GIL have been made over the years, but none have come anywhere near the point where they would be considered for inclusion into the CPython interpreter. Now, though, Sam Gross has entered the arena with a proof-of-concept implementation that may solve the problem for real.

Łukasz Langa (2021, Hacker News:

Sam’s work demonstrates it’s viable to remove the GIL in such a way that the resulting Python interpreter is performant and scales with added CPU cores. For performance to be net positive, other seemingly unrelated interpreter work is required.

See also: Faster CPython (Hacker News).

Backblaze (2022):

Our team had some fun experimenting with Python 3.9-nogil, the results of which will be reported in an upcoming blog post. In the meantime, we saw an opportunity to dive deeper into the history of the global interpreter lock (GIL), including why it makes Python so easy to integrate with and the tradeoff between ease and performance.

PEP 703:

This PEP proposes adding a build configuration (--disable-gil) to CPython to let it run Python code without the global interpreter lock and with the necessary changes needed to make the interpreter thread-safe.


The GIL is a CPython implementation detail that limits multithreaded parallelism, so it might seem unintuitive to think of it as a usability issue. However, library authors frequently care a great deal about performance and will design APIs that support working around the GIL. These workaround frequently lead to APIs that are more difficult to use. Consequently, users of these APIs may experience the GIL as a usability issue and not just a performance issue.


Removing the GIL requires changes to CPython’s reference counting implementation to make it thread-safe. Furthermore, it needs to have low execution overhead and allow for efficient scaling with multiple threads. This PEP proposes a combination of three techniques to address these constraints. The first is a switch from plain non-atomic reference counting to biased reference counting, which is a thread-safe reference counting technique with lower execution overhead than plain atomic reference counting. The other two techniques are immortalization and a limited form of deferred reference counting; they address some of the multi-threaded scalability issues with reference counting by avoiding some reference count modifications.


Using mimalloc, with some modifications, also addresses two other issues related to removing the GIL. First, traversing the internal mimalloc structures allows the garbage collector to find all Python objects without maintaining a linked list. This is described in more detail in the garbage collection section. Second, mimalloc heaps and allocations based on size class enable collections like dict to generally avoid acquiring locks during read-only operations.


This PEP proposes using per-object locks to provide many of the same protections that the GIL provides. For example, every list, dictionary, and set will have an associated lightweight lock. All operations that modify the object must hold the object’s lock. Most operations that read from the object should acquire the object’s lock as well; the few read operations that can proceed without holding a lock are described below.

There are some backwards compatibility issues with the C API.

Carl Meyer (via Hacker News):

We’ve had a chance to discuss this internally with the right people. Our team believes in the value that nogil will provide, and we are committed to working collaboratively to improve Python for everyone.

If PEP 703 is accepted, Meta can commit to support in the form of three engineer-years (from engineers experienced working in CPython internals) between the acceptance of PEP 703 and the end of 2025, to collaborate with the core dev team on landing the PEP 703 implementation smoothly in CPython and on ongoing improvements to the compatibility and performance of nogil CPython.


Update (2023-07-31): Thomas Wouters (via Hacker News):

It’s clear that the overall sentiment is positive, both for the general idea and for PEP 703 specifically. The Steering Council is also largely positive on both. We intend to accept PEP 703, although we’re still working on the acceptance details.

Update (2023-08-22): Jake Edge (Hacker News):

If the Python community finds that the switch is “just going to be too disruptive for too little gain”, the council wants to be able to change its mind anytime before declaring no-GIL as the default mode for the language. He outlined the steps that the council sees, starting with a short-term (perhaps for Python 3.13, which is due in October 2024) experimental no-GIL build of the interpreter that core developers and others can try out. In the medium term, no-GIL would be a supported option, but not the default; when that happens depends a lot on how quickly the community adopts and supports the no-GIL build. In the long term, no-GIL would be the default build and the GIL would be completely excised (“without unnecessarily breaking backward compatibility”).


It is quite a turning point in the history of the language, but the work is (obviously) not done yet. There is a huge amount of researching, coding, testing, experimenting, documenting, and so on between here and a no-GIL-only version of the language in, say, Python 3.17 in October 2028. One guesses that the work will not be done, then, either—there will be more optimizations to be found and applied if there is still funding available to do so.


Hau Tran (via Hacker News):

Self-hosted photo and video backup solution directly from your mobile phone.

The demo is impressive compared with other such services that I’ve used. And there’s an iOS app that can auto-upload new photos.


Update (2023-08-04): Christian Tietze:

Any #FOSS / #selfhosting enthusiasts for photo sharing in my timeline?

We want to share 700 pictures of our wedding weekend.

I tried Piwigo -- it’s cumbersome, but it allows guests to upload stuff form their mobile phones. (In theory)

I tried PhotoPrism -- it’s sleek and responsive, but I can’t create nested-albums and share the parent one (e.g. wedding day → party vs wedding day → ceremony)

@nextcloud Photos is just so buggy. They fixed one issue of privately shared links, but now visitors with read-only guest access can delete/rotate/… photos 🙄

Found Immich on @mjtsai’s blog yesterday, too. Sounds veeeeery intriguing, but with a focus on backup (sadly), not sharing with family.

Mid-1990s Sega Document Leak

Kevin Purdy (Hacker News):

Most of the changes on the Sega Retro wiki every day are tiny things, like single-line tweaks to game details or image swaps. Early Monday morning, the site got something else: A 47MB, 272-page PDF full of confidential emails, notes, and other documents from inside a company with a rich history, a strong new competitor, and deep questions about what to do next.

The document offers glimpses, windows, and sometimes pure numbers that explain how Sega went from a company that broke Nintendo’s near-monopoly in the early 1990s to giving up on consoles entirely after the Dreamcast. Enthusiasts and historians can see the costs, margins, and sales of every Sega system sold in America by 1997 in detailed business plan spreadsheets. Sega’s Wikipedia page will likely be overhauled with the information contained in inter-departmental emails, like the one where CEO Tom Kalinske assures staff (and perhaps himself) that “we are killing Sony” in Japan in March 1996.