{"id":47422,"date":"2025-04-17T13:25:14","date_gmt":"2025-04-17T17:25:14","guid":{"rendered":"https:\/\/mjtsai.com\/blog\/?p=47422"},"modified":"2025-04-17T13:25:14","modified_gmt":"2025-04-17T17:25:14","slug":"performance-of-the-python-3-14-tail-call-interpreter","status":"publish","type":"post","link":"https:\/\/mjtsai.com\/blog\/2025\/04\/17\/performance-of-the-python-3-14-tail-call-interpreter\/","title":{"rendered":"Performance of the Python 3.14 Tail-Call Interpreter"},"content":{"rendered":"<p><a href=\"https:\/\/blog.nelhage.com\/post\/cpython-tail-call\/\">Nelson Elhage<\/a> (via <a href=\"https:\/\/news.ycombinator.com\/item?id=43317592\">Hacker News<\/a>):<\/p>\n<blockquote cite=\"https:\/\/blog.nelhage.com\/post\/cpython-tail-call\/\">\n<p>Unfortunately, as I will document in this post, these impressive performance gains turned out to be <strong>primarily due to inadvertently working around a regression in LLVM 19.<\/strong> When benchmarked against a better baseline (such GCC, clang-18, or LLVM 19 with certain tuning flags), the performance gain drops to 1-5% or so depending on the exact setup.<\/p>\n<p>[&#8230;]<\/p>\n<p>Historically, the optimization of replicating the bytecode dispatch into each opcode has been cited to speed up interpreters anywhere from <a href=\"https:\/\/github.com\/python\/cpython\/blob\/c718c6be0f82af5eb0e57615ce323242155ff014\/Misc\/HISTORY#L15252-L15255\">20%<\/a> to <a href=\"https:\/\/link.springer.com\/content\/pdf\/10.1007\/3-540-44681-8_59.pdf\">100%<\/a>. However, on modern processors with improved branch predictors, <a href=\"https:\/\/inria.hal.science\/hal-01100647\/document\">more recent work<\/a> finds a much smaller speedup, on the order of 2-4%.<\/p>\n<p>[&#8230;]<\/p>\n<p>Still, <code>nix<\/code> was clearly enormously helpful here, and on net it definitely made this kind of multi-version exploration and debugging <strong>much<\/strong> saner than any other approach I can imagine.<\/p>\n<\/blockquote>","protected":false},"excerpt":{"rendered":"<p>Nelson Elhage (via Hacker News): Unfortunately, as I will document in this post, these impressive performance gains turned out to be primarily due to inadvertently working around a regression in LLVM 19. When benchmarked against a better baseline (such GCC, clang-18, or LLVM 19 with certain tuning flags), the performance gain drops to 1-5% or [&hellip;]<\/p>\n","protected":false},"author":1,"featured_media":0,"comment_status":"open","ping_status":"closed","sticky":false,"template":"","format":"standard","meta":{"apple_news_api_created_at":"2025-04-17T17:25:16Z","apple_news_api_id":"64c93936-5ba7-443e-82bf-fd1aa136b14f","apple_news_api_modified_at":"2025-04-17T17:25:16Z","apple_news_api_revision":"AAAAAAAAAAD\/\/\/\/\/\/\/\/\/\/w==","apple_news_api_share_url":"https:\/\/apple.news\/AZMk5NlunRD6Cv_0aoTaxTw","apple_news_coverimage":0,"apple_news_coverimage_caption":"","apple_news_is_hidden":false,"apple_news_is_paid":false,"apple_news_is_preview":false,"apple_news_is_sponsored":false,"apple_news_maturity_rating":"","apple_news_metadata":"\"\"","apple_news_pullquote":"","apple_news_pullquote_position":"","apple_news_slug":"","apple_news_sections":"\"\"","apple_news_suppress_video_url":false,"apple_news_use_image_component":false,"footnotes":""},"categories":[4],"tags":[131,255,229,138,71,232],"class_list":["post-47422","post","type-post","status-publish","format-standard","hentry","category-programming-category","tag-bug","tag-compiler","tag-llvm","tag-optimization","tag-programming","tag-python"],"apple_news_notices":[],"_links":{"self":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/47422","targetHints":{"allow":["GET"]}}],"collection":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts"}],"about":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/types\/post"}],"author":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/users\/1"}],"replies":[{"embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/comments?post=47422"}],"version-history":[{"count":1,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/47422\/revisions"}],"predecessor-version":[{"id":47423,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/posts\/47422\/revisions\/47423"}],"wp:attachment":[{"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/media?parent=47422"}],"wp:term":[{"taxonomy":"category","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/categories?post=47422"},{"taxonomy":"post_tag","embeddable":true,"href":"https:\/\/mjtsai.com\/blog\/wp-json\/wp\/v2\/tags?post=47422"}],"curies":[{"name":"wp","href":"https:\/\/api.w.org\/{rel}","templated":true}]}}