Monday, September 18, 2023

Apple’s New Transformer-Powered Predictive Text Model

Jack Cook (via Hacker News):

The feature will occasionally suggest more than one word at a time, but this is generally limited to instances where the upcoming words are extremely obvious, similar to the autocomplete in Gmail.


I have to say that this vocabulary file strikes me as pretty unique, but it’s definitely not out of the question for a language model deployed in this setting. I’ve personally never seen emojis featured so prominently in a language model’s tokenizer, but existing research has shown that domain-specific models and tokenizers can drastically improve downstream model performance. So it makes sense that a model trained for use in things like text messages, in which emojis and contractions will be used a lot, would prioritize them.


GPT-2 has four main parts: token embeddings, positional encodings, a series of 12-48 decoder blocks, and an output layer. The network described by unilm_joint_cpu appears to be the same, except with only 6 decoder blocks. Most of the layers within each decoder block have names like gpt2_transformer_layer_3d, which would also seem to suggest it’s based on a GPT-2 architecture.

From my calculations based on sizes of each layer, Apple’s predictive text model appears to have about 34 million parameters, and it has a hidden size of 512 units. This makes it much smaller than even the smallest version of GPT-2.

The early reports about auto-correct in iOS 17 and macOS 14 seem to be positive. I’m cautiously optimistic that it will fix the biggest problems for me with the old system, which are that it suggests words that are not spelled correctly and even changes correct words that I entered into mistakes.


1 Comment RSS · Twitter · Mastodon

Beatrix Willius

The spelling is fine. But I still don't like working with predictive text. I type too fast and the predictive text doesn't help me type faster.

Leave a Comment