Friday, October 29, 2021

Tesla’s Configurable Floating Point Formats

Tesla (PDF, via Reddit):

Tesla extended the reduced precision support further, and introduced the Configurable Float8 (CFloat8), an 8-bit floating point format, to further reduce the enormous pressure on memory storage and bandwidth in storing the weights, activations, and gradient values necessary for training the increasingly larger [neural] networks. Unlike the IEEE 754R standard, the purpose of this standard is mostly to standardize the formats and not necessarily to provide for portability of code to guarantee identical numerical result across all platforms.

The IEEE Float16 and Bfloat16 formats described above have a fixed number of bits allocated to the mantissa and exponent fields and have a fixed exponent bias. However, eight bits can only accommodate a small number of mantissa and exponent bits, so some configurability is required to ensure high accuracy and convergence of the training models.

One key property enabling this configurability is the fact that different parameters, namely weights, gradients and activations, have different precision and dynamic range requirements to achieve high training accuracy and convergence.


Due to the limited number of representable exponent values, Infinity and NaN encodings are not supported.

See also: James Douma.

Update (2021-11-12): Miguel de Icaza:

We do something like that, with great results[…]

Comments RSS · Twitter

Leave a Comment