Friday, December 29, 2023

Apple’s Ferret MLLM

Mike Wheatley (Hacker News, Reddit):

Artificial intelligence researchers from Apple Inc. and Cornell University quietly unveiled an open-source and multimodal large language model last October known as Ferret, which is said to use parts of images as queries.

According to VentureBeat, the release of Ferret on GitHub in October went completely under the radar, with no announcement being made. However, it has since gotten a lot of attention from AI researchers. Bart De Witte, who operates a non-profit focused on open-source AI in medicine, posted on X that the release of Ferret “solidifies Apple’s place as a leader in the multimodal AI space.”

Malcolm Owen:

Ferret’s release to open-source is being performed under a non-commercial license, so it cannot be commercialized in its current state.

[…]

A tweet from October by Apple AI/ML research scientist Zhe Gan explains Ferret’s use as being a system that can “refer and ground anything anywhere at any granularity” in an image. It can also do so by using any shape of region within an image.

[…]

In one interesting element from the Github release, Reddit’s r/Apple spotted that Ferret is “trained on 8 A100 GPUs with 80GB memory.” Given Apple’s history with Nvidia GPU support, this was seen to be a rare acknowledgment of the GPU producer.

Previously:

Comments RSS · Twitter · Mastodon

Leave a Comment