The French AI startup Mistral has released its first model that can process images as well as text. Called Pixtral 12B, the 12-billion-parameter model is about 24GB in size. Parameters roughly correspond to a model’s problem-solving skills, and models with more parameters generally perform better than those with fewer parameters. Built on one of Mistral’s text models, Nemo 12B, the new model can answer questions about an arbitrary number of images of an arbitrary size given either URLs or images encoded using base64, the binary-to-text encoding scheme. Similar to other multimodal models such as Anthropic’s Claude family and OpenAI’s GPT-4o, Pixtral 12B should — at least in theory — be able to perform tasks like captioning images and counting the number of objects in a photo.
-
Notifications
You must be signed in to change notification settings - Fork 0
Inference of MistralAI Pixtral 12B model
License
shrimantasatpati/MistralAI_12B_Pixtral
Folders and files
Name | Name | Last commit message | Last commit date | |
---|---|---|---|---|
Repository files navigation
About
Inference of MistralAI Pixtral 12B model
Topics
Resources
License
Stars
Watchers
Forks
Releases
No releases published
Packages 0
No packages published