Marvin can use OpenAI's vision API to process images as inputs.
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
What it does
caption function generates text from images.
Generate a description of the following image, hypothetically available at
"This is a digital illustration featuring a stylized, cute character resembling a Funko Pop vinyl figure with large, shiny eyes and a square-shaped head, sitting on abstract wavy shapes that simulate a landscape. The whimsical figure is set against a dark background with sparkling, colorful bokeh effects, giving it a magical, dreamy atmosphere."
How it works
Marvin passes your images to the OpenAI vision API as part of a larger prompt.
You can pass parameters to the underlying API via the
model_kwargs argument of
caption. These parameters are passed directly to the API, so you can use any supported parameter.
If you are using Marvin in an async environment, you can use