Captioning images¶
Marvin can use OpenAI's vision API to process images as inputs.
Beta
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
What it does
The caption
function generates text from images.
Example
Generate a description of the following image, hypothetically available at /path/to/marvin.png
:
import marvin
from pathlib import Path
caption = marvin.beta.caption(image=Path('/path/to/marvin.png'))
Result
"This is a digital illustration featuring a stylized, cute character resembling a Funko Pop vinyl figure with large, shiny eyes and a square-shaped head, sitting on abstract wavy shapes that simulate a landscape. The whimsical figure is set against a dark background with sparkling, colorful bokeh effects, giving it a magical, dreamy atmosphere."
How it works
Marvin passes your images to the OpenAI vision API as part of a larger prompt.
Model parameters¶
You can pass parameters to the underlying API via the model_kwargs
argument of caption
. These parameters are passed directly to the API, so you can use any supported parameter.
Async support¶
If you are using Marvin in an async environment, you can use caption_async
: