Converting images to data¶
Marvin can use OpenAI's vision API to process images and convert them into structured data, transforming unstructured information into native types that are appropriate for a variety of programmatic use cases.
The marvin.beta.cast
function is an enhanced version of marvin.cast
that accepts images as well as text.
Beta
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
What it does
The cast
function can cast images to structured types.
How it works
This involves a two-step process: first, a caption is generated for the image that is aligned with the structuring goal. Next, the actual cast operation is performed with an LLM.
Example: locations
We will cast this image to a Location
type:
Example: getting information about a book
We will cast this image to a Book
to extract key information:
Instructions¶
If the target type isn't self-documenting, or you want to provide additional guidance, you can provide natural language instructions
when calling cast
in order to steer the output.
Example: checking groceries
Let's use this image to see if we got everything on our shopping list:
Model parameters¶
You can pass parameters to the underlying API via the model_kwargs
and vision_model_kwargs
arguments of cast
. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
Async support¶
If you are using marvin
in an async environment, you can use cast_async
:
Mapping¶
To transform a list of inputs at once, use .map
:
inputs = [
"I bought two donuts.",
"I bought six hot dogs."
]
result = marvin.beta.cast.map(inputs, int)
assert result == [2, 6]
(marvin.beta.cast_async.map
is also available for async environments.)
Mapping automatically issues parallel requests to the API, making it a highly efficient way to work with multiple inputs at once. The result is a list of outputs in the same order as the inputs.