Marvin can use OpenAI's vision API to process images and classify them into categories.
marvin.beta.classify function is an enhanced version of
marvin.classify that accepts images as well as text.
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
What it does
classify function can classify images as one of many labels.
How it works
This involves a two-step process: first, a caption is generated for the image that is aligned with the structuring goal. Next, the actual classify operation is performed with an LLM.
We will classify the animal in this image, as well as whether it is wet or dry:
You can pass parameters to the underlying API via the
vision_model_kwargs arguments of
classify. These parameters are passed directly to the respective APIs, so you can use any supported parameter.
If you are using Marvin in an async environment, you can use