Skip to content

Classifying images

Marvin can use OpenAI's vision API to process images and classify them into categories.

The marvin.beta.classify function is an enhanced version of marvin.classify that accepts images as well as text.

Beta

Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.

What it does

The classify function can classify images as one of many labels.

How it works

This involves a two-step process: first, a caption is generated for the image that is aligned with the structuring goal. Next, the actual classify operation is performed with an LLM.

Example

We will classify the animal in this image, as well as whether it is wet or dry:

import marvin

img = marvin.beta.Image('https://upload.wikimedia.org/wikipedia/commons/d/d5/Retriever_in_water.jpg')

animal = marvin.beta.classify(
    img, 
    labels=['dog', 'cat', 'bird', 'fish', 'deer']
)

dry_or_wet = marvin.beta.classify(
    img, 
    labels=['dry', 'wet'], 
    instructions='Is the animal wet?'
)

Result

assert animal == 'dog'
assert dry_or_wet == 'wet'

Model parameters

You can pass parameters to the underlying API via the model_kwargs and vision_model_kwargs arguments of classify. These parameters are passed directly to the respective APIs, so you can use any supported parameter.

Async support

If you are using Marvin in an async environment, you can use classify_async:

result = await marvin.beta.classify_async(
    "The app crashes when I try to upload a file.", 
    labels=["bug", "feature request", "inquiry"]
) 

assert result == "bug"