Skip to content

marvin.beta.vision

Beta

Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.

This model contains tools for working with the vision API, including vision-enhanced versions of cast, extract, and classify.

caption

Generates a caption for an image using a language model synchronously.

Parameters:

Name Type Description Default
image Union[str, Path, Image]

URL or local path of the image.

required
instructions str

Instructions for the caption generation.

None
model_kwargs dict

Additional arguments for the language model.

None

Returns:

Name Type Description
str str

Generated caption.

caption_async async

Generates a caption for an image using a language model.

Parameters:

Name Type Description Default
image Union[str, Path, Image]

URL or local path of the image.

required
instructions str

Instructions for the caption generation.

None
model_kwargs dict

Additional arguments for the language model.

None

Returns:

Name Type Description
str str

Generated caption.

cast

Converts the input data into the specified type using a vision model synchronously.

Parameters:

Name Type Description Default
data Union[str, Image]

The data to be converted.

required
target type[T]

The type to convert the data into.

required
instructions str

Specific instructions for the conversion.

None
images list[Image]

The images to be processed.

None
vision_model_kwargs dict

Additional keyword arguments for the vision model.

None
model_kwargs dict

Additional keyword arguments for the language model.

None

Returns:

Name Type Description
T T

The converted data of the specified type.

cast_async async

Converts the input data into the specified type using a vision model.

This function uses a vision model and a language model to convert the input data into a specified type. The conversion process can be guided by specific instructions. The function also supports additional arguments for both models.

Parameters:

Name Type Description Default
images list[Union[str, Path]]

The images to be processed.

None
data str

The data to be converted.

required
target type

The type to convert the data into.

required
instructions str

Specific instructions for the conversion. Defaults to None.

None
vision_model_kwargs dict

Additional keyword arguments for the vision model. Defaults to None.

None
model_kwargs dict

Additional keyword arguments for the language model. Defaults to None.

None

Returns:

Name Type Description
T T

The converted data of the specified type.

classify

Classifies provided data and/or images into one of the specified labels synchronously.

Parameters:

Name Type Description Default
data Union[str, Image]

Data or an image for classification.

required
labels Union[Enum, list[T], type]

Labels to classify into.

required
images Union[Union[str, Path], list[Union[str, Path]]]

Additional images for classification.

None
instructions str

Instructions for the classification.

None
vision_model_kwargs dict

Arguments for the vision model.

None
model_kwargs dict

Arguments for the language model.

None

Returns:

Name Type Description
T T

Label that the data/images were classified into.

classify_async async

Classifies provided data and/or images into one of the specified labels. Args: data (Union[str, Image]): Data or an image for classification. labels (Union[Enum, list[T], type]): Labels to classify into. images (Union[Union[str, Path], list[Union[str, Path]]], optional): Additional images for classification. instructions (str, optional): Instructions for the classification. vision_model_kwargs (dict, optional): Arguments for the vision model. model_kwargs (dict, optional): Arguments for the language model.

Returns:

Name Type Description
T T

Label that the data/images were classified into.

extract

Extracts information from provided data and/or images using a vision model synchronously.

Parameters:

Name Type Description Default
data Union[str, Image]

Data or an image for information extraction.

required
target type[T]

The type to extract the data into.

required
instructions str

Instructions for extraction.

None
images list[Union[str, Path]]

Additional images for extraction.

None
vision_model_kwargs dict

Arguments for the vision model.

None
model_kwargs dict

Arguments for the language model.

None

Returns:

Name Type Description
T T

Extracted data of the specified type.

extract_async async

Extracts information from provided data and/or images using a vision model.

Parameters:

Name Type Description Default
data Union[str, Image]

Data or an image for information extraction.

required
target type[T]

The type to extract the data into.

required
instructions str

Instructions for extraction.

None
images list[Union[str, Path]]

Additional images for extraction.

None
vision_model_kwargs dict

Arguments for the vision model.

None
model_kwargs dict

Arguments for the language model.

None

Returns:

Name Type Description
T T

Extracted data of the specified type.

generate_vision_response async

Generates a language model response based on a provided prompt template and images.

Parameters:

Name Type Description Default
images list[Image]

Images used in the prompt, either URLs or local paths.

required
prompt_template str

Template for the language model prompt.

required
prompt_kwargs dict

Keyword arguments for the prompt.

None
model_kwargs dict

Keyword arguments for the language model.

None

Returns:

Name Type Description
ChatResponse ChatResponse

Response from the language model.