marvin.beta.vision

Beta

Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.

This model contains tools for working with the vision API, including vision-enhanced versions of cast, extract, and classify.

`caption` ¶

Generates a caption for an image using a language model synchronously.

Parameters:

Name	Type	Description	Default
`data`	`Union[str, Path, Image]`	URL or local path of the image.	required
`instructions`	`str`	Instructions for the caption generation.	`None`
`model_kwargs`	`dict`	Additional arguments for the language model.	`None`

Returns:

Name	Type	Description
`str`	`str`	Generated caption.

`caption_async` `async` ¶

Generates a caption for an image using a language model.

Parameters:

Name	Type	Description	Default
`data`	`Union[str, Path, Image]`	URL or local path of the image or images.	required
`instructions`	`str`	Instructions for the caption generation.	`None`
`model_kwargs`	`dict`	Additional arguments for the language model.	`None`

Returns:

Name	Type	Description
`str`	`str`	Generated caption.

`cast` ¶

Converts the input data into the specified type using a vision model synchronously.

Parameters:

Name	Type	Description	Default
`data`	`Union[str, Image]`	The data to be converted.	required
`target`	`type[T]`	The type to convert the data into. If not provided but instructions are provided, assumed to be str.	`None`
`instructions`	`str`	Specific instructions for the conversion.	`None`
`images`	`list[Image]`	The images to be processed.	`None`
`vision_model_kwargs`	`dict`	Additional keyword arguments for the vision model.	`None`
`model_kwargs`	`dict`	Additional keyword arguments for the language model.	`None`

Returns:

Name	Type	Description
`T`	`T`	The converted data of the specified type.

`cast_async` `async` ¶

Converts the input data into the specified type using a vision model.

This function uses a vision model and a language model to convert the input data into a specified type. The conversion process can be guided by specific instructions. The function also supports additional arguments for both models.

Parameters:

Name	Type	Description	Default
`images`	`list[Image]`	The images to be processed.	`None`
`data`	`str`	The data to be converted.	required
`target`	`type`	The type to convert the data into. If not provided but instructions are provided, assumed to be str.	`None`
`instructions`	`str`	Specific instructions for the conversion. Defaults to None.	`None`
`vision_model_kwargs`	`dict`	Additional keyword arguments for the vision model. Defaults to None.	`None`
`model_kwargs`	`dict`	Additional keyword arguments for the language model. Defaults to None.	`None`

Returns:

Name	Type	Description
`T`	`T`	The converted data of the specified type.

`classify` ¶

Classifies provided data and/or images into one of the specified labels synchronously.

Parameters:

Name	Type	Description	Default
`data`	`Union[str, Image]`	Data or an image for classification.	required
`labels`	`Union[Enum, list[T], type]`	Labels to classify into.	required
`images`	`Union[Image, list[Image]]`	Additional images for classification.	`None`
`instructions`	`str`	Instructions for the classification.	`None`
`vision_model_kwargs`	`dict`	Arguments for the vision model.	`None`
`model_kwargs`	`dict`	Arguments for the language model.	`None`

Returns:

Name	Type	Description
`T`	`T`	Label that the data/images were classified into.

`classify_async` `async` ¶

Classifies provided data and/or images into one of the specified labels. Args: data (Union[str, Image]): Data or an image for classification. labels (Union[Enum, list[T], type]): Labels to classify into. images (Union[Union[str, Path], list[Union[str, Path]]], optional): Additional images for classification. instructions (str, optional): Instructions for the classification. vision_model_kwargs (dict, optional): Arguments for the vision model. model_kwargs (dict, optional): Arguments for the language model.

Returns:

Name	Type	Description
`T`	`T`	Label that the data/images were classified into.

`extract` ¶

Extracts information from provided data and/or images using a vision model synchronously.

Parameters:

Name	Type	Description	Default
`data`	`Union[str, Image]`	Data or an image for information extraction.	required
`target`	`type[T]`	The type to extract the data into.	required
`instructions`	`str`	Instructions for extraction.	`None`
`images`	`list[Image]`	Additional images for extraction.	`None`
`vision_model_kwargs`	`dict`	Arguments for the vision model.	`None`
`model_kwargs`	`dict`	Arguments for the language model.	`None`

Returns:

Name	Type	Description
`T`	`T`	Extracted data of the specified type.

`extract_async` `async` ¶

Extracts information from provided data and/or images using a vision model.

Parameters:

Name	Type	Description	Default
`data`	`Union[str, Image]`	Data or an image for information extraction.	required
`target`	`type[T]`	The type to extract the data into.	required
`instructions`	`str`	Instructions for extraction.	`None`
`images`	`list[Union[str, Path]]`	Additional images for extraction.	`None`
`vision_model_kwargs`	`dict`	Arguments for the vision model.	`None`
`model_kwargs`	`dict`	Arguments for the language model.	`None`

Returns:

Name	Type	Description
`T`	`T`	Extracted data of the specified type.

`generate_vision_response` `async` ¶

Generates a language model response based on a provided prompt template and images.

Parameters:

Name	Type	Description	Default
`images`	`list[Image]`	Images used in the prompt, either URLs or local paths.	required
`prompt_template`	`str`	Template for the language model prompt.	required
`prompt_kwargs`	`dict`	Keyword arguments for the prompt.	`None`
`model_kwargs`	`dict`	Keyword arguments for the language model.	`None`

Returns:

Name	Type	Description
`ChatResponse`	`ChatResponse`	Response from the language model.

marvin.beta.vision

caption ¶

caption_async async ¶

cast ¶

cast_async async ¶

classify ¶

classify_async async ¶

extract ¶

extract_async async ¶

generate_vision_response async ¶

`caption` ¶

`caption_async` `async` ¶

`cast` ¶

`cast_async` `async` ¶

`classify` ¶

`classify_async` `async` ¶

`extract` ¶

`extract_async` `async` ¶

`generate_vision_response` `async` ¶