marvin.beta.vision
Beta
Please note that vision support in Marvin is still in beta, as OpenAI has not finalized the vision API yet. While it works as expected, it is subject to change.
This model contains tools for working with the vision API, including
vision-enhanced versions of cast
, extract
, and classify
.
caption
¶
Generates a caption for an image using a language model synchronously.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Union[str, Path, Image]
|
URL or local path of the image. |
required |
instructions |
str
|
Instructions for the caption generation. |
None
|
model_kwargs |
dict
|
Additional arguments for the language model. |
None
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Generated caption. |
caption_async
async
¶
Generates a caption for an image using a language model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Union[str, Path, Image]
|
URL or local path of the image or images. |
required |
instructions |
str
|
Instructions for the caption generation. |
None
|
model_kwargs |
dict
|
Additional arguments for the language model. |
None
|
Returns:
Name | Type | Description |
---|---|---|
str |
str
|
Generated caption. |
cast
¶
Converts the input data into the specified type using a vision model synchronously.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Union[str, Image]
|
The data to be converted. |
required |
target |
type[T]
|
The type to convert the data into. If not provided but instructions are provided, assumed to be str. |
None
|
instructions |
str
|
Specific instructions for the conversion. |
None
|
images |
list[Image]
|
The images to be processed. |
None
|
vision_model_kwargs |
dict
|
Additional keyword arguments for the vision model. |
None
|
model_kwargs |
dict
|
Additional keyword arguments for the language model. |
None
|
Returns:
Name | Type | Description |
---|---|---|
T |
T
|
The converted data of the specified type. |
cast_async
async
¶
Converts the input data into the specified type using a vision model.
This function uses a vision model and a language model to convert the input data into a specified type. The conversion process can be guided by specific instructions. The function also supports additional arguments for both models.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
list[Image]
|
The images to be processed. |
None
|
data |
str
|
The data to be converted. |
required |
target |
type
|
The type to convert the data into. If not provided but instructions are provided, assumed to be str. |
None
|
instructions |
str
|
Specific instructions for the conversion. Defaults to None. |
None
|
vision_model_kwargs |
dict
|
Additional keyword arguments for the vision model. Defaults to None. |
None
|
model_kwargs |
dict
|
Additional keyword arguments for the language model. Defaults to None. |
None
|
Returns:
Name | Type | Description |
---|---|---|
T |
T
|
The converted data of the specified type. |
classify
¶
Classifies provided data and/or images into one of the specified labels synchronously.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Union[str, Image]
|
Data or an image for classification. |
required |
labels |
Union[Enum, list[T], type]
|
Labels to classify into. |
required |
images |
Union[Image, list[Image]]
|
Additional images for classification. |
None
|
instructions |
str
|
Instructions for the classification. |
None
|
vision_model_kwargs |
dict
|
Arguments for the vision model. |
None
|
model_kwargs |
dict
|
Arguments for the language model. |
None
|
Returns:
Name | Type | Description |
---|---|---|
T |
T
|
Label that the data/images were classified into. |
classify_async
async
¶
Classifies provided data and/or images into one of the specified labels. Args: data (Union[str, Image]): Data or an image for classification. labels (Union[Enum, list[T], type]): Labels to classify into. images (Union[Union[str, Path], list[Union[str, Path]]], optional): Additional images for classification. instructions (str, optional): Instructions for the classification. vision_model_kwargs (dict, optional): Arguments for the vision model. model_kwargs (dict, optional): Arguments for the language model.
Returns:
Name | Type | Description |
---|---|---|
T |
T
|
Label that the data/images were classified into. |
extract
¶
Extracts information from provided data and/or images using a vision model synchronously.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Union[str, Image]
|
Data or an image for information extraction. |
required |
target |
type[T]
|
The type to extract the data into. |
required |
instructions |
str
|
Instructions for extraction. |
None
|
images |
list[Image]
|
Additional images for extraction. |
None
|
vision_model_kwargs |
dict
|
Arguments for the vision model. |
None
|
model_kwargs |
dict
|
Arguments for the language model. |
None
|
Returns:
Name | Type | Description |
---|---|---|
T |
T
|
Extracted data of the specified type. |
extract_async
async
¶
Extracts information from provided data and/or images using a vision model.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
data |
Union[str, Image]
|
Data or an image for information extraction. |
required |
target |
type[T]
|
The type to extract the data into. |
required |
instructions |
str
|
Instructions for extraction. |
None
|
images |
list[Union[str, Path]]
|
Additional images for extraction. |
None
|
vision_model_kwargs |
dict
|
Arguments for the vision model. |
None
|
model_kwargs |
dict
|
Arguments for the language model. |
None
|
Returns:
Name | Type | Description |
---|---|---|
T |
T
|
Extracted data of the specified type. |
generate_vision_response
async
¶
Generates a language model response based on a provided prompt template and images.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
images |
list[Image]
|
Images used in the prompt, either URLs or local paths. |
required |
prompt_template |
str
|
Template for the language model prompt. |
required |
prompt_kwargs |
dict
|
Keyword arguments for the prompt. |
None
|
model_kwargs |
dict
|
Keyword arguments for the language model. |
None
|
Returns:
Name | Type | Description |
---|---|---|
ChatResponse |
ChatResponse
|
Response from the language model. |