Converting text to data¶
At the heart of Marvin is the ability to convert natural language to native Python types and structured objects. This is one of its simplest but most powerful features, and forms the basis for almost every other tool.
The primary tool for creating structured data is the
cast function, which takes a natural language string as its input, as well as a type to which the text should be converted.
What it does
cast function transforms natural language text into a Python type or structured object.
How it works
Marvin creates a schema from the provided type and instructs the LLM to use the schema to format its JSON response.
In Python, the JSON representation is hydrated into a "full" instance of the type.
cast function supports conversion almost all builtin Python types, plus Pydantic models and Python's
TypedDict. When called, the LLM will take all available information into account, performing deductive reasoning if necessary, to determine the best output. The result will be a Python object of the provided type.
Sometimes the cast operation is obvious, as in the "big apple" example above. Other times, it may be more nuanced. In these cases, the LLM may require guidance or examples to make the right decision. You can provide natural language
instructions when calling
cast() in order to steer the output.
In a simple case, instructions can be used independent of any type-casting. Here, we want to keep the output a string, but get the 2-letter abbreviation of the state.
One way of classifying text is by casting it to a constrained type, such as an
bool. This forces the LLM to choose one of the provided options.
Marvin provides a dedicated
classify function for this purpose. As a convenience,
cast will automatically switch to
classify when given a constrained target type. However, you may prefer to use the
classify function to make your intent more clear to other developers.
In addition to providing Pydantic models as
cast targets, Marvin has a drop-in replacement for Pydantic's
BaseModel that permits instantiating the model with natural language. These "AI Models" can be created in two different ways:
- Decorating a BaseModel with
- Subclassing the
Though these are roughly equivalent, we recommend the decorator as it will make the intent more clear to other developers (in particular, it will not hide that the model is a
Here is the class decorator:
And here is the equivalent subclass:
You can pass parameters to the underlying API via the
model_kwargs argument of
@model. These parameters are passed directly to the API, so you can use any supported parameter.
If you are using
marvin in an async environment, you can use