How to Force a Local LLM to Return Clean JSON with Ollama Structured Outputs
Stop parsing messy model text by hand. Ollama's structured outputs constrain any local model to a JSON schema you define.

If you have ever tried to extract structured data from a local model, you know the pain: you ask for JSON, and the model hands back JSON wrapped in an apology, a code fence, and a closing paragraph.
Quick answer
Pass a JSON schema in the format parameter of an Ollama /api/chat or /api/generate request, and Ollama compiles it into a grammar that constrains decoding so the model can only emit tokens that keep the output valid. In Python, let Pydantic generate the schema with model_json_schema() and validate the reply with model_validate_json(); in JS, use Zod. Set temperature to 0, still ask for JSON in the prompt, and you get parseable JSON on every call. Requires Ollama v0.5.0 or newer.
Ollama's structured outputs feature fixes this at the source. Instead of nudging the model with prompts and praying, it constrains generation itself so the model can only produce text that matches the shape you asked for. The result is parseable JSON on every call, from any model in the library.
Key takeaways
- Structured outputs work by constrained decoding, not prompting: Ollama converts your JSON schema into a grammar and zeroes out the probability of any token that would break it.
- You enable it by passing a JSON schema (or the string
"json") in theformatparameter of a/api/chator/api/generaterequest. - In Python, let Pydantic generate the schema with
model_json_schema()and validate the reply withmodel_validate_json(); in JavaScript, use Zod. - Set
temperatureto 0 and still ask for JSON in the prompt. The grammar controls shape; the prompt controls intent and field quality. - Watch two sharp edges in 2026: deeply nested or recursive schemas can degrade output, and disabling thinking on some thinking-capable models can silently drop the format constraint.
What "structured outputs" actually does
A structured output is not a prompt trick. When you pass a JSON schema to Ollama, the runtime compiles it into a grammar and constrains token sampling so that every generated token keeps the output valid against that schema. Under the hood Ollama builds on llama.cpp's GBNF (a grammar format that defines exactly which tokens are legal next), and since the v0.5 release it generates that grammar automatically from whatever JSON schema you send. Invalid tokens are effectively given zero probability, so the model physically cannot wander off-format.
That distinction matters. A prompt that says "respond only with JSON" is a request the model may ignore; constrained decoding is a hard rule enforced during sampling. It also has a pleasant side effect: because the model stops spending tokens deliberating over formatting, structured generation can run noticeably faster than free-form text for the same task.
You enable it by sending a JSON schema in the format parameter of a chat or generate request. That is the whole mechanism: define the shape, pass it in, parse the response.
Note
Structured outputs require Ollama v0.5.0 or newer. Update the server, then run ollama pull for your model before testing so you are on a build that compiles JSON schema to a grammar.
The fastest path: Python with Pydantic
The recommended approach in Python is to describe your data with a Pydantic model and let it generate the schema for you. The workflow is three steps: define the model, hand its schema to format, then validate the response back into a typed object.
from ollama import chat
from pydantic import BaseModel
class Book(BaseModel):
title: str
author: str
year: int
genres: list[str]
response = chat(
model="qwen3",
messages=[
{"role": "user", "content": "Describe the novel Dune as JSON. Return as JSON."}
],
format=Book.model_json_schema(),
options={"temperature": 0},
)
book = Book.model_validate_json(response.message.content)
print(book.title, book.year)
Two details carry the weight here. First, Book.model_json_schema() produces the exact JSON schema Ollama needs, so you never hand-write or maintain it. Second, model_validate_json() parses and validates the response into a real typed object, which means a wrong type or missing field fails loudly instead of slipping through into the rest of your program.

Calling the raw HTTP API
If you are not in Python, the pattern is identical: the format field accepts any valid JSON schema object. Here is the same idea against the HTTP endpoint with curl, constraining the model to return an array of strings under a required key.
curl http://localhost:11434/api/chat -d '{
"model": "qwen3",
"messages": [{"role": "user", "content": "List two planets as JSON"}],
"stream": false,
"format": {
"type": "object",
"properties": {
"planets": {
"type": "array",
"items": { "type": "string" }
}
},
"required": ["planets"]
}
}'
Because format takes a standard JSON schema, you can constrain nested objects, arrays, enums, and required fields exactly as you would for any API contract. In Node, the same flow works with Zod: define a schema, serialize it with a JSON-schema helper, and pass the result to format. If you are weighing which runtime to standardize on for this kind of work, our comparison of local inference engines covers where Ollama fits against vLLM and raw llama.cpp.
Constraining the values, not just the shape
Constrained decoding guarantees the output is valid JSON. It does not guarantee the values are correct. Three habits close that gap.
Keep the temperature at zero
Set temperature to 0 for extraction and classification. Deterministic sampling reduces the chance the model invents a plausible-looking value purely to satisfy a required field. Ollama's own guidance leans on determinism for exactly this reason.
Still ask for JSON in the prompt
Even with the grammar enforced, a short instruction like "Return as JSON" helps the model understand the task and tends to produce better-populated fields rather than empty placeholders. The schema controls the structure; the prompt controls the intent.
Use enums for classification
For labeling tasks, define an enum in your schema so the model can only return one of your allowed values. This is far more robust than parsing a free-text label out of a sentence. Note one caveat below: enum honoring has been imperfect in some builds, so always validate the returned label against your allowed set rather than assuming the grammar caught it.
Tip
Always run the response through a validator (model_validate_json in Pydantic, a Zod parse in JS). The grammar enforces structure; validation catches the semantic edge cases, such as an empty array where you expected items, before bad data reaches the rest of your app.
Schema versus prompt: who controls what
A frequent source of confusion is which lever does which job. The grammar and your prompt are not interchangeable; they handle different failure modes:
| Concern | Controlled by | What it guarantees |
|---|---|---|
| Output is valid JSON | The compiled grammar | Brackets, types, required keys are well-formed |
| Output matches your shape | The JSON schema you pass | Fields, nesting, arrays match your contract |
| Values are sensible | The prompt + temperature 0 | Better-populated, less-invented field values |
| Label is from an allowed set | Enum + your own validation | One of your values (verify, enums can slip) |
| Bad data never propagates | Pydantic / Zod validation | Loud failure on wrong type or empty array |
Known sharp edges in 2026
Structured outputs are reliable, but a few rough spots are worth knowing before they bite you in production.
- Deep nesting and recursion. Ollama's docs explicitly warn that deeply nested or recursive JSON structures may produce degraded or incomplete results. Flatten your schema where you can, and split very large extractions into smaller calls.
- Thinking models and
think=false. On some thinking-capable models, disabling the reasoning step has been reported to silently drop theformatconstraint, so the model returns plain prose instead of JSON. If a model ignores your schema, check whether thinking is involved before blaming the schema itself. - Enum adherence. Community reports show enum values occasionally not being honored. Treat enums as a strong hint plus a hard validation check, not an absolute guarantee.
- No direct GBNF passthrough. Ollama generates the grammar for you from JSON schema and does not expose raw GBNF, so anything you cannot express in JSON schema is currently out of reach.
When to reach for it
Structured outputs shine anywhere you need machine-readable data from a model: pulling fields out of receipts or emails, classifying support tickets, building a tool-calling layer, or generating config another program will consume. It pairs especially well with small language models, which are cheap to run locally and benefit most from having their formatting decisions taken off the table. It is also a natural fit in retrieval pipelines, where you can use it to produce clean, typed metadata alongside your RAG chunking strategies.
Because it all runs on your local Ollama install, you get this reliability with no API bill and no data leaving your machine, which is exactly why people run models locally in the first place. The shift is small but meaningful: instead of writing brittle regex to claw structure out of prose, you declare the shape once and let the runtime enforce it.
Frequently asked questions
Does the format parameter accept anything besides a full JSON schema?
Yes. You can pass the string "json" to force generic valid-JSON output without specifying a shape, or pass a complete JSON schema object to constrain the exact structure. The schema route is almost always what you want, since it gives you typed, predictable fields instead of arbitrary JSON.
Will structured outputs slow my requests down?
Generally the opposite. Because the model no longer spends tokens deciding how to format the answer, constrained generation is often faster than free-form text for the same prompt. The grammar-checking overhead is small compared to the tokens you save.
Do I have to use Pydantic or Zod?
No. They are conveniences that generate the schema and validate the response for you. You can hand-write a JSON schema and pass it directly to format, then parse the reply with any JSON parser. Pydantic and Zod simply give you type safety and loud failures for free.
Why is the model returning plain text even though I set a schema?
The usual culprits are running an Ollama build older than v0.5.0, or hitting the thinking-model interaction where disabling thinking drops the constraint. Confirm your version, keep thinking enabled on reasoning models, and simplify deeply nested schemas. For agent workflows that mix this with persistent state, see our notes on agent memory.


