OpenAI · none caching

GPT-3.5-turbo-instruct

GPT-3.5-turbo-instruct on Zumik: live pricing, context, and caching, routable by id or alias through one OpenAI-compatible endpoint.

Call it through Zumik About OpenAI

—

Input / 1M tokens

—

Output / 1M tokens

—

Cache read

Context window

Specifications

At a glance.

Provider	OpenAI
Family	gpt-3.5
Released	—
License	Proprietary
Context window	4K tokens
Max output	—
Modalities	text
Tool calling	Yes
Reasoning mode	No
Caching	none
Batch discount	50% off

Measured by Zumik

What reuse looks like here.

Not yet profiled

Pricing, context, and capabilities for GPT-3.5-turbo-instruct are live, but it is outside the flagship set Zumik benchmarks in depth, so measured reuse, capture, and warm TTFT are not shown yet. Run a workload estimate or route it by id to start collecting traces.

Call it

Same OpenAI client, this model.

python

from openai import OpenAI

client = OpenAI(base_url="https://api.zumik.ai/v1", api_key="zk_live_...")

r = client.responses.create(
    model="gpt-3-5-turbo-instruct",
    input="Draft a fix for the failing test.",
)
print(r.usage.input_tokens_cached)   # confirm reuse

Frequently asked

GPT-3.5-turbo-instruct, answered.

How much does GPT-3.5-turbo-instruct cost?

GPT-3.5-turbo-instruct is an open-weights model routed through OpenAI. It is priced on the host's serverless size tier rather than a single published per-token list price, so it shows "—" here until profiled.

What is GPT-3.5-turbo-instruct's context window?

GPT-3.5-turbo-instruct supports a 4K-token context window.

Does GPT-3.5-turbo-instruct support prompt caching?

Yes. OpenAI uses Automatic prefix caching caching. In the Zumik corpus, GPT-3.5-turbo-instruct shows a median cache capture of 80% on agent workloads.

Run GPT-3.5-turbo-instruct with reuse measured.

Point an OpenAI client at Zumik and see exactly how much of this model's input you are reusing.

Migration quickstart Back to catalog