reference-openai-compatible-provider
Generated from plugins/reference-openai-compatible-provider/README.md.
Generic OpenAI-compatible Chat Completions provider package for the AI Agent Platform.
This package is intended as a reusable baseline for OpenAI-like backends, including local servers such as Ollama and hosted providers that expose /v1/chat/completions-style APIs.
Supports:
- non-streaming and streaming chat completions
- OpenAI-style tool calling
- generic reasoning metadata extraction and preservation when providers return reasoning fields
- generic usage metadata extraction when providers return
usage - image attachments for OpenAI-compatible chat completions
- remote image URLs
- inline/data-URL images
- session-owned uploaded images stored locally by the application and inlined at send time
Reasoning, tools, and usage are provider- and model-dependent.
Quickstart (Ollama example)
Prerequisites:
- Ollama is running (
ollama serve) - a compatible model is available (for example
ollama pull qwen3:0.6b)
Create a terminal app config (example: config_ollama.json):
{
"plugin_cache_dir": "~/.crystal/cache/plugins",
"providers": {
"ollama": {
"provider": "reference_openai_compatible",
"base_url": "http://localhost:11434/v1",
"model": "qwen3:0.6b",
"timeout": 180,
"request_options": {
"think": true
}
}
},
"plugins": [
"path:/absolute/path/to/plugins/reference-openai-compatible-provider",
"path:/absolute/path/to/plugins/feature-request-options"
],
"agents": {
"default": {"provider": "ollama"}
}
}
Run the terminal app:
cd application/python
python -m agent_terminal_app --console --config /path/to/config_ollama.json
request_options.think is only an Ollama-compatible example. Other providers may use different request options, nested reasoning controls, or no reasoning toggle at all.
Configuration
Configuration is a single flat dict passed to the provider and all enabled plugins.
In application configs, provider settings live under providers.<provider_id> and may be overridden per-agent under agents.<agent_id>.
Provider config keys
These keys are read by reference_openai_compatible_provider.provider.OpenAICompatibleProvider:
provider(string): provider selector used by the application config- value:
reference_openai_compatible model(string, required): model id to send in the request payloadbase_url(string): OpenAI-compatible base URL; the provider appends/chat/completions- default:
https://api.openai.com/v1 endpoint_options(object, optional): labeled base URL choices that render an endpoint selector bound tobase_url- example:
{ "General": "https://api.z.ai/api/paas/v4", "Coding": "https://api.z.ai/api/coding/paas/v4" } api_key(string, optional): bearer token for hosted OpenAI-compatible APIstimeout(number, optional): request timeout in seconds- default:
60 debug_stream(boolean, optional): log decoded streaming chunks as they are received- default:
false min_request_interval_seconds(number, optional): minimum spacing between requests to the samebase_urlscope- default:
0.0 rate_limit_retry_delays_seconds(list or comma-separated string, optional): retry delays used for HTTP 429 responses- default:
[] respect_retry_after_header(boolean, optional): when retrying HTTP 429 responses, honor the responseRetry-Afterheader- default:
true retry_on_status(object, optional): maps HTTP status codes to retry delay lists for pre-stream retries- default:
{} retry_on_timeout(list or comma-separated string, optional): retry delays used for request timeout exceptions before streaming begins- default:
[] stream_recovery(object, optional): mid-stream recovery policy for SSE connection failures- default:
{ "mode": "disabled", "max_retries": 3 } - keys:
mode:"disabled","early_only", or"aggressive"max_retries: maximum number of stream recovery attempts when recovery is enabled
allow_image_attachment_base64(boolean, optional): allow inline/data-URL image attachments- default:
true - also controls client-uploaded image attachments, because stored session assets are sent inline
allow_image_attachment_url(boolean, optional): allow remote image URL attachments- default:
true
When endpoint_options is provided, the provider emits a config UI dropdown for base_url. Selecting an option updates base_url directly, so model discovery, request URLs, and rate-limit scoping automatically follow the selected endpoint.
Complete provider config example
This example shows all provider-owned config keys together. request_options is intentionally omitted here because it is provided by the separate feature-request-options plugin.
{
"providers": {
"zai": {
"provider": "reference_openai_compatible",
"base_url": "https://api.z.ai/api/paas/v4",
"endpoint_options": {
"General": "https://api.z.ai/api/paas/v4",
"Coding": "https://api.z.ai/api/coding/paas/v4"
},
"api_key": "${env:ZAI_API_KEY}",
"model": "glm-4.5-air",
"timeout": 180,
"debug_stream": false,
"min_request_interval_seconds": 0.5,
"rate_limit_retry_delays_seconds": [1, 3, 5],
"respect_retry_after_header": true,
"retry_on_status": {
"500": [2, 5, 10],
"502": [1, 3]
},
"retry_on_timeout": [1, 3, 5],
"stream_recovery": {
"mode": "early_only",
"max_retries": 3
}
}
}
}
Request options
To pass additional OpenAI-compatible request parameters such as temperature, max_tokens, stop, or provider-specific reasoning toggles, load the request_options feature plugin and set request_options in config.
Example:
{
"plugins": [
"path:/absolute/path/to/plugins/reference-openai-compatible-provider",
"path:/absolute/path/to/plugins/feature-request-options"
],
"providers": {
"demo": {
"provider": "reference_openai_compatible",
"base_url": "https://example.com/v1",
"model": "demo-model",
"request_options": {
"temperature": 0.2,
"max_tokens": 256
}
}
}
}
Tool calling
Tool support is enabled automatically when tool plugins are loaded:
- tool schemas are injected into
config["tools"] - requests include OpenAI-style
toolsandtool_choicewhen tools are present - assistant tool calls are surfaced as
metadata["tool_calls"]
To actually run tools, load one or more tool plugins. Example with built-in math tools:
{
"plugins": [
"path:/absolute/path/to/plugins/reference-openai-compatible-provider",
"plugins.math_tools.MathTools"
]
}
Reasoning metadata
The reasoning extension is now a generic parse/preserve layer. It does not own a package-specific think config key.
When provider-native assistant messages include these fields, they are surfaced into core message metadata:
reasoningreasoning_content
When these fields are attached to assistant metadata during a tool loop, the extension writes them back into provider-native assistant messages so they survive native-history reconstruction for follow-up requests.
Usage metadata
When the provider returns usage, the usage extension surfaces a minimal shared metadata shape:
metadata.usagemetadata.usage_prompt_tokensmetadata.usage_completion_tokensmetadata.usage_total_tokensmetadata.usage_reasoning_tokenswhen present- formatted token-count variants such as
metadata.usage_total_tokens_formatted
Provider-specific or extra usage fields stay inside raw metadata.usage.
Image attachments
When the package is loaded through its plugin path, it also exposes the
reference_openai_compatible_attachments provider extension. That extension:
- advertises image attachment UI to the frontend
- exposes
store_attachment/delete_attachment/download_attachmentactions backed by the application-owned session asset store - allows the frontend to upload an image to the local application first, then send it later as a session-owned attachment
- lets clients download a stored session-owned image again from message history
Stored uploaded images are not uploaded to a provider-side file API. Instead,
they are resolved from the local session asset store and converted to inline
data: URLs when the chat-completions request is built.
Retries and error recovery
The provider has three retry/recovery mechanisms. All pre-stream retries share a global attempt counter, so the total retry budget is bounded by the longest configured delay list across mechanisms.
Rate-limit retries (HTTP 429)
Retries on HTTP 429 responses with configurable backoff and Retry-After header support.
rate_limit_retry_delays_seconds(list or comma-separated string): delay in seconds before each retry attempt. List length = maximum retries.- default:
[](no retries) - example:
[1, 3, 5]— retry up to 3 times with 1s, 3s, 5s delays respect_retry_after_header(boolean): whentrue, waits at least the number of seconds specified by the response'sRetry-Afterheader (uses the larger of the header value and the configured delay).- default:
true min_request_interval_seconds(number): minimum time between consecutive requests to the samebase_urlscope, used for request pacing.- default:
0.0(no pacing)
{
"rate_limit_retry_delays_seconds": [1, 3, 5],
"respect_retry_after_header": true,
"min_request_interval_seconds": 0.5
}
Status code retries (pre-stream)
Retries on configurable HTTP status codes (for example 500, 502, 503) before the response body is consumed.
retry_on_status(object): maps HTTP status codes to delay lists. Keys are status code numbers (as strings in JSON). Values follow the same format asrate_limit_retry_delays_seconds. List length = maximum retries for that code.- default:
{}(no retries) - example:
{"500": [2, 5, 10], "502": [1, 3]}
{
"retry_on_status": {
"500": [2, 5, 10],
"502": [1, 3]
}
}
Timeout retries (pre-stream)
Retries when the HTTP request times out (requests.exceptions.Timeout — covers both connect and read timeouts).
retry_on_timeout(list or comma-separated string): delay in seconds before each retry attempt. List length = maximum retries.- default:
[](no retries — timeout exceptions propagate immediately) - example:
[1, 3, 5]
{
"retry_on_timeout": [1, 3, 5]
}
Stream recovery (mid-stream)
Recovers from connection errors that occur during an active SSE stream (for example connection drops, chunked encoding errors). The provider re-issues the full HTTP request and starts a new stream.
stream_recovery(object):mode(string): recovery strategy."disabled"— no recovery, connection errors propagate immediately. (default)"early_only"— recover only if no visible text content was emitted before the error. Safe for most cases since the consumer hasn't seen partial output yet."aggressive"— recover even after partial content was already streamed. The new request replays from the beginning, so the consumer may see duplicate content.
max_retries(integer): maximum number of recovery attempts.- default:
3(when mode is not"disabled")
- default:
{
"stream_recovery": {
"mode": "early_only",
"max_retries": 3
}
}
Retry and recovery example
{
"providers": {
"my_provider": {
"provider": "reference_openai_compatible",
"base_url": "https://api.example.com/v1",
"model": "gpt-4",
"rate_limit_retry_delays_seconds": [1, 3, 5],
"retry_on_status": {
"500": [2, 5, 10],
"502": [1, 3]
},
"retry_on_timeout": [1, 3, 5],
"stream_recovery": {
"mode": "early_only",
"max_retries": 3
}
}
}
}
Global attempt counter
Pre-stream retries (429, status codes, timeouts) share a single global attempt counter. For example, with retry_on_status: {"500": [2, 5, 8]} and retry_on_timeout: [1, 3, 5]:
| Attempt | Error | Delay source | Delay |
|---|---|---|---|
| 0 | HTTP 500 | retry_on_status[500][0] |
2s |
| 1 | Timeout | retry_on_timeout[1] |
3s |
| 2 | HTTP 500 | retry_on_status[500][2] |
8s |
| 3 | Timeout | no entry at index 3 | raises |
Stream recovery uses its own separate counter.
Troubleshooting
404or connection errors: confirmbase_urlincludes the/v1prefix, because the provider appends/chat/completions- missing tool calls: tool calling is model-dependent; use a model that supports tools and a prompt that strongly requests tool usage
- missing reasoning metadata: enable the appropriate provider/model request option through
request_optionsif needed, and confirm the model actually emits reasoning fields - missing usage metadata: some providers or local servers do not return
usagefor all endpoints or modes
Developer Notes
Install
From the repo root:
python -m pip install -e core/python
python -m pip install -e "plugins/reference-openai-compatible-provider[dev]"
Tests
Fast default package run:
pytest plugins/reference-openai-compatible-provider/tests -q
This package defaults to -m 'not integration'.
Local Ollama integration:
pytest plugins/reference-openai-compatible-provider/tests -m 'integration and ollama and not api and not slow_integration' -q
Hosted Fireworks integration:
pytest plugins/reference-openai-compatible-provider/tests -m 'integration and api and not slow_integration and fireworks' -q
Hosted Z.ai integration:
pytest plugins/reference-openai-compatible-provider/tests -m 'integration and api and not slow_integration and zai' -q
Environment overrides used by the Ollama integration tests:
OLLAMA_OPENAI_BASE_URL(defaulthttp://localhost:11434/v1)OLLAMA_OPENAI_MODEL(defaultqwen3:0.6b)
The reference-provider test suite bootstraps the repo root .env during local
use and does not overwrite already-exported environment variables. Hosted API
tests typically require FIREWORKS_API_KEY or ZAI_API_KEY.
Debugging streaming
Set debug_stream: true in provider config to log decoded streaming chunks. Avoid enabling this when sending secrets because responses may include sensitive data.
License
Copyright 2026 Dynamic Programming Solutions Kft.
Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at
http://www.apache.org/licenses/LICENSE-2.0
Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.