Skip to content

openai_responses

Generated from plugins/openai_responses/README.md.

OpenAI Responses API bundle for the AI Agent Platform.

Use it when you want an OpenAI-backed agent with:

  • plain chat over the Responses API
  • OpenAI API-key auth, ChatGPT auth, or automatic fallback between them
  • streaming and non-streaming replies
  • direct-send image attachments
  • upload-before-send image/PDF attachments
  • tool calling
  • reasoning controls and surfaced reasoning metadata
  • usage and cost metadata in message/UI output
  • verbosity controls for supported models
  • flex processing controls with configurable timeout and retries
  • prompt caching controls with a session-owned cache key
  • native conversation compaction for longer sessions

Most users can think of this as a single plugin bundle that adds OpenAI Responses support to the app. The package name for installation is openai_responses.

It can also be paired with the chatgpt-auth-app application plugin so Responses requests can use a saved ChatGPT login against the ChatGPT-hosted endpoint.

Install

From the repo root:

python -m pip install -e core/python
python -m pip install -e "plugins/openai_responses[dev]"

Environment

  • OPENAI_API_KEY is required unless you pass api_key directly in provider config and use auth_mode: "api", or use auth_mode: "auto" with a valid saved ChatGPT login.
  • For local use in this repo, keeping OPENAI_API_KEY in the repo root .env is the simplest setup.
  • Do not commit real API keys into configs or docs.

The provider defaults to OPENAI_API_KEY automatically, so api_key can be omitted from config when the env var is present.

When auth_mode: "chatgpt" is selected, or when auth_mode: "auto" resolves to ChatGPT because a valid saved login exists, the provider reads a saved credential file instead of requiring OPENAI_API_KEY.

When credentials and network access are available, API-key mode fetches the real OpenAI model list through the OpenAI Python SDK. For ChatGPT auth, the provider merges three Codex-backed sources for model discovery:

  • checked-in manual fallback models
  • a checked-in Codex models snapshot
  • the latest live Codex models catalog from GitHub

If discovery is unavailable, the UI falls back to manual text entry for model.

Quickstart

This is the smallest useful terminal/app-layer config when you want basic Responses chat with the whole openai_responses bundle loaded from its plugin descriptor.

{
  "plugin_cache_dir": "~/.crystal/cache/plugins",
  "plugins": [
    "path:/absolute/path/to/plugins/openai_responses"
  ],
  "providers": {
    "openai": {
      "provider": "openai_responses",
      "model": "gpt-5-mini",
      "timeout": 60
    }
  },
  "agents": {
    "default": {
      "provider": "openai"
    }
  }
}

Run the terminal app:

cd application/python
python -m agent_terminal_app --console --config /path/to/config_openai_responses.json

In config-aware UIs, the provider will try to populate the model selector from the appropriate live catalog. If that lookup fails, you can still type a model id manually.

ChatGPT auth quickstart

This is the smallest useful config for OpenAI Responses with ChatGPT login support.

It assumes:

  • the chatgpt-auth-app application plugin is loaded
  • the system-message feature is loaded
  • top-level instructions should come from the system-message feature rather than a provider-only instructions config key
{
  "plugin_cache_dir": "${env:CONFIG_DIR}/.plugin_cache/plugins",
  "plugins": [
    "path:/absolute/path/to/plugins/openai_responses",
    "path:/absolute/path/to/plugins/chatgpt-auth-app",
    "path:/absolute/path/to/plugins/feature-system-message"
  ],
  "providers": {
    "openai": {
      "provider": "openai_responses",
      "model": "gpt-5",
      "auth_mode": "auto",
      "system_message_as_instructions": true,
      "strip_leading_system_or_developer_message": true,
      "system_message": "You are a precise assistant.",
      "timeout": 120
    }
  },
  "application": {
    "chatgpt_auth": {}
  },
  "agents": {
    "default": {
      "provider": "openai"
    }
  }
}

Then:

  1. Run the app or terminal client.
  2. Use the chatgpt_auth application actions:
  3. Sign in with ChatGPT
  4. Show ChatGPT login status
  5. Cancel ChatGPT login
  6. Log out of ChatGPT
  7. After login succeeds, auth_mode: "auto" will use ChatGPT-backed auth when available and fall back to API-key mode otherwise.

Full bundle example

Reasoning request shaping depends on the request-options feature, and tool usage depends on loading one or more tool plugins. This example shows the intended combination.

{
  "plugin_cache_dir": "~/.crystal/cache/plugins",
  "plugins": [
    "path:/absolute/path/to/plugins/openai_responses",
    "plugins.request_options_feature.RequestOptionsFeature",
    "plugins.math_tools.MathTools"
  ],
  "providers": {
    "openai": {
      "provider": "openai_responses",
      "model": "gpt-5-mini",
      "api_key": "${env:OPENAI_API_KEY}",
      "base_url": "https://api.openai.com/v1",
      "auth_mode": "api",
      "timeout": 120,
      "enable_flex_processing": false,
      "enable_fast_mode": false,
      "enable_fast_mode_for_api_requests": false,
      "flex_timeout_seconds": 300,
      "enable_flex_retries": true,
      "flex_max_retries": 2,
      "attachment_upload_mode": "upload_before_send",
      "enable_prompt_cache_key": true,
      "enable_prompt_cache_retention_24h": false,
      "reasoning_effort": "minimal",
      "reasoning_summary": "concise",
      "verbosity": "high"
    }
  },
  "agents": {
    "default": {
      "provider": "openai"
    }
  }
}

What this enables:

  • OpenAI Responses chat
  • reasoning controls and reasoning metadata on assistant messages
  • tool schema injection and tool-call handling
  • usage and cost metadata
  • GPT-5 verbosity controls
  • flex processing controls for supported OpenAI accounts
  • Fast mode controls for ChatGPT-backed requests, with optional API opt-in
  • prompt caching settings plus a regenerate-key session action
  • a native compaction session action for longer conversations

ChatGPT auth mode

The bundle supports three auth modes:

  • api
  • use api_key or OPENAI_API_KEY
  • use the normal OpenAI API base URL
  • chatgpt
  • use saved ChatGPT login credentials
  • use the ChatGPT-hosted Responses endpoint
  • auto
  • use ChatGPT when a valid saved login exists
  • otherwise fall back to API-key mode

Relevant keys:

  • auth_mode
  • chatgpt_credentials_path
  • chatgpt_base_url
  • src/openai_responses_plugins/data/openai_chatgpt_manual_models.json
  • checked-in ChatGPT-only fallback models and estimate-only pricing
  • pricing entries may define pricing, service_tiers, or both
  • used only when a model is missing from upstream sources
  • src/openai_responses_plugins/data/openai_codex_models.json
  • checked-in offline snapshot of the public Codex model catalog
  • refresh it with python scripts/refresh_openai_responses_models.py
  • used when live Codex fetching fails or when you want offline coverage from a recent repo update
  • chatgpt_account_id
  • chatgpt_client_version
  • strip_leading_system_or_developer_message

ChatGPT model-discovery precedence:

  • live Codex catalog overrides the checked-in Codex snapshot
  • checked-in Codex snapshot overrides manual fallback models
  • manual fallback models only fill gaps

Defaults:

  • ChatGPT base URL defaults to https://chatgpt.com/backend-api/codex
  • saved credentials default to CONFIG_DIR/auth/chatgpt-auth.json when the auth feature can discover CONFIG_DIR from runtime env

Instructions path for ChatGPT mode

For the ChatGPT endpoint, the recommended shape is:

  • set system_message_as_instructions: true on the provider config
  • let the system-message feature compile the prompt into state["request"]["instructions"]
  • avoid a provider-only instructions config key

This mirrors how Codex-style requests send top-level instructions.

Migrating older sessions

If older session histories contain a leading provider-native system or developer message from before top-level instructions were used, enable:

  • strip_leading_system_or_developer_message: true

When enabled, the provider drops the first native message only if its role is system or developer before sending the request.

Features

Tools

Load one or more tool plugins, and the bundle will pass their schemas to the Responses API and surface tool calls/results in the normal tool loop.

Image attachments

The bundle supports two attachment paths for images:

  • direct send through Responses input_image items
  • upload-before-send through the OpenAI Files API, followed by Responses file_id references

Direct-send image shapes:

  • multipartContent[].content.url with http:// or https:// image URLs
  • multipartContent[].content.url with data:image/... URLs
  • multipartContent[].content.data_base64 plus mime_type, which the provider converts into a data: URL before sending

When attachment_upload_mode: "upload_before_send" is enabled, inline image attachments are uploaded first and then sent as input_image items using OpenAI file references.

PDF attachments

PDF attachments are supported only when attachment_upload_mode: "upload_before_send" is enabled.

Supported PDF input shapes:

  • multipartContent[].content.url with data:application/pdf;base64,...
  • multipartContent[].content.data_base64 plus mime_type: "application/pdf"

In this mode, the provider uploads the PDF through the OpenAI Files API and sends the Responses request using an input_file item with a file_id reference.

Explicit upload actions are still follow-up work for Stage 3.

When the package is loaded through its plugin path, it also exposes the openai_responses_attachments provider extension. That extension:

  • advertises upload UI for supported attachment types
  • exposes store_attachment, delete_attachment, and download_attachment actions backed by the application-owned session asset store
  • lets clients upload an attachment once, send it later via asset_ref, and download the same stored asset again from message history

Reasoning

Reasoning controls are available through:

  • reasoning_effort
  • reasoning_summary

To use those options, also load plugins.request_options_feature.RequestOptionsFeature.

When the reasoning extension is active, it also requests include: ["reasoning.encrypted_content"] so OpenAI reasoning items can be preserved in raw metadata for future native replay.

Usage

Usage metadata is attached to assistant messages, including formatted prompt, completion, cached-token, and cost fields. This is useful in the terminal UI and other clients that render message footers or status bars.

Prompt caching

Prompt caching controls are available through:

  • enable_prompt_cache_key
  • prompt_cache_key
  • enable_prompt_cache_retention_24h

When prompt cache key support is enabled:

  • new sessions automatically receive a generated session-owned key
  • older sessions missing a key are repaired on first request
  • forked sessions keep the same key by default
  • users can manually edit the key through normal session settings
  • users can regenerate the key through the regenerate_prompt_cache_key session action

By default, prompt cache key support is enabled when this extension is loaded. Set enable_prompt_cache_key: false if you want to turn off automatic session-owned key generation and request injection.

When the effective auth mode selects the ChatGPT endpoint:

  • prompt_cache_key behavior remains available
  • prompt_cache_retention: "24h" is suppressed automatically even if enable_prompt_cache_retention_24h is true

Flex processing

Flex processing controls are available through:

  • enable_flex_processing
  • flex_timeout_seconds
  • enable_flex_retries
  • flex_max_retries

When flex processing is enabled, the extension writes request overrides so the provider can apply service_tier: "flex", the configured timeout, and the configured retry policy on both streaming and non-streaming Responses calls.

When the effective auth mode selects the ChatGPT endpoint, flex processing is disabled automatically even if the config flags are set. The ChatGPT-hosted endpoint does not support this OpenAI API feature.

Verbosity

verbosity controls the Responses text.verbosity option for supported models. In the current implementation, that is enabled for models whose id contains gpt-5.

Native compaction

The bundle includes a session action that compacts provider-native history for future Responses turns. This is intended for context maintenance, not for producing a human-readable summary message.

Configuration reference

Configuration is a flat dict shared across the provider and enabled extensions.

Provider keys

  • provider: must be openai_responses in app-layer provider config
  • model: required model id
  • auth_mode: optional, one of api, chatgpt, auto; defaults to api
  • api_key: optional if OPENAI_API_KEY is set or ChatGPT auth is used
  • base_url: optional, defaults to https://api.openai.com/v1
  • api_base_url: optional explicit base URL for API-key mode
  • chatgpt_base_url: optional, defaults to https://chatgpt.com/backend-api/codex
  • chatgpt_credentials_path: optional path to saved ChatGPT credentials
  • chatgpt_account_id: optional explicit ChatGPT account id override
  • chatgpt_client_version: optional, used for ChatGPT /models discovery
  • strip_leading_system_or_developer_message: optional compatibility flag for migrated sessions
  • compaction_model: optional model id for native compaction; leave empty or omit it to use model
  • use_chatgpt_auth_for_compaction: optional, defaults to true; set to false to force native compaction onto the API endpoint
  • use_compaction_placeholder_transcript: optional, defaults to false; set to true to show compacted native ranges as a single visible placeholder message instead of rebuilding the compacted native output directly
  • timeout: optional request timeout in seconds, defaults to 60

Attachment keys

  • attachment_upload_mode: direct or upload_before_send; defaults to direct

Use upload_before_send when you want the provider to upload inline image or PDF attachment data through the OpenAI Files API before the Responses request. PDF support is only available in this mode.

Flex processing keys

  • enable_flex_processing: enable service_tier: "flex" for Responses calls
  • flex_timeout_seconds: per-request timeout to apply when flex is enabled; defaults to 300
  • enable_flex_retries: enable request-scoped retries for flex calls; defaults to true
  • flex_max_retries: max retry count when flex retries are enabled; defaults to 2

Note: flex processing is ignored automatically when the effective auth mode selects the ChatGPT endpoint.

Fast mode

Fast mode controls are available through:

  • enable_fast_mode: enable service_tier: "priority" for ChatGPT-backed Responses calls
  • enable_fast_mode_for_api_requests: config-only opt-in that lets enable_fast_mode also apply service_tier: "priority" to API-backed Responses calls

By default, Fast mode applies only to ChatGPT-backed requests. When enable_fast_mode_for_api_requests: true is set in config, the same Fast mode toggle also applies service_tier: "priority" to API-backed requests. The usage footer renders tier: priority when OpenAI reports that tier, and cost estimation uses priority-tier pricing when the checked-in or manual pricing data defines it.

Benchmark script

The repo includes a benchmark helper at scripts/run_openai_responses_tier_matrix.py.

Useful options:

  • --model or --models
  • --repeat
  • --execution-mode sequential|parallel
  • --run-scope api|chatgpt|both
  • --desired-input-tokens
  • --desired-output-tokens

The script saves a text artifact under artifacts/ containing:

  • the benchmark configuration
  • one section per model/tier/repeat run
  • actual usage and duration metrics
  • aggregate summaries per model/tier row

Reasoning keys

These require plugins.request_options_feature.RequestOptionsFeature to be loaded.

  • reasoning_effort: model-specific effort value. Current UI options are:
  • GPT-5 models: minimal, low, medium, high
  • GPT-5.1 models: none, low, medium, high
  • GPT-5.2 and GPT-5.4 models: none, low, medium, high, xhigh
  • GPT-5 pro: high
  • unknown models: fallback UI shows all documented values
  • reasoning_summary: one of auto, concise, detailed

Assistant messages always expose the frontend-friendly field metadata.reasoning. OpenAI-specific raw reasoning data is preserved under metadata.openai_responses_reasoning.

Verbosity keys

  • verbosity: one of low, medium, high
  • show_verbosity_ui: set to false to hide the verbosity selector in UI

Verbosity only activates when the selected model is tagged with supports_verbosity. In the current implementation, that tag is added for models whose id contains gpt-5.

Usage keys

  • pricing_snapshot_path: optional override for the checked-in pricing snapshot
  • show_usage: set to false to hide usage footer/status UI elements
  • show_estimated_cached_tokens: set to true to show ChatGPT estimated cached-token footers
  • show_openai_auth_mode: set to true to show the effective auth mode (API or ChatGPT) in usage footers

ChatGPT-backed Responses usage currently estimates cached tokens using turn-aware heuristics. That estimate is kept separate from the raw provider usage payload and is used for ChatGPT-mode pricing instead of any backend-reported cached-token field. By default it is not displayed; enable show_estimated_cached_tokens if you want the footer.

Prompt caching keys

  • enable_prompt_cache_key: defaults to true; set to false to disable automatic session-owned prompt cache keys
  • prompt_cache_key: optional manual cache key override shown in normal session settings UI
  • enable_prompt_cache_retention_24h: set to true to send prompt_cache_retention: "24h"

Note: enable_prompt_cache_retention_24h is ignored automatically when the effective auth mode selects the ChatGPT endpoint.

Tools

Tool schemas come from loaded tool plugins such as plugins.math_tools.MathTools.

No extra config key is required beyond loading tools in the application.

Native compaction

It exposes a session action with id compact_native_history that uses the native compaction API, rewrites provider-native history, and then rebuilds the visible transcript directly from the compacted native output.

If you prefer the older placeholder-style transcript behavior, enable:

  • use_compaction_placeholder_transcript: true

When enabled, compacted native segments are shown as a single visible placeholder message and are expanded back into native compaction artifacts on future requests.

The bundle also exposes a session action that toggles this view mode:

  • Collapse compacted messages
  • Expand compacted messages

The action changes the visible projection only. Retained native history stays canonical and is rebuilt using the updated session override.

By default, native compaction follows the same effective auth mode as normal Responses requests. If use_chatgpt_auth_for_compaction is enabled, a session running in ChatGPT mode, or auto mode that selected ChatGPT, will attempt native compaction against the ChatGPT-backed endpoint too. If that backend does not support native compaction, the action fails with a clear endpoint-specific error. Set use_chatgpt_auth_for_compaction: false to force compaction onto the normal API endpoint instead.

By default, compaction uses the same model as normal requests. Set compaction_model to use a different model for the compaction API call, for example:

{
  "model": "gpt-5.5",
  "compaction_model": "gpt-5.4-mini"
}

The compacted native output shape is model-dependent. Some models return visible compacted assistant messages alongside opaque compaction artifacts; others return only the retained visible user messages plus opaque compaction state. Both shapes are valid as long as future requests continue from the compacted native history.

The action no longer accepts a freeform instructions field. When the backend requires top-level instructions, native compaction now uses the same config-driven system-message pipeline as normal requests:

  • enable system_message_as_instructions: true
  • set system_message as usual
  • enable strip_leading_system_or_developer_message: true if older sessions still contain a persisted leading system or developer message

When system_message_as_instructions: false, native compaction also follows the same request-time message-injection path as a normal request:

  • if state["request"]["messages"] is the effective outbound message set, compaction uses that rendered message list as its compact input
  • _metadata is stripped from outbound compact input
  • if the compacted output preserves the leading system or developer message, the stored native history restores the original retained version of that message, including template-form content and _metadata
  • when a retained compaction_summary item is present, the visible rebuilt transcript surfaces it as a clear assistant placeholder message instead of letting that native item drift into unrelated later message ownership

Prompt caching session action

It exposes a session action with id regenerate_prompt_cache_key that replaces the current session-owned prompt cache key without changing the visible transcript.

Capabilities and caveats

  • API-key mode supports both non-streaming and streaming Responses requests.
  • ChatGPT mode is streaming-only in the current implementation. Non-streaming requests fail with a clear error.
  • The provider retains Responses-native history so future turns can replay native items rather than flattening everything to plain chat text.
  • Reasoning requests include reasoning.encrypted_content so raw reasoning can be preserved even when the model does not expose readable reasoning text.
  • Verbosity support is model-gated and currently follows the provider tag check for gpt-5 models.
  • ChatGPT-backed requests use top-level instructions and store: false.
  • Flex processing and prompt-cache retention are automatically disabled when the ChatGPT endpoint is selected.
  • ChatGPT /models discovery may still require client_version; if discovery is unavailable, pin model explicitly in config.
  • Native compaction is intentionally different from human-readable summary insertion. It preserves provider-native compacted artifacts for future Responses requests instead of replacing history with a plain assistant summary.

Native compaction behavior

Native compaction is meant for Responses-native context maintenance, not transcript summarization.

  • It can compact the full native history or a visible prefix.
  • It uses the OpenAI Python SDK rather than application-layer summary logic.
  • It uses compaction_model when configured, otherwise it uses the normal request model.
  • It rebuilds visible messages after compaction directly from the compacted native output, whose visible assistant/user shape may vary by model.
  • If use_compaction_placeholder_transcript: true is enabled, it instead shows the compacted region as a single placeholder message in the visible transcript while still preserving the underlying retained native compaction artifacts.
  • Future requests continue from the compacted native history.

Bundle layout

If you are extending or debugging the bundle, the responsibilities are split like this:

  • provider: transport, SDK calls, Responses-native conversion, and native history replay
  • attachments: attachment upload policy/UI plus PDF composer UI
  • tools: tool schema injection and tool-call reconstruction
  • reasoning: reasoning request shaping and reasoning metadata
  • usage: usage formatting and pricing metadata
  • verbosity: text.verbosity request shaping
  • prompt caching: prompt_cache_key / retention request shaping and session-key lifecycle
  • ChatGPT auth feature: saved-credential loading plus api / chatgpt / auto auth-mode selection
  • native compaction: session action for provider-native compaction

Implementation classes registered by the bundle:

  • openai_responses_plugins.openai_responses_provider.OpenAIResponsesProvider
  • openai_responses_plugins.openai_responses_attachments_extension.OpenAIResponsesAttachmentsExtension
  • openai_responses_plugins.openai_responses_chatgpt_auth_feature.OpenAIResponsesChatGPTAuthFeature
  • openai_responses_plugins.openai_responses_flex_processing_extension.OpenAIResponsesFlexProcessingExtension
  • openai_responses_plugins.openai_responses_fast_mode_extension.OpenAIResponsesFastModeExtension
  • openai_responses_plugins.openai_responses_native_compaction_extension.OpenAIResponsesNativeCompactionExtension
  • openai_responses_plugins.openai_responses_prompt_caching_extension.OpenAIResponsesPromptCachingExtension
  • openai_responses_plugins.openai_responses_tools_extension.OpenAIResponsesToolsExtension
  • openai_responses_plugins.openai_responses_reasoning_extension.OpenAIResponsesReasoningExtension
  • openai_responses_plugins.openai_responses_usage_extension.OpenAIResponsesUsageExtension
  • openai_responses_plugins.openai_responses_verbosity_extension.OpenAIResponsesVerbosityExtension

Troubleshooting

  • Authentication errors: check OPENAI_API_KEY in the repo root .env, set api_key in config, or verify the saved ChatGPT credential file exists when using chatgpt / auto.
  • ChatGPT mode says streaming is required: use the streaming request path, or switch auth_mode to api / auto.
  • ChatGPT requests behave oddly on older sessions: enable system_message_as_instructions: true and strip_leading_system_or_developer_message: true.
  • Reasoning controls do nothing: make sure plugins.request_options_feature.RequestOptionsFeature is loaded.
  • Verbosity control does not appear or has no effect: verify the model is tagged with supports_verbosity; today that means a gpt-5 model id.
  • Tool calls never appear: load at least one tool plugin and use a model/prompt that will actually call tools.
  • Native compaction action is missing: make sure the active provider is openai_responses and the bundle was loaded from path:/absolute/path/to/plugins/openai_responses or equivalent.

Tests

For contributors and local validation:

Run the fast default package suite:

pytest plugins/openai_responses/tests -q

This package defaults to -m 'not integration and not manual_integration'.

Run hosted API integration tests:

pytest plugins/openai_responses/tests -m 'integration and api and not slow_integration and openai' -q

Run manual ChatGPT-auth integration tests separately:

pytest plugins/openai_responses/tests -m 'manual_integration' -q

The OpenAI Responses test suite bootstraps the repo root .env during local use in this repo and does not overwrite already-exported environment variables. Hosted API tests typically require OPENAI_API_KEY. Manual integration tests also require saved ChatGPT auth state.

License

Copyright 2026 Dynamic Programming Solutions Kft.

Licensed under the Apache License, Version 2.0 (the "License"); you may not use this file except in compliance with the License. You may obtain a copy of the License at

http://www.apache.org/licenses/LICENSE-2.0

Unless required by applicable law or agreed to in writing, software distributed under the License is distributed on an "AS IS" BASIS, WITHOUT WARRANTIES OR CONDITIONS OF ANY KIND, either express or implied. See the License for the specific language governing permissions and limitations under the License.