Connect a coding agent
Premora can act as the private model endpoint for your coding agents. Point Claude Code, Codex, or any OpenAI-/Anthropic-compatible tool at Premora and it routes to your configured on-prem model (Premora Inference / vLLM) or an approved provider — with the upstream credential injected server-side, so the agent only ever holds a Premora API key.
:::tip Model endpoint vs. grounded knowledge This page is about using Premora as the model backend. To feed the agent your governed, ACL-aware company knowledge (search, lineage, the clarification queue), use the agent skill. The two compose: a private model and grounded context. :::
1. Mint an API key
Create a Premora API key from Settings → API Access (or via the API with a session JWT).
The key (pk_premora_…) needs the inference:chat scope, plus inference:embeddings if the
tool requests embeddings. Keep it out of version control.
The endpoint lives behind the single gateway front door at https://<your-premora-host>/v1.
2a. Claude Code (Anthropic-compatible)
Premora exposes POST /v1/messages. Point Claude Code at it:
export ANTHROPIC_BASE_URL="https://<your-premora-host>/v1"
export ANTHROPIC_AUTH_TOKEN="pk_premora_your_key"
export ANTHROPIC_MODEL="default" # the model bound to the agent pass-through
claude
If your configured upstream is OpenAI-compatible (e.g. Premora Inference / vLLM), Premora translates between the Anthropic Messages and OpenAI Chat Completions wire formats automatically, including token streaming. If the upstream itself speaks Anthropic, requests pass through unchanged.
2b. Codex / OpenCode / OpenAI-compatible tools
Premora exposes POST /v1/chat/completions, POST /v1/embeddings, and GET /v1/models:
export OPENAI_BASE_URL="https://<your-premora-host>/v1"
export OPENAI_API_KEY="pk_premora_your_key"
This works with Codex CLI, OpenCode (OpenAI provider), Continue, Cursor, LiteLLM, and other OpenAI-compatible clients.
How it works
coding agent ──(Premora API key)──▶ premora-api-gateway /v1 ──▶ premora-llm-router
│ resolve agentPassthrough
│ inject upstream secret
▼
Premora Inference / vLLM / approved provider
- The gateway forwards your API key to the router and streams the response back; it never exposes internal services directly.
- The router verifies the key and scope, resolves the admin-configured
agentPassthroughprovider + model, injects the upstream credential, and proxies. Your key and base URL are all you hold; upstream credentials never leave the server. - An administrator configures the pass-through provider and model under Admin → Inference / Providers (see Runtime configuration).
Endpoints
| Method | Path | Protocol | Notes |
|---|---|---|---|
| POST | /v1/messages | Anthropic Messages | Claude Code. Streaming supported. |
| POST | /v1/chat/completions | OpenAI Chat Completions | Codex / OpenCode / etc. Streaming supported. |
| POST | /v1/embeddings | OpenAI Embeddings | Requires inference:embeddings scope. |
| GET | /v1/models | OpenAI Models | Advertises the bound model. |
Notes & limits
- Streaming is supported on
/v1/messagesand/v1/chat/completions(stream: true). - Cross-protocol translation (Anthropic ⇄ OpenAI) covers text and tool calls for the non-streaming path and text for streaming. For full-fidelity tool streaming, configure an Anthropic-compatible upstream so requests pass through losslessly.
- All usage is attributable to the key owner and runs through the gateway's rate/abuse controls.