Connect a coding agent

Premora can act as the private model endpoint for your coding agents. Point Claude Code, Codex, or any OpenAI-/Anthropic-compatible tool at Premora and it routes to your configured on-prem model (Premora Inference / vLLM) or an approved provider — with the upstream credential injected server-side, so the agent only ever holds a Premora API key.

:::tip Model endpoint vs. grounded knowledge This page is about using Premora as the model backend. To feed the agent your governed, ACL-aware company knowledge (search, lineage, the clarification queue), use the agent skill. The two compose: a private model and grounded context. :::

1. Mint an API key

Create a Premora API key from Settings → API Access (or via the API with a session JWT). The key (pk_premora_…) needs the inference:chat scope, plus inference:embeddings if the tool requests embeddings. Keep it out of version control.

The endpoint lives behind the single gateway front door at https://<your-premora-host>/v1.

2a. Claude Code (Anthropic-compatible)

Premora exposes POST /v1/messages. Point Claude Code at it:

export ANTHROPIC_BASE_URL="https://<your-premora-host>/v1"
export ANTHROPIC_AUTH_TOKEN="pk_premora_your_key"
export ANTHROPIC_MODEL="default"   # the model bound to the agent pass-through
claude

If your configured upstream is OpenAI-compatible (e.g. Premora Inference / vLLM), Premora translates between the Anthropic Messages and OpenAI Chat Completions wire formats automatically, including token streaming. If the upstream itself speaks Anthropic, requests pass through unchanged.

2b. Codex / OpenCode / OpenAI-compatible tools

Premora exposes POST /v1/chat/completions, POST /v1/embeddings, and GET /v1/models:

export OPENAI_BASE_URL="https://<your-premora-host>/v1"
export OPENAI_API_KEY="pk_premora_your_key"

This works with Codex CLI, OpenCode (OpenAI provider), Continue, Cursor, LiteLLM, and other OpenAI-compatible clients.

How it works

coding agent ──(Premora API key)──▶ premora-api-gateway /v1 ──▶ premora-llm-router
                                                                  │ resolve agentPassthrough
                                                                  │ inject upstream secret
                                                                  ▼
                                                       Premora Inference / vLLM / approved provider

The gateway forwards your API key to the router and streams the response back; it never exposes internal services directly.
The router verifies the key and scope, resolves the admin-configured agentPassthrough provider + model, injects the upstream credential, and proxies. Your key and base URL are all you hold; upstream credentials never leave the server.
An administrator configures the pass-through provider and model under Admin → Inference / Providers (see Runtime configuration).

Endpoints

Method	Path	Protocol	Notes
POST	`/v1/messages`	Anthropic Messages	Claude Code. Streaming supported.
POST	`/v1/chat/completions`	OpenAI Chat Completions	Codex / OpenCode / etc. Streaming supported.
POST	`/v1/embeddings`	OpenAI Embeddings	Requires `inference:embeddings` scope.
GET	`/v1/models`	OpenAI Models	Advertises the bound model.

Notes & limits

Streaming is supported on /v1/messages and /v1/chat/completions (stream: true).
Cross-protocol translation (Anthropic ⇄ OpenAI) covers text and tool calls for the non-streaming path and text for streaming. For full-fidelity tool streaming, configure an Anthropic-compatible upstream so requests pass through losslessly.
All usage is attributable to the key owner and runs through the gateway's rate/abuse controls.

1. Mint an API key​

2a. Claude Code (Anthropic-compatible)​

2b. Codex / OpenCode / OpenAI-compatible tools​

How it works​

Endpoints​

Notes & limits​