Skip to main content

Connect a coding agent

Premora can act as the private model endpoint for your coding agents. Point Claude Code, Codex, or any OpenAI-/Anthropic-compatible tool at Premora and it routes to your configured on-prem model (Premora Inference / vLLM) or an approved provider — with the upstream credential injected server-side, so the agent only ever holds a Premora API key.

:::tip Model endpoint vs. grounded knowledge This page is about using Premora as the model backend. To feed the agent your governed, ACL-aware company knowledge (search, lineage, the clarification queue), use the agent skill. The two compose: a private model and grounded context. :::

1. Mint an API key

Create a Premora API key from Settings → API Access (or via the API with a session JWT). The key (pk_premora_…) needs the inference:chat scope, plus inference:embeddings if the tool requests embeddings. Keep it out of version control.

The endpoint lives behind the single gateway front door at https://<your-premora-host>/v1.

2a. Claude Code (Anthropic-compatible)

Premora exposes POST /v1/messages. Point Claude Code at it:

export ANTHROPIC_BASE_URL="https://<your-premora-host>/v1"
export ANTHROPIC_AUTH_TOKEN="pk_premora_your_key"
export ANTHROPIC_MODEL="default" # the model bound to the agent pass-through
claude

If your configured upstream is OpenAI-compatible (e.g. Premora Inference / vLLM), Premora translates between the Anthropic Messages and OpenAI Chat Completions wire formats automatically, including token streaming. If the upstream itself speaks Anthropic, requests pass through unchanged.

2b. Codex / OpenCode / OpenAI-compatible tools

Premora exposes POST /v1/chat/completions, POST /v1/embeddings, and GET /v1/models:

export OPENAI_BASE_URL="https://<your-premora-host>/v1"
export OPENAI_API_KEY="pk_premora_your_key"

This works with Codex CLI, OpenCode (OpenAI provider), Continue, Cursor, LiteLLM, and other OpenAI-compatible clients.

How it works

coding agent ──(Premora API key)──▶ premora-api-gateway /v1 ──▶ premora-llm-router
│ resolve agentPassthrough
│ inject upstream secret

Premora Inference / vLLM / approved provider
  • The gateway forwards your API key to the router and streams the response back; it never exposes internal services directly.
  • The router verifies the key and scope, resolves the admin-configured agentPassthrough provider + model, injects the upstream credential, and proxies. Your key and base URL are all you hold; upstream credentials never leave the server.
  • An administrator configures the pass-through provider and model under Admin → Inference / Providers (see Runtime configuration).

Endpoints

MethodPathProtocolNotes
POST/v1/messagesAnthropic MessagesClaude Code. Streaming supported.
POST/v1/chat/completionsOpenAI Chat CompletionsCodex / OpenCode / etc. Streaming supported.
POST/v1/embeddingsOpenAI EmbeddingsRequires inference:embeddings scope.
GET/v1/modelsOpenAI ModelsAdvertises the bound model.

Notes & limits

  • Streaming is supported on /v1/messages and /v1/chat/completions (stream: true).
  • Cross-protocol translation (Anthropic ⇄ OpenAI) covers text and tool calls for the non-streaming path and text for streaming. For full-fidelity tool streaming, configure an Anthropic-compatible upstream so requests pass through losslessly.
  • All usage is attributable to the key owner and runs through the gateway's rate/abuse controls.