API Reference — LiteRTLM Gateway

Base URL

http://<host>:<port>

All REST endpoints are prefixed with /api. The WebSocket endpoint is at /ws.

Authentication

Two credential types are accepted depending on the endpoint:

Type	Header	Endpoints
JWT	Authorization: Bearer <accessToken>	All `/api/conversations/`, `/api/api-key/`
API Key	Authorization: Bearer lrtlm_<key>	All `/api/conversations/*`, WebSocket

API key management endpoints (/api/api-key/*) require JWT only.

Auth

POST /api/auth/login

Authenticate and receive a JWT pair. No auth header required.

Request body

{
  "username": "admin",
  "password": "your-password"
}

Response 200

{
  "ok": true,
  "accessToken": "eyJ...",
  "refreshToken": "eyJ..."
}

POST /api/auth/refresh

Exchange a refresh token for a new JWT pair. Refresh tokens rotate on every use.

Request body

{
  "refreshToken": "eyJ..."
}

Response 200

{
  "ok": true,
  "accessToken": "eyJ...",
  "refreshToken": "eyJ..."
}

POST /api/auth/logout

Invalidate the current session. The access token expires naturally after 15 minutes.

Request body

{
  "refreshToken": "eyJ..."
}

Response 200

{ "ok": true }

Conversations

All conversation endpoints accept JWT or API key via Authorization: Bearer.

GET /api/conversations

List all conversations.

Response 200

{
  "ok": true,
  "conversations": ["my-chat", "code-review"]
}

POST /api/conversations

Create a new conversation. Choose a builtin preset or supply a custom system instruction. Optionally bind tools by name.

Request body — builtin preset

{
  "name": "my-chat",
  "config": "assistant"    // "assistant" | "coder" | "concise" | "creative"
}

Request body — with tools

{
  "name": "my-chat",
  "config": "assistant",
  "tools": ["datetime", "calculator"]    // optional — bind tools by name
}

Request body — custom instruction

{
  "name": "my-chat",
  "systemInstruction": "You are a pirate. Respond only in pirate speak.",
  "tools": ["datetime"]    // tools work with custom instructions too
}

Response 201

{
  "ok": true,
  "name": "my-chat",
  "config": "assistant"
}

GET /api/conversations/{name}/messages

Retrieve full message history for a conversation, ordered oldest-first.

Response 200

{
  "ok": true,
  "messages": [
    { "role": "user",  "text": "Hello", "seq": 0, "createdAt": 1700000000000 },
    { "role": "model", "text": "Hi! How can I help?", "seq": 1, "createdAt": 1700000000000 }
  ]
}

POST /api/conversations/{name}/messages

Send a message and receive the full reply in a single blocking response. For streaming, use the WebSocket endpoint instead.

Request body

{
  "message": "Explain binary search trees"
}

Response 200

{
  "ok": true,
  "reply": "A binary search tree is a data structure..."
}

DELETE /api/conversations/{name}

Permanently delete a conversation and all its message history.

Response 200

{ "ok": true }

Tools

Tools let the model call server-side functions during inference. When a tool is bound to a conversation, the model can invoke it automatically — the entire call/response loop happens inside the SDK before any token reaches the client. From the client's perspective, the reply arrives as normal streaming tokens.

GET /api/tools

List all tools currently registered on the server. No authentication required.

Response 200

{
  "ok": true,
  "tools": [
    {
      "name": "datetime",
      "description": "Get the current date and time...",
      "parameters": [
        { "name": "format",   "type": "STRING", "description": "Date format pattern", "required": false },
        { "name": "timezone", "type": "STRING", "description": "IANA timezone ID",    "required": false }
      ]
    },
    {
      "name": "calculator",
      "description": "Evaluate a mathematical expression...",
      "parameters": [
        { "name": "expression", "type": "STRING", "description": "A math expression", "required": true }
      ]
    }
  ]
}

POST /api/conversations — bind tools at creation

Pass "tools" when creating a conversation to bind tools by name. Tools are resolved from the server registry — unknown names are silently skipped. Omit "tools" or pass [] for a tool-free conversation.

Request body

{
  "name":   "research-chat",
  "config": "assistant",
  "tools":  ["datetime", "calculator"]
}

Built-in tools

Name	Description	Parameters
datetime	Returns the current date and time	`format` (optional) — Java date pattern, e.g. `yyyy-MM-dd HH:mm:ss` `timezone` (optional) — IANA ID, e.g. `Asia/Tokyo`
calculator	Evaluates a math expression and returns the result	`expression` (required) — e.g. `(3 + 5) * 2`, `Math.sqrt(16)`
rogo_list_docs	Lists all enterprise documents available in the rogodocs directory. Returns name, size, and last-modified date for each file.	None
rogo_read_doc	Reads the full content of an enterprise document by filename. Call `rogo_list_docs` first to discover available files.	`filename` (required) — exact filename, e.g. `hr-policy.md`

How it works

// 1. Create a conversation with tools bound
POST /api/conversations
{ "name": "helper", "config": "assistant", "tools": ["datetime", "calculator"] }

// 2. Send a message — tool calls happen inside the model, transparently
WS /ws/conversations/helper?token=lrtlm_...
→ { "message": "What is 2 to the power of 10, and what time is it in Tokyo?" }

// Model calls: calculator("Math.pow(2,10)") → "1024"
//              datetime(timezone="Asia/Tokyo") → "2026-04-11 18:30:00"
// Then generates the final reply using both results.

← { "type": "token", "token": "2 to the power of 10 is 1024..." }
← { "type": "done" }

// 3. The client sees only the final answer — tool calls are invisible

Tool error handling

If a tool fails (bad parameters, runtime error), it returns a descriptive error string to the model — e.g. "Error: could not evaluate expression '1/0': Division by zero". The model reads this as the tool result and responds accordingly. Tool errors never interrupt streaming or cause a 503.

Inference Queue

The gateway processes inference tasks one at a time on a single engine. When multiple conversations send messages simultaneously, tasks queue and are processed in submission order.

GET /api/queue

Returns the current queue snapshot — which conversations are processing or waiting, and their positions. No authentication required.

Response 200 — engine idle

{
  "ok": true,
  "size": 0,
  "queue": []
}

Response 200 — tasks in queue

{
  "ok": true,
  "size": 2,
  "queue": [
    { "name": "chat-1", "position": 1, "status": "processing" },
    { "name": "chat-5", "position": 2, "status": "waiting"    }
  ]
}

position is 1-based. Position 1 is the conversation currently being processed. The status field is either "processing" or "waiting".

WebSocket — Streaming

WS /ws/conversations/{name}?token=<accessToken|apiKey>

Stream model replies token-by-token over a persistent WebSocket connection. Pass your JWT access token or API key as the token query parameter. The connection stays open across multiple turns — send a new message after receiving "done". Inference runs detached — if you disconnect mid-generation, the reply still completes and persists to DB.

Client → Server

{ "message": "What is Kotlin coroutines?" }

Server → Client — frame types

Type	When	Fields
queued	Engine is busy with another conversation — your task is waiting	`position` — 1-based queue position
busy	Inference is now running for your message	—
token	One partial token streamed from the model	`token` — partial text string
done	Turn complete — authoritative full reply	`reply` — full reply text
error	Recoverable error — connection stays open	`error` — error message

Example — engine free (no queue)

// Client sends:
{ "message": "Explain binary search" }

// Server responds:
{ "type": "busy" }
{ "type": "token", "token": "Binary" }
{ "type": "token", "token": " search" }
{ "type": "token", "token": " is..." }
// ... one frame per token ...
{ "type": "done", "reply": "Binary search is a fast algorithm..." }

Example — engine busy (queued)

// Server responds immediately with queue position:
{ "type": "queued", "position": 2 }
// Then when it's your turn:
{ "type": "busy" }
{ "type": "token", "token": "Binary" }
// ...
{ "type": "done", "reply": "Binary search is a fast algorithm..." }

Example — JavaScript client

const ws = new WebSocket(
  'ws://localhost:8080/ws/conversations/my-chat?token=lrtlm_...'
);

let streamBuffer = '';

ws.onopen = () => {
  ws.send(JSON.stringify({ message: 'Hello!' }));
};

ws.onmessage = (evt) => {
  const frame = JSON.parse(evt.data);

  if (frame.type === 'queued') {
    console.log('Queued at position', frame.position);
  } else if (frame.type === 'busy') {
    console.log('Inference running...');
    streamBuffer = '';
  } else if (frame.type === 'token') {
    streamBuffer += frame.token;
    process.stdout.write(frame.token);   // live streaming
  } else if (frame.type === 'done') {
    console.log('\n[done]', frame.reply); // authoritative full reply
  } else if (frame.type === 'error') {
    console.error('[error]', frame.error);
  }
};

Reconnect behaviour

Scenario	Behaviour
Disconnect mid-inference	Inference keeps running. Reply persists to DB on completion. Reload history via `GET /messages`.
Reconnect while inference running	Server sends `busy`, streams tokens from that point on, then `done` with full reply.
Reconnect after inference done	No frames sent. Load history via `GET /api/conversations/{name}/messages`.

API Key Management

These endpoints require JWT only (Authorization: Bearer <accessToken>).

POST /api/api-key/generate

Generate a new API key. The raw key is returned once — store it immediately.

Request body

{ "name": "mobile-client" }

Response 201

{
  "ok": true,
  "key": "lrtlm_...",       // raw key — shown once only
  "id": "uuid",
  "prefix": "lrtlm_Xx",
  "name": "mobile-client"
}

GET /api/api-key/list

List all API keys (active and revoked). Raw keys are never returned.

Response 200

{
  "ok": true,
  "keys": [
    {
      "id": "uuid",
      "prefix": "lrtlm_Xx",
      "name": "mobile-client",
      "active": true,
      "createdAt": 1700000000000,
      "lastUsedAt": 1700000001000
    }
  ]
}

GET /api/api-key/info?key=lrtlm_...

Look up metadata for a specific key by its raw value.

DELETE /api/api-key/revoke

Soft-revoke an API key. The record is kept for audit purposes.

Request body

{ "key": "lrtlm_..." }

Response 200

{ "ok": true }

Error Responses

All errors follow the same structure:

{ "ok": false, "error": "Human-readable message" }

Status	Meaning
400	Bad request — missing or invalid field
401	Unauthorized — missing, invalid, or expired token
404	Resource not found
409	Conflict — resource already exists
503	Engine not ready — model is still loading