Replacing your existing chat/completions backend

If your app already calls a POST /chat/completions-style endpoint and renders the returned text, this page is the migration. The shape changes in exactly one place: the reply is no longer in the HTTP response. Everything else — your UI, your end-user auth, your message history — stays yours.

This guide gives you a drop-in adapter that hides the change behind a sync-feeling function, plus the production patterns (retry safety, typing UX, polling-vs-SSE, and a graceful fallback to your old provider during cutover).

The one mental shift

A classic completions call is synchronous: you send messages, the HTTP response is the reply.

POST /chat/completions  { messages: [...] }   ->   200 { choices: [{ message }] }
                        (reply is in this response)

LoreOS is asynchronous: POST /v1/sessions/{id}/messages returns immediately with a cursor and a run_ref — an acknowledgement, not the reply. The character’s reply arrives a few seconds later as a message.created event on the session’s event log, which you read by polling, SSE, or webhook.

POST /v1/sessions/{id}/messages  { text }
  -> 200 { accepted: true, cursor, run_ref }           (acknowledgement)
  ...a few seconds later, on the event log...
  -> { type: "message.created", role: "character", payload: { bubbles, text } }

Why it works this way. A LoreOS reply is not a single model call. The character runs a multi-step engine per turn — retrieval over its evolving world model, relational and emotional state, grounding and safety/accuracy gates, voice shaping, then multi-bubble emission and delivery. That work takes seconds and can be recovered if your process dies mid-turn (a durability ledger re-runs it). Returning a cursor immediately lets you ack the send instantly, show a typing indicator, and read a richer multi-bubble reply when it lands — instead of holding an HTTP request open through the whole engine.

Two consequences to internalize before you write code:

The reply is multi-bubble. payload.bubbles is an ordered list (a messenger-style burst); payload.text is the joined convenience form. Render bubbles when present.
Don’t read top-level response fields. Every response is wrapped as { schema_version, data, next_actions }. Read response.data.cursor, not response.cursor. Errors are { detail: { code, message, fix } }.

A reference adapter (sync façade over the async events)

The cleanest migration keeps your call sites looking synchronous. Write one server-side function — getReply(sessionId, text) — that sends the message, waits for the message.created event, and returns the bubbles. Your existing code calls it the same way it called chat/completions; the async model is hidden inside.

This matches the examples/loreos-node-chat style: a small loreos() helper that unwraps data and throws on the { detail } error envelope, plus a cursor-poll loop.

1 // loreos-adapter.ts — keep this on the server; never ship LOREOS_KEY to the browser.
2 const BASE = process.env.LOREOS_BASE || "https://api.loreos.app";
3 const KEY = process.env.LOREOS_KEY!;
4 
5 async function loreos(path: string, options: RequestInit = {}) {
6   const response = await fetch(`${BASE}${path}`, {
7     ...options,
8     headers: {
9       Authorization: `Bearer ${KEY}`,
10       "Content-Type": "application/json",
11       ...(options.headers || {}),
12     },
13   });
14   const body = await response.json().catch(() => ({}));
15   if (!response.ok) {
16     const detail = body.detail || {};
17     throw new Error(
18       `${response.status} ${detail.code || "error"}: ${
19         detail.fix || detail.message || JSON.stringify(body)
20       }`,
21     );
22   }
23   return body.data;
24 }
25 
26 type Reply = { bubbles: string[]; text: string; cursor: number; runRef: string };
27 
28 // Sync-feeling reply: send -> poll the event log until the character replies -> return.
29 export async function getReply(
30   sessionId: string,
31   text: string,
32   opts: { idempotencyKey?: string; replyMode?: "fast" | "deep"; timeoutMs?: number } = {},
33 ): Promise<Reply> {
34   // 1. Send. Returns immediately with a cursor + run_ref (NOT the reply).
35   const accepted = await loreos(`/v1/sessions/${encodeURIComponent(sessionId)}/messages`, {
36     method: "POST",
37     // Idempotency-Key makes a retried send safe (serverless/Vercel may retry).
38     headers: opts.idempotencyKey ? { "Idempotency-Key": opts.idempotencyKey } : {},
39     body: JSON.stringify({ text, ...(opts.replyMode ? { reply_mode: opts.replyMode } : {}) }),
40   });
41   const runRef: string = accepted.run_ref;
42   let cursor: number = accepted.cursor;
43 
44   // 2. Poll the event log for THIS turn's reply. Poll from the user-message cursor.
45   const deadline = Date.now() + (opts.timeoutMs ?? 45_000);
46   while (Date.now() < deadline) {
47     const page = await loreos(
48       `/v1/sessions/${encodeURIComponent(sessionId)}/events?since=${cursor}`,
49     );
50     for (const ev of page.events || []) {
51       cursor = ev.cursor; // advance so we never re-read an event
52       const type = ev.type || ev.event_type;
53       if (type === "message.created" && ev.role === "character") {
54         const bubbles = ev.payload?.bubbles?.length
55           ? ev.payload.bubbles
56           : [ev.payload?.text ?? ""];
57         return { bubbles, text: ev.payload?.text ?? bubbles.join("\n"), cursor, runRef };
58       }
59       if (type === "message.failed") {
60         // role: "system"; payload { turn_index, reason, recoverable }
61         throw new Error(`reply failed: ${ev.payload?.reason ?? "unknown"}`);
62       }
63     }
64     await new Promise((r) => setTimeout(r, 1_000));
65   }
66   throw new Error("timed out waiting for the character reply");
67 }

Your call site barely changes:

1 // before:  const text = await chatCompletions(messages);
2 // after:
3 const reply = await getReply(sessionId, userText, { idempotencyKey: messageId });
4 render(reply.bubbles); // multi-bubble; falls back to [text] when a single bubble

The session is created once per end-user (POST /v1/sessions with your external_user_ref) and reused for the conversation — it is the persistent thread, not a per-message construct. See Quickstart for character + session creation.

reply_mode: keep messenger-grade latency

POST /messages accepts an optional reply_mode:

fast (default) — use this for migrations. It is the same light path the managed Telegram channels run, tuned for messenger-grade latency. It skips only the advisory voice-quality critic (a final stylistic polish pass). Grounding, safety, knowledge, and the emission gates all stay on, so factual accuracy and safety are unchanged — you are not trading correctness for speed.
deep — opts into the full critic stack, which adds the quality critic and its one-shot voice rewrite for maximum voice polish. It is roughly 17s slower per turn.

If you omit reply_mode, you get fast. The deep world-model and relational update runs asynchronously after the reply either way — fast does not skip the character learning from the turn, it only skips the synchronous voice-polish pass. The response echoes the reply_mode it used, so you can confirm it.

1 // messenger app: default fast is correct — omit reply_mode.
2 await getReply(sessionId, text, { idempotencyKey: messageId });
3 
4 // a "composed letter" surface where voice polish matters more than latency:
5 await getReply(sessionId, text, { idempotencyKey: messageId, replyMode: "deep" });

Retry safety with Idempotency-Key

Serverless platforms (Vercel, Lambda) and network proxies retry requests. Without protection, a retried send creates a duplicate turn and a duplicate reply. Send an Idempotency-Key header with a value that is stable per logical message (your own message id is ideal):

1 await loreos(`/v1/sessions/${sessionId}/messages`, {
2   method: "POST",
3   headers: { "Idempotency-Key": messageId },
4   body: JSON.stringify({ text }),
5 });

A same-key re-call returns the original cursor and sent_turn_index plus idempotent_replay: true, instead of creating a second turn. Your adapter can treat that identically — it polls from the same cursor and gets the same reply.

Dedupe on the read side too: every event carries a monotonic cursor. Track the highest cursor you have processed and ignore anything at or below it, so a re-poll or an at-least-once webhook never double-renders a bubble.

Typing UX off the run.status event

The instant LoreOS accepts your send, it emits a run.status event so your UI can show a typing indicator while the engine works:

1 { "type": "run.status", "role": "character", "payload": { "status": "generating", "run_ref": "..." } }

Render “typing…” when you see run.status, and clear it on the next message.created (role character). This is an additive event type — if you only handle message.created, you simply won’t show typing; nothing breaks. The run_ref in the payload matches the run_ref returned by your send, so you can scope the indicator to the exact turn.

1 for (const ev of page.events || []) {
2   const type = ev.type || ev.event_type;
3   if (type === "run.status" && ev.payload?.status === "generating") setTyping(true);
4   if (type === "message.created" && ev.role === "character") {
5     setTyping(false);
6     render(ev.payload.bubbles?.length ? ev.payload.bubbles : [ev.payload.text]);
7   }
8 }

Polling vs SSE vs webhook: which transport

All three transports project the same cursored event log, so you can mix them and resume across them with one cursor. Pick by surface:

Transport	Call	Use when
Polling	`GET /v1/sessions/{id}/events?since={cursor}`	The universal default. Simplest to operate, works on every runtime including short-lived serverless functions. Best for the sync-façade adapter above.
SSE stream	`GET /v1/sessions/{id}/events/stream?since={cursor}`	A long-lived client UI that wants push without managing webhooks. One connection is capped at 5 minutes — reconnect with the last cursor (polling + SSE share the cursor, so there are no gaps).
Webhook push	`POST /v1/sessions/{id}/channels` (register a URL + secret)	Server-to-server. You run a backend that can receive an inbound HTTPS callback and want events pushed instead of pulled. At-least-once — dedupe by `cursor`.

Decision guide:

Migrating a request/response chat app? Start with polling inside the adapter. It is the least infrastructure and matches the synchronous call site you are replacing.
Building a live, always-open chat surface? Use SSE for lower latency, and reconnect-with-cursor on the 5-minute cap.
No user-facing client at all (a bot, a pipeline, a backend bridge)? Use a webhook so LoreOS pushes to your server; you don’t hold connections open.

When in doubt, polling is never wrong — the other two are optimizations over the same log.

Graceful fallback during cutover

While you migrate, keep your old LLM provider wired as a fallback so a slow or unavailable LoreOS turn degrades instead of failing the user. Wrap the adapter with a timeout and fall back to your existing completions call:

1 export async function replyWithFallback(sessionId: string, text: string, messageId: string) {
2   try {
3     const reply = await getReply(sessionId, text, {
4       idempotencyKey: messageId,
5       replyMode: "fast",
6       timeoutMs: 12_000, // tight budget during cutover
7     });
8     return { source: "loreos", bubbles: reply.bubbles };
9   } catch (err) {
10     // Timeout, transient 5xx, or message.failed: serve the user with your old path.
11     const text2 = await legacyChatCompletions(text); // your existing provider
12     return { source: "fallback", bubbles: [text2] };
13   }
14 }

Notes on doing this safely:

Always pass the Idempotency-Key so a fallback that races a slow-but-successful LoreOS turn does not create a duplicate when you retry later.
A 402 budget_exceeded is a budget signal, not an outage — it is returned before any model spend. Treat it distinctly (raise the cap or back off), not as a reason to fall back to your old provider.
Fallback replies don’t carry LoreOS memory or world-model continuity. Keep the fallback window short and prefer raising your timeout or using fast mode over leaning on it.

Vercel / Next.js: keep the key server-side

LOREOS_KEY can create state and trigger metered model work, so it must never reach the browser. On Next.js, put the adapter behind a Route Handler (app/api/.../route.ts) or a Server Action, set LOREOS_KEY as a server-only environment variable, and have the browser call your own endpoint — which proxies to LoreOS.

1 // app/api/chat/route.ts — server-only; LOREOS_KEY is never sent to the client.
2 import { getReply } from "../lib/loreos-adapter";
3 
4 export async function POST(req: Request) {
5   const { sessionId, text, messageId } = await req.json();
6   const reply = await getReply(sessionId, text, { idempotencyKey: messageId });
7   return Response.json({ bubbles: reply.bubbles, runRef: reply.runRef });
8 }

The browser does fetch("/api/chat", …); your route handler holds the key and talks to LoreOS. The same rule applies to the SSE and webhook transports — proxy them through your backend, or terminate them server-side.

Migration checklist

Stop reading the reply from the POST /messages response; read data.cursor + data.run_ref, then read the reply from the event log.
Render payload.bubbles (multi-bubble), falling back to payload.text.
Create a session per end-user with your external_user_ref and reuse it.
Default to reply_mode: "fast"; reach for deep only where voice polish beats latency.
Send an Idempotency-Key (your message id) on every send.
Show typing off run.status; clear on message.created.
Dedupe reads/webhooks by the monotonic cursor.
Keep LOREOS_KEY server-side; proxy the browser through your backend.
(During cutover) wrap the adapter in a timeout + fallback to your old provider.

Once you are migrated, see Staging keys and repeatable evaluation to set up a persistent dev key and reproducible eval runs before you cut production over.