Replacing your existing chat/completions backend

Move an app from a synchronous /chat/completions call to LoreOS’s async event model — without a UX downgrade.

If your app already calls a POST /chat/completions-style endpoint and renders the returned text, this page is the migration. The shape changes in exactly one place: the reply is no longer in the HTTP response. Everything else — your UI, your end-user auth, your message history — stays yours.

This guide gives you a drop-in adapter that hides the change behind a sync-feeling function, plus the production patterns (retry safety, typing UX, polling-vs-SSE, and a graceful fallback to your old provider during cutover).

The one mental shift

A classic completions call is synchronous: you send messages, the HTTP response is the reply.

POST /chat/completions { messages: [...] } -> 200 { choices: [{ message }] }
(reply is in this response)

LoreOS is asynchronous: POST /v1/sessions/{id}/messages returns immediately with a cursor and a run_ref — an acknowledgement, not the reply. The character’s reply arrives a few seconds later as a message.created event on the session’s event log, which you read by polling, SSE, or webhook.

POST /v1/sessions/{id}/messages { text }
-> 200 { accepted: true, cursor, run_ref } (acknowledgement)
...a few seconds later, on the event log...
-> { type: "message.created", role: "character", payload: { bubbles, text } }

Why it works this way. A LoreOS reply is not a single model call. The character runs a multi-step engine per turn — retrieval over its evolving world model, relational and emotional state, grounding and safety/accuracy gates, voice shaping, then multi-bubble emission and delivery. That work takes seconds and can be recovered if your process dies mid-turn (a durability ledger re-runs it). Returning a cursor immediately lets you ack the send instantly, show a typing indicator, and read a richer multi-bubble reply when it lands — instead of holding an HTTP request open through the whole engine.

Two consequences to internalize before you write code:

  • The reply is multi-bubble. payload.bubbles is an ordered list (a messenger-style burst); payload.text is the joined convenience form. Render bubbles when present.
  • Don’t read top-level response fields. Every response is wrapped as { schema_version, data, next_actions }. Read response.data.cursor, not response.cursor. Errors are { detail: { code, message, fix } }.

A reference adapter (sync façade over the async events)

The cleanest migration keeps your call sites looking synchronous. Write one server-side function — getReply(sessionId, text) — that sends the message, waits for the message.created event, and returns the bubbles. Your existing code calls it the same way it called chat/completions; the async model is hidden inside.

This matches the examples/loreos-node-chat style: a small loreos() helper that unwraps data and throws on the { detail } error envelope, plus a cursor-poll loop.

1// loreos-adapter.ts — keep this on the server; never ship LOREOS_KEY to the browser.
2const BASE = process.env.LOREOS_BASE || "https://api.loreos.app";
3const KEY = process.env.LOREOS_KEY!;
4
5async function loreos(path: string, options: RequestInit = {}) {
6 const response = await fetch(`${BASE}${path}`, {
7 ...options,
8 headers: {
9 Authorization: `Bearer ${KEY}`,
10 "Content-Type": "application/json",
11 ...(options.headers || {}),
12 },
13 });
14 const body = await response.json().catch(() => ({}));
15 if (!response.ok) {
16 const detail = body.detail || {};
17 throw new Error(
18 `${response.status} ${detail.code || "error"}: ${
19 detail.fix || detail.message || JSON.stringify(body)
20 }`,
21 );
22 }
23 return body.data;
24}
25
26type Reply = { bubbles: string[]; text: string; cursor: number; runRef: string };
27
28// Sync-feeling reply: send -> poll the event log until the character replies -> return.
29export async function getReply(
30 sessionId: string,
31 text: string,
32 opts: { idempotencyKey?: string; replyMode?: "fast" | "deep"; timeoutMs?: number } = {},
33): Promise<Reply> {
34 // 1. Send. Returns immediately with a cursor + run_ref (NOT the reply).
35 const accepted = await loreos(`/v1/sessions/${encodeURIComponent(sessionId)}/messages`, {
36 method: "POST",
37 // Idempotency-Key makes a retried send safe (serverless/Vercel may retry).
38 headers: opts.idempotencyKey ? { "Idempotency-Key": opts.idempotencyKey } : {},
39 body: JSON.stringify({ text, ...(opts.replyMode ? { reply_mode: opts.replyMode } : {}) }),
40 });
41 const runRef: string = accepted.run_ref;
42 let cursor: number = accepted.cursor;
43
44 // 2. Poll the event log for THIS turn's reply. Poll from the user-message cursor.
45 const deadline = Date.now() + (opts.timeoutMs ?? 45_000);
46 while (Date.now() < deadline) {
47 const page = await loreos(
48 `/v1/sessions/${encodeURIComponent(sessionId)}/events?since=${cursor}`,
49 );
50 for (const ev of page.events || []) {
51 cursor = ev.cursor; // advance so we never re-read an event
52 const type = ev.event_type || ev.type;
53 if (type === "message.created" && ev.role === "character") {
54 const bubbles = ev.payload?.bubbles?.length
55 ? ev.payload.bubbles
56 : [ev.payload?.text ?? ""];
57 return { bubbles, text: ev.payload?.text ?? bubbles.join("\n"), cursor, runRef };
58 }
59 if (type === "message.failed") {
60 // role: "system"; payload { turn_index, reason, recoverable }
61 throw new Error(`reply failed: ${ev.payload?.reason ?? "unknown"}`);
62 }
63 }
64 await new Promise((r) => setTimeout(r, 1_000));
65 }
66 throw new Error("timed out waiting for the character reply");
67}

Your call site barely changes:

1// before: const text = await chatCompletions(messages);
2// after:
3const reply = await getReply(sessionId, userText, { idempotencyKey: messageId });
4render(reply.bubbles); // multi-bubble; falls back to [text] when a single bubble

The session is created once per end-user (POST /v1/sessions with your external_user_ref) and reused for the conversation — it is the persistent thread, not a per-message construct. See Quickstart for character + session creation.

reply_mode: keep messenger-grade latency

POST /messages accepts an optional reply_mode:

  • fast (default) — use this for migrations. It is the same light path the managed Telegram channels run, tuned for messenger-grade latency. It skips only the advisory voice-quality critic (a final stylistic polish pass). Grounding, safety, knowledge, and the emission gates all stay on, so factual accuracy and safety are unchanged — you are not trading correctness for speed.
  • deep — opts into the full critic stack, which adds the quality critic and its one-shot voice rewrite for maximum voice polish. It is roughly 17s slower per turn.

If you omit reply_mode, you get fast. The deep world-model and relational update runs asynchronously after the reply either wayfast does not skip the character learning from the turn, it only skips the synchronous voice-polish pass. The response echoes the reply_mode it used, so you can confirm it.

1// messenger app: default fast is correct — omit reply_mode.
2await getReply(sessionId, text, { idempotencyKey: messageId });
3
4// a "composed letter" surface where voice polish matters more than latency:
5await getReply(sessionId, text, { idempotencyKey: messageId, replyMode: "deep" });

Retry safety with Idempotency-Key

Serverless platforms (Vercel, Lambda) and network proxies retry requests. Without protection, a retried send creates a duplicate turn and a duplicate reply. Send an Idempotency-Key header with a value that is stable per logical message (your own message id is ideal):

1await loreos(`/v1/sessions/${sessionId}/messages`, {
2 method: "POST",
3 headers: { "Idempotency-Key": messageId },
4 body: JSON.stringify({ text }),
5});

A same-key re-call returns the original cursor and sent_turn_index plus idempotent_replay: true, instead of creating a second turn. Your adapter can treat that identically — it polls from the same cursor and gets the same reply.

Dedupe on the read side too: every event carries a monotonic cursor. Track the highest cursor you have processed and ignore anything at or below it, so a re-poll or an at-least-once webhook never double-renders a bubble.

Typing UX off the run.status event

The instant LoreOS accepts your send, it emits a run.status event so your UI can show a typing indicator while the engine works:

1{ "type": "run.status", "role": "character", "payload": { "status": "generating", "run_ref": "..." } }

Render “typing…” when you see run.status, and clear it on the next message.created (role character). This is an additive event type — if you only handle message.created, you simply won’t show typing; nothing breaks. The run_ref in the payload matches the run_ref returned by your send, so you can scope the indicator to the exact turn.

1for (const ev of page.events || []) {
2 const type = ev.event_type || ev.type;
3 if (type === "run.status" && ev.payload?.status === "generating") setTyping(true);
4 if (type === "message.created" && ev.role === "character") {
5 setTyping(false);
6 render(ev.payload.bubbles?.length ? ev.payload.bubbles : [ev.payload.text]);
7 }
8}

Polling vs SSE vs webhook: which transport

All three transports project the same cursored event log, so you can mix them and resume across them with one cursor. Pick by surface:

TransportCallUse when
PollingGET /v1/sessions/{id}/events?since={cursor}The universal default. Simplest to operate, works on every runtime including short-lived serverless functions. Best for the sync-façade adapter above.
SSE streamGET /v1/sessions/{id}/events/stream?since={cursor}A long-lived client UI that wants push without managing webhooks. One connection is capped at 5 minutes — reconnect with the last cursor (polling + SSE share the cursor, so there are no gaps).
Webhook pushPOST /v1/sessions/{id}/channels (register a URL + secret)Server-to-server. You run a backend that can receive an inbound HTTPS callback and want events pushed instead of pulled. At-least-once — dedupe by cursor.

Decision guide:

  • Migrating a request/response chat app? Start with polling inside the adapter. It is the least infrastructure and matches the synchronous call site you are replacing.
  • Building a live, always-open chat surface? Use SSE for lower latency, and reconnect-with-cursor on the 5-minute cap.
  • No user-facing client at all (a bot, a pipeline, a backend bridge)? Use a webhook so LoreOS pushes to your server; you don’t hold connections open.

When in doubt, polling is never wrong — the other two are optimizations over the same log.

Graceful fallback during cutover

While you migrate, keep your old LLM provider wired as a fallback so a slow or unavailable LoreOS turn degrades instead of failing the user. Wrap the adapter with a timeout and fall back to your existing completions call:

1export async function replyWithFallback(sessionId: string, text: string, messageId: string) {
2 try {
3 const reply = await getReply(sessionId, text, {
4 idempotencyKey: messageId,
5 replyMode: "fast",
6 timeoutMs: 12_000, // tight budget during cutover
7 });
8 return { source: "loreos", bubbles: reply.bubbles };
9 } catch (err) {
10 // Timeout, transient 5xx, or message.failed: serve the user with your old path.
11 const text2 = await legacyChatCompletions(text); // your existing provider
12 return { source: "fallback", bubbles: [text2] };
13 }
14}

Notes on doing this safely:

  • Always pass the Idempotency-Key so a fallback that races a slow-but-successful LoreOS turn does not create a duplicate when you retry later.
  • A 402 budget_exceeded is a budget signal, not an outage — it is returned before any model spend. Treat it distinctly (raise the cap or back off), not as a reason to fall back to your old provider.
  • Fallback replies don’t carry LoreOS memory or world-model continuity. Keep the fallback window short and prefer raising your timeout or using fast mode over leaning on it.

Vercel / Next.js: keep the key server-side

LOREOS_KEY can create state and trigger metered model work, so it must never reach the browser. On Next.js, put the adapter behind a Route Handler (app/api/.../route.ts) or a Server Action, set LOREOS_KEY as a server-only environment variable, and have the browser call your own endpoint — which proxies to LoreOS.

1// app/api/chat/route.ts — server-only; LOREOS_KEY is never sent to the client.
2import { getReply } from "../lib/loreos-adapter";
3
4export async function POST(req: Request) {
5 const { sessionId, text, messageId } = await req.json();
6 const reply = await getReply(sessionId, text, { idempotencyKey: messageId });
7 return Response.json({ bubbles: reply.bubbles, runRef: reply.runRef });
8}

The browser does fetch("/api/chat", …); your route handler holds the key and talks to LoreOS. The same rule applies to the SSE and webhook transports — proxy them through your backend, or terminate them server-side.

Migration checklist

  • Stop reading the reply from the POST /messages response; read data.cursor + data.run_ref, then read the reply from the event log.
  • Render payload.bubbles (multi-bubble), falling back to payload.text.
  • Create a session per end-user with your external_user_ref and reuse it.
  • Default to reply_mode: "fast"; reach for deep only where voice polish beats latency.
  • Send an Idempotency-Key (your message id) on every send.
  • Show typing off run.status; clear on message.created.
  • Dedupe reads/webhooks by the monotonic cursor.
  • Keep LOREOS_KEY server-side; proxy the browser through your backend.
  • (During cutover) wrap the adapter in a timeout + fallback to your old provider.

Once you are migrated, see Staging keys and repeatable evaluation to set up a persistent dev key and reproducible eval runs before you cut production over.