client.rag
Parse, chunk, embed, store and retrieve documents on top of the agent's own Postgres — one pool, one migration set, one transaction boundary.
client.rag
Parse, chunk, embed, store and retrieve documents on top of the agent's own Postgres. The RAG schema lives in the same database as
client.databaseand shares the same connection pool, so an ingest and a write to your own tables can happen inside one transaction.
Mental model
client.rag is a thin façade over four canonical tables —
stackbone_rag_collections, stackbone_rag_documents,
stackbone_rag_chunks and stackbone_rag_jobs — that the CLI
installs into your agent's Postgres alongside the rest of your
schema. Three properties follow from this and shape the rest of the
page:
- One database. The tables live in the agent's Postgres
(
STACKBONE_POSTGRES_URL). There is no separate "RAG store". - One pool.
client.ragreuses the pool thatclient.databasebuilds, so a single agent that touches both surfaces opens exactly one pool and can wrap RAG operations and creator-owned writes in the sameclient.database.transaction(...)callback. - One migration journal. The schema is governed by the same
stackbone db migrate ...workflow you already use for your own tables — committed SQL files, advisory-lock-protectedmigrate up, no ad-hoc DDL on first call.
Embeddings are computed inside the agent process via the same
OpenRouter client client.ai already uses. The default model is
openai/text-embedding-3-small (1536 dims), frozen by ADR
2026-05-10-rag-consolidation-on-client-database.
Setup
1. Install the canonical schema
Run the installer once per agent. It is idempotent and safe to re-run.
stackbone db migrate add-ragThis drops a numbered SQL file under .stackbone/migrations/ (e.g.
0007_stackbone_rag_v1.sql) with the marker -- @stackbone:rag@v1
on its first line. Commit the file alongside your code — it is part
of your agent's source.
If a RAG migration of the same major version already exists, the
command is a no-op. Newer SDK versions install additive vN+1 deltas
as separate migration files; breaking schema changes only land on a
major SDK bump.
2. Apply pending migrations
stackbone db migrate upThis applies the RAG migration alongside any of your own pending migrations under an advisory lock. Running it again is a no-op.
3. Configure embeddings (optional)
Add a rag: block to agent.yaml to override the defaults:
rag:
embeddingModel: openai/text-embedding-3-small # default
autoMigrate: false # defaultembeddingModel— any model id OpenRouter resolves for the/v1/embeddingsendpoint. Changing model on an existing collection requires re-ingestion (the dimension is locked at schema-install time).autoMigrate— whentrue,stackbone devinstalls and applies the RAG migration automatically the first time it detectsclient.ragin your bundle, without prompting. Default isfalse:devprompts interactively, andstackbone deployalways refuses to mutate the repo on your behalf.
The block is optional. An agent.yaml without rag: ships with the
defaults above.
Ingestion
Pure helpers — parse and chunk
client.rag.parse(...) and client.rag.chunk(...) are pure helpers.
They never touch the database and never call the embedding provider,
so you can use them offline (in a script, in a test) to iterate on
chunking before committing to ingestion:
import { createClient } from '@stackbone/sdk';
const client = createClient();
const text = await client.rag.parse({ url: 'https://example.com/post' });
const chunks = client.rag.chunk(text, { maxTokens: 512, overlap: 64 });Synchronous ingest — client.rag.ingest
For one-off ingestion (a CLI script, a small upload handler) call
ingest directly. The promise resolves once every chunk has been
embedded and persisted:
import { createClient } from '@stackbone/sdk';
const client = createClient();
const text = await client.rag.parse({ url: 'https://example.com/post' });
const result = await client.rag.ingest({
id: 'post-123',
collection: 'help-center',
chunks: client.rag.chunk(text),
model: 'openai/text-embedding-3-small',
metadata: { source: 'help-center', url: 'https://example.com/post' },
});
if (result.error) {
// SdkError — see "Errors" below.
throw new Error(`${result.error.code}: ${result.error.message}`);
}
console.log(`Ingested ${result.data.chunks} chunks for ${result.data.id}.`);Re-ingesting with the same id replaces all chunks for that
document atomically — useful for periodic re-crawls.
You can pass onProgress to observe per-chunk progress without
paying for the SQL job writer:
await client.rag.ingest({
id: 'post-123',
collection: 'help-center',
chunks,
model: 'openai/text-embedding-3-small',
onProgress: (event) => console.log(event.type, event),
});Asynchronous ingest — client.rag.ingestAsync
For long-running ingestion (a webhook handler that must return in
seconds, a backfill job that takes minutes) call ingestAsync. It
allocates a stackbone_rag_jobs row synchronously, returns the job
id immediately, and exposes both a streaming channel of progress
events and the final Result:
const handle = await client.rag.ingestAsync({
id: 'post-123',
collection: 'help-center',
chunks,
model: 'openai/text-embedding-3-small',
});
if (handle.error) throw new Error(handle.error.code);
// Return early so the webhook responds in time.
respondToWebhook({ jobId: handle.data.jobId });
// Drain progress events into your own observability — run-loop, logs, SSE.
for await (const event of handle.data.events) {
switch (event.type) {
case 'started':
log.info('rag.ingest.started', event);
break;
case 'progress':
log.info('rag.ingest.progress', event);
break;
case 'completed':
log.info('rag.ingest.completed', event);
break;
case 'failed':
log.error('rag.ingest.failed', event);
break;
}
}stackbone_rag_jobs is the observability surface. Studio's RAG
explorer reads it; the local emulator's GET /api/rag/jobs returns
it; cancelling a job from Studio (or via
POST /api/rag/jobs/:jobId/cancel) flips the row's status, which
the worker observes between chunk batches and exits with
rag_ingest_cancelled.
Retrieval
client.rag.retrieve is the only read API. Pass a text query to
let the SDK embed it for you:
const result = await client.rag.retrieve({
text: 'how do I reset my password?',
model: 'openai/text-embedding-3-small',
topK: 5,
});
if (result.error) throw new Error(result.error.code);
for (const hit of result.data) {
console.log(hit.score.toFixed(3), hit.id, hit.content);
}topK defaults to 5. Hits are returned ordered by descending
cosine similarity (score is in [0, 1], 1 = identical).
Filtering on metadata
Pass filter to scope the search to documents whose metadata
matches a JSON sub-object (matched server-side via jsonb @> $1):
const result = await client.rag.retrieve({
text: 'pricing tiers',
model: 'openai/text-embedding-3-small',
filter: { source: 'help-center', locale: 'en' },
topK: 10,
});V1 supports exact-match equality on simple paths. Range queries and
jsonpath operators are out of scope for now.
Precomputed embeddings
If you already have an embedding (e.g. from your own embedder or a cached vector), pass it directly:
await client.rag.retrieve({
embedding: cachedQueryVector, // number[]
topK: 5,
});The pipeline skips the auto-embed step entirely.
Deletion
Two atomic deletion APIs:
// Delete one or many documents by id (re-ingest replaces; this removes).
await client.rag.delete(['post-123', 'post-124']);
// Delete every chunk whose metadata matches a sub-object.
await client.rag.deleteWhere({ source: 'help-center' });Both run inside a single SQL statement and cascade through to chunks
via the foreign key. Pass { namespace } to scope either call.
Cross-table transactions
Because RAG and your own tables share the pool, you can ingest a
document and update your own documents table in the same
transaction. If anything throws, both rolls back together:
import { createClient } from '@stackbone/sdk';
import { eq } from '@stackbone/sdk/db';
import { documents } from './schema';
const client = createClient();
await client.database.transaction(async (tx) => {
// Your own row — uses the tx-scoped Drizzle handle.
await tx.update(documents).set({ ragIndexedAt: new Date() }).where(eq(documents.id, 'post-123'));
// RAG ingest — runs against the same connection.
const result = await client.rag.ingest({
id: 'post-123',
collection: 'help-center',
chunks: client.rag.chunk(parsedText),
model: 'openai/text-embedding-3-small',
});
if (result.error) throw new Error(result.error.code);
});This is the property that the consolidation buys you: pre-feature 30 the two surfaces opened separate pools and the transaction was impossible.
Inspecting RAG data
Three places see the same rows:
stackbone db studio— the local schema explorer lists the fourstackbone_rag_*tables alongside your own. Read-only onchunksis recommended; a staleembeddingis worse than a missing one.- The local emulator —
stackbone devexposesGET /api/rag/collections,POST /api/rag/collections/:name/query,GET /api/rag/jobs?status=...andPOST /api/rag/jobs/:jobId/cancel. Studio's RAG explorer is the primary consumer. - Studio's RAG explorer — gated on
STACKBONE_CONTRACT_VERSION ≥ 9. If the agent's SDK is older, Studio hides the explorer entirely instead of showing a half-broken view.
Custom pgvector columns
If you need a vector column on your own table — e.g. an embedding
field on documents you maintain by hand — vector is re-exported
from @stackbone/sdk/db:
import { pgTable, text, vector } from '@stackbone/sdk/db';
export const documents = pgTable('documents', {
id: text('id').primaryKey(),
content: text('content').notNull(),
embedding: vector('embedding', { dimensions: 1536 }).notNull(),
});The first migration that introduces a vector(...) column emits a
CREATE EXTENSION IF NOT EXISTS vector automatically; the RAG
installer also ensures the extension exists, so the order in which
you add custom and platform vector columns does not matter.
Errors
Every client.rag.* call returns Result<T>. Inspect error.code
to branch:
| Code | When | Hint |
|---|---|---|
rag_schema_missing |
RAG tables do not exist in this database (the installer was never run, or migrate up is pending). |
Run stackbone db migrate add-rag and stackbone db migrate up, or set rag.autoMigrate: true in agent.yaml. |
rag_embedding_model_unsupported |
OpenRouter resolved the model but it does not expose the /v1/embeddings endpoint. |
Pick an embedding-capable model (e.g. openai/text-embedding-3-small). |
rag_embedding_failed |
The embedding provider rejected the request (auth, rate limit, transient). | The message carries the provider's error; retry policy is up to the caller. |
rag_invalid_request |
Required field missing or malformed (e.g. empty id, no chunks, topK ≤ 0). |
The message names the offending field. |
rag_ingest_cancelled |
The job's row was flipped to cancelled while the worker was running. |
Expected when a caller cancels via Studio or the REST surface. |
The contract gate adds contract_version_unsupported,
capability_unavailable, contract_unreachable and
contract_malformed. Every gated client.rag method requires the
rag.basic capability — see
overview.
Pure helpers (parse, chunk) never hit the database or the
embedding provider and are intentionally exempt from the gate.
Where to go next
client.database— the same pool RAG runs on, for your own tables and migrations.@stackbone/sdkoverview — the rest of the modules: storage, AI, approval, secrets, config.- Agent.yaml reference — the full
manifest schema, including the
rag:block.