client.rag

Parse, chunk, embed, store and retrieve documents on top of the agent's own Postgres — one pool, one migration set, one transaction boundary.

client.rag

Parse, chunk, embed, store and retrieve documents on top of the agent's own Postgres. The RAG schema lives in the same database as client.database and shares the same connection pool, so an ingest and a write to your own tables can happen inside one transaction.

Mental model

client.rag is a thin façade over four canonical tables — stackbone_rag_collections, stackbone_rag_documents, stackbone_rag_chunks and stackbone_rag_jobs — that the CLI installs into your agent's Postgres alongside the rest of your schema. Three properties follow from this and shape the rest of the page:

One database. The tables live in the agent's Postgres (STACKBONE_POSTGRES_URL). There is no separate "RAG store".
One pool. client.rag reuses the pool that client.database builds, so a single agent that touches both surfaces opens exactly one pool and can wrap RAG operations and creator-owned writes in the same client.database.transaction(...) callback.
One migration journal. The schema is governed by the same stackbone db migrate ... workflow you already use for your own tables — committed SQL files, advisory-lock-protected migrate up, no ad-hoc DDL on first call.

Embeddings are computed inside the agent process via the same OpenRouter client client.ai already uses. The default model is openai/text-embedding-3-small (1536 dims), frozen by ADR 2026-05-10-rag-consolidation-on-client-database.

Setup

1. Install the canonical schema

Run the installer once per agent. It is idempotent and safe to re-run.

stackbone db migrate add-rag

This drops a numbered SQL file under .stackbone/migrations/ (e.g. 0007_stackbone_rag_v1.sql) with the marker -- @stackbone:rag@v1 on its first line. Commit the file alongside your code — it is part of your agent's source.

If a RAG migration of the same major version already exists, the command is a no-op. Newer SDK versions install additive vN+1 deltas as separate migration files; breaking schema changes only land on a major SDK bump.

2. Apply pending migrations

stackbone db migrate up

This applies the RAG migration alongside any of your own pending migrations under an advisory lock. Running it again is a no-op.

3. Configure embeddings (optional)

Add a rag: block to agent.yaml to override the defaults:

rag:
  embeddingModel: openai/text-embedding-3-small # default
  autoMigrate: false # default

embeddingModel — any model id OpenRouter resolves for the /v1/embeddings endpoint. Changing model on an existing collection requires re-ingestion (the dimension is locked at schema-install time).
autoMigrate — when true, stackbone dev installs and applies the RAG migration automatically the first time it detects client.rag in your bundle, without prompting. Default is false: dev prompts interactively, and stackbone deploy always refuses to mutate the repo on your behalf.

The block is optional. An agent.yaml without rag: ships with the defaults above.

Ingestion

Pure helpers — `parse` and `chunk`

client.rag.parse(...) and client.rag.chunk(...) are pure helpers. They never touch the database and never call the embedding provider, so you can use them offline (in a script, in a test) to iterate on chunking before committing to ingestion:

import { createClient } from '@stackbone/sdk';

const client = createClient();

const text = await client.rag.parse({ url: 'https://example.com/post' });
const chunks = client.rag.chunk(text, { maxTokens: 512, overlap: 64 });

Synchronous ingest — `client.rag.ingest`

For one-off ingestion (a CLI script, a small upload handler) call ingest directly. The promise resolves once every chunk has been embedded and persisted:

import { createClient } from '@stackbone/sdk';

const client = createClient();
const text = await client.rag.parse({ url: 'https://example.com/post' });

const result = await client.rag.ingest({
  id: 'post-123',
  collection: 'help-center',
  chunks: client.rag.chunk(text),
  model: 'openai/text-embedding-3-small',
  metadata: { source: 'help-center', url: 'https://example.com/post' },
});

if (result.error) {
  // SdkError — see "Errors" below.
  throw new Error(`${result.error.code}: ${result.error.message}`);
}

console.log(`Ingested ${result.data.chunks} chunks for ${result.data.id}.`);

Re-ingesting with the same id replaces all chunks for that document atomically — useful for periodic re-crawls.

You can pass onProgress to observe per-chunk progress without paying for the SQL job writer:

await client.rag.ingest({
  id: 'post-123',
  collection: 'help-center',
  chunks,
  model: 'openai/text-embedding-3-small',
  onProgress: (event) => console.log(event.type, event),
});

Asynchronous ingest — `client.rag.ingestAsync`

For long-running ingestion (a webhook handler that must return in seconds, a backfill job that takes minutes) call ingestAsync. It allocates a stackbone_rag_jobs row synchronously, returns the job id immediately, and exposes both a streaming channel of progress events and the final Result:

const handle = await client.rag.ingestAsync({
  id: 'post-123',
  collection: 'help-center',
  chunks,
  model: 'openai/text-embedding-3-small',
});

if (handle.error) throw new Error(handle.error.code);

// Return early so the webhook responds in time.
respondToWebhook({ jobId: handle.data.jobId });

// Drain progress events into your own observability — run-loop, logs, SSE.
for await (const event of handle.data.events) {
  switch (event.type) {
    case 'started':
      log.info('rag.ingest.started', event);
      break;
    case 'progress':
      log.info('rag.ingest.progress', event);
      break;
    case 'completed':
      log.info('rag.ingest.completed', event);
      break;
    case 'failed':
      log.error('rag.ingest.failed', event);
      break;
  }
}

stackbone_rag_jobs is the observability surface. Studio's RAG explorer reads it; the local emulator's GET /api/rag/jobs returns it; cancelling a job from Studio (or via POST /api/rag/jobs/:jobId/cancel) flips the row's status, which the worker observes between chunk batches and exits with rag_ingest_cancelled.

Retrieval

client.rag.retrieve is the only read API. Pass a text query to let the SDK embed it for you:

const result = await client.rag.retrieve({
  text: 'how do I reset my password?',
  model: 'openai/text-embedding-3-small',
  topK: 5,
});

if (result.error) throw new Error(result.error.code);

for (const hit of result.data) {
  console.log(hit.score.toFixed(3), hit.id, hit.content);
}

topK defaults to 5. Hits are returned ordered by descending cosine similarity (score is in [0, 1], 1 = identical).

Filtering on metadata

Pass filter to scope the search to documents whose metadata matches a JSON sub-object (matched server-side via jsonb @> $1):

const result = await client.rag.retrieve({
  text: 'pricing tiers',
  model: 'openai/text-embedding-3-small',
  filter: { source: 'help-center', locale: 'en' },
  topK: 10,
});

V1 supports exact-match equality on simple paths. Range queries and jsonpath operators are out of scope for now.

Precomputed embeddings

If you already have an embedding (e.g. from your own embedder or a cached vector), pass it directly:

await client.rag.retrieve({
  embedding: cachedQueryVector, // number[]
  topK: 5,
});

The pipeline skips the auto-embed step entirely.

Deletion

Two atomic deletion APIs:

// Delete one or many documents by id (re-ingest replaces; this removes).
await client.rag.delete(['post-123', 'post-124']);

// Delete every chunk whose metadata matches a sub-object.
await client.rag.deleteWhere({ source: 'help-center' });

Both run inside a single SQL statement and cascade through to chunks via the foreign key. Pass { namespace } to scope either call.

Cross-table transactions

Because RAG and your own tables share the pool, you can ingest a document and update your own documents table in the same transaction. If anything throws, both rolls back together:

import { createClient } from '@stackbone/sdk';
import { eq } from '@stackbone/sdk/db';
import { documents } from './schema';

const client = createClient();

await client.database.transaction(async (tx) => {
  // Your own row — uses the tx-scoped Drizzle handle.
  await tx.update(documents).set({ ragIndexedAt: new Date() }).where(eq(documents.id, 'post-123'));

  // RAG ingest — runs against the same connection.
  const result = await client.rag.ingest({
    id: 'post-123',
    collection: 'help-center',
    chunks: client.rag.chunk(parsedText),
    model: 'openai/text-embedding-3-small',
  });
  if (result.error) throw new Error(result.error.code);
});

This is the property that the consolidation buys you: pre-feature 30 the two surfaces opened separate pools and the transaction was impossible.

Inspecting RAG data

Three places see the same rows:

stackbone db studio — the local schema explorer lists the four stackbone_rag_* tables alongside your own. Read-only on chunks is recommended; a stale embedding is worse than a missing one.
The local emulator — stackbone dev exposes GET /api/rag/collections, POST /api/rag/collections/:name/query, GET /api/rag/jobs?status=... and POST /api/rag/jobs/:jobId/cancel. Studio's RAG explorer is the primary consumer.
Studio's RAG explorer — gated on STACKBONE_CONTRACT_VERSION ≥ 9. If the agent's SDK is older, Studio hides the explorer entirely instead of showing a half-broken view.

Custom pgvector columns

If you need a vector column on your own table — e.g. an embedding field on documents you maintain by hand — vector is re-exported from @stackbone/sdk/db:

import { pgTable, text, vector } from '@stackbone/sdk/db';

export const documents = pgTable('documents', {
  id: text('id').primaryKey(),
  content: text('content').notNull(),
  embedding: vector('embedding', { dimensions: 1536 }).notNull(),
});

The first migration that introduces a vector(...) column emits a CREATE EXTENSION IF NOT EXISTS vector automatically; the RAG installer also ensures the extension exists, so the order in which you add custom and platform vector columns does not matter.

Errors

Every client.rag.* call returns Result<T>. Inspect error.code to branch:

Code	When	Hint
`rag_schema_missing`	RAG tables do not exist in this database (the installer was never run, or `migrate up` is pending).	Run `stackbone db migrate add-rag` and `stackbone db migrate up`, or set `rag.autoMigrate: true` in `agent.yaml`.
`rag_embedding_model_unsupported`	OpenRouter resolved the model but it does not expose the `/v1/embeddings` endpoint.	Pick an embedding-capable model (e.g. `openai/text-embedding-3-small`).
`rag_embedding_failed`	The embedding provider rejected the request (auth, rate limit, transient).	The `message` carries the provider's error; retry policy is up to the caller.
`rag_invalid_request`	Required field missing or malformed (e.g. empty `id`, no `chunks`, `topK ≤ 0`).	The `message` names the offending field.
`rag_ingest_cancelled`	The job's row was flipped to `cancelled` while the worker was running.	Expected when a caller cancels via Studio or the REST surface.

The contract gate adds contract_version_unsupported, capability_unavailable, contract_unreachable and contract_malformed. Every gated client.rag method requires the rag.basic capability — see overview.

Pure helpers (parse, chunk) never hit the database or the embedding provider and are intentionally exempt from the gate.

Where to go next

client.database — the same pool RAG runs on, for your own tables and migrations.
@stackbone/sdk overview — the rest of the modules: storage, AI, approval, secrets, config.
Agent.yaml reference — the full manifest schema, including the rag: block.

client.rag

Mental model

Setup

1. Install the canonical schema

2. Apply pending migrations

3. Configure embeddings (optional)

Ingestion

Pure helpers — parse and chunk

Synchronous ingest — client.rag.ingest

Asynchronous ingest — client.rag.ingestAsync

Retrieval

Filtering on metadata

Precomputed embeddings

Deletion

Cross-table transactions

Inspecting RAG data

Custom pgvector columns

Errors

Where to go next

Pure helpers — `parse` and `chunk`

Synchronous ingest — `client.rag.ingest`

Asynchronous ingest — `client.rag.ingestAsync`