Hands-on Build Guide

Build an AI Customer Assistant Zero to Production

Q: Why build my own when SaaS options exist?

Three reasons: (1) cost — SaaS charges per message; at scale, self-built is 5-10x cheaper; (2) data ownership — customer data stays in your servers, making GDPR compliance much easier; (3) flexibility — you tailor to your sector, brand voice and tools. A SaaS may be fine for a quick prototype; in production, owning the stack is almost always more sustainable.

Q: How long does it take to build?

With Claude Code following this guide: an initial MVP (RAG + agent + web widget) takes 1-2 weeks. Adding WhatsApp + tools takes another week. Testing + beta takes 2-3 weeks. Total: 4-6 weeks to a production-ready assistant, assuming one backend engineer. SaaS '5-minute setup' claims aren't realistic — quick to launch, takes work to make actually useful.

Q: What's the monthly cost?

Small business (200 conversations/day): VPS ~$10, Anthropic API ~$50-100, OpenAI embeddings ~$2, Supabase ~$0-10, WhatsApp Business ~$0 (first 1,000 free). Total: ~$80-150/month. Mid-size (2,000 conversations/day): ~$300-600/month. A human team would cost $1,500-3,000 per agent + infra.

Q: How do I limit hallucinations?

Three-layer defence: (1) System prompt mandates 'only answer from sources; if not present, say I don't know.' (2) Structured output with { reply, sources[], confidence } — the model can't reply without citing sources. (3) confidence < 0.7 → auto handoff. With all three in place, practical hallucination drops below 1%.

Q: What about GDPR / CCPA?

Minimum: (1) a privacy notice visible in the chat window; (2) a first message disclosing 'You're chatting with an AI assistant; conversations may be logged'; (3) data minimisation — never ask for national ID or card numbers; (4) right to erasure: an endpoint that handles 'delete my data'; (5) retention: conversations 90 days, logs 30 days; (6) cookie consent + third-party disclosure (OpenAI/Anthropic). For full compliance work with legal counsel and our AI Security page.

Q: How does human handoff work?

Three triggers: (1) agent confidence < 0.7; (2) the customer says 'agent please' or similar; (3) topic categories — refund, complaint, price negotiation — auto-handoff. Process: agent_handoff tool sends a Slack notification with a conversation summary, the customer is told their queue position, the agent clicks 'I'll take it' to silence the bot and continues in their CRM widget.

Q: Can it speak multiple languages?

Yes — three ways: (1) the system prompt 'reply in the same language the customer wrote in' — modern LLMs handle 50+ languages fluently; (2) keep RAG docs in your main languages; (3) version WhatsApp templates per language. One assistant can serve the world.

Skip the SaaS subscription and build your own AI customer assistant in 4-6 weeks with Claude Code. From architecture decisions to deploy and monitoring — 9 end-to-end stages. Your data, your pricing, your brand voice.

80%+

Of routine queries automated

24/7

Always-on availability

10-15x

Concurrent conversation capacity

<2 s

Average response time

Build with Claude Code See the architecture

Quick definition

What is an AI customer assistant, and why build your own?

A modern AI customer assistant talks to customers in natural language, triggers real actions (order lookup, refund) via tool calls, answers from your own documents, and hands off to humans when needed. The difference from a rule-based chatbot is the combination of generative LLM (Claude, GPT, Gemini) + RAG knowledge base + tool calling + memory.

SaaS options start quickly, but: monthly subscription + per message fees + customer data going to a third party + limited customisation. Follow this guide and you'll have the same architecture on your own server, at your own price, in your brand voice. In 4-6 weeks you ship a scalable, testable assistant in production.

Comparison

Legacy call centre vs AI customer assistant

Feature

Legacy / Rule-based bot

AI Customer Assistant

Hours

9-18 weekdays; 24/7 is expensive

24/7, no breaks, no holidays

Response speed

Queue-dependent, minutes

Instant, seconds

Concurrency

1 agent = 1 customer

One system = thousands of customers

Personalisation

Agent-dependent, limited

Full personalisation via CRM + tool calling

Learning

Human training, slow

Learns each conversation via prompt + RAG iteration

Monthly cost (small business)

$1,500-3,000 (staff + infra)

$80-300 (API + server) + one-off build

New channels

Separate team per channel

One backend → WhatsApp + Web + Telegram + Email

Architecture

The 8 building blocks of the assistant

Whatever stack you pick, a sustainable customer assistant has these 8 layers. The Claude Code guide below builds all of them in order.

Channel (WhatsApp/Web)

Pre-process

AI Agent

RAG Retrieve

Tools

Memory

Handoff?

Respond

Channel Layer

Where the customer comes from: WhatsApp Business API, web widget, Telegram bot, Instagram DM, or your support window. All land on a single backend webhook.

WhatsApp Cloud API · Web Widget · Telegram · Instagram Graph API

Routing & Pre-processing

Normalise the incoming message (audio → Whisper transcribe, image → vision recognition), detect language, merge message bursts (3 rapid messages → one context).

n8n · Whisper · GPT-4o Vision

AI Agent (Brain)

LLM (Claude Sonnet/Haiku, GPT-4o, Gemini), system prompt, tool list and memory attached. Understands the message and decides which tool to call.

Claude · GPT · Gemini

RAG Knowledge Base

Product catalog, returns policy, FAQ, user manuals — stored in a vector DB. The agent retrieves cited information on demand.

Supabase pgvector · Pinecone · Qdrant

Tools (Capabilities)

order_lookup, ticket_create, refund_initiate, agent_handoff, appointment_book — actually do work.

REST API · Postgres · Slack

Memory & Session

Conversation history per customer (Postgres/Redis), user profile (pulled from CRM), daily token budget.

Postgres · Redis · HubSpot

Handoff & Human Approval

Refunds, complaints, anything requiring corporate commitment — gated by human approval. Slack notification with approve buttons; the bot waits for the agent.

Slack · n8n Wait node

Monitoring & Audit

Every conversation logged to Postgres (success, latency, tokens, cost). Weekly 'I don't know' report fuels the improvement loop.

Postgres · Grafana · Slack

Deep dive

Build with Claude Code — 9 stages to production

The guide below walks you from the moment you open Claude Code with the claude command in the terminal to a production-ready assistant. Each stage gives you: the goal, the steps to take, a ready-to-use Claude Code prompt, the critical pitfalls, and a workflow diagram. If you get stuck, simply ask Claude Code "show me the plan again."

Stage 0 — Prerequisites — accounts and keys

Goal: Get every account and key ready before you open a Claude Code session.

Steps

Node.js 20+ and npm installed; terminal access.
Claude Code CLI installed (`npm install -g @anthropic-ai/claude-code`) — the `claude` command works.
Anthropic Console API key (for Claude Sonnet/Haiku).
OpenAI API key (for text-embedding-3-small — cheap embeddings).
Vector DB chosen: Supabase (free tier to start) or Pinecone serverless.
WhatsApp Business: a verified phone number in Meta Business Manager + Cloud API access token.
Postgres database (Supabase or self-host) — for chat memory + audit log.
An empty git repository (e.g. customer-assistant) — Claude Code works here.

Pitfall

Never type API keys directly into the terminal. Always store them in .env and add .env to .gitignore. Otherwise they leak to GitHub and get abused within minutes.

Stage 1 — Have Claude Code scaffold the project

Goal: Open a Claude Code session and let it scaffold the project structure, folder layout and base dependencies.

Steps

In the terminal, cd into the project: `cd customer-assistant`
Start Claude Code with the `claude` command.
Paste the bootstrap prompt below — Claude Code will plan first, then generate code on your approval.
Run `git add . && git commit` after each major step; you can roll back if something goes wrong.
Copy the generated .env.example to .env and fill in your keys.

Give to Claude Code

Prompt

You are a senior engineer/architect setting up an AI customer assistant.

Context:
- Customer base: [company + sector] — e.g. "Pamuk Textiles, B2C online retail"
- Channel priorities: 1) WhatsApp Cloud API 2) Web widget 3) Telegram later
- LLM: Anthropic Claude Sonnet (reasoning), Haiku (classification)
- Vector DB: Supabase pgvector (to start)
- Database: Postgres (Supabase)
- Deploy target: n8n + Docker Compose (our own server)

Your task:
1. Sketch a high-level architecture (ASCII diagram + 5-7 sentence description).
2. Propose a folder structure: /src, /agents, /tools, /channels, /rag, /db, /tests, /infrastructure — and briefly explain the split.
3. Recommend package.json and core dependencies (Anthropic SDK, OpenAI SDK, supabase-js, express or fastify, dotenv, zod, pino).
4. Generate a .env.example (ANTHROPIC_API_KEY, OPENAI_API_KEY, SUPABASE_URL, SUPABASE_KEY, WHATSAPP_TOKEN, DATABASE_URL, ENCRYPTION_KEY).
5. Suggest TypeScript + ESLint + Prettier setup.
6. Build a minimal HTTP server with a /healthz endpoint.
7. Write a 'how to get started' README.

Show the plan in one message first. After my approval, generate the code step by step; for each file, name the file you're creating.

Pitfall

Don't ask Claude Code to 'do everything at once.' Move stage by stage: first architecture + skeleton, then a separate prompt per stage. Otherwise 30+ files get generated together and you can't debug them.

Stage 2 — RAG knowledge base — giving the agent your data

Goal: Set up an 'ingestion' workflow that splits and embeds your company docs (FAQ, catalog, returns policy, manuals) into Supabase.

Steps

Drop the docs into /data as PDF/Markdown (e.g. returns-policy.md, product-catalog.json, faq.md).
Give Claude Code the prompt below — it will produce the ingestion script + Postgres migration.
Run `npm run ingest` to load documents into the vector DB.
Test in the vector DB with questions like 'how many days for a return?' (cosine similarity ≥ 0.75 = useful).
Re-run ingestion when docs change — content hashing only re-uploads what changed.

Workflow view

/data PDF/MD

Text Splitter

OpenAI Embeddings

Dedup (hash)

Supabase pgvector INSERT

Give to Claude Code

Prompt

The /data folder contains company knowledge documents (PDF, MD, JSON).

Task: produce /src/rag/ingest.ts.

Requirements:
1. Read the documents (pdf-parse, fs.readFile).
2. Text splitter: chunk_size=1000 tokens, overlap=150 tokens (LangChain RecursiveCharacterTextSplitter).
3. Embed each chunk with OpenAI text-embedding-3-small.
4. Write to a Supabase 'documents' table: { id, source, chunk_index, content, embedding (vector(1536)), content_hash, updated_at }.
5. Skip if (source, chunk_index, content_hash) already exists — dedup.
6. Batch insert: 100 chunks per INSERT.
7. Print a summary: chunks loaded + total tokens + estimated cost.
8. CLI: 'npm run ingest -- --source returns-policy.md' for a single file, no arg for all.

Also generate the Supabase migration: documents table + IVFFLAT index (lists=100) + permission_level field (public/internal/admin).

Show the plan first.

Pitfall

Switching the embedding model later (text-embedding-3-small → large) forces you to re-embed every chunk. Start with 'small' and upgrade only if you really need to.

Stage 3 — AI Agent + System Prompt — the assistant's identity

Goal: Write the main agent function that talks to the customer and define its limits in your brand voice.

Steps

The system prompt has 4 core blocks: Role, Scope, Format, Limits.
Use the prompt below in Claude Code to generate /src/agents/customerAgent.ts.
Validate with test messages ('how many days for return?', 'where is my order?', 'can you discount?').
Tighten the system prompt when you see unexpected behaviour ('never promise a price', 'say I don't know when not in sources').

Workflow view

Webhook

Memory (last 10)

Vector Retrieve (top_k=5)

Claude Sonnet

Postgres Log

Respond

Give to Claude Code

Prompt

Produce /src/agents/customerAgent.ts.

The function: async function customerAgent({ userMessage, userId, channel }): Promise<AgentResponse>

Inside:
1. Pull the last 10 messages from Postgres (memory).
2. Retrieve top_k=5 chunks from the vector store for userMessage.
3. Call Anthropic Messages API: model='claude-sonnet-4-5', max_tokens=1024.
4. Build the system prompt with this 4-block template:

  [ROLE]
  You are the customer assistant for [COMPANY_NAME]. Brand voice: professional, warm, concise, actionable.

  [SCOPE]
  Reply ONLY on these topics:
  - Order status and shipping
  - Returns and exchanges
  - Product catalog and pricing
  - Manuals and FAQ
  - Appointments / reservations
  For anything outside, redirect politely ("Our [human team] can help with that").

  [FORMAT]
  - Short paragraphs, bullets where useful.
  - End every reply with a "next step" suggestion.
  - English, warm but professional.

  [LIMITS]
  - If the answer isn't in <context>, say "I don't have that in my docs — shall I connect you to our team?"
  - For refund decisions, discounts, gift wrapping → call the agent_handoff tool.
  - Never ask for personal data (national ID, card number).
  - Don't hallucinate; ask or say you don't know.

  [CONTEXT]
  ${retrievedChunks.join("\n---\n")}

5. Return the reply + insert user + assistant rows into the Postgres 'messages' table.
6. On error, log via pino and reply "one moment, please try again."

Zod schema for AgentResponse: { reply, sources[], should_handoff, confidence }.

Plan first, then code.

Pitfall

Keep the system prompt 'strict.' Instead of vague 'be helpful' instructions, explicitly state what the agent does AND doesn't do. Vague prompt = hallucination + customer complaints.

Stage 4 — Tools — letting the agent actually do work

Goal: Define real-work tools: order_lookup, ticket_create, refund_initiate, agent_handoff.

Steps

Define each tool with 'name + description + JSON schema.'
Keep tool implementations in separate files (/src/tools/*.ts).
Use the prompt below to have Claude Code generate the 4 core tools.
Test: ask the agent 'where is my order 1234?'; order_lookup must be called.
When a tool fails, return a structured error to the agent so it stays honest with the customer.

Give to Claude Code

Prompt

Generate 4 tool files in /src/tools/. All in Anthropic Tool Use format.

1. order_lookup.ts
   - input: { order_id: string }
   - Fetch the order from the DB and return its status (placed/shipped/delivered/cancelled + tracking_no).
   - On not found, return { error: "not_found" }.

2. ticket_create.ts
   - input: { topic: enum, summary: string, priority: "low"|"medium"|"high" }
   - Insert into the 'tickets' table, post a Slack message (#support channel).
   - Return id and eta.

3. refund_initiate.ts
   - input: { order_id, reason: string }
   - Check the 14-day return window in the DB.
   - If in window: status='pending_approval', post a Slack 'Approve/Reject' button message.
   - If past window: return an error to the agent ('14-day return window expired — please consult the human team').

4. agent_handoff.ts
   - input: { reason: string, conversation_summary: string }
   - Post a Slack notification to the live agent with the summary and conversation id.
   - Mark 'paused: true' — the bot won't reply; the human takes over.

Collect Anthropic Messages tool definitions (name, description, input_schema) in /src/agents/toolRegistry.ts. In each customerAgent turn, if a tool_use block appears, call the right tool, feed the result back to the agent, and let it produce the final answer (multi-turn).

Plan first.

Pitfall

Vague tool descriptions = the agent picks the wrong tool or calls it when unnecessary. For each tool include 'WHEN to call + WHEN NOT to call' examples.

Stage 5 — Memory + Session — conversation continuity

Goal: Remember a customer's past messages; track a per-user daily token + cost budget.

Steps

Postgres 'messages' table: id, session_id, user_id, role, content, tokens, created_at.
Session ID = WhatsApp phone number or hash of a web widget cookie.
Have Claude Code add a daily-token counter — return 'system busy' when the threshold is hit.
Summarise old messages with Haiku so the context window doesn't blow up.

Give to Claude Code

Prompt

Generate the Postgres migration for 'sessions' and 'messages' tables.

sessions: id (uuid), channel (text), external_id (text — WhatsApp phone or web cookie), user_profile (jsonb), daily_token_used (int), daily_token_reset_at (timestamp), created_at.

messages: id, session_id (fk), role ('user'|'assistant'|'tool'), content (text), tool_calls (jsonb), tokens (int), created_at.

/src/db/sessionStore.ts:
- getOrCreateSession(channel, externalId)
- appendMessage(sessionId, role, content, tokens)
- getRecentMessages(sessionId, limit=10)
- checkDailyBudget(sessionId, limit=10000): boolean
- summarizeOldMessages(sessionId): when message count > 50, summarise with Claude Haiku and drop the originals.

Wire it into customerAgent: before each call, checkDailyBudget; on false reply "you've reached today's free quota, see you tomorrow."

Plan first.

Pitfall

Get the Session ID wrong and two customers' conversations merge. On WhatsApp use the phone number; on the web widget use a persistent cookie hash, not a cookie + IP combo.

Stage 6 — Multi-channel adapter — WhatsApp + Web Widget + Telegram

Goal: Trigger the same agent from 3 channels: a webhook per channel and a single agent function.

Steps

One file per channel under /src/channels/ (whatsapp.ts, webWidget.ts, telegram.ts).
Each adapter: normalise the incoming payload → call customerAgent → reply in the channel's native format.
Web widget: a small /public/widget.js you can embed into any site.
Don't forget HMAC signature verification on WhatsApp (Meta App Secret).
Telegram uses the bot token + chat.id.

Workflow view

WhatsApp/Web/Telegram

Verify + Rate limit

Burst buffer (5s)

customerAgent

Channel-specific Response

Give to Claude Code

Prompt

Generate 3 adapter files in /src/channels/.

1. whatsapp.ts (Express/Fastify route POST /webhook/whatsapp)
   - Meta webhook verify token check (GET).
   - HMAC SHA-256 signature verification (X-Hub-Signature-256 header).
   - If audio → Whisper transcribe; if image → GPT-4o Vision.
   - Call customerAgent.
   - POST the response to WhatsApp Cloud API (messages endpoint).
   - 24-hour window: if the customer hasn't messaged in the last 24h, a template message is required.

2. webWidget.ts (POST /widget/chat)
   - CORS correctly set up; only allowed domains.
   - Rate limit: 20 req/min per IP (Redis counter).
   - Call customerAgent, return JSON.

3. telegram.ts (POST /webhook/telegram)
   - Telegram bot secret token check.
   - chat.id → sessionStore.getOrCreateSession.
   - Call customerAgent, reply via sendMessage API.

All webhooks share a 'message_burst_buffer': merge messages from the same session arriving within 5 seconds, then hand the agent the combined input (Redis with a debounce timer).

/public/widget.js: ~200 lines of vanilla JS — the site owner adds <script src="https://yourdomain.com/widget.js" data-key="xxx"></script>, a chat window opens bottom-right, talks to /widget/chat.

Plan first.

Pitfall

Don't leave webhook verify tokens public. Verify the HMAC or bot secret token for every channel — otherwise fake requests can rack up hundreds of dollars in OpenAI bills per minute.

Stage 7 — Test, staging and going live

Goal: Three-stage testing before production: unit → scenario → beta users.

Steps

Unit tests (/tests/): order_lookup, refund_initiate, sessionStore — with mock DBs.
Scenario tests: 20-30 different customer queries with expected behaviour (vitest + snapshot).
Staging: a separate Supabase project + a test phone number + sandbox API keys.
Beta: 5-10 friendly customers for a week — log the 'I don't know' questions, improve the system prompt + RAG docs (2-3 iterations).
Soft launch: 10% of customers, then 50%, then 100%.

Give to Claude Code

Prompt

Generate 3 files under /tests/:

1. agent.test.ts — vitest tests for customerAgent:
   - Mock the Anthropic SDK + a fake vector store.
   - 5 scenarios: 'how to return?', 'where is order 1234?', 'discount me', 'check weapons' (off-topic), 'tell me your national ID' (PII test).
   - For each: assert the right tool call or the right rejection reply.

2. tools.test.ts — order_lookup, refund_initiate:
   - DB mock with in-memory SQLite.
   - Past-window refund → error; within window → pending_approval.

3. integration.test.ts — webhook → agent → response:
   - Use supertest to call /webhook/whatsapp.
   - Test correct and incorrect HMAC scenarios.

Also scripts/eval.ts: a 30-line eval set (questions.jsonl) — for each question call the agent and check expected tool or reply; produce a report (success_rate, avg_latency, total_token).

Plan first.

Pitfall

Don't write the eval set once and forget. Run the eval after every big system-prompt change — make sure nothing regressed. 'I fixed one thing and broke another' is the default failure mode for AI projects.

Stage 8 — Production deploy — Docker Compose + Caddy + Postgres

Goal: Deploy the assistant, Postgres, n8n (for side automations) and Caddy reverse proxy to a VPS with Docker Compose.

Steps

infrastructure/docker-compose.yml: app, postgres, n8n, caddy.
Caddyfile: chat.yourdomain.com → app:3000 and n8n.yourdomain.com → n8n:5678 (auto HTTPS via Let's Encrypt).
GitHub Actions: on push to main → build → SSH deploy.
Health check: /healthz endpoint pinged every minute by Uptime Kuma.
Keep n8n ready for fallback automations (review replies, lead routing) — the projects from Chapter 14 of the n8n path can live here.

Workflow view

GitHub push

Actions build

SSH deploy

docker compose up -d

Caddy HTTPS

app + postgres + redis + n8n

Give to Claude Code

Prompt

Generate infrastructure/docker-compose.yml. Services:

- app: our Node.js service (custom Dockerfile, internal port 3000).
- postgres: postgres:16, volume mount /var/lib/postgresql/data, healthcheck.
- redis: redis:7-alpine (rate limit + session burst).
- n8n: docker.n8n.io/n8nio/n8n, depends_on postgres+redis, queue mode via env vars.
- caddy: caddy:2-alpine, mounted Caddyfile, expose 80+443, auto HTTPS.

infrastructure/Caddyfile:
chat.example.com { reverse_proxy app:3000 }
n8n.example.com { reverse_proxy n8n:5678 }

infrastructure/Dockerfile (for app): Node 20 Alpine, multistage build, prod deps only, non-root user, healthcheck CMD.

.github/workflows/deploy.yml: on push to main → docker build → ssh deploy.example.com 'cd /opt/assistant && git pull && docker compose pull && docker compose up -d --build app'.

scripts/backup.sh: daily pg_dump → upload to Backblaze B2, delete entries older than 30 days.

Plan first.

Pitfall

DON'T use 'latest' as the Docker image tag. Pin versions (node:20.11.0-alpine, postgres:16.4) — otherwise it upgrades itself one night and is broken in the morning.

Stage 9 — Monitoring + improvement loop — sustainability

Goal: Don't 'set and forget.' A weekly rhythm keeps the assistant improving.

Steps

Dashboard: conversations per day, success rate, average latency, token spend, most-called tools.
Weekly 'I don't know' report: ranked list of questions the agent couldn't answer.
Monthly model A/B test: when a new Claude/GPT version ships, run both for a week and compare metrics.
Quarterly security review: rotate webhook secrets, PII spot-check, refresh GDPR posture.
Customer feedback: ask for a 1-5 star rating at the end of each chat; review the low-rated ones.

Give to Claude Code

Prompt

Generate /src/dashboard/metrics.ts.

In Postgres write functions for:
- conversationsByDay(days=30)
- successRate(): answered / total
- avgLatencyMs()
- tokenUsageByDay()
- topUnansweredQuestions(limit=20) — group by 'sources_used = 0' or 'should_handoff = true'
- topToolsCalled()

/src/dashboard/page.tsx (Next.js): basic-auth-protected dashboard with 6 metric cards + a 'last 20 failed conversations' table.

scripts/weekly-report.ts: a Monday 09:00 cron job that queries Postgres and posts a summary to Slack #ops.

Plan first.

Pitfall

Logging the customer's 'I don't know' moments is critical. Without it, you can't see what the system couldn't answer and you can't improve. Don't forget the 'unanswered' flag in the audit log.

Alternative paths

Other ways to build it (besides Claude Code)

Claude Code isn't right for every team. Four popular alternatives and when they make sense.

ChatGPT (web) + Custom GPT

No-code starter prototype

Tight budget, small business, web chat only, moderate production needs. Upload knowledge to a Custom GPT, write the instructions, and use the shareable ChatGPT link.

Claude.ai (web) + Projects

Consulting needing deep document analysis

Long documents (contracts, regulations), careful writing, citation-heavy work. Create a Project on claude.ai, upload docs to Knowledge, your team uses it via browser.

n8n + AI Agent node

For people who prefer visual flows

If you don't enjoy writing code or have no backend team. In n8n you wire Webhook → AI Agent → Vector Store → WhatsApp visually. Chapter 14 of our n8n learning path builds this project end to end.

Claude Code (our recommendation)

Production-grade with full control

If you want your data, your scale and your tests under your roof, take this route — the entire guide above walks you through it.

Security & GDPR

10-point pre-production security checklist

Don't ship until all are green. A single missing item is a real attack surface or a real GDPR violation.

1Every webhook has HMAC or bearer-token verification?
2API keys in .env, .env in .gitignore? Never in git if the repo is public?
3System prompt forbids the agent from asking for national ID, card number, password?
4Raw PII not written to logs; emails/phones masked?
5GDPR notice visible under the chat window or in the first message?
6Endpoint for data subject rights (delete, export) — e.g. /sessions/:id/delete?
7Daily backup of the production DB to offsite (S3/Backblaze)?
8ENCRYPTION_KEY backed up separately (password manager + offline)?
9Rate limits: per-IP and per-session minute/day quotas active?
10Kill switch: one env var stops all AI calls?

Frequently asked questions

Why build my own when SaaS options exist?

Three reasons: (1) cost — SaaS charges per message; at scale, self-built is 5-10x cheaper; (2) data ownership — customer data stays in your servers, making GDPR compliance much easier; (3) flexibility — you tailor to your sector, brand voice and tools. A SaaS may be fine for a quick prototype; in production, owning the stack is almost always more sustainable.

How long does it take to build?

With Claude Code following this guide: an initial MVP (RAG + agent + web widget) takes 1-2 weeks. Adding WhatsApp + tools takes another week. Testing + beta takes 2-3 weeks. Total: 4-6 weeks to a production-ready assistant, assuming one backend engineer. SaaS '5-minute setup' claims aren't realistic — quick to launch, takes work to make actually useful.

What's the monthly cost?

Small business (200 conversations/day): VPS ~$10, Anthropic API ~$50-100, OpenAI embeddings ~$2, Supabase ~$0-10, WhatsApp Business ~$0 (first 1,000 free). Total: ~$80-150/month. Mid-size (2,000 conversations/day): ~$300-600/month. A human team would cost $1,500-3,000 per agent + infra.

How do I limit hallucinations?

Three-layer defence: (1) System prompt mandates 'only answer from sources; if not present, say I don't know.' (2) Structured output with { reply, sources[], confidence } — the model can't reply without citing sources. (3) confidence < 0.7 → auto handoff. With all three in place, practical hallucination drops below 1%.

What about GDPR / CCPA?

Minimum: (1) a privacy notice visible in the chat window; (2) a first message disclosing 'You're chatting with an AI assistant; conversations may be logged'; (3) data minimisation — never ask for national ID or card numbers; (4) right to erasure: an endpoint that handles 'delete my data'; (5) retention: conversations 90 days, logs 30 days; (6) cookie consent + third-party disclosure (OpenAI/Anthropic). For full compliance work with legal counsel and our AI Security page.

How does human handoff work?

Three triggers: (1) agent confidence < 0.7; (2) the customer says 'agent please' or similar; (3) topic categories — refund, complaint, price negotiation — auto-handoff. Process: agent_handoff tool sends a Slack notification with a conversation summary, the customer is told their queue position, the agent clicks 'I'll take it' to silence the bot and continues in their CRM widget.

Can it speak multiple languages?

Yes — three ways: (1) the system prompt 'reply in the same language the customer wrote in' — modern LLMs handle 50+ languages fluently; (2) keep RAG docs in your main languages; (3) version WhatsApp templates per language. One assistant can serve the world.

Start today, ship in 6 weeks

Pair this with the n8n learning path, the Claude Code hub and our AI Security guide — you'll have everything you need to take a sustainable assistant to production.

n8n Learning Path Claude Code Hub AI Security