Architecture Overview
AudienceGPT is a Next.js 16 App Router application that classifies advertising audience segments through a 7-layer intent taxonomy engine. It combines AI-powered classification (Anthropic Claude) with a deterministic local fallback, stores results in Neon PostgreSQL with pgvector embeddings, and provides multi-tenant access through Clerk organizations.
This page covers the tech stack, system architecture, classification pipeline, storage layer, authentication model, and engine versioning system.
Tech Stack
| Layer | Technology | Details |
|---|---|---|
| Runtime | Bun | JavaScript/TypeScript runtime and package manager |
| Framework | Next.js 16.1.6 | App Router with React Server Components |
| UI | React 19 + Tailwind CSS 4 | Component library with utility-first styling |
| Language | TypeScript (strict mode) | Full type coverage, no any escapes |
| Database | Neon PostgreSQL | Serverless Postgres with pgvector extension |
| Auth | Clerk | Multi-tenant orgs, session management, RBAC |
| AI | Anthropic Claude | Chat (Haiku 4.5), Classification (Sonnet 4.6) |
| Build | Turbopack | Bundled with Next.js 16 |
| Testing | Bun test + happy-dom | Built-in test runner, no Jest dependency |
| Linting | ESLint 9 + Knip | Code quality and dead code detection |
System Architecture
Classification Pipeline
The classification pipeline uses a dual-path architecture: a primary AI path via the Anthropic API, and a deterministic local fallback for when the API is unavailable.
AI Path Details
The primary classification path calls the Anthropic API with:
- Model: Sonnet 4.6 (
claude-sonnet-4-6) by default, configurable viaCLASSIFICATION_MODELenv var - Structured outputs: Uses
output_config.formatto guarantee valid JSON matching the classification schema - Web search: The model has access to a
web_searchtool (max 3 uses per request) to verify unfamiliar brands, products, or companies before classifying - Max tokens: 4,096 for classification responses
The chat conversational layer uses Haiku 4.5 (claude-haiku-4-5-20251001) for faster, cheaper responses.
Local Fallback
When the API is unavailable, buildLocalClassification() in local-fallback.ts runs deterministic regex-based classification. It produces an identical output shape but without web search verification. The fallback scores topic text against keyword patterns for each of the 7 layers.
7-Layer Classification Summary
Each classified topic passes through all seven layers to produce a rich, multi-dimensional profile:
| Layer | Name | Output | Method |
|---|---|---|---|
| 1 | Intent Type | Primary + secondary intents | Regex scoring against 8 intent patterns |
| 2 | Intensity | Level (dormant-critical) + score | Keyword matching with weighted scoring |
| 3 | Awareness | Schwartz stage (unaware-retention) | Mapped from intent type + intensity |
| 4 | Segment | B2B/B2C/B2B2C/B2E/B2G | Taxonomy lookup, then keyword fallback |
| 5 | Sensitivity | Standard or Sensitive | Regulated category detection (cannabis, gambling, etc.) |
| 6 | Buyer Journey | Stage + funnel position | Composite of intent + intensity + awareness |
| 7 | Composite | 0-100 score + interpretation | Weighted combination of all layers |
The engine code lives in src/lib/classification/engine.ts. Each layer is a pure function with no side effects.
Storage Layer
ITaxonomyStore Interface
All data access goes through the ITaxonomyStore interface (src/lib/storage/interface.ts). This async interface defines the contract for:
- CRUD:
getTopic,addTopic,updateTopic,deleteTopics - Queries:
getAllTopics,listTopics(paginated + filtered + sorted) - Similarity:
findSimilar,checkDuplicate,batchSimilarityCheck - Batch:
addTopicsBatch,linkTopicsBatch,updateOrgTopicMetadataBatch - Import:
createImportBatch,updateImportBatch,getImportBatch,listImportBatches - Stats:
getStats,getTaxonomyPerformance - Global matching:
resolveGlobalMatches(for import/sync dedup against global catalog)
Two implementations exist:
| Implementation | Module | Use Case |
|---|---|---|
TaxonomyNeonStore | src/lib/storage/neon-store.ts | Production -- Neon PostgreSQL |
TaxonomyMemoryStore | src/lib/storage/memory-store.ts | Tests -- in-memory, no DB needed |
Duplicate Detection
Topics are embedded as 256-dimensional hash vectors. Duplicate detection uses cosine similarity on these embeddings via pgvector's HNSW index:
- 95% similarity: Blocks the insert (considered a duplicate)
- 75% similarity: Warns the user (potential near-duplicate)
- Brand alias dictionary: Deterministic matching for known brand variants (e.g., "CrowdStrike" vs "Crowdstrike")
Two-Table Architecture
Topics are stored in a global-plus-org-link model:
topics: Global catalog of all classified segments (shared across organizations)org_topics: Per-organization links to global topics, with optional field overrides and performance tracking
The DB columns taxonomy_type and parent_category are swapped relative to their TypeScript names. See the Field Mapping page for the full explanation -- this is critical knowledge for any developer writing SQL.
Authentication Architecture
AudienceGPT supports three authentication methods, each suited to different integration patterns:
1. Clerk Session (Browser)
The primary auth method for the web dashboard. Clerk manages user sessions, organization membership, and roles. The Next.js middleware (proxy.ts) validates sessions on every request.
2. API Key (txadv_ prefix)
Self-managed API keys for server-to-server integration. Each organization can create up to 25 keys with granular scope permissions:
| Scope | Access |
|---|---|
classify | Classify topics via /api/classify |
topics:read | Read topics, stats, duplicate checks |
topics:write | Create/delete topics, run imports |
export | Export topics as CSV or JSON |
sync | Manage sync sources and run syncs |
activations | Manage segment activations and push to DSPs |
mappings:read | Read platform ID mappings |
3. SDK Key (pk_live_ / pk_test_ prefix)
Publishable keys for client-side SDK integration (e.g., embedding the classify widget). These are safe to expose in frontend code and have a fixed scope set.
Multi-Tenant Scoping
Every database query is scoped by OrgContext, which contains the authenticated orgId and userId. There is no cross-organization data leakage -- the storage layer enforces tenant isolation at the query level.
Engine Versioning
Every classified topic is stamped with the current ENGINE_VERSION (currently "2.5") from src/lib/constants/engine-version.ts.
When to bump: Any change to classification logic that would produce different output for the same input -- keyword patterns, layer functions, taxonomy definitions, system prompts, or local fallback logic.
Reclassification options:
- Single topic:
POST /api/topics/:id/reclassifywith optional{ llm: true }for AI-powered mode - Bulk:
POST /api/topics/reclassifywith{ ids: [...], llm?: boolean }(max 500 local, 100 LLM) - Global script:
bun run reclassify-globalfor all outdated topics in the global catalog
The library UI shows an Engine Version filter (Current/Outdated) and outdated segments display a reclassify banner.
Key Module Map
| Module Path | Responsibility |
|---|---|
src/lib/classification/engine.ts | 7-layer classification pure functions |
src/lib/classification/local-fallback.ts | Deterministic regex fallback |
src/lib/classification/reclassify.ts | Local reclassification helper |
src/lib/classification/reclassify-llm.ts | LLM-powered reclassification |
src/config/models.ts | Model IDs, max tokens, pricing |
src/lib/constants/engine-version.ts | ENGINE_VERSION constant |
src/lib/constants/taxonomy-types.ts | 41 parent categories, 13 taxonomy types |
src/lib/naming/dsp-names.ts | Trade Desk, LiveRamp, Internal names |
src/lib/naming/template-engine.ts | Configurable output templates |
src/lib/signals/ucp.ts | User Context Protocol generation |
src/lib/storage/interface.ts | ITaxonomyStore contract |
src/lib/storage/neon-store.ts | Neon PostgreSQL implementation |
src/lib/storage/neon-store-helpers.ts | Row mapping (DB-to-TS swap boundary) |
src/lib/storage/memory-store.ts | In-memory store for tests |
src/hooks/use-classification.ts | UI state machine (ConvoStep) |
src/lib/api-client.ts | Client-side API fetch wrapper |
src/app/api/classify/route.ts | Server-side classification endpoint |
src/lib/auth/api-key-store.ts | API key CRUD (txadv_ prefix) |
src/lib/auth/sdk-key-store.ts | SDK key CRUD (pk_live_/pk_test_) |
src/proxy.ts | Clerk auth middleware |
CI/CD
GitHub Actions workflows in .github/workflows/:
ci.yml-- Runs on push tomainand all PRs: lint, typecheck, knip, test, buildmigrate.yml-- Runs on push tomainonly: applies pending DB migrations via theDATABASE_URLsecret
The CI pipeline mirrors the pre-commit hook checks, ensuring the same quality gates in both local and remote environments.
Next Steps
- Field Mapping -- Understand the DB-to-TS column swap before writing any queries
- Taxonomy Structure -- Explore the full hierarchy of 13 types, 41 categories, and the subcategory tree
- API Reference: Authentication -- Detailed auth integration guide