Architecture Overview

AudienceGPT is a Next.js 16 App Router application that classifies advertising audience segments through a 7-layer intent taxonomy engine. It combines AI-powered classification (Anthropic Claude) with a deterministic local fallback, stores results in Neon PostgreSQL with pgvector embeddings, and provides multi-tenant access through Clerk organizations.

This page covers the tech stack, system architecture, classification pipeline, storage layer, authentication model, and engine versioning system.

Tech Stack

Layer	Technology	Details
Runtime	Bun	JavaScript/TypeScript runtime and package manager
Framework	Next.js 16.1.6	App Router with React Server Components
UI	React 19 + Tailwind CSS 4	Component library with utility-first styling
Language	TypeScript (strict mode)	Full type coverage, no `any` escapes
Database	Neon PostgreSQL	Serverless Postgres with pgvector extension
Auth	Clerk	Multi-tenant orgs, session management, RBAC
AI	Anthropic Claude	Chat (Haiku 4.5), Classification (Sonnet 4.6)
Build	Turbopack	Bundled with Next.js 16
Testing	Bun test + happy-dom	Built-in test runner, no Jest dependency
Linting	ESLint 9 + Knip	Code quality and dead code detection

System Architecture

Classification Pipeline

The classification pipeline uses a dual-path architecture: a primary AI path via the Anthropic API, and a deterministic local fallback for when the API is unavailable.

AI Path Details

The primary classification path calls the Anthropic API with:

Model: Sonnet 4.6 (claude-sonnet-4-6) by default, configurable via CLASSIFICATION_MODEL env var
Structured outputs: Uses output_config.format to guarantee valid JSON matching the classification schema
Web search: The model has access to a web_search tool (max 3 uses per request) to verify unfamiliar brands, products, or companies before classifying
Max tokens: 4,096 for classification responses

The chat conversational layer uses Haiku 4.5 (claude-haiku-4-5-20251001) for faster, cheaper responses.

Local Fallback

When the API is unavailable, buildLocalClassification() in local-fallback.ts runs deterministic regex-based classification. It produces an identical output shape but without web search verification. The fallback scores topic text against keyword patterns for each of the 7 layers.

7-Layer Classification Summary

Each classified topic passes through all seven layers to produce a rich, multi-dimensional profile:

Layer	Name	Output	Method
1	Intent Type	Primary + secondary intents	Regex scoring against 8 intent patterns
2	Intensity	Level (dormant-critical) + score	Keyword matching with weighted scoring
3	Awareness	Schwartz stage (unaware-retention)	Mapped from intent type + intensity
4	Segment	B2B/B2C/B2B2C/B2E/B2G	Taxonomy lookup, then keyword fallback
5	Sensitivity	Standard or Sensitive	Regulated category detection (cannabis, gambling, etc.)
6	Buyer Journey	Stage + funnel position	Composite of intent + intensity + awareness
7	Composite	0-100 score + interpretation	Weighted combination of all layers

The engine code lives in src/lib/classification/engine.ts. Each layer is a pure function with no side effects.

Storage Layer

ITaxonomyStore Interface

All data access goes through the ITaxonomyStore interface (src/lib/storage/interface.ts). This async interface defines the contract for:

CRUD: getTopic, addTopic, updateTopic, deleteTopics
Queries: getAllTopics, listTopics (paginated + filtered + sorted)
Similarity: findSimilar, checkDuplicate, batchSimilarityCheck
Batch: addTopicsBatch, linkTopicsBatch, updateOrgTopicMetadataBatch
Import: createImportBatch, updateImportBatch, getImportBatch, listImportBatches
Stats: getStats, getTaxonomyPerformance
Global matching: resolveGlobalMatches (for import/sync dedup against global catalog)

Two implementations exist:

Implementation	Module	Use Case
`TaxonomyNeonStore`	`src/lib/storage/neon-store.ts`	Production -- Neon PostgreSQL
`TaxonomyMemoryStore`	`src/lib/storage/memory-store.ts`	Tests -- in-memory, no DB needed

Duplicate Detection

Topics are embedded as 256-dimensional hash vectors. Duplicate detection uses cosine similarity on these embeddings via pgvector's HNSW index:

95% similarity: Blocks the insert (considered a duplicate)
75% similarity: Warns the user (potential near-duplicate)
Brand alias dictionary: Deterministic matching for known brand variants (e.g., "CrowdStrike" vs "Crowdstrike")

Two-Table Architecture

Topics are stored in a global-plus-org-link model:

topics: Global catalog of all classified segments (shared across organizations)
org_topics: Per-organization links to global topics, with optional field overrides and performance tracking

Field names differ between DB and TypeScript

The DB columns taxonomy_type and parent_category are swapped relative to their TypeScript names. See the Field Mapping page for the full explanation -- this is critical knowledge for any developer writing SQL.

Authentication Architecture

AudienceGPT supports three authentication methods, each suited to different integration patterns:

1. Clerk Session (Browser)

The primary auth method for the web dashboard. Clerk manages user sessions, organization membership, and roles. The Next.js middleware (proxy.ts) validates sessions on every request.

2. API Key (`txadv_` prefix)

Self-managed API keys for server-to-server integration. Each organization can create up to 25 keys with granular scope permissions:

Scope	Access
`classify`	Classify topics via `/api/classify`
`topics:read`	Read topics, stats, duplicate checks
`topics:write`	Create/delete topics, run imports
`export`	Export topics as CSV or JSON
`sync`	Manage sync sources and run syncs
`activations`	Manage segment activations and push to DSPs
`mappings:read`	Read platform ID mappings

3. SDK Key (`pk_live_` / `pk_test_` prefix)

Publishable keys for client-side SDK integration (e.g., embedding the classify widget). These are safe to expose in frontend code and have a fixed scope set.

Multi-Tenant Scoping

Every database query is scoped by OrgContext, which contains the authenticated orgId and userId. There is no cross-organization data leakage -- the storage layer enforces tenant isolation at the query level.

Engine Versioning

Every classified topic is stamped with the current ENGINE_VERSION (currently "2.5") from src/lib/constants/engine-version.ts.

When to bump: Any change to classification logic that would produce different output for the same input -- keyword patterns, layer functions, taxonomy definitions, system prompts, or local fallback logic.

Reclassification options:

Single topic: POST /api/topics/:id/reclassify with optional { llm: true } for AI-powered mode
Bulk: POST /api/topics/reclassify with { ids: [...], llm?: boolean } (max 500 local, 100 LLM)
Global script: bun run reclassify-global for all outdated topics in the global catalog

The library UI shows an Engine Version filter (Current/Outdated) and outdated segments display a reclassify banner.

Key Module Map

Module Path	Responsibility
`src/lib/classification/engine.ts`	7-layer classification pure functions
`src/lib/classification/local-fallback.ts`	Deterministic regex fallback
`src/lib/classification/reclassify.ts`	Local reclassification helper
`src/lib/classification/reclassify-llm.ts`	LLM-powered reclassification
`src/config/models.ts`	Model IDs, max tokens, pricing
`src/lib/constants/engine-version.ts`	`ENGINE_VERSION` constant
`src/lib/constants/taxonomy-types.ts`	41 parent categories, 13 taxonomy types
`src/lib/naming/dsp-names.ts`	Trade Desk, LiveRamp, Internal names
`src/lib/naming/template-engine.ts`	Configurable output templates
`src/lib/signals/ucp.ts`	User Context Protocol generation
`src/lib/storage/interface.ts`	`ITaxonomyStore` contract
`src/lib/storage/neon-store.ts`	Neon PostgreSQL implementation
`src/lib/storage/neon-store-helpers.ts`	Row mapping (DB-to-TS swap boundary)
`src/lib/storage/memory-store.ts`	In-memory store for tests
`src/hooks/use-classification.ts`	UI state machine (ConvoStep)
`src/lib/api-client.ts`	Client-side API fetch wrapper
`src/app/api/classify/route.ts`	Server-side classification endpoint
`src/lib/auth/api-key-store.ts`	API key CRUD (`txadv_` prefix)
`src/lib/auth/sdk-key-store.ts`	SDK key CRUD (`pk_live_`/`pk_test_`)
`src/proxy.ts`	Clerk auth middleware

CI/CD

GitHub Actions workflows in .github/workflows/:

ci.yml -- Runs on push to main and all PRs: lint, typecheck, knip, test, build
migrate.yml -- Runs on push to main only: applies pending DB migrations via the DATABASE_URL secret

The CI pipeline mirrors the pre-commit hook checks, ensuring the same quality gates in both local and remote environments.

Next Steps

Field Mapping -- Understand the DB-to-TS column swap before writing any queries
Taxonomy Structure -- Explore the full hierarchy of 13 types, 41 categories, and the subcategory tree
API Reference: Authentication -- Detailed auth integration guide

Tech Stack​

System Architecture​

Classification Pipeline​

AI Path Details​

Local Fallback​

7-Layer Classification Summary​

Storage Layer​

ITaxonomyStore Interface​

Duplicate Detection​

Two-Table Architecture​

Authentication Architecture​

1. Clerk Session (Browser)​

2. API Key (txadv_ prefix)​

3. SDK Key (pk_live_ / pk_test_ prefix)​

Multi-Tenant Scoping​

Engine Versioning​

Key Module Map​

CI/CD​

Next Steps​