Skip to main content

Architecture Overview

AudienceGPT is a Next.js 16 App Router application that classifies advertising audience segments through a 7-layer intent taxonomy engine. It combines AI-powered classification (Anthropic Claude) with a deterministic local fallback, stores results in Neon PostgreSQL with pgvector embeddings, and provides multi-tenant access through Clerk organizations.

This page covers the tech stack, system architecture, classification pipeline, storage layer, authentication model, and engine versioning system.

Tech Stack

LayerTechnologyDetails
RuntimeBunJavaScript/TypeScript runtime and package manager
FrameworkNext.js 16.1.6App Router with React Server Components
UIReact 19 + Tailwind CSS 4Component library with utility-first styling
LanguageTypeScript (strict mode)Full type coverage, no any escapes
DatabaseNeon PostgreSQLServerless Postgres with pgvector extension
AuthClerkMulti-tenant orgs, session management, RBAC
AIAnthropic ClaudeChat (Haiku 4.5), Classification (Sonnet 4.6)
BuildTurbopackBundled with Next.js 16
TestingBun test + happy-domBuilt-in test runner, no Jest dependency
LintingESLint 9 + KnipCode quality and dead code detection

System Architecture

Classification Pipeline

The classification pipeline uses a dual-path architecture: a primary AI path via the Anthropic API, and a deterministic local fallback for when the API is unavailable.

AI Path Details

The primary classification path calls the Anthropic API with:

  • Model: Sonnet 4.6 (claude-sonnet-4-6) by default, configurable via CLASSIFICATION_MODEL env var
  • Structured outputs: Uses output_config.format to guarantee valid JSON matching the classification schema
  • Web search: The model has access to a web_search tool (max 3 uses per request) to verify unfamiliar brands, products, or companies before classifying
  • Max tokens: 4,096 for classification responses

The chat conversational layer uses Haiku 4.5 (claude-haiku-4-5-20251001) for faster, cheaper responses.

Local Fallback

When the API is unavailable, buildLocalClassification() in local-fallback.ts runs deterministic regex-based classification. It produces an identical output shape but without web search verification. The fallback scores topic text against keyword patterns for each of the 7 layers.

7-Layer Classification Summary

Each classified topic passes through all seven layers to produce a rich, multi-dimensional profile:

LayerNameOutputMethod
1Intent TypePrimary + secondary intentsRegex scoring against 8 intent patterns
2IntensityLevel (dormant-critical) + scoreKeyword matching with weighted scoring
3AwarenessSchwartz stage (unaware-retention)Mapped from intent type + intensity
4SegmentB2B/B2C/B2B2C/B2E/B2GTaxonomy lookup, then keyword fallback
5SensitivityStandard or SensitiveRegulated category detection (cannabis, gambling, etc.)
6Buyer JourneyStage + funnel positionComposite of intent + intensity + awareness
7Composite0-100 score + interpretationWeighted combination of all layers

The engine code lives in src/lib/classification/engine.ts. Each layer is a pure function with no side effects.

Storage Layer

ITaxonomyStore Interface

All data access goes through the ITaxonomyStore interface (src/lib/storage/interface.ts). This async interface defines the contract for:

  • CRUD: getTopic, addTopic, updateTopic, deleteTopics
  • Queries: getAllTopics, listTopics (paginated + filtered + sorted)
  • Similarity: findSimilar, checkDuplicate, batchSimilarityCheck
  • Batch: addTopicsBatch, linkTopicsBatch, updateOrgTopicMetadataBatch
  • Import: createImportBatch, updateImportBatch, getImportBatch, listImportBatches
  • Stats: getStats, getTaxonomyPerformance
  • Global matching: resolveGlobalMatches (for import/sync dedup against global catalog)

Two implementations exist:

ImplementationModuleUse Case
TaxonomyNeonStoresrc/lib/storage/neon-store.tsProduction -- Neon PostgreSQL
TaxonomyMemoryStoresrc/lib/storage/memory-store.tsTests -- in-memory, no DB needed

Duplicate Detection

Topics are embedded as 256-dimensional hash vectors. Duplicate detection uses cosine similarity on these embeddings via pgvector's HNSW index:

  • 95% similarity: Blocks the insert (considered a duplicate)
  • 75% similarity: Warns the user (potential near-duplicate)
  • Brand alias dictionary: Deterministic matching for known brand variants (e.g., "CrowdStrike" vs "Crowdstrike")

Two-Table Architecture

Topics are stored in a global-plus-org-link model:

  • topics: Global catalog of all classified segments (shared across organizations)
  • org_topics: Per-organization links to global topics, with optional field overrides and performance tracking
Field names differ between DB and TypeScript

The DB columns taxonomy_type and parent_category are swapped relative to their TypeScript names. See the Field Mapping page for the full explanation -- this is critical knowledge for any developer writing SQL.

Authentication Architecture

AudienceGPT supports three authentication methods, each suited to different integration patterns:

1. Clerk Session (Browser)

The primary auth method for the web dashboard. Clerk manages user sessions, organization membership, and roles. The Next.js middleware (proxy.ts) validates sessions on every request.

2. API Key (txadv_ prefix)

Self-managed API keys for server-to-server integration. Each organization can create up to 25 keys with granular scope permissions:

ScopeAccess
classifyClassify topics via /api/classify
topics:readRead topics, stats, duplicate checks
topics:writeCreate/delete topics, run imports
exportExport topics as CSV or JSON
syncManage sync sources and run syncs
activationsManage segment activations and push to DSPs
mappings:readRead platform ID mappings

3. SDK Key (pk_live_ / pk_test_ prefix)

Publishable keys for client-side SDK integration (e.g., embedding the classify widget). These are safe to expose in frontend code and have a fixed scope set.

Multi-Tenant Scoping

Every database query is scoped by OrgContext, which contains the authenticated orgId and userId. There is no cross-organization data leakage -- the storage layer enforces tenant isolation at the query level.

Engine Versioning

Every classified topic is stamped with the current ENGINE_VERSION (currently "2.5") from src/lib/constants/engine-version.ts.

When to bump: Any change to classification logic that would produce different output for the same input -- keyword patterns, layer functions, taxonomy definitions, system prompts, or local fallback logic.

Reclassification options:

  • Single topic: POST /api/topics/:id/reclassify with optional { llm: true } for AI-powered mode
  • Bulk: POST /api/topics/reclassify with { ids: [...], llm?: boolean } (max 500 local, 100 LLM)
  • Global script: bun run reclassify-global for all outdated topics in the global catalog

The library UI shows an Engine Version filter (Current/Outdated) and outdated segments display a reclassify banner.

Key Module Map

Module PathResponsibility
src/lib/classification/engine.ts7-layer classification pure functions
src/lib/classification/local-fallback.tsDeterministic regex fallback
src/lib/classification/reclassify.tsLocal reclassification helper
src/lib/classification/reclassify-llm.tsLLM-powered reclassification
src/config/models.tsModel IDs, max tokens, pricing
src/lib/constants/engine-version.tsENGINE_VERSION constant
src/lib/constants/taxonomy-types.ts41 parent categories, 13 taxonomy types
src/lib/naming/dsp-names.tsTrade Desk, LiveRamp, Internal names
src/lib/naming/template-engine.tsConfigurable output templates
src/lib/signals/ucp.tsUser Context Protocol generation
src/lib/storage/interface.tsITaxonomyStore contract
src/lib/storage/neon-store.tsNeon PostgreSQL implementation
src/lib/storage/neon-store-helpers.tsRow mapping (DB-to-TS swap boundary)
src/lib/storage/memory-store.tsIn-memory store for tests
src/hooks/use-classification.tsUI state machine (ConvoStep)
src/lib/api-client.tsClient-side API fetch wrapper
src/app/api/classify/route.tsServer-side classification endpoint
src/lib/auth/api-key-store.tsAPI key CRUD (txadv_ prefix)
src/lib/auth/sdk-key-store.tsSDK key CRUD (pk_live_/pk_test_)
src/proxy.tsClerk auth middleware

CI/CD

GitHub Actions workflows in .github/workflows/:

  • ci.yml -- Runs on push to main and all PRs: lint, typecheck, knip, test, build
  • migrate.yml -- Runs on push to main only: applies pending DB migrations via the DATABASE_URL secret

The CI pipeline mirrors the pre-commit hook checks, ensuring the same quality gates in both local and remote environments.

Next Steps