Skip to main content

Topic Classification

Topic classification is the core of AudienceGPT. When you describe an audience segment in natural language, the platform runs it through a 7-layer classification engine that produces a structured taxonomy record. This record includes intent type, intensity, awareness stage, segment type, sensitivity flags, buyer journey position, and a composite score -- along with platform-ready segment names for major DSPs.

This guide covers both classification modes, the full conversation flow, each of the 7 layers in detail, DSP segment naming, reclassification, and duplicate detection.

Classification Modes

AudienceGPT offers two classification modes. You can choose between them depending on your needs for accuracy, speed, and cost.

AI-Powered Classification

The primary classification mode uses Claude Sonnet 4.6 with structured outputs to guarantee valid JSON responses. This mode provides:

  • Web search verification -- The AI can perform up to 3 web searches per classification to verify brand identity, product details, and company information. This prevents misclassification of ambiguous topics (e.g., "Allbirds" correctly identified as a sustainable footwear brand rather than a bird-watching service).
  • Contextual understanding -- The AI interprets nuanced descriptions, industry jargon, and brand names that rule-based systems would miss.
  • Higher accuracy -- Especially for novel topics, niche brands, and ambiguous terms.
info

Web search is optional and can be disabled to reduce classification cost by approximately 50%. When disabled, the AI relies solely on its training data.

Rule-Based Classification

The local fallback mode uses deterministic regex-based classification that runs entirely on the client. This mode:

  • Produces results instantly with no API call
  • Costs nothing (no token usage)
  • Works offline or when the API is unavailable
  • Uses the same 7-layer output structure as AI classification

Rule-based classification is best for well-known categories where the topic name clearly indicates its taxonomy (e.g., "Toyota Camry" is unambiguously automotive). For ambiguous or novel topics, AI-powered mode is recommended.

tip

When importing large batches via CSV, you can choose rule-based mode to classify thousands of topics at no cost, then selectively reclassify ambiguous ones with AI afterward.

Conversation Flow

Classification follows a structured conversation flow with distinct phases. The chat interface guides you through each step.

Flow Phases

input → gathering → confirm → classifying → review → result
PhaseWhat Happens
InputYou enter a topic name or description in the chat
GatheringThe AI asks follow-up questions to collect context (keywords, category hints, segment type)
ConfirmThe AI presents a summary of the topic details for your approval
ClassifyingThe 7-layer engine processes the topic (a few seconds for AI mode, instant for rule-based)
ReviewClassification results are displayed for your review
ResultYou can add the topic to your library, adjust details, or start a new classification

Gathering Phase Details

During the gathering phase, the AI may ask about:

  • Additional keywords -- Synonyms, related terms, or specific product/service names that strengthen the classification signal
  • Category context -- Industry vertical, whether the topic is B2B or B2C, the intended audience
  • Disambiguation -- If the topic name is ambiguous (e.g., "Mercury" could be automotive, technology, or financial services), the AI will ask for clarification

You can skip gathering by providing detailed context upfront. For example, instead of just typing "Salesforce", you could type "Salesforce CRM platform for enterprise sales teams" and the AI may proceed directly to confirmation.

Web Search Verification

When AI-powered mode is active and web search is enabled, the AI can search the web to verify facts about your topic before classifying. This is particularly valuable for:

  • Brand identification -- Confirming what a company actually sells
  • Product categorization -- Distinguishing between similarly named products in different industries
  • Recency -- Catching recent pivots, acquisitions, or product launches

Web search results are used internally by the AI and do not appear directly in the chat. The classification result reflects the verified information. Citation markup from web search results is stripped before the response is returned.

The 7 Classification Layers

Each classified topic receives scores and labels across all 7 layers. Here is a detailed breakdown of each.

Layer 1: Intent Type

Identifies the nature of the audience's interest in the topic. The engine scores the topic name, category, and keywords against weighted regex patterns and returns a ranked list of intent types.

Intent TypeDescriptionExample
BrandResearching a specific brand or product line"Nike", "Salesforce"
ProductInterest in tangible products or hardware/software"iPhone 16", "Ring Doorbell"
ServiceProfessional services, agencies, or providers"Tax Preparation", "HVAC Repair"
SolutionBusiness problems solved by an offering"CRM Platform", "Supply Chain Optimization"
FunctionTechnical concepts, frameworks, or capabilities"Machine Learning", "API Integration"
SymptomProblem recognition and pain points"Slow Website Performance", "High Employee Turnover"
Side EffectSecondary consequences and risks"Data Breach Impact", "Medication Side Effects"
EventConferences, summits, or flagship events"AWS re:Invent", "CES 2026"

Each topic receives a primary intent type (the strongest signal) and optionally a secondary intent type. Both are scored numerically.

Layer 2: Intensity

Measures how strong the behavioral signal is for this audience interest. Intensity is determined by keyword-based scoring against patterns associated with each level.

LevelScoreDescription
Dormant0No active signals detected
Passive15Background-level interest, minimal engagement
Curious30Light research, casual browsing
Active50Regular engagement with topic-related content
Engaged70Sustained, repeated interaction over time
Urgent85Time-sensitive need or strong buying signals
Critical95Immediate action required, highest priority

Layer 3: Awareness Stage

Maps the audience to one of 5 stages in the Eugene Schwartz awareness model, adapted for digital advertising. This tells you where the audience is in their journey from complete unawareness to post-purchase loyalty.

StageFunnel PositionDescription
UnawarePre-FunnelThe audience does not know they have a need
AwarenessTop of Funnel (TOFU)They recognize the problem or category exists
ConsiderationMiddle of Funnel (MOFU)Actively comparing options and solutions
DecisionBottom of Funnel (BOFU)Ready to choose, evaluating specific providers
RetentionPost-PurchaseExisting customers, loyalty and upsell audiences

The awareness stage is derived from the intent type classification. For example, "symptom" intents typically map to the Awareness stage, while "brand" intents with purchase keywords map to Decision.

Layer 4: Segment Type

Determines the business model context of the audience. This affects how the topic is categorized and which DSP configurations are appropriate.

SegmentDescription
B2CBusiness-to-Consumer -- targeting individual consumers
B2BBusiness-to-Business -- targeting companies or professional buyers
B2B2CBusiness-to-Business-to-Consumer -- intermediary model
B2EBusiness-to-Employee -- targeting workforce audiences
B2GBusiness-to-Government -- targeting government entities

Segment type is determined by taxonomy lookup first (each of the 41 parent categories has a default segment type), then refined by keyword analysis if needed.

warning

Segment type determination is strict by design. If the system cannot confidently assign a segment type, it will flag the topic for administrator review rather than guess incorrectly.

Layer 5: Sensitivity

Flags whether the topic falls under regulated or sensitive categories that require special handling in advertising platforms. Sensitive topics are subject to additional compliance rules on most DSPs.

ClassificationDescriptionExamples
StandardNo special restrictions"Toyota RAV4", "Kitchen Remodeling"
SensitiveRegulated category, may have platform restrictionsCannabis, Gambling, Alcohol, Pharmaceutical products

Sensitivity is detected based on the parent category assignment. The following parent categories are automatically flagged as sensitive:

  • Cannabis -- Dispensaries, CBD, cultivation
  • Gambling & Casino -- Online betting, sportsbooks, casinos
  • Alcohol & Spirits -- Beer, wine, spirits (age-gated)
  • Health & Wellness -- Pharmaceutical products, medical devices (context-dependent)

Layer 6: Buyer Journey

Evaluates the purchase readiness of the audience, providing a more granular view than the awareness stage alone.

StageDescriptionScore Range
Purchase ReadyShowing clear buying signals, ready to convert70--100
Active EvaluationComparing specific products/vendors, requesting demos40--69
Research DiscoveryEarly-stage research, gathering information0--39

Each buyer journey stage includes a funnel position label and a descriptive action statement (e.g., "Prioritize for retargeting" for Purchase Ready).

Layer 7: Composite Score

A single 0--100 score that synthesizes signals from all other layers into one actionable number. The composite score drives the interpretation label:

ScoreLabelRecommended Action
80--100Hot LeadPrioritize for direct response and retargeting campaigns
60--79Warm ProspectNurture with consideration-stage content and offers
40--59Active ResearcherEngage with educational content and comparisons
20--39Early ExplorerBuild awareness with top-of-funnel content
0--19Cold AudienceLong-term brand awareness, broad reach campaigns

DSP Segment Names

For each classified topic, AudienceGPT generates platform-ready segment names formatted for major DSP platforms. These names follow the hierarchical naming conventions required by each platform.

Platform Formats

AudienceGPT generates names for three built-in platform formats, plus any custom output templates configured by your administrator:

PlatformFormat PatternCharacter Limit
Trade Desk (Koa)Taxonomy Type > Parent Category > Subcategory > Topic NameDescription: 256 chars
LiveRampTaxonomy Type > Parent Category > Subcategory > Topic NameDescription: 256 chars
InternalTaxonomy Type > Parent Category > Subcategory > Topic NameNo limit

Each platform name is generated from the same classification data but formatted according to that platform's conventions. The names include the full taxonomy path and may include additional context like segment type, intensity, or user behavior labels.

DSP Names Tab

In the Library's topic detail panel, the DSP Names tab shows all generated platform names for a topic. These are the exact strings that will be used when activating segments through platform connections.

tip

If your organization uses custom output templates (configured by an administrator), additional platform name formats will appear alongside the built-in ones. Output templates support {{field}} placeholders for dynamic content.

Taxonomy Hierarchy

Every classified topic is placed within a 4-level taxonomy hierarchy:

Taxonomy Type (13 groups)
└── Parent Category (41 types)
└── Subcategory (tree nodes with L0/L1/L2 levels)
└── Topic

The 13 Taxonomy Types

Taxonomy TypeParent Categories
Automotive & VehiclesAuto, Recreational Vehicles
Home & PropertyReal Estate, Home & Garden / Home Improvement, Home Services
Financial & LegalFinancial Services, Insurance, Legal Services
Technology & TelecomBusiness Technology, Technographics, Telecommunications, Consumer Electronics, Consumer Technology
Consumer Goods & RetailConsumer Goods, Food & Beverage, Apparel & Accessories, Beauty & Personal Care, Babies & Children, Pets & Animals
HealthHealth & Wellness
EducationEducation
Travel & HospitalityTravel & Leisure
Entertainment & MediaEntertainment, Sports & Fitness, Video Gaming, Gambling & Casino, News & Media
Lifestyle & Special InterestAlcohol & Spirits, Cannabis, Gifting & Occasions, Sustainability & Green Living, Luxury & Premium
Civic & CauseCharities & Nonprofits, Politics, Spiritual & Religion
B2B & IndustrialBusiness & Professional Services, Agriculture & Farming, Transportation & Logistics, Energy & Utilities, Government & Public Sector
Cross-CuttingLife-Stage (Inferred)

Each of the 41 parent categories has associated metadata including an IAB content taxonomy code, audience type, domain signals, and example topics that aid classification.

Reclassifying Topics

Topics can be reclassified when the classification engine is updated or when you want to re-evaluate a topic with a different mode.

When to Reclassify

  • Engine version update -- When AudienceGPT releases an engine update that changes classification logic, existing topics are marked as "outdated." You can reclassify them to get results from the current engine.
  • Mode switch -- A topic originally classified with rule-based mode can be reclassified with AI-powered mode for potentially better accuracy.
  • Context changes -- If a brand pivots its business model or a product category evolves, reclassification captures the updated reality.

How to Reclassify

Single topic: Open the topic in the Library detail panel. If the engine version is outdated, a reclassify banner appears. Click "Reclassify" and choose your mode:

  • AI-Powered -- Uses Claude Sonnet 4.6 with optional web search. Higher accuracy, consumes API credits.
  • Rule-Based -- Instant local classification. No cost, but may be less accurate for ambiguous topics.

Bulk reclassify: Select multiple topics in the Library table using checkboxes, then click "Reclassify Selected" in the bulk action bar. Choose your mode in the modal. Limits:

  • Rule-based: up to 500 topics per batch
  • AI-powered: up to 100 topics per batch
info

AI-powered reclassification is quota-checked and usage-tracked. The reclassify modal shows an estimated cost before you confirm.

Engine Versioning

Every topic is stamped with the engine version at classification time. When classification logic changes (keyword patterns, layer functions, taxonomy definitions, or the AI prompt), the engine version is incremented. Topics classified with older versions are flagged as "outdated" in the Library.

You can filter the Library by engine version status (Current or Outdated) to quickly find topics that need reclassification.

Duplicate Detection

AudienceGPT uses a dual-layer duplicate detection system to prevent redundant topics in your library:

  1. Semantic similarity -- Each topic is converted to a 256-dimensional embedding vector. Topics with a cosine similarity above 95% are blocked as duplicates. Topics between 75% and 95% similarity trigger a warning with the option to proceed.

  2. Brand alias matching -- A dictionary of known brand aliases catches deterministic duplicates that embeddings might miss (e.g., "Chevy" and "Chevrolet").

When a duplicate is detected during classification, you will see a warning with the matching topic's name and similarity score. You can choose to proceed (creating a near-duplicate) or cancel and use the existing topic instead.

tip

Duplicate detection also runs during CSV imports and sync operations, automatically enriching existing topics with new metadata rather than creating redundant entries.

Next Steps