Data Hygiene

The Data Hygiene dashboard at /hygiene gives you a centralized view of your organization's data quality across three dimensions: outdated topics that need reclassification, activated segments that have gone stale, and topics missing external source identifiers. Keeping your data clean ensures that the segments you push to DSPs reflect the latest classification logic, naming conventions, and taxonomy structure.

Dashboard Overview

The hygiene dashboard displays three summary cards at the top of the page, each representing a category of data quality issue:

Card	What It Tracks	Why It Matters
Outdated Topics	Topics classified with an older engine version	Classification accuracy improves with each engine update; outdated topics may have suboptimal taxonomy placement
Stale Activations	Active segments where the pushed data no longer matches current topic data	DSPs are receiving outdated segment names, descriptions, or classifications
Missing Source IDs	Topics without an `external_id` value	Cannot trace the topic back to its original data source; complicates deduplication and sync tracking

Each card shows a count and a visual indicator of severity. Clicking a card expands a detailed table of the affected items with available remediation actions.

Outdated Topics

Every topic in AudienceGPT is stamped with an engine_version value at the time of classification. This version corresponds to the classification engine that produced the topic's taxonomy placement, intent scoring, and segment naming. When the classification engine is updated (for example, to improve keyword matching, add new taxonomy types, or refine scoring logic), the ENGINE_VERSION constant is incremented, and all topics classified under the previous version become "outdated."

How Outdated Topics Arise

Engine updates: The most common cause. When the AudienceGPT team releases a classification engine update, the internal version number changes. Topics classified before the update carry the old version.
Taxonomy structure changes: If new parent categories, subcategories, or taxonomy types are added or reorganized, existing topics may benefit from reclassification under the new structure.
Scoring improvements: Refinements to intent type scoring, intensity levels, or buyer journey detection may produce more accurate results for existing topics.

Identifying Outdated Topics

On the hygiene dashboard, the Outdated Topics card shows the total count of topics in your library that were classified with an engine version older than the current version. Expanding the card reveals a table with:

Topic Name -- the audience segment name
Current Engine Version -- the version stamped on the topic
Latest Engine Version -- the current system version
Parent Category -- the topic's taxonomy placement
Classified At -- when the topic was last classified

info

The engine version comparison is exact string matching. A topic is outdated if its engine_version does not equal the current ENGINE_VERSION value. There is no concept of partial compatibility between versions.

Remediation: Reclassifying Outdated Topics

You have two options for bringing outdated topics up to date:

Individual Reclassification

Click on an outdated topic to open its detail panel
Click the Reclassify button in the banner that appears for outdated topics
Choose your classification mode:
- AI-Powered -- uses the Anthropic API with web search for maximum accuracy (costs apply)
- Rule-Based -- uses deterministic local classification at no additional cost
The topic is reclassified and stamped with the current engine version

Bulk Reclassification

From your library page, use the Engine Version filter to show only "Outdated" topics
Select the topics you want to reclassify using the checkboxes
Click Reclassify Selected in the bulk action bar
Choose AI-Powered or Rule-Based mode in the modal
The system processes topics sequentially (up to 500 for rule-based, 100 for AI-powered per batch)

tip

Rule-based reclassification is free and fast, making it suitable for large batches. Use AI-powered reclassification for high-value topics where accuracy is critical, such as topics that drive active DSP campaigns.

Stale Activations

A stale activation is an active segment on an external DSP platform where the data pushed during activation no longer matches the current state of the topic in AudienceGPT. Staleness is detected by comparing two attributes:

Pushed Name vs. Current Name -- the segment name that was sent to the DSP at activation time versus the name the topic currently produces (based on the connection's output template)
Engine Version at Push vs. Current Engine Version -- the engine version when the segment was pushed versus the topic's current engine version

If either value has changed, the activation is flagged as stale.

Viewing Stale Activations

The Stale Activations card on the hygiene dashboard shows the total count across all your outbound connections. Expanding reveals:

Topic Name -- the audience segment
Platform -- the DSP where the segment is active (LiveRamp, Trade Desk, Index Exchange, Custom API)
Connection -- the specific outbound connection
Pushed Name -- what the DSP currently has
Current Name -- what the topic would generate now
Engine Version Mismatch -- whether the engine version has changed since the push

Refreshing Individual Stale Activations

To refresh a single stale activation:

Click on the stale activation row in the hygiene dashboard or navigate to the activation in the Destinations dashboard
Click Refresh on the activation detail
AudienceGPT pushes the updated segment name and description to the DSP platform
The activation record updates with the new pushed name and current engine version

Batch Refresh: Refresh All Stale

For organizations with many stale activations, the hygiene dashboard provides a Refresh All Stale button that triggers a background job to refresh every stale activation across all your outbound connections.

Click Refresh All Stale on the Stale Activations card
A background job is created that processes stale activations in batches of 25
Progress is displayed in real time -- you can see how many have been refreshed and any errors
You can cancel the job if needed; already-refreshed activations retain their updates

warning

Refreshing activations makes live changes to your segments on external DSP platforms. Each refresh calls the platform's update API, which means the DSP will receive the new segment name and description. Verify that your output templates produce the desired naming before running a batch refresh.

Missing Source IDs

Topics that lack an external_id cannot be traced back to their original data source. This typically affects topics that were:

Classified individually through the chatbot (no external source)
Imported from CSV files that did not include a source ID column
Created via matrix generation (combinatorial topics have no external origin)

The Missing Source IDs card shows the count of topics in your library without an external identifier. While this is informational rather than critical, having source IDs improves:

Deduplication accuracy -- topics with matching external IDs are recognized as the same item during sync, even if names differ slightly
Sync reconciliation -- incremental syncs use the source ID to determine which topics are new vs. already imported
Audit trail -- tracing a topic back to the original record in the source system

Adding Source IDs

Source IDs are typically populated during import or sync operations that include an ID field mapping. If you need to add source IDs to existing topics:

Export your library topics to CSV
Match topics to their source records and add the external ID
Re-import with the external ID column mapped

Alternatively, setting up an inbound sync connection with a source_id_field mapping will populate external IDs for matched topics during subsequent syncs.

Health Indicators

The hygiene dashboard uses color-coded health indicators to communicate the overall state of your data at a glance:

Indicator	Meaning
Green (Healthy)	No issues detected -- all topics are current, all activations are fresh, and source IDs are populated
Yellow (Warning)	Minor issues present -- some outdated topics or a small number of stale activations
Red (Critical)	Significant issues requiring attention -- many stale activations, large numbers of outdated topics, or errors in refresh operations
Gray (Idle)	No data to evaluate -- the category is empty (e.g., no activations exist to check for staleness)

These indicators also appear on the Destinations dashboard for per-connection health monitoring.

Next Steps

Platform Connections -- Set up inbound and outbound connections to keep your data flowing
Sync -- Pull topics from external APIs with automatic classification and deduplication
Segment Activations -- Push segments to DSPs and monitor delivery health

Dashboard Overview​

Outdated Topics​

How Outdated Topics Arise​

Identifying Outdated Topics​

Remediation: Reclassifying Outdated Topics​

Stale Activations​

Viewing Stale Activations​

Refreshing Individual Stale Activations​

Batch Refresh: Refresh All Stale​

Missing Source IDs​

Adding Source IDs​

Health Indicators​

Next Steps​