Skip to main content

Data Hygiene

The Data Hygiene dashboard at /hygiene gives you a centralized view of your organization's data quality across three dimensions: outdated topics that need reclassification, activated segments that have gone stale, and topics missing external source identifiers. Keeping your data clean ensures that the segments you push to DSPs reflect the latest classification logic, naming conventions, and taxonomy structure.

Dashboard Overview

The hygiene dashboard displays three summary cards at the top of the page, each representing a category of data quality issue:

CardWhat It TracksWhy It Matters
Outdated TopicsTopics classified with an older engine versionClassification accuracy improves with each engine update; outdated topics may have suboptimal taxonomy placement
Stale ActivationsActive segments where the pushed data no longer matches current topic dataDSPs are receiving outdated segment names, descriptions, or classifications
Missing Source IDsTopics without an external_id valueCannot trace the topic back to its original data source; complicates deduplication and sync tracking

Each card shows a count and a visual indicator of severity. Clicking a card expands a detailed table of the affected items with available remediation actions.

Outdated Topics

Every topic in AudienceGPT is stamped with an engine_version value at the time of classification. This version corresponds to the classification engine that produced the topic's taxonomy placement, intent scoring, and segment naming. When the classification engine is updated (for example, to improve keyword matching, add new taxonomy types, or refine scoring logic), the ENGINE_VERSION constant is incremented, and all topics classified under the previous version become "outdated."

How Outdated Topics Arise

  • Engine updates: The most common cause. When the AudienceGPT team releases a classification engine update, the internal version number changes. Topics classified before the update carry the old version.
  • Taxonomy structure changes: If new parent categories, subcategories, or taxonomy types are added or reorganized, existing topics may benefit from reclassification under the new structure.
  • Scoring improvements: Refinements to intent type scoring, intensity levels, or buyer journey detection may produce more accurate results for existing topics.

Identifying Outdated Topics

On the hygiene dashboard, the Outdated Topics card shows the total count of topics in your library that were classified with an engine version older than the current version. Expanding the card reveals a table with:

  • Topic Name -- the audience segment name
  • Current Engine Version -- the version stamped on the topic
  • Latest Engine Version -- the current system version
  • Parent Category -- the topic's taxonomy placement
  • Classified At -- when the topic was last classified
info

The engine version comparison is exact string matching. A topic is outdated if its engine_version does not equal the current ENGINE_VERSION value. There is no concept of partial compatibility between versions.

Remediation: Reclassifying Outdated Topics

You have two options for bringing outdated topics up to date:

Individual Reclassification

  1. Click on an outdated topic to open its detail panel
  2. Click the Reclassify button in the banner that appears for outdated topics
  3. Choose your classification mode:
    • AI-Powered -- uses the Anthropic API with web search for maximum accuracy (costs apply)
    • Rule-Based -- uses deterministic local classification at no additional cost
  4. The topic is reclassified and stamped with the current engine version

Bulk Reclassification

  1. From your library page, use the Engine Version filter to show only "Outdated" topics
  2. Select the topics you want to reclassify using the checkboxes
  3. Click Reclassify Selected in the bulk action bar
  4. Choose AI-Powered or Rule-Based mode in the modal
  5. The system processes topics sequentially (up to 500 for rule-based, 100 for AI-powered per batch)
tip

Rule-based reclassification is free and fast, making it suitable for large batches. Use AI-powered reclassification for high-value topics where accuracy is critical, such as topics that drive active DSP campaigns.

Stale Activations

A stale activation is an active segment on an external DSP platform where the data pushed during activation no longer matches the current state of the topic in AudienceGPT. Staleness is detected by comparing two attributes:

  1. Pushed Name vs. Current Name -- the segment name that was sent to the DSP at activation time versus the name the topic currently produces (based on the connection's output template)
  2. Engine Version at Push vs. Current Engine Version -- the engine version when the segment was pushed versus the topic's current engine version

If either value has changed, the activation is flagged as stale.

Viewing Stale Activations

The Stale Activations card on the hygiene dashboard shows the total count across all your outbound connections. Expanding reveals:

  • Topic Name -- the audience segment
  • Platform -- the DSP where the segment is active (LiveRamp, Trade Desk, Index Exchange, Custom API)
  • Connection -- the specific outbound connection
  • Pushed Name -- what the DSP currently has
  • Current Name -- what the topic would generate now
  • Engine Version Mismatch -- whether the engine version has changed since the push

Refreshing Individual Stale Activations

To refresh a single stale activation:

  1. Click on the stale activation row in the hygiene dashboard or navigate to the activation in the Destinations dashboard
  2. Click Refresh on the activation detail
  3. AudienceGPT pushes the updated segment name and description to the DSP platform
  4. The activation record updates with the new pushed name and current engine version

Batch Refresh: Refresh All Stale

For organizations with many stale activations, the hygiene dashboard provides a Refresh All Stale button that triggers a background job to refresh every stale activation across all your outbound connections.

  1. Click Refresh All Stale on the Stale Activations card
  2. A background job is created that processes stale activations in batches of 25
  3. Progress is displayed in real time -- you can see how many have been refreshed and any errors
  4. You can cancel the job if needed; already-refreshed activations retain their updates
warning

Refreshing activations makes live changes to your segments on external DSP platforms. Each refresh calls the platform's update API, which means the DSP will receive the new segment name and description. Verify that your output templates produce the desired naming before running a batch refresh.

Missing Source IDs

Topics that lack an external_id cannot be traced back to their original data source. This typically affects topics that were:

  • Classified individually through the chatbot (no external source)
  • Imported from CSV files that did not include a source ID column
  • Created via matrix generation (combinatorial topics have no external origin)

The Missing Source IDs card shows the count of topics in your library without an external identifier. While this is informational rather than critical, having source IDs improves:

  • Deduplication accuracy -- topics with matching external IDs are recognized as the same item during sync, even if names differ slightly
  • Sync reconciliation -- incremental syncs use the source ID to determine which topics are new vs. already imported
  • Audit trail -- tracing a topic back to the original record in the source system

Adding Source IDs

Source IDs are typically populated during import or sync operations that include an ID field mapping. If you need to add source IDs to existing topics:

  1. Export your library topics to CSV
  2. Match topics to their source records and add the external ID
  3. Re-import with the external ID column mapped

Alternatively, setting up an inbound sync connection with a source_id_field mapping will populate external IDs for matched topics during subsequent syncs.

Health Indicators

The hygiene dashboard uses color-coded health indicators to communicate the overall state of your data at a glance:

IndicatorMeaning
Green (Healthy)No issues detected -- all topics are current, all activations are fresh, and source IDs are populated
Yellow (Warning)Minor issues present -- some outdated topics or a small number of stale activations
Red (Critical)Significant issues requiring attention -- many stale activations, large numbers of outdated topics, or errors in refresh operations
Gray (Idle)No data to evaluate -- the category is empty (e.g., no activations exist to check for staleness)

These indicators also appear on the Destinations dashboard for per-connection health monitoring.

Next Steps

  • Platform Connections -- Set up inbound and outbound connections to keep your data flowing
  • Sync -- Pull topics from external APIs with automatic classification and deduplication
  • Segment Activations -- Push segments to DSPs and monitor delivery health