Skip to main content

Sync (Pull-based Import)

Sync is the process of pulling audience topic data from an external API into your AudienceGPT library. Unlike CSV import (which uploads a static file), sync connects to a live API endpoint, fetches records page by page, classifies each one through the AudienceGPT engine, deduplicates against your existing topics, and inserts the results into your library. Sync operations are client-orchestrated, giving you real-time progress visibility and the ability to cancel at any point.

Sync Overview

The sync workflow follows this sequence:

  1. Configure an inbound connection with the API endpoint, authentication, field mappings, and pagination settings
  2. Trigger a sync run from the connection detail page
  3. Fetch records from the external API page by page
  4. Classify each record through the AudienceGPT classification engine (AI-powered or rule-based)
  5. Deduplicate against existing global and library topics
  6. Insert new topics and enrich existing duplicates with metadata updates
  7. Monitor progress and review the results
Inbound Connection → Sync Now → Create Run + Fetch Count
→ Loop: Fetch Page → Classify → Dedup → Insert
→ Poll Status → Complete / Cancel

Setting Up an Inbound Connection

Before you can run a sync, you need an inbound platform connection configured with the details of your external API. See Platform Connections for the full connection creation guide. The key configuration fields for sync are described below.

API Configuration

FieldDescriptionExample
Base URLRoot URL of the external APIhttps://api.example.com/v2/segments
Response PathJSON path to the array of records in the API responsedata.segments or results
Count EndpointSeparate endpoint (or same endpoint with count param) that returns total record counthttps://api.example.com/v2/segments/count
Count Response PathJSON path to the total count in the count responsedata.total or count
Extra HeadersAdditional HTTP headers to include in every request{"X-Workspace-Id": "abc123"}
FiltersKey-value pairs appended as query parameters{"status": "active", "type": "audience"}

Field Mappings

Field mappings tell AudienceGPT how to translate fields from the external API response into topic attributes. Each mapping pairs a source field name (from the API) with a target field name (in AudienceGPT).

Source Field (API)Target Field (AudienceGPT)Description
nametopic_nameThe audience segment name (required)
idexternal_idThe source record's unique identifier
categoryparent_categoryPre-mapped taxonomy category
descriptiondescriptionFree-text description of the segment
segment_typesegment_typeB2B or B2C designation

At minimum, you must map a source field to topic_name. The external_id mapping is strongly recommended, as it enables deduplication by source ID and supports incremental sync.

tip

Map the source system's unique record ID to external_id. This allows AudienceGPT to recognize previously synced records during subsequent sync runs, preventing duplicates and enabling metadata enrichment on existing topics.

Pagination Configuration

AudienceGPT supports two pagination strategies for fetching records from external APIs:

StrategyHow It WorksWhen to Use
OffsetUses limit and offset query parameters to page through resultsMost REST APIs with standard pagination
CursorUses a cursor token from the previous response to fetch the next pageAPIs that use cursor-based pagination (e.g., some GraphQL endpoints)

Configure the pagination parameters:

ParameterDescriptionDefault
Limit ParamQuery parameter name for page sizelimit
Offset ParamQuery parameter name for offset (offset mode)offset
Cursor ParamQuery parameter name for cursor token (cursor mode)cursor
Page SizeNumber of records per page100

Sync Mode

ModeBehaviorUse Case
FullFetches all records from the API on every sync runFirst-time sync or when you want a complete refresh
IncrementalOnly fetches records created or updated since the last syncOngoing syncs to pick up new records without re-processing the entire dataset

For incremental sync, specify the Incremental Field -- the API field that contains a timestamp or version number indicating when the record was last modified. AudienceGPT stores the latest value from each sync run and uses it as a filter on the next run.

Source ID Field

The Source ID Field tells AudienceGPT which field in the API response contains the unique record identifier. This value is stored as external_id on the topic and used for deduplication during subsequent syncs.

Running a Sync

Triggering a Sync

Navigate to the Sync page (/sync) or open your inbound connection's detail view. Click Sync Now to start a new sync run.

Before fetching records, you will be prompted to choose a classification mode:

ModeDescriptionCostSpeed
AI-PoweredUses the Anthropic API with web search to classify new topicsPer-token pricing appliesSlower (API calls per topic, capped per page)
Rule-BasedUses the deterministic local classification engineFreeFast
info

The classification mode only applies to topics that are new to the global catalog. Topics that already exist globally (matched by name or embedding similarity) are adopted into your library with their existing classification, regardless of the mode you select.

Page-by-Page Processing

After triggering the sync, AudienceGPT creates a sync run record and fetches the total record count from the count endpoint. Then it processes records page by page:

  1. Fetch page: Requests a page of records from the external API using the configured pagination
  2. Map fields: Applies your field mappings to transform each record into AudienceGPT's topic format
  3. Classify: Runs each new topic through the classification engine (AI-powered or rule-based, based on your selection)
  4. Deduplicate: Checks each topic against existing global topics using:
    • External ID match: If the topic's external ID already exists, it is recognized as a duplicate
    • Name match: Exact topic name comparison
    • Embedding similarity: 256-dimensional hash embeddings with cosine similarity (95% threshold blocks, 75% threshold warns)
  5. Insert or enrich: New topics are inserted into the global catalog and linked to your library. Duplicate topics enrich the existing record's metadata via COALESCE (e.g., updating external_id or source if the existing record was missing those values)
  6. Advance: Moves to the next page until all pages are processed or the sync is cancelled

Each page is processed as a self-contained unit. If a page fails, the sync can retry that page without re-processing successful pages.

Monitoring Progress

While a sync is running, the sync progress panel displays real-time statistics:

MetricDescription
StatusCurrent run status: Processing, Completed, Failed, or Cancelled
PagesPages completed out of total pages (e.g., "5 / 12")
ProcessedTotal records processed across all completed pages
NewRecords that were classified and inserted as new topics
DuplicatesRecords that matched existing topics (enriched metadata)
AdoptedRecords that matched global topics and were linked to your library
ErrorsRecords that failed classification or insertion

Cancelling a Sync

Click Cancel on the sync progress panel to stop the run. Cancellation is graceful:

  • The currently processing page completes (it is not interrupted mid-page)
  • No further pages are fetched
  • All topics already inserted remain in your library
  • The sync run is marked as "Cancelled" in history
warning

Cancelling a sync does not roll back topics that were already inserted. If you need to remove synced topics, you must do so manually from your library.

Duplicate Handling

AudienceGPT uses a multi-layered deduplication strategy during sync to avoid creating redundant topics while still enriching existing records with new metadata.

Deduplication Layers

  1. External ID match: The fastest check. If the incoming record's external ID matches an existing topic's external_id, they are the same record. The existing topic is enriched with any new metadata from the incoming record.

  2. Name match: Exact string comparison of topic names. If a topic with the same name already exists globally, the incoming record adopts the existing classification rather than reclassifying.

  3. Embedding similarity: For records that pass name matching, AudienceGPT computes a 256-dimensional hash embedding and compares it against existing topics using cosine similarity:

    • 95% or higher: Blocked as a duplicate -- the existing topic is used
    • 75% to 94%: Warning-level similarity -- still treated as a new topic but flagged

Metadata Enrichment via COALESCE

When a duplicate is detected, AudienceGPT does not silently skip the record. Instead, it applies a COALESCE enrichment pattern: for each metadata field, if the existing topic's value is NULL and the incoming record has a value, the existing topic is updated. This means syncing the same source multiple times progressively fills in metadata without overwriting existing data.

Fields eligible for COALESCE enrichment include:

  • external_id (source record ID)
  • source (origin system name)
  • description
  • iab_code
  • audience_type

Sync History

Each sync run is recorded and accessible from the connection detail page. The sync history table shows:

ColumnDescription
Run IDUnique identifier for the sync run
StartedTimestamp when the sync was triggered
CompletedTimestamp when the sync finished (or was cancelled)
StatusFinal status: Completed, Failed, or Cancelled
Total ItemsTotal records discovered in the external API
ProcessedRecords successfully processed
NewNew topics created
DuplicatesDuplicate records (metadata enriched)
AdoptedGlobal topics linked to your library
ErrorsRecords that failed processing
Classification ModeAI-Powered or Rule-Based

Click on any history row to view detailed error logs and deduplication summaries for that run.

Troubleshooting

Common Sync Errors

ErrorCauseResolution
Authentication failedCredentials expired or were revokedEdit the connection and re-enter credentials; run Test Connection to verify
Count endpoint returned 0The count endpoint is misconfigured or the API has no matching recordsVerify the count endpoint URL and count response path; check API filters
Field mapping error: topic_name is requiredThe mapped source field does not exist in the API responseVerify the field mapping -- the source field name must exactly match the API response field
Response path returned no recordsThe configured response path does not match the API response structureTest the API response manually and verify the JSON path (e.g., data.segments vs. results)
Page fetch timeoutThe external API did not respond within the timeout windowCheck the API's health; retry the sync -- the page that timed out will be re-attempted
Classification failed for itemA specific record could not be classified (e.g., empty topic name)Review the error log for the specific record; verify the source data quality

Retry Behavior

The sync client automatically retries failed pages up to 3 times with exponential backoff. If a page fails all retries, the error is logged and the sync continues to the next page. The overall sync is marked as "Completed" (with errors) rather than "Failed" unless all pages fail.

tip

If a sync consistently fails on certain pages, examine those specific records in the source API. Common causes include malformed data, extremely long field values, or special characters that break JSON parsing.

Verifying Sync Results

After a sync completes:

  1. Check the sync history for error counts and review any errors
  2. Navigate to your library and filter by the source name to see newly synced topics
  3. Spot-check a few topics to verify that field mappings produced correct values
  4. If using incremental sync, verify that the incremental field value was stored for the next run

Next Steps

  • Segment Activations -- Push your synced topics to DSPs as activated segments
  • Data Hygiene -- Monitor synced topics for outdated classifications or missing data
  • Topic Catalog -- Browse the global catalog to see how your synced topics fit into the broader taxonomy