Sync (Pull-based Import)

Sync is the process of pulling audience topic data from an external API into your AudienceGPT library. Unlike CSV import (which uploads a static file), sync connects to a live API endpoint, fetches records page by page, classifies each one through the AudienceGPT engine, deduplicates against your existing topics, and inserts the results into your library. Sync operations are client-orchestrated, giving you real-time progress visibility and the ability to cancel at any point.

Sync Overview

The sync workflow follows this sequence:

Configure an inbound connection with the API endpoint, authentication, field mappings, and pagination settings
Trigger a sync run from the connection detail page
Fetch records from the external API page by page
Classify each record through the AudienceGPT classification engine (AI-powered or rule-based)
Deduplicate against existing global and library topics
Insert new topics and enrich existing duplicates with metadata updates
Monitor progress and review the results

Inbound Connection → Sync Now → Create Run + Fetch Count
  → Loop: Fetch Page → Classify → Dedup → Insert
  → Poll Status → Complete / Cancel

Setting Up an Inbound Connection

Before you can run a sync, you need an inbound platform connection configured with the details of your external API. See Platform Connections for the full connection creation guide. The key configuration fields for sync are described below.

API Configuration

Field	Description	Example
Base URL	Root URL of the external API	`https://api.example.com/v2/segments`
Response Path	JSON path to the array of records in the API response	`data.segments` or `results`
Count Endpoint	Separate endpoint (or same endpoint with count param) that returns total record count	`https://api.example.com/v2/segments/count`
Count Response Path	JSON path to the total count in the count response	`data.total` or `count`
Extra Headers	Additional HTTP headers to include in every request	`{"X-Workspace-Id": "abc123"}`
Filters	Key-value pairs appended as query parameters	`{"status": "active", "type": "audience"}`

Field Mappings

Field mappings tell AudienceGPT how to translate fields from the external API response into topic attributes. Each mapping pairs a source field name (from the API) with a target field name (in AudienceGPT).

Source Field (API)	Target Field (AudienceGPT)	Description
`name`	`topic_name`	The audience segment name (required)
`id`	`external_id`	The source record's unique identifier
`category`	`parent_category`	Pre-mapped taxonomy category
`description`	`description`	Free-text description of the segment
`segment_type`	`segment_type`	B2B or B2C designation

At minimum, you must map a source field to topic_name. The external_id mapping is strongly recommended, as it enables deduplication by source ID and supports incremental sync.

tip

Map the source system's unique record ID to external_id. This allows AudienceGPT to recognize previously synced records during subsequent sync runs, preventing duplicates and enabling metadata enrichment on existing topics.

Pagination Configuration

AudienceGPT supports two pagination strategies for fetching records from external APIs:

Strategy	How It Works	When to Use
Offset	Uses `limit` and `offset` query parameters to page through results	Most REST APIs with standard pagination
Cursor	Uses a cursor token from the previous response to fetch the next page	APIs that use cursor-based pagination (e.g., some GraphQL endpoints)

Configure the pagination parameters:

Parameter	Description	Default
Limit Param	Query parameter name for page size	`limit`
Offset Param	Query parameter name for offset (offset mode)	`offset`
Cursor Param	Query parameter name for cursor token (cursor mode)	`cursor`
Page Size	Number of records per page	100

Sync Mode

Mode	Behavior	Use Case
Full	Fetches all records from the API on every sync run	First-time sync or when you want a complete refresh
Incremental	Only fetches records created or updated since the last sync	Ongoing syncs to pick up new records without re-processing the entire dataset

For incremental sync, specify the Incremental Field -- the API field that contains a timestamp or version number indicating when the record was last modified. AudienceGPT stores the latest value from each sync run and uses it as a filter on the next run.

Source ID Field

The Source ID Field tells AudienceGPT which field in the API response contains the unique record identifier. This value is stored as external_id on the topic and used for deduplication during subsequent syncs.

Running a Sync

Triggering a Sync

Navigate to the Sync page (/sync) or open your inbound connection's detail view. Click Sync Now to start a new sync run.

Before fetching records, you will be prompted to choose a classification mode:

Mode	Description	Cost	Speed
AI-Powered	Uses the Anthropic API with web search to classify new topics	Per-token pricing applies	Slower (API calls per topic, capped per page)
Rule-Based	Uses the deterministic local classification engine	Free	Fast

info

The classification mode only applies to topics that are new to the global catalog. Topics that already exist globally (matched by name or embedding similarity) are adopted into your library with their existing classification, regardless of the mode you select.

Page-by-Page Processing

After triggering the sync, AudienceGPT creates a sync run record and fetches the total record count from the count endpoint. Then it processes records page by page:

Fetch page: Requests a page of records from the external API using the configured pagination
Map fields: Applies your field mappings to transform each record into AudienceGPT's topic format
Classify: Runs each new topic through the classification engine (AI-powered or rule-based, based on your selection)
Deduplicate: Checks each topic against existing global topics using:
- External ID match: If the topic's external ID already exists, it is recognized as a duplicate
- Name match: Exact topic name comparison
- Embedding similarity: 256-dimensional hash embeddings with cosine similarity (95% threshold blocks, 75% threshold warns)
Insert or enrich: New topics are inserted into the global catalog and linked to your library. Duplicate topics enrich the existing record's metadata via COALESCE (e.g., updating external_id or source if the existing record was missing those values)
Advance: Moves to the next page until all pages are processed or the sync is cancelled

Each page is processed as a self-contained unit. If a page fails, the sync can retry that page without re-processing successful pages.

Monitoring Progress

While a sync is running, the sync progress panel displays real-time statistics:

Metric	Description
Status	Current run status: Processing, Completed, Failed, or Cancelled
Pages	Pages completed out of total pages (e.g., "5 / 12")
Processed	Total records processed across all completed pages
New	Records that were classified and inserted as new topics
Duplicates	Records that matched existing topics (enriched metadata)
Adopted	Records that matched global topics and were linked to your library
Errors	Records that failed classification or insertion

Cancelling a Sync

Click Cancel on the sync progress panel to stop the run. Cancellation is graceful:

The currently processing page completes (it is not interrupted mid-page)
No further pages are fetched
All topics already inserted remain in your library
The sync run is marked as "Cancelled" in history

warning

Cancelling a sync does not roll back topics that were already inserted. If you need to remove synced topics, you must do so manually from your library.

Duplicate Handling

AudienceGPT uses a multi-layered deduplication strategy during sync to avoid creating redundant topics while still enriching existing records with new metadata.

Deduplication Layers

External ID match: The fastest check. If the incoming record's external ID matches an existing topic's external_id, they are the same record. The existing topic is enriched with any new metadata from the incoming record.
Name match: Exact string comparison of topic names. If a topic with the same name already exists globally, the incoming record adopts the existing classification rather than reclassifying.
Embedding similarity: For records that pass name matching, AudienceGPT computes a 256-dimensional hash embedding and compares it against existing topics using cosine similarity:
- 95% or higher: Blocked as a duplicate -- the existing topic is used
- 75% to 94%: Warning-level similarity -- still treated as a new topic but flagged

Metadata Enrichment via COALESCE

When a duplicate is detected, AudienceGPT does not silently skip the record. Instead, it applies a COALESCE enrichment pattern: for each metadata field, if the existing topic's value is NULL and the incoming record has a value, the existing topic is updated. This means syncing the same source multiple times progressively fills in metadata without overwriting existing data.

Fields eligible for COALESCE enrichment include:

external_id (source record ID)
source (origin system name)
description
iab_code
audience_type

Sync History

Each sync run is recorded and accessible from the connection detail page. The sync history table shows:

Column	Description
Run ID	Unique identifier for the sync run
Started	Timestamp when the sync was triggered
Completed	Timestamp when the sync finished (or was cancelled)
Status	Final status: Completed, Failed, or Cancelled
Total Items	Total records discovered in the external API
Processed	Records successfully processed
New	New topics created
Duplicates	Duplicate records (metadata enriched)
Adopted	Global topics linked to your library
Errors	Records that failed processing
Classification Mode	AI-Powered or Rule-Based

Click on any history row to view detailed error logs and deduplication summaries for that run.

Troubleshooting

Common Sync Errors

Error	Cause	Resolution
Authentication failed	Credentials expired or were revoked	Edit the connection and re-enter credentials; run Test Connection to verify
Count endpoint returned 0	The count endpoint is misconfigured or the API has no matching records	Verify the count endpoint URL and count response path; check API filters
Field mapping error: topic_name is required	The mapped source field does not exist in the API response	Verify the field mapping -- the source field name must exactly match the API response field
Response path returned no records	The configured response path does not match the API response structure	Test the API response manually and verify the JSON path (e.g., `data.segments` vs. `results`)
Page fetch timeout	The external API did not respond within the timeout window	Check the API's health; retry the sync -- the page that timed out will be re-attempted
Classification failed for item	A specific record could not be classified (e.g., empty topic name)	Review the error log for the specific record; verify the source data quality

Retry Behavior

The sync client automatically retries failed pages up to 3 times with exponential backoff. If a page fails all retries, the error is logged and the sync continues to the next page. The overall sync is marked as "Completed" (with errors) rather than "Failed" unless all pages fail.

tip

If a sync consistently fails on certain pages, examine those specific records in the source API. Common causes include malformed data, extremely long field values, or special characters that break JSON parsing.

Verifying Sync Results

After a sync completes:

Check the sync history for error counts and review any errors
Navigate to your library and filter by the source name to see newly synced topics
Spot-check a few topics to verify that field mappings produced correct values
If using incremental sync, verify that the incremental field value was stored for the next run

Next Steps

Segment Activations -- Push your synced topics to DSPs as activated segments
Data Hygiene -- Monitor synced topics for outdated classifications or missing data
Topic Catalog -- Browse the global catalog to see how your synced topics fit into the broader taxonomy

Sync Overview​

Setting Up an Inbound Connection​

API Configuration​

Field Mappings​

Pagination Configuration​

Sync Mode​

Source ID Field​

Running a Sync​

Triggering a Sync​

Page-by-Page Processing​

Monitoring Progress​

Cancelling a Sync​

Duplicate Handling​

Deduplication Layers​

Metadata Enrichment via COALESCE​

Sync History​

Troubleshooting​

Common Sync Errors​

Retry Behavior​

Verifying Sync Results​

Next Steps​