Sync (Pull-based Import)
Sync is the process of pulling audience topic data from an external API into your AudienceGPT library. Unlike CSV import (which uploads a static file), sync connects to a live API endpoint, fetches records page by page, classifies each one through the AudienceGPT engine, deduplicates against your existing topics, and inserts the results into your library. Sync operations are client-orchestrated, giving you real-time progress visibility and the ability to cancel at any point.
Sync Overview
The sync workflow follows this sequence:
- Configure an inbound connection with the API endpoint, authentication, field mappings, and pagination settings
- Trigger a sync run from the connection detail page
- Fetch records from the external API page by page
- Classify each record through the AudienceGPT classification engine (AI-powered or rule-based)
- Deduplicate against existing global and library topics
- Insert new topics and enrich existing duplicates with metadata updates
- Monitor progress and review the results
Inbound Connection → Sync Now → Create Run + Fetch Count
→ Loop: Fetch Page → Classify → Dedup → Insert
→ Poll Status → Complete / Cancel
Setting Up an Inbound Connection
Before you can run a sync, you need an inbound platform connection configured with the details of your external API. See Platform Connections for the full connection creation guide. The key configuration fields for sync are described below.
API Configuration
| Field | Description | Example |
|---|---|---|
| Base URL | Root URL of the external API | https://api.example.com/v2/segments |
| Response Path | JSON path to the array of records in the API response | data.segments or results |
| Count Endpoint | Separate endpoint (or same endpoint with count param) that returns total record count | https://api.example.com/v2/segments/count |
| Count Response Path | JSON path to the total count in the count response | data.total or count |
| Extra Headers | Additional HTTP headers to include in every request | {"X-Workspace-Id": "abc123"} |
| Filters | Key-value pairs appended as query parameters | {"status": "active", "type": "audience"} |
Field Mappings
Field mappings tell AudienceGPT how to translate fields from the external API response into topic attributes. Each mapping pairs a source field name (from the API) with a target field name (in AudienceGPT).
| Source Field (API) | Target Field (AudienceGPT) | Description |
|---|---|---|
name | topic_name | The audience segment name (required) |
id | external_id | The source record's unique identifier |
category | parent_category | Pre-mapped taxonomy category |
description | description | Free-text description of the segment |
segment_type | segment_type | B2B or B2C designation |
At minimum, you must map a source field to topic_name. The external_id mapping is strongly recommended, as it enables deduplication by source ID and supports incremental sync.
Map the source system's unique record ID to external_id. This allows AudienceGPT to recognize previously synced records during subsequent sync runs, preventing duplicates and enabling metadata enrichment on existing topics.
Pagination Configuration
AudienceGPT supports two pagination strategies for fetching records from external APIs:
| Strategy | How It Works | When to Use |
|---|---|---|
| Offset | Uses limit and offset query parameters to page through results | Most REST APIs with standard pagination |
| Cursor | Uses a cursor token from the previous response to fetch the next page | APIs that use cursor-based pagination (e.g., some GraphQL endpoints) |
Configure the pagination parameters:
| Parameter | Description | Default |
|---|---|---|
| Limit Param | Query parameter name for page size | limit |
| Offset Param | Query parameter name for offset (offset mode) | offset |
| Cursor Param | Query parameter name for cursor token (cursor mode) | cursor |
| Page Size | Number of records per page | 100 |
Sync Mode
| Mode | Behavior | Use Case |
|---|---|---|
| Full | Fetches all records from the API on every sync run | First-time sync or when you want a complete refresh |
| Incremental | Only fetches records created or updated since the last sync | Ongoing syncs to pick up new records without re-processing the entire dataset |
For incremental sync, specify the Incremental Field -- the API field that contains a timestamp or version number indicating when the record was last modified. AudienceGPT stores the latest value from each sync run and uses it as a filter on the next run.
Source ID Field
The Source ID Field tells AudienceGPT which field in the API response contains the unique record identifier. This value is stored as external_id on the topic and used for deduplication during subsequent syncs.
Running a Sync
Triggering a Sync
Navigate to the Sync page (/sync) or open your inbound connection's detail view. Click Sync Now to start a new sync run.
Before fetching records, you will be prompted to choose a classification mode:
| Mode | Description | Cost | Speed |
|---|---|---|---|
| AI-Powered | Uses the Anthropic API with web search to classify new topics | Per-token pricing applies | Slower (API calls per topic, capped per page) |
| Rule-Based | Uses the deterministic local classification engine | Free | Fast |
The classification mode only applies to topics that are new to the global catalog. Topics that already exist globally (matched by name or embedding similarity) are adopted into your library with their existing classification, regardless of the mode you select.
Page-by-Page Processing
After triggering the sync, AudienceGPT creates a sync run record and fetches the total record count from the count endpoint. Then it processes records page by page:
- Fetch page: Requests a page of records from the external API using the configured pagination
- Map fields: Applies your field mappings to transform each record into AudienceGPT's topic format
- Classify: Runs each new topic through the classification engine (AI-powered or rule-based, based on your selection)
- Deduplicate: Checks each topic against existing global topics using:
- External ID match: If the topic's external ID already exists, it is recognized as a duplicate
- Name match: Exact topic name comparison
- Embedding similarity: 256-dimensional hash embeddings with cosine similarity (95% threshold blocks, 75% threshold warns)
- Insert or enrich: New topics are inserted into the global catalog and linked to your library. Duplicate topics enrich the existing record's metadata via COALESCE (e.g., updating
external_idorsourceif the existing record was missing those values) - Advance: Moves to the next page until all pages are processed or the sync is cancelled
Each page is processed as a self-contained unit. If a page fails, the sync can retry that page without re-processing successful pages.
Monitoring Progress
While a sync is running, the sync progress panel displays real-time statistics:
| Metric | Description |
|---|---|
| Status | Current run status: Processing, Completed, Failed, or Cancelled |
| Pages | Pages completed out of total pages (e.g., "5 / 12") |
| Processed | Total records processed across all completed pages |
| New | Records that were classified and inserted as new topics |
| Duplicates | Records that matched existing topics (enriched metadata) |
| Adopted | Records that matched global topics and were linked to your library |
| Errors | Records that failed classification or insertion |
Cancelling a Sync
Click Cancel on the sync progress panel to stop the run. Cancellation is graceful:
- The currently processing page completes (it is not interrupted mid-page)
- No further pages are fetched
- All topics already inserted remain in your library
- The sync run is marked as "Cancelled" in history
Cancelling a sync does not roll back topics that were already inserted. If you need to remove synced topics, you must do so manually from your library.
Duplicate Handling
AudienceGPT uses a multi-layered deduplication strategy during sync to avoid creating redundant topics while still enriching existing records with new metadata.
Deduplication Layers
-
External ID match: The fastest check. If the incoming record's external ID matches an existing topic's
external_id, they are the same record. The existing topic is enriched with any new metadata from the incoming record. -
Name match: Exact string comparison of topic names. If a topic with the same name already exists globally, the incoming record adopts the existing classification rather than reclassifying.
-
Embedding similarity: For records that pass name matching, AudienceGPT computes a 256-dimensional hash embedding and compares it against existing topics using cosine similarity:
- 95% or higher: Blocked as a duplicate -- the existing topic is used
- 75% to 94%: Warning-level similarity -- still treated as a new topic but flagged
Metadata Enrichment via COALESCE
When a duplicate is detected, AudienceGPT does not silently skip the record. Instead, it applies a COALESCE enrichment pattern: for each metadata field, if the existing topic's value is NULL and the incoming record has a value, the existing topic is updated. This means syncing the same source multiple times progressively fills in metadata without overwriting existing data.
Fields eligible for COALESCE enrichment include:
external_id(source record ID)source(origin system name)descriptioniab_codeaudience_type
Sync History
Each sync run is recorded and accessible from the connection detail page. The sync history table shows:
| Column | Description |
|---|---|
| Run ID | Unique identifier for the sync run |
| Started | Timestamp when the sync was triggered |
| Completed | Timestamp when the sync finished (or was cancelled) |
| Status | Final status: Completed, Failed, or Cancelled |
| Total Items | Total records discovered in the external API |
| Processed | Records successfully processed |
| New | New topics created |
| Duplicates | Duplicate records (metadata enriched) |
| Adopted | Global topics linked to your library |
| Errors | Records that failed processing |
| Classification Mode | AI-Powered or Rule-Based |
Click on any history row to view detailed error logs and deduplication summaries for that run.
Troubleshooting
Common Sync Errors
| Error | Cause | Resolution |
|---|---|---|
| Authentication failed | Credentials expired or were revoked | Edit the connection and re-enter credentials; run Test Connection to verify |
| Count endpoint returned 0 | The count endpoint is misconfigured or the API has no matching records | Verify the count endpoint URL and count response path; check API filters |
| Field mapping error: topic_name is required | The mapped source field does not exist in the API response | Verify the field mapping -- the source field name must exactly match the API response field |
| Response path returned no records | The configured response path does not match the API response structure | Test the API response manually and verify the JSON path (e.g., data.segments vs. results) |
| Page fetch timeout | The external API did not respond within the timeout window | Check the API's health; retry the sync -- the page that timed out will be re-attempted |
| Classification failed for item | A specific record could not be classified (e.g., empty topic name) | Review the error log for the specific record; verify the source data quality |
Retry Behavior
The sync client automatically retries failed pages up to 3 times with exponential backoff. If a page fails all retries, the error is logged and the sync continues to the next page. The overall sync is marked as "Completed" (with errors) rather than "Failed" unless all pages fail.
If a sync consistently fails on certain pages, examine those specific records in the source API. Common causes include malformed data, extremely long field values, or special characters that break JSON parsing.
Verifying Sync Results
After a sync completes:
- Check the sync history for error counts and review any errors
- Navigate to your library and filter by the source name to see newly synced topics
- Spot-check a few topics to verify that field mappings produced correct values
- If using incremental sync, verify that the incremental field value was stored for the next run
Next Steps
- Segment Activations -- Push your synced topics to DSPs as activated segments
- Data Hygiene -- Monitor synced topics for outdated classifications or missing data
- Topic Catalog -- Browse the global catalog to see how your synced topics fit into the broader taxonomy