Skip to main content

Import API

The Import API implements a chunked upload pipeline for bulk-importing topics from CSV files into your organization's library. The architecture supports up to 50,000 rows per import with automatic classification, embedding-based deduplication, and optional AI-powered classification. Imports are client-driven: the server provides batch metadata and chunk processing, while the client orchestrates the upload loop.

Import Flow Overview

1. POST /api/import                    -- Create batch (metadata only)
Returns: batchId, chunksTotal, chunkSize

2. POST /api/import/:batchId/chunk -- Upload each chunk sequentially
Send: chunkIndex, rows (max 500), mappings
Returns: successCount, duplicateCount, errors

3. GET /api/import/:batchId/status -- Poll progress (optional)

4. PATCH /api/import/:batchId/status -- Cancel import (optional)
Send: { status: "cancelled" }
tip

The client parses the CSV locally and sends rows in chunks of 500. Each chunk is classified, deduplicated, and inserted server-side within a 60-second timeout. Use the status endpoint to poll progress or resume after a network interruption.


Create Import Batch

Initialize a new import batch with metadata. No rows are sent in this request.

POST /api/import

Auth: API key with topics:write scope, or Clerk session with import permission

Request Body

{
"filename": "audience-segments-2026.csv",
"totalRows": 2500,
"mappings": [
{ "csvColumn": "Segment Name", "targetField": "topic_name" },
{ "csvColumn": "Category", "targetField": "parent_category" },
{ "csvColumn": "Type", "targetField": "segment_type" }
],
"useLLM": false
}
FieldTypeRequiredDescription
filenamestringYesOriginal CSV filename
totalRowsnumberYesTotal row count (max 50,000)
mappingsarrayYesColumn mapping from CSV headers to target fields
useLLMbooleanNoEnable AI-powered classification for new topics (default: false)

Column mapping format:

Each mapping object maps a CSV column header to a target field:

Target FieldDescription
topic_nameRequired -- The topic/segment name
parent_categoryParent category (41 types)
taxonomy_typeTaxonomy type group (13 groups)
subcategorySubcategory
segment_typeB2B, B2C, B2B2C, B2E, B2G
external_idExternal system identifier
keywordsComma-separated keywords

Response (200)

{
"batchId": "batch_1740000000000_x7k2m9",
"chunksTotal": 5,
"chunkSize": 500
}

Error Responses

StatusBodyDescription
400{"error": "filename is required"}Missing required field
400{"error": "Maximum 50,000 rows allowed"}Row count exceeds limit
400{"error": "mappings must be a non-empty array"}Invalid mappings

List Import Batches

List recent import batches for your organization.

GET /api/import

Auth: API key with topics:read scope, or Clerk session

Query Parameters

ParameterTypeDefaultDescription
limitnumber20Max batches to return (max 100)
offsetnumber0Pagination offset

Response (200)

[
{
"id": "batch_1740000000000_x7k2m9",
"filename": "audience-segments-2026.csv",
"status": "completed",
"total_rows": 2500,
"processed_rows": 2500,
"success_count": 2100,
"error_count": 50,
"duplicate_count": 300,
"adopted_count": 50,
"chunks_total": 5,
"chunks_completed": 5,
"use_llm": false,
"created_at": "2026-02-25T10:00:00.000Z",
"completed_at": "2026-02-25T10:02:30.000Z"
}
]

Process a Chunk

Upload and process a single chunk of CSV rows. Each chunk is classified, deduplicated against existing topics, and inserted into the database.

POST /api/import/:batchId/chunk

Auth: API key with topics:write scope, or Clerk session with import permission

Max duration: 60 seconds

Path Parameters

ParameterTypeDescription
batchIdstringThe batch ID from the create step

Request Body

{
"chunkIndex": 0,
"rows": [
{ "Segment Name": "Tesla Model 3 Buyers", "Category": "Auto", "Type": "B2C" },
{ "Segment Name": "BMW X5 Shoppers", "Category": "Auto", "Type": "B2C" }
],
"mappings": [
{ "csvColumn": "Segment Name", "targetField": "topic_name" },
{ "csvColumn": "Category", "targetField": "parent_category" },
{ "csvColumn": "Type", "targetField": "segment_type" }
]
}
FieldTypeRequiredDescription
chunkIndexnumberYesZero-based chunk index
rowsobject[]YesArray of row objects (CSV column name to value)
mappingsarrayYesSame column mappings from the create step

Response (200)

{
"chunkIndex": 0,
"successCount": 450,
"errorCount": 5,
"duplicateCount": 40,
"adoptedCount": 5,
"updatedCount": 3,
"newTopicIds": ["ot_01...", "ot_02..."],
"errors": [
{ "row": 12, "message": "topic_name is empty" },
{ "row": 47, "message": "Classification failed" }
]
}

Deduplication Pipeline

Each chunk goes through a multi-stage deduplication pipeline:

  1. Classification -- Each row is classified using the 7-layer engine
  2. Embedding generation -- 256-dim hash embeddings for similarity comparison
  3. Exact name dedup -- Checks against existing org topics by normalized name
  4. Global embedding search -- Finds matching global topics (0.95 cosine similarity threshold)
  5. Intra-chunk dedup -- Prevents duplicate rows within the same chunk
  6. Batch insert -- New topics are inserted; existing global matches are adopted (linked to your org)

Idempotency

Chunks track completion via chunks_completed. If a chunk is re-sent with a chunkIndex that was already processed, the server returns a skipped response:

{
"chunkIndex": 0,
"successCount": 0,
"errorCount": 0,
"duplicateCount": 0,
"adoptedCount": 0,
"errors": [],
"skipped": true
}
tip

Use the skipped flag to safely retry failed chunk uploads without creating duplicate data. The client should implement exponential backoff with up to 3 retries per chunk.

LLM Classification

When the batch was created with useLLM: true, new-to-global topics are reclassified through Claude (capped at 10 LLM calls per chunk to avoid timeouts). Topics that match existing global entries or duplicates skip LLM classification entirely.


Check Import Status

Poll the current status and progress of an import batch.

GET /api/import/:batchId/status

Auth: API key with topics:read scope, or Clerk session

Response (200)

{
"id": "batch_1740000000000_x7k2m9",
"filename": "audience-segments-2026.csv",
"status": "processing",
"total_rows": 2500,
"processed_rows": 1500,
"success_count": 1350,
"error_count": 30,
"duplicate_count": 120,
"adopted_count": 25,
"updated_count": 10,
"chunks_total": 5,
"chunks_completed": 3,
"use_llm": false,
"created_at": "2026-02-25T10:00:00.000Z",
"completed_at": null
}

Status values: processing, completed, cancelled, failed


Cancel an Import

Cancel a running import. Only batches with processing status can be cancelled.

PATCH /api/import/:batchId/status

Auth: API key with topics:write scope, or Clerk session

Request Body

{
"status": "cancelled"
}

Response (200)

{
"batchId": "batch_1740000000000_x7k2m9",
"status": "cancelled"
}

Error Responses

StatusBodyDescription
404{"error": "Batch not found"}Invalid batch ID
409{"error": "Batch is completed, can only cancel/fail a processing batch"}Batch already finished

Taxonomy Path Auto-Parsing

The import pipeline automatically detects taxonomy paths in topic names. Paths like "Provider > Category > Subcategory > Topic Name" are parsed to extract:

  • Provider prefix -- Stripped (e.g., "Data Alliance >")
  • Structural segments -- Stripped (e.g., "Audiences >", "Segments >")
  • Taxonomy fields -- Extracted into parent_category, subcategory
  • Leaf topic -- Used as the topic_name

Post-Import Review

After an import completes, the chatbot enters the import_review step where you can review each imported topic with quick actions:

  • Keep -- Accept the topic as-is
  • Skip -- Remove the topic from your library
  • Rename -- Edit the topic name
  • Fix All Names -- Batch action to clean all topic names using AI

TypeScript Example

async function importCSV(
file: File,
mappings: { csvColumn: string; targetField: string }[]
) {
const rows = parseCSV(file); // Your CSV parser
const CHUNK_SIZE = 500;

// Step 1: Create batch
const batchRes = await fetch("https://app.audiencegpt.com/api/import", {
method: "POST",
headers: {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
filename: file.name,
totalRows: rows.length,
mappings,
useLLM: false,
}),
});
const { batchId, chunksTotal } = await batchRes.json();

// Step 2: Upload chunks sequentially
for (let i = 0; i < chunksTotal; i++) {
const chunk = rows.slice(i * CHUNK_SIZE, (i + 1) * CHUNK_SIZE);

const chunkRes = await fetch(
`https://app.audiencegpt.com/api/import/${batchId}/chunk`,
{
method: "POST",
headers: {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ chunkIndex: i, rows: chunk, mappings }),
}
);

const result = await chunkRes.json();
console.log(`Chunk ${i}: ${result.successCount} created, ${result.duplicateCount} dupes`);
}

// Step 3: Check final status
const statusRes = await fetch(
`https://app.audiencegpt.com/api/import/${batchId}/status`,
{ headers: { "Authorization": `Bearer ${API_KEY}` } }
);
return statusRes.json();
}

Next Steps