Import API
The Import API implements a chunked upload pipeline for bulk-importing topics from CSV files into your organization's library. The architecture supports up to 50,000 rows per import with automatic classification, embedding-based deduplication, and optional AI-powered classification. Imports are client-driven: the server provides batch metadata and chunk processing, while the client orchestrates the upload loop.
Import Flow Overview
1. POST /api/import -- Create batch (metadata only)
Returns: batchId, chunksTotal, chunkSize
2. POST /api/import/:batchId/chunk -- Upload each chunk sequentially
Send: chunkIndex, rows (max 500), mappings
Returns: successCount, duplicateCount, errors
3. GET /api/import/:batchId/status -- Poll progress (optional)
4. PATCH /api/import/:batchId/status -- Cancel import (optional)
Send: { status: "cancelled" }
The client parses the CSV locally and sends rows in chunks of 500. Each chunk is classified, deduplicated, and inserted server-side within a 60-second timeout. Use the status endpoint to poll progress or resume after a network interruption.
Create Import Batch
Initialize a new import batch with metadata. No rows are sent in this request.
POST /api/import
Auth: API key with topics:write scope, or Clerk session with import permission
Request Body
{
"filename": "audience-segments-2026.csv",
"totalRows": 2500,
"mappings": [
{ "csvColumn": "Segment Name", "targetField": "topic_name" },
{ "csvColumn": "Category", "targetField": "parent_category" },
{ "csvColumn": "Type", "targetField": "segment_type" }
],
"useLLM": false
}
| Field | Type | Required | Description |
|---|---|---|---|
filename | string | Yes | Original CSV filename |
totalRows | number | Yes | Total row count (max 50,000) |
mappings | array | Yes | Column mapping from CSV headers to target fields |
useLLM | boolean | No | Enable AI-powered classification for new topics (default: false) |
Column mapping format:
Each mapping object maps a CSV column header to a target field:
| Target Field | Description |
|---|---|
topic_name | Required -- The topic/segment name |
parent_category | Parent category (41 types) |
taxonomy_type | Taxonomy type group (13 groups) |
subcategory | Subcategory |
segment_type | B2B, B2C, B2B2C, B2E, B2G |
external_id | External system identifier |
keywords | Comma-separated keywords |
Response (200)
{
"batchId": "batch_1740000000000_x7k2m9",
"chunksTotal": 5,
"chunkSize": 500
}
Error Responses
| Status | Body | Description |
|---|---|---|
| 400 | {"error": "filename is required"} | Missing required field |
| 400 | {"error": "Maximum 50,000 rows allowed"} | Row count exceeds limit |
| 400 | {"error": "mappings must be a non-empty array"} | Invalid mappings |
List Import Batches
List recent import batches for your organization.
GET /api/import
Auth: API key with topics:read scope, or Clerk session
Query Parameters
| Parameter | Type | Default | Description |
|---|---|---|---|
limit | number | 20 | Max batches to return (max 100) |
offset | number | 0 | Pagination offset |
Response (200)
[
{
"id": "batch_1740000000000_x7k2m9",
"filename": "audience-segments-2026.csv",
"status": "completed",
"total_rows": 2500,
"processed_rows": 2500,
"success_count": 2100,
"error_count": 50,
"duplicate_count": 300,
"adopted_count": 50,
"chunks_total": 5,
"chunks_completed": 5,
"use_llm": false,
"created_at": "2026-02-25T10:00:00.000Z",
"completed_at": "2026-02-25T10:02:30.000Z"
}
]
Process a Chunk
Upload and process a single chunk of CSV rows. Each chunk is classified, deduplicated against existing topics, and inserted into the database.
POST /api/import/:batchId/chunk
Auth: API key with topics:write scope, or Clerk session with import permission
Max duration: 60 seconds
Path Parameters
| Parameter | Type | Description |
|---|---|---|
batchId | string | The batch ID from the create step |
Request Body
{
"chunkIndex": 0,
"rows": [
{ "Segment Name": "Tesla Model 3 Buyers", "Category": "Auto", "Type": "B2C" },
{ "Segment Name": "BMW X5 Shoppers", "Category": "Auto", "Type": "B2C" }
],
"mappings": [
{ "csvColumn": "Segment Name", "targetField": "topic_name" },
{ "csvColumn": "Category", "targetField": "parent_category" },
{ "csvColumn": "Type", "targetField": "segment_type" }
]
}
| Field | Type | Required | Description |
|---|---|---|---|
chunkIndex | number | Yes | Zero-based chunk index |
rows | object[] | Yes | Array of row objects (CSV column name to value) |
mappings | array | Yes | Same column mappings from the create step |
Response (200)
{
"chunkIndex": 0,
"successCount": 450,
"errorCount": 5,
"duplicateCount": 40,
"adoptedCount": 5,
"updatedCount": 3,
"newTopicIds": ["ot_01...", "ot_02..."],
"errors": [
{ "row": 12, "message": "topic_name is empty" },
{ "row": 47, "message": "Classification failed" }
]
}
Deduplication Pipeline
Each chunk goes through a multi-stage deduplication pipeline:
- Classification -- Each row is classified using the 7-layer engine
- Embedding generation -- 256-dim hash embeddings for similarity comparison
- Exact name dedup -- Checks against existing org topics by normalized name
- Global embedding search -- Finds matching global topics (0.95 cosine similarity threshold)
- Intra-chunk dedup -- Prevents duplicate rows within the same chunk
- Batch insert -- New topics are inserted; existing global matches are adopted (linked to your org)
Idempotency
Chunks track completion via chunks_completed. If a chunk is re-sent with a chunkIndex that was already processed, the server returns a skipped response:
{
"chunkIndex": 0,
"successCount": 0,
"errorCount": 0,
"duplicateCount": 0,
"adoptedCount": 0,
"errors": [],
"skipped": true
}
Use the skipped flag to safely retry failed chunk uploads without creating duplicate data. The client should implement exponential backoff with up to 3 retries per chunk.
LLM Classification
When the batch was created with useLLM: true, new-to-global topics are reclassified through Claude (capped at 10 LLM calls per chunk to avoid timeouts). Topics that match existing global entries or duplicates skip LLM classification entirely.
Check Import Status
Poll the current status and progress of an import batch.
GET /api/import/:batchId/status
Auth: API key with topics:read scope, or Clerk session
Response (200)
{
"id": "batch_1740000000000_x7k2m9",
"filename": "audience-segments-2026.csv",
"status": "processing",
"total_rows": 2500,
"processed_rows": 1500,
"success_count": 1350,
"error_count": 30,
"duplicate_count": 120,
"adopted_count": 25,
"updated_count": 10,
"chunks_total": 5,
"chunks_completed": 3,
"use_llm": false,
"created_at": "2026-02-25T10:00:00.000Z",
"completed_at": null
}
Status values: processing, completed, cancelled, failed
Cancel an Import
Cancel a running import. Only batches with processing status can be cancelled.
PATCH /api/import/:batchId/status
Auth: API key with topics:write scope, or Clerk session
Request Body
{
"status": "cancelled"
}
Response (200)
{
"batchId": "batch_1740000000000_x7k2m9",
"status": "cancelled"
}
Error Responses
| Status | Body | Description |
|---|---|---|
| 404 | {"error": "Batch not found"} | Invalid batch ID |
| 409 | {"error": "Batch is completed, can only cancel/fail a processing batch"} | Batch already finished |
Taxonomy Path Auto-Parsing
The import pipeline automatically detects taxonomy paths in topic names. Paths like "Provider > Category > Subcategory > Topic Name" are parsed to extract:
- Provider prefix -- Stripped (e.g., "Data Alliance >")
- Structural segments -- Stripped (e.g., "Audiences >", "Segments >")
- Taxonomy fields -- Extracted into
parent_category,subcategory - Leaf topic -- Used as the
topic_name
Post-Import Review
After an import completes, the chatbot enters the import_review step where you can review each imported topic with quick actions:
- Keep -- Accept the topic as-is
- Skip -- Remove the topic from your library
- Rename -- Edit the topic name
- Fix All Names -- Batch action to clean all topic names using AI
TypeScript Example
async function importCSV(
file: File,
mappings: { csvColumn: string; targetField: string }[]
) {
const rows = parseCSV(file); // Your CSV parser
const CHUNK_SIZE = 500;
// Step 1: Create batch
const batchRes = await fetch("https://app.audiencegpt.com/api/import", {
method: "POST",
headers: {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({
filename: file.name,
totalRows: rows.length,
mappings,
useLLM: false,
}),
});
const { batchId, chunksTotal } = await batchRes.json();
// Step 2: Upload chunks sequentially
for (let i = 0; i < chunksTotal; i++) {
const chunk = rows.slice(i * CHUNK_SIZE, (i + 1) * CHUNK_SIZE);
const chunkRes = await fetch(
`https://app.audiencegpt.com/api/import/${batchId}/chunk`,
{
method: "POST",
headers: {
"Authorization": `Bearer ${API_KEY}`,
"Content-Type": "application/json",
},
body: JSON.stringify({ chunkIndex: i, rows: chunk, mappings }),
}
);
const result = await chunkRes.json();
console.log(`Chunk ${i}: ${result.successCount} created, ${result.duplicateCount} dupes`);
}
// Step 3: Check final status
const statusRes = await fetch(
`https://app.audiencegpt.com/api/import/${batchId}/status`,
{ headers: { "Authorization": `Bearer ${API_KEY}` } }
);
return statusRes.json();
}
Next Steps
- Topics API -- Manage imported topics
- Classify API -- Understand the classification pipeline
- Matrix Generation -- Generate combinatorial topics
- Catalog API -- Add pre-classified topics from the global catalog