Import API

The Import API implements a chunked upload pipeline for bulk-importing topics from CSV files into your organization's library. The architecture supports up to 50,000 rows per import with automatic classification, embedding-based deduplication, and optional AI-powered classification. Imports are client-driven: the server provides batch metadata and chunk processing, while the client orchestrates the upload loop.

Import Flow Overview

1. POST /api/import                    -- Create batch (metadata only)
   Returns: batchId, chunksTotal, chunkSize

2. POST /api/import/:batchId/chunk     -- Upload each chunk sequentially
   Send: chunkIndex, rows (max 500), mappings
   Returns: successCount, duplicateCount, errors

3. GET /api/import/:batchId/status     -- Poll progress (optional)

4. PATCH /api/import/:batchId/status   -- Cancel import (optional)
   Send: { status: "cancelled" }

tip

The client parses the CSV locally and sends rows in chunks of 500. Each chunk is classified, deduplicated, and inserted server-side within a 60-second timeout. Use the status endpoint to poll progress or resume after a network interruption.

Create Import Batch

Initialize a new import batch with metadata. No rows are sent in this request.

POST /api/import

Auth: API key with topics:write scope, or Clerk session with import permission

Request Body

{
  "filename": "audience-segments-2026.csv",
  "totalRows": 2500,
  "mappings": [
    { "csvColumn": "Segment Name", "targetField": "topic_name" },
    { "csvColumn": "Category", "targetField": "parent_category" },
    { "csvColumn": "Type", "targetField": "segment_type" }
  ],
  "useLLM": false
}

Field	Type	Required	Description
`filename`	string	Yes	Original CSV filename
`totalRows`	number	Yes	Total row count (max 50,000)
`mappings`	array	Yes	Column mapping from CSV headers to target fields
`useLLM`	boolean	No	Enable AI-powered classification for new topics (default: `false`)

Column mapping format:

Each mapping object maps a CSV column header to a target field:

Target Field	Description
`topic_name`	Required -- The topic/segment name
`parent_category`	Parent category (41 types)
`taxonomy_type`	Taxonomy type group (13 groups)
`subcategory`	Subcategory
`segment_type`	B2B, B2C, B2B2C, B2E, B2G
`external_id`	External system identifier
`keywords`	Comma-separated keywords

Response (200)

{
  "batchId": "batch_1740000000000_x7k2m9",
  "chunksTotal": 5,
  "chunkSize": 500
}

Error Responses

Status	Body	Description
400	`{"error": "filename is required"}`	Missing required field
400	`{"error": "Maximum 50,000 rows allowed"}`	Row count exceeds limit
400	`{"error": "mappings must be a non-empty array"}`	Invalid mappings

List Import Batches

List recent import batches for your organization.

GET /api/import

Auth: API key with topics:read scope, or Clerk session

Query Parameters

Parameter	Type	Default	Description
`limit`	number	20	Max batches to return (max 100)
`offset`	number	0	Pagination offset

Response (200)

[
  {
    "id": "batch_1740000000000_x7k2m9",
    "filename": "audience-segments-2026.csv",
    "status": "completed",
    "total_rows": 2500,
    "processed_rows": 2500,
    "success_count": 2100,
    "error_count": 50,
    "duplicate_count": 300,
    "adopted_count": 50,
    "chunks_total": 5,
    "chunks_completed": 5,
    "use_llm": false,
    "created_at": "2026-02-25T10:00:00.000Z",
    "completed_at": "2026-02-25T10:02:30.000Z"
  }
]

Process a Chunk

Upload and process a single chunk of CSV rows. Each chunk is classified, deduplicated against existing topics, and inserted into the database.

POST /api/import/:batchId/chunk

Auth: API key with topics:write scope, or Clerk session with import permission

Max duration: 60 seconds

Path Parameters

Parameter	Type	Description
`batchId`	string	The batch ID from the create step

Request Body

{
  "chunkIndex": 0,
  "rows": [
    { "Segment Name": "Tesla Model 3 Buyers", "Category": "Auto", "Type": "B2C" },
    { "Segment Name": "BMW X5 Shoppers", "Category": "Auto", "Type": "B2C" }
  ],
  "mappings": [
    { "csvColumn": "Segment Name", "targetField": "topic_name" },
    { "csvColumn": "Category", "targetField": "parent_category" },
    { "csvColumn": "Type", "targetField": "segment_type" }
  ]
}

Field	Type	Required	Description
`chunkIndex`	number	Yes	Zero-based chunk index
`rows`	object[]	Yes	Array of row objects (CSV column name to value)
`mappings`	array	Yes	Same column mappings from the create step

Response (200)

{
  "chunkIndex": 0,
  "successCount": 450,
  "errorCount": 5,
  "duplicateCount": 40,
  "adoptedCount": 5,
  "updatedCount": 3,
  "newTopicIds": ["ot_01...", "ot_02..."],
  "errors": [
    { "row": 12, "message": "topic_name is empty" },
    { "row": 47, "message": "Classification failed" }
  ]
}

Deduplication Pipeline

Each chunk goes through a multi-stage deduplication pipeline:

Classification -- Each row is classified using the 7-layer engine
Embedding generation -- 256-dim hash embeddings for similarity comparison
Exact name dedup -- Checks against existing org topics by normalized name
Global embedding search -- Finds matching global topics (0.95 cosine similarity threshold)
Intra-chunk dedup -- Prevents duplicate rows within the same chunk
Batch insert -- New topics are inserted; existing global matches are adopted (linked to your org)

Idempotency

Chunks track completion via chunks_completed. If a chunk is re-sent with a chunkIndex that was already processed, the server returns a skipped response:

{
  "chunkIndex": 0,
  "successCount": 0,
  "errorCount": 0,
  "duplicateCount": 0,
  "adoptedCount": 0,
  "errors": [],
  "skipped": true
}

tip

Use the skipped flag to safely retry failed chunk uploads without creating duplicate data. The client should implement exponential backoff with up to 3 retries per chunk.

LLM Classification

When the batch was created with useLLM: true, new-to-global topics are reclassified through Claude (capped at 10 LLM calls per chunk to avoid timeouts). Topics that match existing global entries or duplicates skip LLM classification entirely.

Check Import Status

Poll the current status and progress of an import batch.

GET /api/import/:batchId/status

Auth: API key with topics:read scope, or Clerk session

Response (200)

{
  "id": "batch_1740000000000_x7k2m9",
  "filename": "audience-segments-2026.csv",
  "status": "processing",
  "total_rows": 2500,
  "processed_rows": 1500,
  "success_count": 1350,
  "error_count": 30,
  "duplicate_count": 120,
  "adopted_count": 25,
  "updated_count": 10,
  "chunks_total": 5,
  "chunks_completed": 3,
  "use_llm": false,
  "created_at": "2026-02-25T10:00:00.000Z",
  "completed_at": null
}

Status values: processing, completed, cancelled, failed

Cancel an Import

Cancel a running import. Only batches with processing status can be cancelled.

PATCH /api/import/:batchId/status

Auth: API key with topics:write scope, or Clerk session

Request Body

{
  "status": "cancelled"
}

Response (200)

{
  "batchId": "batch_1740000000000_x7k2m9",
  "status": "cancelled"
}

Error Responses

Status	Body	Description
404	`{"error": "Batch not found"}`	Invalid batch ID
409	`{"error": "Batch is completed, can only cancel/fail a processing batch"}`	Batch already finished

Taxonomy Path Auto-Parsing

The import pipeline automatically detects taxonomy paths in topic names. Paths like "Provider > Category > Subcategory > Topic Name" are parsed to extract:

Provider prefix -- Stripped (e.g., "Data Alliance >")
Structural segments -- Stripped (e.g., "Audiences >", "Segments >")
Taxonomy fields -- Extracted into parent_category, subcategory
Leaf topic -- Used as the topic_name

Post-Import Review

After an import completes, the chatbot enters the import_review step where you can review each imported topic with quick actions:

Keep -- Accept the topic as-is
Skip -- Remove the topic from your library
Rename -- Edit the topic name
Fix All Names -- Batch action to clean all topic names using AI

TypeScript Example

async function importCSV(
  file: File,
  mappings: { csvColumn: string; targetField: string }[]
) {
  const rows = parseCSV(file); // Your CSV parser
  const CHUNK_SIZE = 500;

  // Step 1: Create batch
  const batchRes = await fetch("https://app.audiencegpt.com/api/import", {
    method: "POST",
    headers: {
      "Authorization": `Bearer ${API_KEY}`,
      "Content-Type": "application/json",
    },
    body: JSON.stringify({
      filename: file.name,
      totalRows: rows.length,
      mappings,
      useLLM: false,
    }),
  });
  const { batchId, chunksTotal } = await batchRes.json();

  // Step 2: Upload chunks sequentially
  for (let i = 0; i < chunksTotal; i++) {
    const chunk = rows.slice(i * CHUNK_SIZE, (i + 1) * CHUNK_SIZE);

    const chunkRes = await fetch(
      `https://app.audiencegpt.com/api/import/${batchId}/chunk`,
      {
        method: "POST",
        headers: {
          "Authorization": `Bearer ${API_KEY}`,
          "Content-Type": "application/json",
        },
        body: JSON.stringify({ chunkIndex: i, rows: chunk, mappings }),
      }
    );

    const result = await chunkRes.json();
    console.log(`Chunk ${i}: ${result.successCount} created, ${result.duplicateCount} dupes`);
  }

  // Step 3: Check final status
  const statusRes = await fetch(
    `https://app.audiencegpt.com/api/import/${batchId}/status`,
    { headers: { "Authorization": `Bearer ${API_KEY}` } }
  );
  return statusRes.json();
}

Next Steps

Topics API -- Manage imported topics
Classify API -- Understand the classification pipeline
Matrix Generation -- Generate combinatorial topics
Catalog API -- Add pre-classified topics from the global catalog

Import Flow Overview​

Create Import Batch​

Request Body​

Response (200)​

Error Responses​

List Import Batches​

Query Parameters​

Response (200)​

Process a Chunk​

Path Parameters​

Request Body​

Response (200)​

Deduplication Pipeline​

Idempotency​

LLM Classification​

Check Import Status​

Response (200)​

Cancel an Import​

Request Body​

Response (200)​

Error Responses​

Taxonomy Path Auto-Parsing​

Post-Import Review​

TypeScript Example​

Next Steps​

Import Flow Overview

Create Import Batch

Request Body

Response (200)

Error Responses

List Import Batches

Query Parameters

Response (200)

Process a Chunk

Path Parameters

Request Body

Response (200)

Deduplication Pipeline

Idempotency

LLM Classification

Check Import Status

Response (200)

Cancel an Import

Request Body

Response (200)

Error Responses

Taxonomy Path Auto-Parsing

Post-Import Review

TypeScript Example

Next Steps