CSV Import

The CSV Import feature (/import) lets you bulk-import audience topics from spreadsheet files. The 4-step wizard guides you through uploading your file, mapping columns, choosing a classification mode, and executing the import with real-time progress tracking. The system processes up to 50,000 rows in chunked batches of 500, with automatic duplicate detection, taxonomy path parsing, and post-import review.

Preparing Your CSV File

Before importing, ensure your CSV file is properly formatted. The system is flexible with column names and can auto-detect many common formats.

Required Columns

Column	Required	Description
Topic name	Yes	The primary audience topic name. This is the only strictly required column.

Optional Columns

Column	Description	Example
Keywords	Comma-separated keyword signals	"hybrid SUV, compact crossover, AWD"
Category	Parent category or taxonomy type hint	"Auto", "Business Technology"
Segment type	B2B, B2C, B2B2C, B2E, or B2G	"B2C"
External ID	Your source system's identifier for this topic	"EXT-12345"
Source	Where this topic originated	"Data Alliance", "Experian"
Taxonomy path	Full taxonomy path using `>` separator	"Automotive > Auto > Electric Vehicles > Tesla"

tip

Column names are matched flexibly during the column mapping step. The system recognizes common variations like "name", "topic", "topic_name", "segment_name" for the topic name column.

Taxonomy Path Format

If your CSV includes a taxonomy path column, the system automatically parses it into the hierarchy levels. The > character is used as the separator.

Provider > Taxonomy Type > Parent Category > Subcategory > Topic Name

Example paths:

Taxonomy Path
"Data Alliance > Automotive & Vehicles > Auto > Electric Vehicles > Tesla Model Y"
"Experian > Technology & Telecom > Business Technology > CRM > Salesforce"
"IAB > Consumer Goods & Retail > Food & Beverage > QSR > Chipotle"

The parser:

Splits on the > separator
Strips leading/trailing whitespace from each segment
Detects and removes provider prefixes (e.g., "Data Alliance", "Experian") that are not part of the taxonomy
Removes structural segments that match taxonomy type or parent category labels
Extracts the leaf (last segment) as the topic name

info

Provider prefix stripping is automatic. If the first segment of the path matches a known data provider name, it is stripped. The system maintains a list of recognized providers.

File Requirements

Parameter	Limit
File format	CSV (comma-separated)
Maximum rows	50,000
Maximum file size	No hard limit (constrained by row count)
Encoding	UTF-8 recommended
Header row	Required (first row must be column headers)

The 4-Step Import Wizard

Step 1: Upload

Navigate to the Import page (/import) and click Upload CSV or drag and drop your file onto the upload area.

The system parses the file immediately and shows:

Total row count
Detected column headers
A preview of the first few rows

If your file exceeds 50,000 rows, only the first 50,000 will be imported. A warning message indicates how many rows were truncated.

Step 2: Column Mapping

Map each column in your CSV to an AudienceGPT field. The system attempts to auto-detect mappings based on column header names, but you can adjust any mapping manually.

Your CSV Column	Maps To
"name", "topic", "topic_name", "segment_name"	Topic Name
"keywords", "tags", "keyword"	Keywords
"category", "parent_category", "taxonomy"	Category
"segment", "segment_type", "type"	Segment Type
"external_id", "id", "source_id"	External ID
"source", "provider", "data_source"	Source
"path", "taxonomy_path", "full_path"	Taxonomy Path

For each column, select the target field from the dropdown, or choose "Skip" to ignore a column.

warning

The Topic Name mapping is required. The import cannot proceed without at least one column mapped to Topic Name.

Step 3: Classification Mode

Choose how imported topics should be classified:

Mode	Description	Speed	Cost	Best For
Rule-Based	Deterministic local classification	Very fast	Free	Large imports, well-known categories
AI-Powered	Claude Sonnet 4.6 with optional web search	Slower	Per-topic API cost	Ambiguous topics, brand verification

When AI-powered mode is selected:

New-to-global topics (not already in the global catalog) are classified via the AI
Topics that match existing global catalog entries are adopted without AI classification
Duplicate topics skip AI classification entirely
A maximum of 10 topics per chunk are classified via AI to prevent route timeouts (the IMPORT_MAX_LLM_PER_CHUNK limit)

tip

For large imports, consider using rule-based mode first, then selectively reclassifying ambiguous topics with AI afterward. This minimizes cost while ensuring accuracy where it matters most.

Step 4: Execute

Click Start Import to begin. The system creates an import batch and begins processing.

Chunked Processing Architecture

Imports are processed in chunks of 500 rows to ensure reliability and enable progress tracking. Here is how it works:

Parse CSV → Create batch (POST /api/import)
  → Sequential chunks of 500 rows (POST /api/import/{batchId}/chunk)
    → Each chunk: classify → embed → deduplicate → batch INSERT
  → Import complete

Processing Pipeline Per Chunk

For each 500-row chunk, the system:

Classifies each topic through the 7-layer engine (rule-based or AI depending on your mode selection)
Generates embeddings -- 256-dimensional vectors for duplicate detection
Checks for duplicates -- Compares against existing topics using cosine similarity (95% threshold blocks, 75% warns) plus brand alias matching
Enriches duplicates -- If a topic already exists, the system enriches the existing record with new metadata (external ID, source) using a COALESCE pattern rather than creating a redundant entry
Batch inserts new topics into the database

Idempotency

Each chunk tracks its completion status. If a chunk is accidentally re-sent (e.g., due to a network retry), the system detects it and returns { skipped: true } without creating duplicate records.

Progress Tracking

During import, a progress bar shows real-time status:

Chunks completed out of total (e.g., "12 / 20 chunks")
Topics processed -- Running count of classified topics
New topics -- Topics added to your Library
Duplicates -- Topics that matched existing records (metadata enriched)
Errors -- Any topics that failed classification
Estimated time remaining

You can cancel an in-progress import at any time. Cancellation is graceful -- topics from already-completed chunks remain in your Library, but no further chunks are processed.

info

The import status can be polled at any time via GET /api/import/{batchId}/status. If you navigate away from the page during import, you can return later to check the result.

Retry Behavior

If a chunk fails (network error, server timeout, etc.), the system automatically retries up to 3 times with exponential backoff:

Attempt	Wait Before Retry
1st retry	~1 second
2nd retry	~2 seconds
3rd retry	~4 seconds

After 3 failed attempts, the chunk is marked as failed and the import continues with the next chunk. Failed chunks are reported in the final import summary.

Duplicate Detection During Import

The import pipeline uses the same dual-layer duplicate detection as single-topic classification:

Semantic similarity (embeddings) -- Each imported topic is compared against all existing topics in your Library. Topics with 95%+ cosine similarity are treated as duplicates and are not re-inserted.
Brand alias matching -- Known brand aliases (e.g., "Chevy" / "Chevrolet") are caught deterministically.

When a duplicate is found, instead of skipping the row entirely, the system enriches the existing topic's metadata:

The external_id from the import is applied if the existing topic does not have one
The source field is updated
Other metadata fields are merged using a COALESCE pattern (existing values are preserved, empty fields are filled)

This means importing the same file twice will not create duplicates but will ensure all metadata is as complete as possible.

Post-Import Review

After the import completes, you enter the import review step in the chatbot. This lets you review each imported topic one at a time and take quick actions:

Quick Actions

Action	Description
Keep	Accept the topic as classified -- no changes needed
Skip	Remove the topic from your Library
Rename	Edit the topic name (the classification is preserved)
Field Edit	Modify specific classification fields (category, segment type, keywords)

Fix All Names (Batch Action)

If many imported topics have naming issues (e.g., they still contain provider prefixes, structural segments, or formatting artifacts), use the Fix All Names batch action. This sends all remaining unreviewed topics to the AI for name cleanup in a single operation.

The AI applies the following fixes:

Strips provider prefixes (e.g., "Data Alliance - " prefix removed)
Removes structural taxonomy segments that were incorrectly included in the name
Fixes CSV quoting artifacts (e.g., extra quotes or escaped characters)
Normalizes capitalization and spacing

tip

Fix All Names is the fastest way to clean up large imports. It processes all remaining topics at once rather than requiring you to review each one individually.

Review Scoring

Each imported topic receives a review score based on name quality heuristics. Topics with low scores (indicating potential naming issues) are surfaced first in the review queue, so you address the most problematic imports first.

Import History

All past imports are tracked in the import history. You can view:

Batch ID -- Unique identifier for each import
Date -- When the import was executed
Row count -- Total rows in the original file
Results -- New topics, duplicates, errors, and skipped
Classification mode -- Whether AI-powered or rule-based was used
Status -- Completed, cancelled, or partially failed

Access import history from the Import page to review past operations or re-import files with different settings.

Troubleshooting

Common Issues

Problem	Cause	Solution
"Maximum 50,000 rows allowed"	File exceeds the row limit	Split your file into multiple CSVs of 50,000 rows or fewer
Column mapping not auto-detected	Unusual column header names	Manually map columns in Step 2 of the wizard
Many duplicates detected	Re-importing previously imported topics	This is expected behavior. Existing topics have their metadata enriched.
Chunk processing timeout	AI-powered mode on complex topics	Switch to rule-based mode. AI mode is limited to 10 LLM calls per chunk to prevent timeouts.
Topics have provider prefix in name	Taxonomy path not parsed correctly	Check that your taxonomy path uses `>` as the separator. Provider prefix stripping requires the correct path format.
Import shows "cancelled"	You or another user cancelled the import	Completed chunks are preserved. Restart the import with remaining data if needed.
CSV parsing errors	File encoding issues	Ensure your file is saved as UTF-8. Avoid special characters in column headers.

Re-importing After Errors

If an import partially fails:

Review the import summary to see which chunks succeeded and which failed.
The topics from completed chunks are already in your Library.
You can re-import the same file -- duplicate detection prevents double-counting, and only the previously failed topics will be newly processed.

info

The idempotent chunk processing means re-importing is always safe. You will never create duplicate topics by running the same import twice.

Next Steps

Library Management -- Browse and manage your imported topics
Classification Deep Dive -- Understand the 7-layer classification applied during import
Matrix Generation -- Generate combinatorial taxonomies instead of importing them
Campaign Brief Analysis -- Upload briefs for AI-recommended topics

Preparing Your CSV File​

Required Columns​

Optional Columns​

Taxonomy Path Format​

File Requirements​

The 4-Step Import Wizard​

Step 1: Upload​

Step 2: Column Mapping​

Step 3: Classification Mode​

Step 4: Execute​

Chunked Processing Architecture​

Processing Pipeline Per Chunk​

Idempotency​

Progress Tracking​

Retry Behavior​

Duplicate Detection During Import​

Post-Import Review​

Quick Actions​

Fix All Names (Batch Action)​

Review Scoring​

Import History​

Troubleshooting​

Common Issues​

Re-importing After Errors​

Next Steps​

Preparing Your CSV File

Required Columns

Optional Columns

Taxonomy Path Format

File Requirements

The 4-Step Import Wizard

Step 1: Upload

Step 2: Column Mapping

Step 3: Classification Mode

Step 4: Execute

Chunked Processing Architecture

Processing Pipeline Per Chunk

Idempotency

Progress Tracking

Retry Behavior

Duplicate Detection During Import

Post-Import Review

Quick Actions

Fix All Names (Batch Action)

Review Scoring

Import History

Troubleshooting

Common Issues

Re-importing After Errors

Next Steps