Skip to main content

Background Jobs

AudienceGPT uses a generic background job system for long-running admin operations. All job types -- reclassification, dedup sweeps, merges, activation refreshes -- share the same admin_jobs table, the same create/process/poll API pattern, and the same client-driven chunk processing model. This means the admin UI drives job progress through repeated API calls rather than relying on server-side workers.

This guide covers the job system architecture, all job types, lifecycle management, monitoring, and CLI scripts for triggering jobs outside the UI.

Architecture

The admin_jobs Table

Created by migration 0020_admin_jobs.sql:

ColumnTypeDescription
idTEXT PKJob ID (format: job_{timestamp}_{random})
job_typeTEXTOne of: reclassify, dedup_sweep, merge, refresh_activations, sync_trade_desk
org_idTEXTOrganization scope (NULL for system-wide jobs)
created_byTEXTUser ID who created the job
statusTEXTCurrent state: pending, processing, completed, failed, cancelled
configJSONBJob-specific configuration (mode, filters, clusters, etc.)
total_itemsINTEGERTotal items to process
processed_itemsINTEGERItems processed so far
success_countINTEGERSuccessfully processed items
error_countINTEGERItems that failed
skip_countINTEGERItems skipped (already up-to-date)
errorsJSONBArray of error objects: [{id?, name?, message}]
resultJSONBJob-specific result data (clusters, batch IDs, offsets, etc.)
created_atTIMESTAMPTZJob creation time
started_atTIMESTAMPTZWhen processing began
completed_atTIMESTAMPTZWhen the job finished

Indexes:

  • idx_admin_jobs_type_status on (job_type, status)
  • idx_admin_jobs_created_at on (created_at DESC)
  • idx_admin_jobs_org_id on (org_id, created_at DESC)

Client-Driven Processing

Unlike traditional worker-based job systems, AudienceGPT uses client-driven chunk processing:

Admin UI → POST /api/admin/jobs              (create job)
→ POST /api/admin/jobs/:jobId/process (process chunk) ← repeated
→ GET /api/admin/jobs/:jobId (poll status)

Each /process call processes one batch of items and returns progress information. The client (admin UI hook) keeps calling /process in a loop until done: true is returned.

This pattern has several advantages:

  • No background workers or CRON infrastructure needed
  • Natural rate limiting -- one chunk per request
  • Progress is visible immediately in the UI
  • Jobs can be cancelled between chunks
  • No risk of zombie workers or orphaned processes
info

For LLM reclassification, the client still drives the loop, but Phase 1 (batch submission) and Phase 2 (polling Anthropic) may span multiple CRON ticks with the client polling periodically.

Job Lifecycle

State Transitions

pending → processing → completed
→ failed
→ cancelled
StateDescription
pendingJob created, no processing started
processingAt least one chunk has been processed
completedAll items processed successfully (some may have errors)
failedUnrecoverable error during processing
cancelledAdmin cancelled the job mid-processing

Creating a Job

API: POST /api/admin/jobs

{
"jobType": "reclassify",
"config": {
"mode": "llm",
"force": false,
"enableWebSearch": true,
"parentCategoryFilter": "Business Technology",
"limit": 500
}
}

The totalItems is calculated server-side based on the job type and config. Returns:

{
"jobId": "job_m3abc123_xyz789",
"totalItems": 2847
}

Processing Chunks

API: POST /api/admin/jobs/:jobId/process

The process route (maxDuration: 60s) dispatches to the appropriate handler based on job_type:

Job TypeHandlerBatch Size
reclassifyhandleReclassify()50 (local), 100 sub-batches (LLM)
reclassify (rollback)handleRollback()50
dedup_sweephandleDedupSweep()50
mergehandleMerge()10
sync_trade_deskhandleTradeDesktSync()Varies

Each call returns:

{
"done": false,
"processedItems": 150,
"totalItems": 2847,
"successCount": 148,
"errorCount": 2,
"skipCount": 0
}

When done: true, the job has completed.

Cancelling a Job

API: PATCH /api/admin/jobs/:jobId with { "status": "cancelled" }

Cancellation takes effect between chunks -- the current chunk completes, but no further chunks are processed.

Listing Jobs

API: GET /api/admin/jobs

Query parameters:

ParameterDescriptionDefault
jobTypeFilter by job typeAll types
statusFilter by statusAll statuses
limitMaximum results (1--100)20

Returns jobs ordered by created_at DESC.

Job Types

Reclassify

Re-runs the classification pipeline on global topics. Two modes:

Local Mode

Processes topics sequentially using the deterministic regex-based classification engine. No API calls. Fast (~50 topics/second) and free.

Config options:

FieldTypeDescription
mode"local"Required for local mode
forcebooleanReclassify all topics, not just outdated ones
limitnumberMaximum topics to process
parentCategoryFilterstringOnly reclassify this parent category

Batch size: 50 topics per chunk (RECLASSIFY_BATCH_SIZE)

LLM Mode

Uses the Anthropic Message Batches API for AI-powered reclassification. Three phases:

  1. Phase 1 -- Submit: Collects eligible topics, splits into sub-batches of 100, submits to Anthropic
  2. Phase 2a -- Poll & Stage: Polls batch status each CRON tick. Streams completed results into batch_results_staging table. Time-bounded to 45 seconds per tick.
  3. Phase 2b -- Post-process: Processes staged results in chunks of 500. Validates, updates topics, records history.

Config options:

FieldTypeDescription
mode"llm"Required for LLM mode
forcebooleanReclassify all topics, not just outdated ones
limitnumberMaximum topics to process
parentCategoryFilterstringOnly reclassify this parent category
enableWebSearchbooleanEnable Claude web search tool (default: true, disabling saves ~50% cost)
modelstringOverride classification model (default: CLASSIFICATION_MODEL)

Sub-batch size: 100 topics per Anthropic batch (SUB_BATCH_SIZE)

Column Mode

Reclassifies a single field across all topics using a focused LLM prompt. Uses the same three-phase batch pattern as full LLM mode.

Additional config:

FieldTypeDescription
targetColumnstringColumn to reclassify: segment_type, taxonomy_type, parent_category, subcategory, audience_type
tip

Column-level reclassification is significantly cheaper than full reclassification because the prompt only asks for one field. Use this when you know only a specific field needs correction across the catalog.

Dedup Sweep

Scans topics modified by a reclassify job to find near-duplicates using embedding similarity.

Config:

FieldTypeDescription
sourceBatchIdstringThe job ID of the source reclassify job

Batch size: 50 topics per chunk (DEDUP_BATCH_SIZE)

Similarity threshold: 0.95 (cosine similarity)

The sweep accumulates clusters in the job's result.clusters array across chunks. When complete, the clusters are available for review and merge.

See Dedup & Merge for full details.

Merge

Merges duplicate topic clusters identified by a dedup sweep.

Config:

FieldTypeDescription
clustersarrayArray of DedupCluster objects with winner/loser information

Batch size: 10 clusters per chunk (MERGE_BATCH_SIZE)

Each cluster merge involves:

  1. Transferring or consolidating org topic links
  2. Transferring activations and history
  3. Soft-deleting the loser topic (merged_into = winnerId)
  4. Recording merge history

See Dedup & Merge for the full merge cascade description.

Refresh Activations

Refreshes stale segment activations by re-pushing updated names/descriptions to platform APIs.

Config: Org-scoped (uses org_id from the job record)

This job type is used by the hygiene dashboard's "Refresh All Stale" button. It finds all activations where the pushed segment name or version differs from the current topic's values and re-pushes them.

Rollback

Any completed reclassify job can be rolled back:

API: POST /api/admin/jobs/:jobId/rollback

This creates a new job of type reclassify with config.rollbackOf set to the original job's ID. The rollback handler:

  1. Queries topic_history for entries with the source batchId
  2. Restores each topic's previousValues from the history snapshot
  3. Regenerates embeddings from the restored values
  4. Records rolled_back history entries

Batch size: 50 topics per chunk (same as reclassify)

warning

Rollback only restores global topic data. Org topic transfers from merge operations are not automatically reversed. See Dedup & Merge for rollback limitations.

Retry Errors

Failed topics from a reclassify job can be retried:

API: POST /api/admin/jobs/:jobId/retry-errors

This creates a new reclassify job that:

  • Targets only the topic IDs that appear in the source job's errors array
  • Inherits the mode (local/LLM) from the source job
  • Uses force: true to ensure reprocessing
  • Stores a retryOf reference for traceability

Error Handling

Job-Level Errors

Errors are accumulated in the errors JSONB array on the job record. Each error object:

{
"id": "topic_abc123",
"name": "Salesforce CRM",
"message": "Invalid segment_type: null returned by LLM"
}

Errors are appended via appendJobErrors() and never cleared -- they represent the full error history of the job.

Job Failure

If an unrecoverable error occurs during processing (e.g., database connection failure), the job is marked as failed via failAdminJob(). The error message is stored in the job record.

Topic-Level Flagging

When a topic fails with a validation error (e.g., "Invalid segment_type"), the reclassify handler flags the topic for human review:

UPDATE topics SET review_status = 'needs_review', review_reason = $message
WHERE id = $topicId

This feeds into the Review Queue.

Monitoring Jobs

Admin UI

Navigate to Admin > Background Jobs to see:

  • List of recent jobs with status, progress, and timing
  • Per-job detail with config, progress bars, and error counts
  • Action buttons for rollback, retry errors, and cancel

Background Jobs Dashboard

API Polling

# List recent jobs
GET /api/admin/jobs?limit=10

# Get specific job status
GET /api/admin/jobs/:jobId

# List only processing jobs
GET /api/admin/jobs?status=processing

Batch Constants Reference

ConstantValueUsed By
RECLASSIFY_BATCH_SIZE50Local reclassify, rollback
DEDUP_BATCH_SIZE50Dedup sweep
MERGE_BATCH_SIZE10Merge handler
SUB_BATCH_SIZE100LLM sub-batch submission
BATCH_PROCESS_CHUNK_SIZE500Post-processing staged LLM results
STAGING_CHUNK_SIZE2000Max results to stage per CRON tick
STAGING_TIME_BUDGET_MS45000Time budget for staging (45 seconds)

CLI Scripts

Several admin operations can be triggered from the command line:

Global Reclassify

# Rule-based reclassify (local engine)
bun run reclassify-global

# AI-powered reclassify (Anthropic API with local fallback)
bun run reclassify-global --llm

# Preview without making changes
bun run reclassify-global --dry-run

# Reclassify with limit and concurrency
bun run reclassify-global --llm --limit 500 --concurrency 5

# Force reclassify all topics (not just outdated)
bun run reclassify-global --force

# Verbose output
bun run reclassify-global --verbose

Taxonomy Tree Realignment

# Reclassify subcategories to tree node labels
bun run scripts/realign-subcategories.ts

# With filters
bun run scripts/realign-subcategories.ts --taxonomy-filter "Auto" --limit 10

# Target specific topic IDs
bun run scripts/realign-subcategories.ts --ids-file /tmp/ids.json

# Fix subcategory display values
bun run scripts/fix-subcategory-level0.ts

# Backfill taxonomy_node_id via LLM
bun run scripts/backfill-taxonomy-nodes-llm.ts

Activation Refresh

# Refresh all stale LiveRamp activations
bun run refresh-activations

# Preview without making changes
bun run refresh-activations --dry-run

# Limit the number of activations to refresh
bun run refresh-activations --limit 50

Credential Encryption

# Encrypt existing plaintext sync credentials
bun run encrypt-credentials

# Preview without making changes
bun run encrypt-credentials --dry-run

Taxonomy Tree Seeding

# Seed all taxonomy tree nodes
bun run scripts/seed-all-trees.sh

Job Store API Reference

The admin-job-store.ts provides these functions used by handlers:

FunctionDescription
createAdminJob(params)Create a new job record
getAdminJob(jobId)Fetch a job by ID
listAdminJobs(filters)List jobs with optional type/status/limit filters
updateAdminJobProgress(jobId, updates)Update progress counters and result data
completeAdminJob(jobId, result, counters?)Mark job as completed with final result
failAdminJob(jobId, errorMessage)Mark job as failed
appendJobErrors(jobId, errors)Add errors to the errors array

Best Practices

  1. Run dedup sweeps after reclassify jobs -- changed embeddings may create new near-duplicates
  2. Monitor the errors array -- patterns in errors (e.g., all topics from one parent category failing) indicate systematic issues
  3. Use column-level reclassify for targeted fixes -- it costs much less than a full reclassify
  4. Cancel stuck jobs promptly -- if a job shows no progress after several minutes, cancel it and investigate
  5. Review the staging table -- for LLM jobs, the batch_results_staging table shows intermediate results. It is automatically cleaned up when processing completes.
  6. Disable web search for cost savings -- if you do not need web verification, disabling web search cuts LLM reclassify costs by approximately 50%

Next Steps