Skip to main content

Taxonomy Tree

The taxonomy tree provides a hierarchical classification structure underneath each of the 41 parent categories. It uses an adjacency list pattern (each node points to its parent via parent_id) combined with materialized paths (each node stores its full path as a human-readable string like "Electric Vehicles > Battery Technology"). Topics link to tree nodes via the taxonomy_node_id foreign key, enabling precise subcategory-level classification.

This guide covers the tree data model, how to browse and edit the tree, seeding strategies, and how to realign topics to tree nodes.

Data Model

The taxonomy_tree Table

Created by migration 0031_taxonomy_tree.sql:

ColumnTypeDescription
idTEXT PKDeterministic ID generated from taxonomy_type + path
taxonomy_typeTEXTThe parent category this node belongs to (one of 41 types, stored as DB taxonomy_type)
parent_idTEXT FKReferences taxonomy_tree.id for the parent node. NULL for root (L0) nodes
labelTEXTHuman-readable node label (e.g., "Electric Vehicles")
levelSMALLINTDepth in the tree. L0 = root subcategory, L1 = first child, L2 = second, etc.
pathTEXTMaterialized path using > separator (e.g., "Electric Vehicles > Battery Technology")
sort_orderINTEGEROrder among siblings at the same level
sourceTEXTOrigin of the node: seed, admin, auto-seed
is_activeBOOLEANSoft-delete flag. Inactive nodes are hidden from the UI
created_atTIMESTAMPTZCreation timestamp
updated_atTIMESTAMPTZLast modification timestamp

Unique constraint: (taxonomy_type, path) -- no two nodes can have the same full path within a parent category.

Indexes:

  • idx_tree_taxonomy_type on taxonomy_type
  • idx_tree_parent_id on parent_id
  • idx_tree_level on (taxonomy_type, level)

Topic-to-Tree Linking

Topics link to tree nodes through two columns on the topics table:

ColumnTypeDescription
taxonomy_node_idTEXT FKReferences taxonomy_tree.id. SET NULL on node deletion
taxonomy_pathTEXTDenormalized copy of the node's path for fast display

An index on taxonomy_node_id supports efficient lookups of all topics assigned to a given node.

Node ID Generation

Node IDs are deterministic, generated by slugifying the taxonomy_type and path:

auto::electric-vehicles                          -- L0
auto::electric-vehicles-battery-technology -- L1

This ensures idempotent node creation -- re-running seed scripts does not create duplicates.

Hierarchy Structure

The tree reads top-down as:

taxonomy_type (13 groups: "Automotive & Vehicles", "Technology & Telecom", ...)
|-- parent_category (41 types: "Auto", "Business Technology", ...) -- This IS the taxonomy_type column in the tree
|-- L0 node (root subcategory: "Electric Vehicles")
|-- L1 node ("Battery Technology")
|-- L2 node ("Lithium-Ion Batteries")
|-- L3 node (deeper if needed)
Relationship to Parent Categories

Each tree is scoped to one of the 41 parent categories (stored in the taxonomy_type column of the taxonomy_tree table). The 13 taxonomy type groups are a higher-level grouping that exists in the constants file but is not directly represented in the tree table. The tree starts at the parent category level and provides subcategory structure underneath.

Browsing the Tree

Navigate to Admin > Topics > Tree Mapping tab.

Taxonomy Tree Browser

Tree Summary View

When no parent category is selected, the API returns a summary of all parent categories that have trees:

API: GET /api/admin/taxonomy-tree

Returns for each parent category:

  • taxonomy_type -- The parent category name
  • node_count -- Total number of tree nodes
  • category_count -- Number of L0 (root) nodes
  • max_depth -- Deepest level in the tree
  • assigned_topics -- Topics linked to tree nodes
  • unassigned_topics -- Topics with this parent category but no taxonomy_node_id

Detailed Tree View

Select a parent category to see its full nested tree:

API: GET /api/admin/taxonomy-tree?parent_category=Auto

Returns:

  • parent_category -- The selected parent category
  • node_count -- Total nodes in this tree
  • unassigned_topics -- Count of topics not yet mapped
  • tree -- Nested array of nodes, each with:
    • id, label, level, path, sort_order, is_active
    • topic_count -- Number of topics linked to this node
    • children -- Nested child nodes (recursively)

The tree is built from flat database rows using an in-memory adjacency list algorithm that wires up parent-child relationships and sorts children by sort_order.

Adding Tree Nodes

Via the Admin UI

  1. Select a parent category in the tree browser
  2. Click Add Node at the desired level
  3. Enter the node label (2--100 characters)
  4. If adding a child node, select the parent node
  5. Confirm

API: POST /api/admin/taxonomy-tree

{
"parent_category": "Auto",
"label": "Electric Vehicles",
"parent_id": null
}

For child nodes:

{
"parent_category": "Auto",
"label": "Battery Technology",
"parent_id": "auto::electric-vehicles"
}

The system automatically:

  • Calculates the level from the parent (parent's level + 1, or 0 for root)
  • Builds the path by appending the label to the parent's path
  • Assigns the next sort_order among siblings
  • Generates a deterministic id from the taxonomy type and path
tip

Node labels should be concise and descriptive. They become the subcategory value when topics are mapped to them, and they appear in platform output names via the template engine's {{subcategory}} field.

Seeding Trees

Auto-Seed from Existing Topics

The fastest way to create an initial tree is to auto-seed from existing topic subcategory values:

  1. Navigate to Admin > Topics > Tree Mapping
  2. Select a parent category
  3. Click Auto-Seed

API: POST /api/admin/taxonomy-tree/auto-seed

{
"parent_category": "Auto"
}

The auto-seed process:

  1. Queries all unmapped topics (taxonomy_node_id IS NULL) with this parent category that have non-empty subcategory values
  2. Groups by distinct subcategory value with counts, ordered by frequency descending
  3. For each unique subcategory:
    • Creates an L0 tree node (using ON CONFLICT DO NOTHING for idempotency)
    • Maps matching topics by updating taxonomy_node_id and taxonomy_path (case-insensitive match on trimmed subcategory)
  4. Returns the number of nodes created and topics mapped, plus per-node details
warning

Auto-seed only creates L0 (root-level) nodes. It does not build a multi-level hierarchy. Use this as a starting point, then manually organize nodes into a deeper tree structure if needed.

Seed Scripts (CLI)

For bulk tree creation from predefined structures, use the CLI seed scripts:

# Seed all taxonomy trees from predefined structures
bun run scripts/seed-all-trees.sh

# Individual seed scripts exist per parent category
bun run scripts/seed-auto-trees.ts
bun run scripts/seed-tech-trees.ts
# etc.

Each seed script:

  • Defines a hierarchical node structure in code
  • Generates deterministic IDs using the same makeNodeId() function as the API
  • Uses ON CONFLICT DO NOTHING for safe re-runs
  • Sets source = 'seed' on created nodes

Mapping Topics to Tree Nodes

Unmapped Topics Browser

The unmapped topics view shows topics that have a parent category but no taxonomy_node_id:

API: GET /api/admin/topics/unmapped

Query parameters:

ParameterDescriptionDefault
parent_categoryFilter by parent categoryAll categories
pagePage number1
pageSizeItems per page (max 200)50
searchSearch by topic name (ILIKE)None

The response includes a breakdown array showing the count of unmapped topics per parent category, useful for prioritizing which trees to work on first.

Bulk Mapping

Select unmapped topics and assign them to a tree node:

  1. In the unmapped topics browser, select topics using checkboxes
  2. Choose a target tree node from the dropdown
  3. Click Map Selected

API: POST /api/admin/topics/unmapped

{
"topicIds": ["topic_1", "topic_2", "topic_3"],
"nodeId": "auto::electric-vehicles"
}

This updates each topic's:

  • taxonomy_node_id -- Set to the selected node's ID
  • subcategory -- Set to the node's label
  • taxonomy_path -- Set to the node's path
  • updated_at -- Current timestamp

Maximum 500 topics per request.

tip

When mapping topics to tree nodes, the topic's subcategory text value is overwritten with the node's label. This ensures consistency between the tree structure and the displayed subcategory value. The template engine uses this value for {{subcategory}} in platform output names.

Tree Realignment (LLM Batch)

For large-scale realignment of topics to tree nodes, use the CLI realignment script. This uses Claude batches to reclassify topic subcategories to match tree node labels.

Running Realignment

# Preview without making changes
bun run scripts/realign-subcategories.ts --dry-run

# Realign all topics (processes in LLM batches)
bun run scripts/realign-subcategories.ts

# Filter by parent category and limit
bun run scripts/realign-subcategories.ts --taxonomy-filter "Auto" --limit 10

# Target specific topic IDs from a JSON file
bun run scripts/realign-subcategories.ts --ids-file /tmp/ids.json

# Force reclassify even if taxonomy_node_id is already set
bun run scripts/realign-subcategories.ts --force

How Realignment Works

  1. Fetch tree paths: Loads all tree node labels grouped by parent category using fetchTreePathMap()
  2. Collect eligible topics: Finds topics where taxonomy_node_id IS NULL (or all topics with --force)
  3. Build LLM prompts: For each topic, provides the topic name, current classification context, and the available tree node labels for its parent category
  4. Submit as batch: Sends to the Anthropic Batch API (50% cost discount)
  5. Post-process results: Maps each LLM response to a tree node, sets taxonomy_node_id and updates subcategory to match

Display Fix Script

After realignment, subcategory text values may not match their L0 tree node labels. The fix script corrects this:

# Preview changes
bun run scripts/fix-subcategory-level0.ts --dry-run

# Apply fixes
bun run scripts/fix-subcategory-level0.ts

This updates the subcategory text column on all topics that have a taxonomy_node_id to match the tree node's L0 ancestor label. This ensures the displayed subcategory aligns with the tree hierarchy.

Backfilling taxonomy_node_id via LLM

For topics that have a subcategory text value but no taxonomy_node_id, use the backfill script:

# Preview without writing
bun run scripts/backfill-taxonomy-nodes-llm.ts --dry-run

# Run backfill
bun run scripts/backfill-taxonomy-nodes-llm.ts

This script uses LLM batches to match existing topic subcategory values to the closest tree node label, then sets the taxonomy_node_id accordingly.

Tree Node Reference

Level Conventions

LevelNameExamplePurpose
L0Root subcategory"Electric Vehicles"Primary subcategory classification
L1First child"Battery Technology"Narrower specialization
L2Second child"Lithium-Ion Batteries"Specific sub-specialization
L3+Deeper levelsAs neededFine-grained classification

Source Values

SourceOrigin
seedCLI seed scripts
adminCreated manually via admin UI
auto-seedCreated by the auto-seed API

Path Format Examples

"Electric Vehicles"                                     -- L0
"Electric Vehicles > Battery Technology" -- L1
"Electric Vehicles > Battery Technology > Lithium-Ion" -- L2

Paths use > (space-arrow-space) as the separator. The path is the concatenation of all ancestor labels plus the current node's label.

Template Engine Integration

Tree levels are available as template context variables for platform output names:

VariableSource
{{subcategory}}L0 node label (the topic's direct subcategory)
{{subcategory_l1}}L1 node label from the taxonomy path
{{subcategory_l2}}L2 node label from the taxonomy path
{{subcategory_l3}}L3 node label from the taxonomy path
{{subcategory_l4}}L4 node label from the taxonomy path

These are extracted from the taxonomy_path field by splitting on > and indexing into the resulting array.

Next Steps