Taxonomy Tree
The taxonomy tree provides a hierarchical classification structure underneath each of the 41 parent categories. It uses an adjacency list pattern (each node points to its parent via parent_id) combined with materialized paths (each node stores its full path as a human-readable string like "Electric Vehicles > Battery Technology"). Topics link to tree nodes via the taxonomy_node_id foreign key, enabling precise subcategory-level classification.
This guide covers the tree data model, how to browse and edit the tree, seeding strategies, and how to realign topics to tree nodes.
Data Model
The taxonomy_tree Table
Created by migration 0031_taxonomy_tree.sql:
| Column | Type | Description |
|---|---|---|
id | TEXT PK | Deterministic ID generated from taxonomy_type + path |
taxonomy_type | TEXT | The parent category this node belongs to (one of 41 types, stored as DB taxonomy_type) |
parent_id | TEXT FK | References taxonomy_tree.id for the parent node. NULL for root (L0) nodes |
label | TEXT | Human-readable node label (e.g., "Electric Vehicles") |
level | SMALLINT | Depth in the tree. L0 = root subcategory, L1 = first child, L2 = second, etc. |
path | TEXT | Materialized path using > separator (e.g., "Electric Vehicles > Battery Technology") |
sort_order | INTEGER | Order among siblings at the same level |
source | TEXT | Origin of the node: seed, admin, auto-seed |
is_active | BOOLEAN | Soft-delete flag. Inactive nodes are hidden from the UI |
created_at | TIMESTAMPTZ | Creation timestamp |
updated_at | TIMESTAMPTZ | Last modification timestamp |
Unique constraint: (taxonomy_type, path) -- no two nodes can have the same full path within a parent category.
Indexes:
idx_tree_taxonomy_typeontaxonomy_typeidx_tree_parent_idonparent_ididx_tree_levelon(taxonomy_type, level)
Topic-to-Tree Linking
Topics link to tree nodes through two columns on the topics table:
| Column | Type | Description |
|---|---|---|
taxonomy_node_id | TEXT FK | References taxonomy_tree.id. SET NULL on node deletion |
taxonomy_path | TEXT | Denormalized copy of the node's path for fast display |
An index on taxonomy_node_id supports efficient lookups of all topics assigned to a given node.
Node ID Generation
Node IDs are deterministic, generated by slugifying the taxonomy_type and path:
auto::electric-vehicles -- L0
auto::electric-vehicles-battery-technology -- L1
This ensures idempotent node creation -- re-running seed scripts does not create duplicates.
Hierarchy Structure
The tree reads top-down as:
taxonomy_type (13 groups: "Automotive & Vehicles", "Technology & Telecom", ...)
|-- parent_category (41 types: "Auto", "Business Technology", ...) -- This IS the taxonomy_type column in the tree
|-- L0 node (root subcategory: "Electric Vehicles")
|-- L1 node ("Battery Technology")
|-- L2 node ("Lithium-Ion Batteries")
|-- L3 node (deeper if needed)
Each tree is scoped to one of the 41 parent categories (stored in the taxonomy_type column of the taxonomy_tree table). The 13 taxonomy type groups are a higher-level grouping that exists in the constants file but is not directly represented in the tree table. The tree starts at the parent category level and provides subcategory structure underneath.
Browsing the Tree
Navigate to Admin > Topics > Tree Mapping tab.

Tree Summary View
When no parent category is selected, the API returns a summary of all parent categories that have trees:
API: GET /api/admin/taxonomy-tree
Returns for each parent category:
taxonomy_type-- The parent category namenode_count-- Total number of tree nodescategory_count-- Number of L0 (root) nodesmax_depth-- Deepest level in the treeassigned_topics-- Topics linked to tree nodesunassigned_topics-- Topics with this parent category but notaxonomy_node_id
Detailed Tree View
Select a parent category to see its full nested tree:
API: GET /api/admin/taxonomy-tree?parent_category=Auto
Returns:
parent_category-- The selected parent categorynode_count-- Total nodes in this treeunassigned_topics-- Count of topics not yet mappedtree-- Nested array of nodes, each with:id,label,level,path,sort_order,is_activetopic_count-- Number of topics linked to this nodechildren-- Nested child nodes (recursively)
The tree is built from flat database rows using an in-memory adjacency list algorithm that wires up parent-child relationships and sorts children by sort_order.
Adding Tree Nodes
Via the Admin UI
- Select a parent category in the tree browser
- Click Add Node at the desired level
- Enter the node label (2--100 characters)
- If adding a child node, select the parent node
- Confirm
API: POST /api/admin/taxonomy-tree
{
"parent_category": "Auto",
"label": "Electric Vehicles",
"parent_id": null
}
For child nodes:
{
"parent_category": "Auto",
"label": "Battery Technology",
"parent_id": "auto::electric-vehicles"
}
The system automatically:
- Calculates the
levelfrom the parent (parent's level + 1, or 0 for root) - Builds the
pathby appending the label to the parent's path - Assigns the next
sort_orderamong siblings - Generates a deterministic
idfrom the taxonomy type and path
Node labels should be concise and descriptive. They become the subcategory value when topics are mapped to them, and they appear in platform output names via the template engine's {{subcategory}} field.
Seeding Trees
Auto-Seed from Existing Topics
The fastest way to create an initial tree is to auto-seed from existing topic subcategory values:
- Navigate to Admin > Topics > Tree Mapping
- Select a parent category
- Click Auto-Seed
API: POST /api/admin/taxonomy-tree/auto-seed
{
"parent_category": "Auto"
}
The auto-seed process:
- Queries all unmapped topics (
taxonomy_node_id IS NULL) with this parent category that have non-emptysubcategoryvalues - Groups by distinct
subcategoryvalue with counts, ordered by frequency descending - For each unique subcategory:
- Creates an L0 tree node (using
ON CONFLICT DO NOTHINGfor idempotency) - Maps matching topics by updating
taxonomy_node_idandtaxonomy_path(case-insensitive match on trimmed subcategory)
- Creates an L0 tree node (using
- Returns the number of nodes created and topics mapped, plus per-node details
Auto-seed only creates L0 (root-level) nodes. It does not build a multi-level hierarchy. Use this as a starting point, then manually organize nodes into a deeper tree structure if needed.
Seed Scripts (CLI)
For bulk tree creation from predefined structures, use the CLI seed scripts:
# Seed all taxonomy trees from predefined structures
bun run scripts/seed-all-trees.sh
# Individual seed scripts exist per parent category
bun run scripts/seed-auto-trees.ts
bun run scripts/seed-tech-trees.ts
# etc.
Each seed script:
- Defines a hierarchical node structure in code
- Generates deterministic IDs using the same
makeNodeId()function as the API - Uses
ON CONFLICT DO NOTHINGfor safe re-runs - Sets
source = 'seed'on created nodes
Mapping Topics to Tree Nodes
Unmapped Topics Browser
The unmapped topics view shows topics that have a parent category but no taxonomy_node_id:
API: GET /api/admin/topics/unmapped
Query parameters:
| Parameter | Description | Default |
|---|---|---|
parent_category | Filter by parent category | All categories |
page | Page number | 1 |
pageSize | Items per page (max 200) | 50 |
search | Search by topic name (ILIKE) | None |
The response includes a breakdown array showing the count of unmapped topics per parent category, useful for prioritizing which trees to work on first.
Bulk Mapping
Select unmapped topics and assign them to a tree node:
- In the unmapped topics browser, select topics using checkboxes
- Choose a target tree node from the dropdown
- Click Map Selected
API: POST /api/admin/topics/unmapped
{
"topicIds": ["topic_1", "topic_2", "topic_3"],
"nodeId": "auto::electric-vehicles"
}
This updates each topic's:
taxonomy_node_id-- Set to the selected node's IDsubcategory-- Set to the node'slabeltaxonomy_path-- Set to the node'spathupdated_at-- Current timestamp
Maximum 500 topics per request.
When mapping topics to tree nodes, the topic's subcategory text value is overwritten with the node's label. This ensures consistency between the tree structure and the displayed subcategory value. The template engine uses this value for {{subcategory}} in platform output names.
Tree Realignment (LLM Batch)
For large-scale realignment of topics to tree nodes, use the CLI realignment script. This uses Claude batches to reclassify topic subcategories to match tree node labels.
Running Realignment
# Preview without making changes
bun run scripts/realign-subcategories.ts --dry-run
# Realign all topics (processes in LLM batches)
bun run scripts/realign-subcategories.ts
# Filter by parent category and limit
bun run scripts/realign-subcategories.ts --taxonomy-filter "Auto" --limit 10
# Target specific topic IDs from a JSON file
bun run scripts/realign-subcategories.ts --ids-file /tmp/ids.json
# Force reclassify even if taxonomy_node_id is already set
bun run scripts/realign-subcategories.ts --force
How Realignment Works
- Fetch tree paths: Loads all tree node labels grouped by parent category using
fetchTreePathMap() - Collect eligible topics: Finds topics where
taxonomy_node_id IS NULL(or all topics with--force) - Build LLM prompts: For each topic, provides the topic name, current classification context, and the available tree node labels for its parent category
- Submit as batch: Sends to the Anthropic Batch API (50% cost discount)
- Post-process results: Maps each LLM response to a tree node, sets
taxonomy_node_idand updatessubcategoryto match
Display Fix Script
After realignment, subcategory text values may not match their L0 tree node labels. The fix script corrects this:
# Preview changes
bun run scripts/fix-subcategory-level0.ts --dry-run
# Apply fixes
bun run scripts/fix-subcategory-level0.ts
This updates the subcategory text column on all topics that have a taxonomy_node_id to match the tree node's L0 ancestor label. This ensures the displayed subcategory aligns with the tree hierarchy.
Backfilling taxonomy_node_id via LLM
For topics that have a subcategory text value but no taxonomy_node_id, use the backfill script:
# Preview without writing
bun run scripts/backfill-taxonomy-nodes-llm.ts --dry-run
# Run backfill
bun run scripts/backfill-taxonomy-nodes-llm.ts
This script uses LLM batches to match existing topic subcategory values to the closest tree node label, then sets the taxonomy_node_id accordingly.
Tree Node Reference
Level Conventions
| Level | Name | Example | Purpose |
|---|---|---|---|
| L0 | Root subcategory | "Electric Vehicles" | Primary subcategory classification |
| L1 | First child | "Battery Technology" | Narrower specialization |
| L2 | Second child | "Lithium-Ion Batteries" | Specific sub-specialization |
| L3+ | Deeper levels | As needed | Fine-grained classification |
Source Values
| Source | Origin |
|---|---|
seed | CLI seed scripts |
admin | Created manually via admin UI |
auto-seed | Created by the auto-seed API |
Path Format Examples
"Electric Vehicles" -- L0
"Electric Vehicles > Battery Technology" -- L1
"Electric Vehicles > Battery Technology > Lithium-Ion" -- L2
Paths use > (space-arrow-space) as the separator. The path is the concatenation of all ancestor labels plus the current node's label.
Template Engine Integration
Tree levels are available as template context variables for platform output names:
| Variable | Source |
|---|---|
{{subcategory}} | L0 node label (the topic's direct subcategory) |
{{subcategory_l1}} | L1 node label from the taxonomy path |
{{subcategory_l2}} | L2 node label from the taxonomy path |
{{subcategory_l3}} | L3 node label from the taxonomy path |
{{subcategory_l4}} | L4 node label from the taxonomy path |
These are extracted from the taxonomy_path field by splitting on > and indexing into the resulting array.
Next Steps
- Global Topics -- Manage topics that reference tree nodes
- Output Templates -- Use
{{subcategory}}and tree levels in platform naming - Background Jobs -- Monitor realignment jobs