Agentic Spatial Pathologist Public Product Roadmap#

This repo is the public-facing product layer for the agentic spatial pathologist workflow.

Phase 1: Wrapper Product#

Current strategy:

  • keep histoseg as the execution engine

  • keep this repo lightweight and public-facing

  • expose a stable package name: spatho

  • provide a simple CLI and README-first onboarding

What is already wrapped:

  • OpenAI-driven cluster annotation

  • structure discovery

  • H&E overlay generation

  • pathology review report generation

  • formal organ-pack metadata

  • workflow config schema export

  • artifact manifest generation

Why this repo exists#

histoseg began as a segmentation and contour-generation library. The public product experience now needs a clearer identity:

  • disease-focused workflows

  • case-level reporting

  • reproducible workflow bundles

  • user-facing documentation

This repo becomes that layer.

Phase 2: Product Stabilization#

Near-term work:

  1. Expand packaged organ packs beyond lung and breast

  2. Add small regression tests with tiny fixtures

  3. Add GitHub Actions for pytest, package build, and CLI smoke tests

  4. Stabilize config schema versioning and compatibility rules

  5. Stabilize artifact manifest and report schema versions

Phase 3: Dependency Inversion#

The long-term goal is to reduce direct runtime coupling to sibling repos.

Planned moves:

  1. move public-safe workflow code from histoseg into spatho

  2. keep only geometry/segmentation primitives in histoseg

  3. define organ packs under spatho.organ_packs

  4. support multiple providers: OpenAI, Anthropic, local models

Phase 4: Community Release#

Before broad public release:

  1. rewrite README around spatho, not just legacy wrappers

  2. publish example datasets and example reports

  3. document license boundaries clearly

  4. add issue templates and contribution guide

Phase 5: stGPT Foundation Model -> Evidence Workbench#

The next AI upgrade should add an optional spatial transcriptomics foundation-model layer rather than replacing the existing workflow. The narrative is:

stGPT learns reusable contour/region morpho-molecular representations; spatho plans, validates, and turns them into auditable spatial pathology evidence.

The product should be described as a closed loop:

Model -> Evidence -> Agent -> Human Review -> Better Model

Planned moves:

  1. define stGPT Foundation: training, model architecture, checkpoint loading, embedding, and model packaging

  2. define stGPT Evidence Suite: QC, deterministic splits, benchmark tables, ablations, domain-shift checks, and failure analysis

  3. define stGPT Runtime / Tool API: embed_cells, evaluate_checkpoint, package_model, and export_spatho_artifacts first; retrieval, imputation, niche scoring, region comparison, and structure explanation only after tested outputs exist

  4. define spatho Agentic Workbench: guardrailed workflow orchestration that checks QC before biological conclusions

  5. define spatho Reports: report sections that separate measured expression, model-derived evidence, warnings, and human review

Implementation guardrails:

  • precomputed stGPT artifacts must work without importing stgpt

  • local_stgpt is optional and should fail clearly when the package or model paths are missing

  • fatal stGPT QC blocks a run when stgpt_require_qc_pass=true

  • warning-only stGPT QC becomes cautionary report language, not a hard failure

  • imputation, reconstruction, or embeddings must never be described as measured expression

See stGPT Upgrade Plan for the detailed implementation route.

Repository Roles#

  • spatho: public product and user experience layer

  • histoseg: geometry and segmentation engine

  • example web apps: optional deployment surfaces, not the core product

Agentic Spatial Pathologist v0.1#

The v0.1 platform target is a Xenium-native auditable evidence loop:

stGPT learns Xenium-native morpho-molecular representations; spatho turns them into auditable spatial pathology evidence.

The fixed demo question is: “Which H&E-defined structures in this Xenium case show reproducible morpho-molecular programs, and do those findings pass QC?”

The workbench contract is intentionally conservative:

  • stGPT evidence is model-derived support, not measured expression or diagnosis.

  • QC fatal errors block biological conclusions.

  • QC warning-only evidence is labelled cautionary.

  • Every report claim should carry an evidence ID, artifact path or ID, model/checkpoint provenance, QC status, and human-review state.

  • pyXenium LazySlide/PLIP/mTM summaries are optional evidence sources consumed by spatho; the LR benchmark scaffold remains separate backlog.