Foundation Evidence Layer#
AI-Driven Spatial Pathologist now includes an optional foundation evidence layer. It adds auditable stGPT/scGPT/SpatialFusion-inspired evidence to the existing workflow without replacing the OpenAI backend, the local pathology-ai backend, heuristic annotation, or the PLIP H&E contour workflow.
The v1 design is intentionally conservative: it consumes frozen or precomputed features, summarizes them at cluster and structure level, and lets the local or OpenAI reviewer adjudicate conflicts. It does not train a new SpatialFusion-style joint embedding model.
The guiding product sentence is:
stGPT learns reusable contour/region morpho-molecular representations; spatho plans, validates, and turns them into auditable spatial pathology evidence.
What It Borrows#
From scGPT-style workflows:
reference-mapping evidence from pretrained RNA representations;
cluster- and structure-level label distributions;
confidence and unknown/ambiguous rates that can temper marker-only annotation.
From SpatialFusion-like multimodal thinking:
explicit RNA, H&E, pathway, and spatial evidence channels;
structure/niche summaries rather than isolated cluster labels;
cross-modal concordance, conflict, and complementarity statements.
From pathology foundation-model practice:
H&E contour evidence is summarized as feature signals, not used as a silent replacement for molecular labels;
visual uncertainty, artifact, tumor, inflammation, and stroma signals remain visible in the report.
Workflow Fields#
All new fields default to disabled:
{
"rna_foundation_enabled": true,
"rna_foundation_backend": "precomputed_scgpt",
"rna_foundation_cell_mapping_path": "/path/to/scgpt_cell_mapping.csv",
"rna_foundation_cluster_summary_path": null,
"pathway_activity_enabled": true,
"pathway_activity_csv": null,
"niche_fusion_enabled": true,
"niche_fusion_backend": "lightweight"
}
The stGPT workbench fields also default to disabled:
{
"stgpt_enabled": true,
"stgpt_backend": "precomputed_artifacts",
"stgpt_artifact_dir": "/path/to/stgpt/spatho_export",
"stgpt_min_cell_coverage": 0.95,
"stgpt_require_qc_pass": true
}
stgpt_backend="precomputed_artifacts" consumes exported files without importing stgpt. stgpt_backend="local_stgpt" can call a local stGPT runtime when stgpt_model_path and stgpt_config_path are configured.
rna_foundation_backend="precomputed_scgpt" means the workflow reads a prepared cell-reference mapping or cluster summary. The package does not require scGPT, CUDA, Scanpy, or SpatialFusion for normal installation.
If pathway_activity_csv is not provided, the workflow computes a lightweight pathway score from the configured differential-expression CSV using built-in breast-relevant gene sets.
Outputs#
The layer writes a foundation/ directory under the workflow output root:
rna_foundation_cluster_summary.csv/jsonrna_foundation_structure_summary.csv/jsonpathway_activity_cluster_summary.csv/jsonpathway_activity_structure_summary.csv/jsonhe_morphology_feature_summary.csv/jsonniche_fusion_summary.csv/jsonfoundation_evidence_metadata.jsonstgpt_evidence_summary.csv/jsonwhen stGPT evidence is enabled
These files are also included in artifact_manifest.json when the layer is enabled.
Evidence Bundle Contract#
The long-term fusion target is an evidence graph built from small, typed evidence bundles rather than a report assembled directly from raw CSV rows. Each stGPT, RNA foundation, pathway, H&E, or niche-fusion statement should be representable as:
{
"evidence_id": "stgpt.structure.3",
"unit": "structure",
"unit_id": "3",
"source": "stgpt",
"evidence_type": "morpho_molecular_embedding",
"measured": false,
"model_derived": true,
"qc_status": "warning",
"summary": "Region embedding is available but image coverage is below the configured threshold.",
"supporting_artifacts": [
"region_embeddings.parquet",
"region_qc_report.json"
]
}
Required fields are evidence_id, unit, unit_id, source, evidence_type, measured, model_derived, qc_status, summary, and supporting_artifacts. Embeddings, imputation, reconstruction, retrieval, or model scores must set model_derived=true; directly measured Xenium expression and H&E metadata may set measured=true only when the value is not inferred.
The preferred stGPT artifact set is region-first:
region_embeddings.parquetregion_cell_membership.parquetregion_molecular_summary.parquetregion_image_manifest.jsonregion_qc_report.jsonevidence_manifest.json
Compatibility files such as cell_embeddings.parquet, structure_embedding_summary.csv, and qc_report.json should remain readable by spatho.
Report Semantics#
The HTML report gains a Foundation Evidence section. Each structure-level review can carry:
marker heuristic evidence;
scGPT-like RNA reference evidence;
pathway activity evidence;
PLIP H&E morphology evidence;
lightweight niche-fusion consistency notes;
stGPT morpho-molecular embedding summaries;
final LLM adjudication.
Guardrails are explicit: missing stGPT artifacts make spatho doctor not ready; fatal stGPT QC blocks a run when stgpt_require_qc_pass=true; warning-only QC is shown as cautionary evidence; imputed or reconstructed signals must be labeled as model-derived, not measured expression. Fatal QC blocks biological claims; warning-only QC enters cautionary report language; imputation and reconstruction are never reported as measured expression.
The review text is expected to distinguish agreement from complementarity. For example, a tumor RNA reference plus tumor-like H&E morphology is concordant; a tumor RNA reference plus macrophage-rich H&E evidence may be complementary inflammation; high artifact signal is treated as a quality caveat.
Limitations#
This v1 layer is not a replacement for trained multimodal representation learning. It standardizes the evidence interface and reporting surface first. Later versions can plug in true SpatialFusion-style joint embeddings, UNI/CONCH image embeddings, or scGPT-spatial zero-shot embeddings behind the same structure-level evidence files.
For the current Atera tutorial, the generated precomputed_scgpt smoke mapping validates the interface only. It is not a real stGPT or scGPT-spatial result and should not be interpreted biologically. The next real integration step is to produce stGPT exports with export_spatho_artifacts, then let spatho consume those artifacts through stgpt_backend="precomputed_artifacts" or stgpt_backend="local_stgpt".