
Multi-Backend Workflows with tinydenseR
Pedro Milanez-Almeida
2026-05-12
Source:vignettes/multi-backend-workflows.Rmd
multi-backend-workflows.RmdIntroduction
tinydenseR supports multiple single-cell data formats
through a unified S3 dispatch system. Regardless of whether your data
lives in a Seurat object, a SingleCellExperiment, an HDF5-backed AnnData
file, or a bare sparse matrix, the same analysis functions work
identically.
The supported backends are:
| Backend | Input class | Result container |
|---|---|---|
| TDRObj (direct) | TDRObj |
TDRObj |
| Seurat | Seurat |
Seurat (TDRObj in Misc
slot) |
| SingleCellExperiment | SingleCellExperiment |
SingleCellExperiment (TDRObj in
metadata) |
| H5AD file path |
character (path to
.h5ad) |
TDRObj (bare) |
| HDF5AnnData |
HDF5AnnData (via anndataR,
modern files only) |
TDRObj (bare) |
| Sparse matrix | dgCMatrix |
TDRObj (bare) |
| On-disk matrix |
IterableMatrix (BPCells) |
TDRObj (bare) |
| Delayed matrix | DelayedMatrix |
TDRObj (bare) |
| Cytometry |
flowSet (flowCore) or cytoset
(flowWorkspace) |
TDRObj (bare) |
This means you can adopt tinydenseR without converting
your data into a specific format first.
Entry Points: RunTDR()
RunTDR() is the main entry point. It accepts different
input types and runs the full pipeline (landmark selection, graph
construction, optional cell typing, mapping, and embedding). The result
is returned in a form that matches the input container when
possible.
Seurat
library(tinydenseR)
library(SeuratObject)
seurat_result <- RunTDR(seurat_obj,
.sample.var = "sample_id",
.assay = "RNA",
.layer = "counts",
.assay.type = "RNA")The returned object is the same Seurat object with the
TDRObj stored in
Misc(seurat_result, slot = "tdr.obj").
SingleCellExperiment
library(SingleCellExperiment)
sce_result <- RunTDR(sce_obj,
.sample.var = "sample_id",
.assay = "counts",
.assay.type = "RNA")The returned object is the same SCE with the TDRObj
stored in S4Vectors::metadata(sce_result)$tdr.obj.
H5AD File Path (recommended)
The simplest way to work with .h5ad files is to pass the
file path directly. This supports all H5AD format
versions, including pre-0.8.0 files, using rhdf5 for
metadata and BPCells for the expression matrix.
tdr <- RunTDR("data.h5ad",
.sample.var = "sample_id",
.assay.type = "RNA")HDF5AnnData (modern files only)
Alternatively, if you already have an HDF5AnnData object
from anndataR, you can pass it directly. Note that
anndataR only supports H5AD files created with anndata
>= 0.8.0.
library(anndataR)
adata <- read_h5ad("data.h5ad", as = "HDF5AnnData")
h5ad_result <- RunTDR(adata,
.sample.var = "sample_id",
.assay.type = "RNA")Both H5AD entry points convert the expression matrix to a BPCells
on-disk format for efficient lazy access. The result is a bare
TDRObj.
Sparse matrix (dgCMatrix)
tdr <- RunTDR(dgc_matrix,
.cell.meta = cell_metadata,
.sample.var = "sample_id",
.assay.type = "RNA")On-disk matrix (BPCells IterableMatrix)
tdr <- RunTDR(BPCells_matrix,
.cell.meta = cell_metadata,
.sample.var = "sample_id",
.assay.type = "RNA")For both sparse and on-disk matrix inputs, .cell.meta
must be a data.frame with one row per cell and rownames
matching cell IDs. The result is a bare TDRObj.
Downstream Analysis: Identical API
Once RunTDR() has been called, the same pipeline code
works regardless of the input class. S3 dispatch routes each function
call through the appropriate wrapper automatically.
# These work identically whether result is Seurat or SCE:
result <- result |>
get.lm(.design = design)
# Plotting works the same way
plotPCA(result)
plotBeeswarm(result, .coefs = "ConditionB")This uniformity extends to all analysis and plotting functions in the
package: get.pbDE(), get.plsD(),
plotUMAP(), plotHeatmap(), and so on.
How It Works: GetTDR / SetTDR
The dispatch mechanism follows a simple three-step pattern. When you call an analysis function on a container object (e.g., a Seurat object), the S3 method:
-
Extracts the
TDRObjwithGetTDR() -
Runs the computation on the
TDRObj -
Stores the updated
TDRObjback withSetTDR()
User calls: get.lm(seurat_obj, .design = design)
|
v
S3 dispatch --> get.lm.Seurat(x, ...)
|
+-- tdr <- GetTDR(x) # Extract TDRObj from Misc slot
+-- tdr <- get.lm.TDRObj(tdr, ...) # Run computation
+-- SetTDR(x, tdr) # Store updated TDRObj back
|
v
Returns: updated seurat_obj (with results inside)
Where the TDRObj is stored depends on the container:
| Container |
GetTDR() reads from |
SetTDR() writes to |
|---|---|---|
| Seurat | Misc(x, slot = "tdr.obj") |
Misc(x, slot = "tdr.obj") |
| SingleCellExperiment | S4Vectors::metadata(x)$tdr.obj |
S4Vectors::metadata(x)$tdr.obj |
| TDRObj | returns x directly |
returns tdr directly |
Note: H5AD and matrix-class inputs always produce a bare
TDRObj, so GetTDR/SetTDR are not
needed for those backends.
Extracting the TDRObj
You can always extract the TDRObj directly to inspect
results or work with them programmatically:
# Extract from any container
tdr <- GetTDR(seurat_obj)
# Access linear model results
tdr$results$lm$default$fit$coefficients
# Access landmark PCA coordinates
head(tdr$landmark.embed$pca$coord)
# Access clustering assignments
tdr$landmark.annot$clustering$idsThe $ accessor on a TDRObj maps to the
underlying S4 slots, so tdr$results is equivalent to
tdr@results.
Matrix-Class Backends
The dgCMatrix, DelayedMatrix,
IterableMatrix, flowSet, and
cytoset backends produce a bare TDRObj
directly — they do not wrap the result back into a container object.
After calling RunTDR(), you work with the
TDRObj directly:
# From a sparse matrix
tdr <- RunTDR(dgc_matrix,
.cell.meta = cell_metadata,
.sample.var = "sample_id",
.assay.type = "RNA")
# All downstream functions work directly on the TDRObj
tdr <- tdr |>
get.lm(.design = design)
plotPCA(tdr)
plotBeeswarm(tdr, .coefs = "ConditionB")For DelayedMatrix inputs (e.g., HDF5-backed
SingleCellExperiment assays), the data is automatically
converted to a BPCells on-disk format for efficient lazy access before
running the pipeline. You can control the conversion directory with the
.bpcells.dir argument.
The flowSet and cytoset backends (from
flowCore and flowWorkspace, respectively) are
designed for flow, mass, and spectral cytometry data and require
.assay.type = "cyto" along with a .markers
argument specifying the channels to use. Both formats use the same
internal backend ("cyto"), so downstream behaviour is
identical.