
tinydenseR Package Architecture
Pedro Milanez-Almeida
2026-05-12
Source:vignettes/package-architecture.Rmd
package-architecture.RmdOverview
tinydenseR is a landmark-based framework for
differential abundance and differential expression analysis of
single-cell data. It is designed for scRNA-seq, flow, mass, and spectral
cytometry data.
The core idea is to select a representative set of landmark cells, build a graph over them, map all cells to their nearest landmarks, and then model per-landmark density or expression across samples as biological replicates. This preserves statistical rigor while maintaining the richness of single-cell resolution.
TDRObj: The Core Data Structure
All analysis results are stored in a single S4 class,
TDRObj. It contains 12 slots that organize the data and
results at each stage of the pipeline:
| Slot | Contents |
|---|---|
cells |
Named list of per-sample cell indices (or file paths for the files backend) |
metadata |
data.frame of sample-level metadata |
config |
Run parameters: key, sampling, assay.type, markers, n.threads, backend |
integration |
Trained projection models and batch variables (harmony.var, harmony.obj, symphony.obj, umap.model) |
assay |
Landmark expression layers (L × features matrices): raw, expr, scaled |
landmark.embed |
Landmark-space coordinate matrices (pca, le, umap),
each with a $coord entry |
landmark.annot |
Per-landmark categorical annotations: clustering and
celltyping, each with an $ids factor |
graphs |
Landmark-landmark connectivity: adj.matrix, snn, fgraph |
density |
Fuzzy density analytics (L × N matrices): raw, norm, log.norm, size.factors, composition |
sample.embed |
Sample-level embeddings (N × k matrices): pca, traj, pepc |
cellmap |
Per-cell, per-sample data: clustering and celltyping,
each with an $ids factor, nearest.lm, fuzzy.graphs. Each
entry is in-memory or an on-disk path string |
results |
All statistical outputs: lm, pb, marker, spec, nmf, pls, clustering, celltyping, features |
The $ accessor is overloaded so you can use
tdr$results instead of tdr@results:
tdr$metadata # sample-level metadata
tdr$landmark.embed$pca # PCA coordinates of landmarks
tdr$results$lm # linear model results
tdr$config$assay.type # "RNA" or "cyto"Analysis Pipeline Stages
The standard analysis pipeline proceeds through a series of stages,
each populating specific slots in the TDRObj:
setup.tdr.obj() / RunTDR()
|
v
get.landmarks() --> landmark.embed (pca, le)
| assay (raw, expr, scaled)
v
get.graph() --> graphs (adj.matrix, snn)
| landmark.embed (umap)
| landmark.annot (clustering)
v
get.map() --> density (raw, norm, log.norm, composition)
| cellmap (nearest.lm, fuzzy.graphs)
v
get.lm() --> results$lm (fit, trad)
| sample.embed (pca)
v
get.pbDE() --> results$pb
get.pbDE(.mode="marker") --> results$marker
get.plsD() --> results$pls
A typical workflow looks like:
library(tinydenseR)
# Option 1: step-by-step
tdr <- setup.tdr.obj(...)
tdr <- get.landmarks(tdr)
tdr <- get.graph(tdr)
tdr <- get.map(tdr)
tdr <- get.lm(tdr, .design = design)
# Option 2: RunTDR() runs landmarks + graph + map + embedding in one call
result <- RunTDR(seurat_obj, .sample.var = "sample_id", .assay.type = "RNA")
result <- get.lm(result, .design = design)S3 Dispatch System
tinydenseR uses S3 dispatch to support multiple
container types (Seurat, SingleCellExperiment) through a single API.
H5AD inputs (either as a file path or via anndataR) are
converted to a bare TDRObj during RunTDR() and
do not carry container dispatch methods. The dispatch wrappers in
dispatch.R follow one of three patterns:
Tier 1: Compute-Only
Functions that only need the TDRObj for computation. The
wrapper extracts the TDRObj, runs the .TDRObj
method, and stores it back.
# Example: get.lm.Seurat
get.lm.Seurat <- function(x, ...) {
tdr <- GetTDR(x)
tdr <- get.lm.TDRObj(tdr, ...)
SetTDR(x, tdr)
}Functions in this tier: get.graph(),
get.lm(), get.embedding(),
get.plsD(), get.features(),
lm.cluster(), celltyping().
Tier 2: Needs Source Data
Functions that need access to the original data container (e.g., to
read expression matrices via .get_sample_matrix()). The
source object is passed via the .source argument.
# Example: get.landmarks.Seurat
get.landmarks.Seurat <- function(x, ...) {
tdr <- GetTDR(x)
tdr <- get.landmarks.TDRObj(tdr, .source = x, ...)
SetTDR(x, tdr)
}Functions in this tier: get.landmarks(),
get.map(), get.pbDE().
Note: get.markerDE() is soft-deprecated. Use
get.pbDE(.mode = "marker") instead. The
get.pbDE() function supports two modes:
-
.mode = "design"(default) — pseudobulk differential expression -
.mode = "marker"— marker gene/protein identification between groups
Special Case: goi.summary()
goi.summary() is a read-only Tier 2 function: it needs
.source for expression access, but it returns a summary
data frame rather than updating the container.
# Example: goi.summary.Seurat
goi.summary.Seurat <- function(x, ...) {
tdr <- GetTDR(x)
goi.summary.TDRObj(tdr, .source = x, ...)
}Tier 3: Read-Only Plots
Plot functions only need to read the TDRObj; they return
a plot object rather than updating the container.
# Example: plotPCA.Seurat
plotPCA.Seurat <- function(x, ...) plotPCA.TDRObj(GetTDR(x), ...)All plot*() functions follow this pattern:
plotPCA(), plotUMAP(),
plotBeeswarm(), plot2Markers(),
plotSamplePCA(), plotSampleEmbedding(),
plotTradStats(), plotTradPerc(),
plotDensity(), plotPbDE(),
plotDEA(), plotMarkerDE(),
plotHeatmap(), plotPlsD(),
plotPlsDHeatmap().
Backend Data Flow
When tinydenseR needs to access per-sample expression
matrices during the pipeline, it calls the internal function
.get_sample_matrix(). This function dispatches on
config$backend to retrieve data from the appropriate
source:
| Backend value | Source | How data is retrieved |
|---|---|---|
"files" |
RDS files on disk | readRDS(tdr@cells[[sample_idx]]) |
"seurat" |
Live Seurat object | SeuratObject::LayerData(.source, assay, layer)[, col_idx] |
"sce" |
Live SingleCellExperiment | SummarizedExperiment::assay(.source, assay)[, col_idx] |
"matrix" |
In-memory or on-disk matrix | Direct column indexing on the stored matrix reference |
"cyto" |
flowCore flowSet or flowWorkspace cytoset | flowCore::exprs(cs[[sample_name]]) |
The "matrix" backend is used by dgCMatrix,
DelayedMatrix, IterableMatrix, and H5AD inputs
(both RunTDR.character and
RunTDR.HDF5AnnData). In these cases, a reference to the
matrix is stored in a locked environment within
config$source.env so that .get_sample_matrix()
can slice into it without copying the entire matrix.
For the "seurat" and "sce" backends, the
live container object is passed as .source to Tier 2
functions, which then forward it to
.get_sample_matrix().
Caching System
tinydenseR includes an on-disk caching system for
per-cell mapping results (nearest landmark assignments, fuzzy graph
memberships). This avoids recomputing expensive cell-to-landmark
mappings when re-running downstream analyses.
The cache is managed through three user-facing functions:
# Show cache location and size
tdr_cache_info(tdr)
# Validate cache integrity
tdr_cache_validate(tdr)
# Remove orphaned cache files
tdr_cache_cleanup(tdr)Internal helpers (.tdr_cache_read(),
.tdr_cache_write(),
.tdr_cache_sweep_orphans()) handle the low-level read/write
operations and are not intended to be called directly by users.