
tinydenseR Package Architecture
Pedro Milanez-Almeida
2026-05-27
Source:vignettes/package-architecture.Rmd
package-architecture.RmdOverview
tinydenseR is a landmark-based framework for
differential abundance and differential expression analysis of
single-cell data. It is designed for scRNA-seq, flow, mass, and spectral
cytometry data.
The core idea is to select a representative set of landmark cells, build a graph over them, map all cells to their nearest landmarks, and then model per-landmark density or expression across samples as biological replicates. This preserves statistical rigor while maintaining the richness of single-cell resolution.
TDRObj: The Core Data Structure
All analysis results are stored in a single S4 class,
TDRObj. It contains 12 slots that organize the data and
results at each stage of the pipeline:
| Slot | Contents |
|---|---|
cells |
Named list of per-sample cell indices (or file paths for the files backend) |
metadata |
data.frame of sample-level metadata |
config |
Run parameters: key, sampling, assay.type, markers, n.threads, backend |
integration |
Trained projection models and batch variables (harmony.var, harmony.obj, symphony.obj, umap.model) |
assay |
Landmark expression layers (L × features matrices): raw, expr, scaled |
landmark.embed |
Landmark-space coordinate matrices (pca, le, umap),
each with a $coord entry |
landmark.annot |
Per-landmark categorical annotations: clustering and
celltyping, each with an $ids factor |
graphs |
Landmark-landmark connectivity: adj.matrix, snn, fgraph |
density |
Fuzzy density analytics (L × N matrices): raw, norm, log.norm, size.factors, composition |
sample.embed |
Sample-level embeddings (N × k matrices): pca, traj, pepc |
cellmap |
Per-cell, per-sample data: clustering and celltyping,
each with an $ids factor, nearest.lm, fuzzy.graphs. Each
entry is in-memory or an on-disk path string |
results |
All statistical outputs: lm, pb, marker, spec, nmf, pls, clustering, celltyping, features |
The $ accessor is overloaded so you can use
tdr$results instead of tdr@results:
tdr$metadata # sample-level metadata
tdr$landmark.embed$pca # PCA coordinates of landmarks
tdr$results$lm # linear model results
tdr$config$assay.type # "RNA" or "cyto"Analysis Pipeline Stages
The standard analysis pipeline proceeds through a series of stages,
each populating specific slots in the TDRObj:
setup.tdr.obj() / RunTDR()
|
v
get.landmarks() --> landmark.embed (pca, le)
| assay (raw, expr, scaled)
v
get.graph() --> graphs (adj.matrix, snn)
| landmark.embed (umap)
| landmark.annot (clustering)
v
get.map() --> density (raw, norm, log.norm, composition)
| cellmap (nearest.lm, fuzzy.graphs)
v
get.lm() --> results$lm (fit, trad)
| sample.embed (pca)
v
get.pbDE() --> results$pb
get.pbDE(.mode="marker") --> results$marker
get.plsD() --> results$pls
A typical workflow looks like:
library(tinydenseR)
# Option 1: step-by-step
tdr <- setup.tdr.obj(...)
tdr <- get.landmarks(tdr)
tdr <- get.graph(tdr)
tdr <- get.map(tdr)
tdr <- get.lm(tdr, .design = design)
# Option 2: RunTDR() runs landmarks + graph + map + embedding in one call
result <- RunTDR(seurat_obj, .sample.var = "sample_id", .assay.type = "RNA")
result <- get.lm(result, .design = design)S3 Dispatch System
tinydenseR uses S3 dispatch to support multiple
container types (Seurat, SingleCellExperiment) through a single API.
H5AD inputs (either as a file path or via anndataR) are
converted to a bare TDRObj during RunTDR() and
do not carry container dispatch methods. The dispatch wrappers in
dispatch.R follow one of three patterns:
Tier 1: Compute-Only
Functions that only need the TDRObj for computation. The
wrapper extracts the TDRObj, runs the .TDRObj
method, and stores it back.
# Example: get.lm.Seurat
get.lm.Seurat <- function(x, ...) {
tdr <- GetTDR(x)
tdr <- get.lm.TDRObj(tdr, ...)
SetTDR(x, tdr)
}Functions in this tier: get.graph(),
get.lm(), get.embedding(),
get.plsD(), get.features(),
lm.cluster(), celltyping().
Tier 2: Needs Source Data
Functions that need access to the original data container (e.g., to
read expression matrices via .get_sample_matrix()). The
source object is passed via the .source argument.
# Example: get.landmarks.Seurat
get.landmarks.Seurat <- function(x, ...) {
tdr <- GetTDR(x)
tdr <- get.landmarks.TDRObj(tdr, .source = x, ...)
SetTDR(x, tdr)
}Functions in this tier: get.landmarks(),
get.map(), get.pbDE().
Note: get.markerDE() is soft-deprecated. Use
get.pbDE(.mode = "marker") instead. The
get.pbDE() function supports two modes:
-
.mode = "design"(default) — pseudobulk differential expression -
.mode = "marker"— marker gene/protein identification between groups
Special Case: goi.summary()
goi.summary() is a read-only Tier 2 function: it needs
.source for expression access, but it returns a summary
data frame rather than updating the container.
# Example: goi.summary.Seurat
goi.summary.Seurat <- function(x, ...) {
tdr <- GetTDR(x)
goi.summary.TDRObj(tdr, .source = x, ...)
}Tier 3: Read-Only Plots
Plot functions only need to read the TDRObj; they return
a plot object rather than updating the container.
# Example: plotPCA.Seurat
plotPCA.Seurat <- function(x, ...) plotPCA.TDRObj(GetTDR(x), ...)All plot*() functions follow this pattern:
plotPCA(), plotUMAP(),
plotBeeswarm(), plot2Markers(),
plotSamplePCA(), plotSampleEmbedding(),
plotTradStats(), plotTradPerc(),
plotDensity(), plotPbDE(),
plotDEA(), plotMarkerDE(),
plotHeatmap(), plotPlsD(),
plotPlsDHeatmap().
Backend Data Flow
When tinydenseR needs to access per-sample expression
matrices during the pipeline, it calls the internal function
.get_sample_matrix(). This function dispatches on
config$backend to retrieve data from the appropriate
source:
| Backend value | Source | How data is retrieved |
|---|---|---|
"files" |
RDS files on disk | readRDS(tdr@cells[[sample_idx]]) |
"seurat" |
Live Seurat object | SeuratObject::LayerData(.source, assay, layer)[, col_idx] |
"sce" |
Live SingleCellExperiment | SummarizedExperiment::assay(.source, assay)[, col_idx] |
"matrix" |
In-memory or on-disk matrix | Direct column indexing on the stored matrix reference |
"cyto" |
flowCore flowSet or flowWorkspace cytoset | flowCore::exprs(cs[[sample_name]]) |
The "matrix" backend is used by dgCMatrix,
DelayedMatrix, IterableMatrix, and H5AD inputs
(both RunTDR.character and
RunTDR.HDF5AnnData). In these cases, a reference to the
matrix is stored in a locked environment within
config$source.env so that .get_sample_matrix()
can slice into it without copying the entire matrix.
For the "seurat" and "sce" backends, the
live container object is passed as .source to Tier 2
functions, which then forward it to
.get_sample_matrix().
Caching System
tinydenseR includes an on-disk caching system for
per-cell mapping results (nearest landmark assignments, fuzzy graph
memberships). This avoids recomputing expensive cell-to-landmark
mappings when re-running downstream analyses.
The cache is managed through three user-facing functions:
# Show cache location and size
tdr_cache_info(tdr)
# Validate cache integrity
tdr_cache_validate(tdr)
# Remove orphaned cache files
tdr_cache_cleanup(tdr)Internal helpers (.tdr_cache_read(),
.tdr_cache_write(),
.tdr_cache_sweep_orphans()) handle the low-level read/write
operations and are not intended to be called directly by users.
Memory Management on HPC
When running tinydenseR on Linux HPC systems, the glibc
memory allocator may retain freed memory in the process address space
rather than returning it to the OS. This can cause the reported RSS
(Resident Set Size) to remain high even after R has freed its objects,
potentially triggering OOM kills by the job scheduler.
get.map() already minimises peak memory by excluding
large per-sample objects from its internal accumulator when on-disk
caching is enabled (the default). For additional defence-in-depth, add
the following to your ~/.Renviron or SLURM/PBS job script
before launching R:
These variables instruct glibc to use mmap() for
allocations larger than 4 KB. Unlike brk()-based
allocations, mmap() pages are returned to the OS
immediately on free(), keeping the process RSS close to
actual R memory usage.
No additional R code or packages are required — this is a user-side configuration that applies to any R session on Linux.