Create a hierarchical subset TDRObj

Given a parent TDRObj that has been processed through get.map, creates a new child TDRObj containing only the cells matching the specified selection criteria. The child references the parent's expression data without copying: for file-backed data, the same RDS paths are reused with an index filter; for in-memory backends (Seurat, SCE, matrix, cytoset), the index vectors are intersected.

Usage

get.subset(x, ...)

# S3 method for class 'TDRObj'
get.subset(
  x,
  .source = NULL,
  .id = NULL,
  .id.idx = NULL,
  .id.from = "clustering",
  .label.confidence = NULL,
  .prop.landmarks = 0.1,
  .tot.landmarks = 5000,
  .min.cells.per.sample = 10,
  .verbose = TRUE,
  ...
)

Arguments

x: A TDRObj, Seurat, or SingleCellExperiment object that has been processed through get.map.
...: Additional arguments (currently unused).
.source: The raw data object for non-file backends (NULL for the files backend). When calling on a bare TDRObj, pass the original Seurat or SCE object here for seurat/sce backends. When calling via a dispatch wrapper (get.subset.Seurat), this is filled automatically.
.id: Optional character vector of cluster or celltype labels to keep. Exactly one of .id or .id.idx must be provided.
.id.idx: Optional integer vector of landmark indices. Cells are selected via fuzzy confidence-thresholded voting using the stored UMAP fuzzy graphs.
.id.from: Character: "clustering" (default) or "celltyping". Source of labels when .id is used.
.label.confidence: Numeric scalar in [0,1]. Minimum confidence for fuzzy label assignment. Defaults to the parent's stored threshold (@config\$label.confidence), or 0.5 if not set.
.prop.landmarks: Numeric between 0 and 1 specifying proportion of cells to use as landmarks when the child pipeline runs get.landmarks(). Default 0.1 (10 percent).
.tot.landmarks: Integer. Maximum total number of landmarks. Default 5000.
.min.cells.per.sample: Integer. Samples with fewer qualifying cells than this threshold are excluded from the child (default 10).
.verbose: Logical: print progress? Default TRUE.

Value

A new TDRObj with:

@cells: Modified per-sample cell references (no expression data copied).
@metadata: Filtered to retained samples, with updated n.cells.
@config: Inherited parameters (backend, assay.type, markers, harmony.var) plus provenance in @config\$.subset.provenance.
All other slots: Empty – ready for get.landmarks().

Details

The child object is returned with empty analysis slots (assay, landmarks, graphs, density, cellmap, results). The user must re-run the full pipeline (get.landmarks $\to$ get.graph $\to$ get.map $\to$ get.lm $\to$ ...) on the child to obtain subset-specific results.

Cell selection uses the same machinery as get.pbDE:

.id: select cells whose per-cell label (from @cellmap\$clustering\$ids or @cellmap\$celltyping\$ids) matches the specified identifiers.
.id.idx: select cells via fuzzy confidence-thresholded voting from the specified landmark indices (using the UMAP fuzzy simplicial set stored in @cellmap\$fuzzy.graphs).

Nested subsetting is supported: calling get.subset() on a child object creates a grandchild that still references the original expression data without additional copies.

Session-scoped

Subset objects are session-scoped. Serialization via saveRDS is not supported for child TDRObjs that reference in-memory backends (seurat, sce, matrix, cyto). For the files backend, the child can be saved as long as the parent's RDS files remain on disk.

Examples