Performs community detection on a similarity graph using the Leiden algorithm, followed by reassignment of small "straggler" clusters to better-connected neighbors. This improves cluster quality by preventing tiny, isolated groups.
Arguments
- .tdr.obj
Optional tinydenseR object from
get.landmarks()andget.graph(). If provided, uses Laplacian Eigenmap embedding for initialization (if computed successfully), otherwise falls back to PCA embedding. If NULL, computes 3D embedding from.sim.matrixvia truncated SVD.- .sim.matrix
Square similarity matrix (typically Jaccard similarity of k-NN graphs). Higher values indicate stronger connections between nodes. Self-loops (diagonal) are automatically removed.
- .resolution.parameter
Numeric controlling cluster granularity in Leiden CPM algorithm. Higher values (e.g., 0.0005-0.005) yield more/smaller clusters. Lower values yield fewer/larger clusters. Optimal value depends on data scale.
- .small.size
Integer threshold for straggler clusters. Clusters with fewer cells are absorbed into neighbors. Default: 0.5% of total nodes (floor(n/200)).
- .verbose
Logical, print progress and cluster counts. Default TRUE.
- .seed
Integer for reproducibility (affects k-means and Leiden). Default 123.
Value
Factor vector of cluster assignments with levels ordered by appearance. Labels are formatted as "cluster.01", "cluster.02", etc.
Details
Algorithm steps:
Initialize with k-means (up to 25 centers) on embedding to avoid singletons
Run Leiden community detection with CPM objective function
Identify "straggler" clusters smaller than
.small.sizeFor each straggler, compute mean connectivity to all keeper clusters
Reassign straggler cells to the most connected keeper cluster
Relabel clusters sequentially
Initialization strategy:
Uses Laplacian Eigenmap embedding (if computed and available in .tdr.obj$graph$LE$embed),
otherwise PCA embedding (up to first 3 components), or computes 3D embedding from
similarity matrix via truncated SVD (irlba) when .tdr.obj is NULL.
K-means initialization helps Leiden start from reasonable communities rather than
singleton nodes, improving stability.
Straggler absorption: Small clusters (< 0.5% of total by default) often represent noise or transitional states. Rather than keeping them as separate clusters, they're merged with the most similar neighboring cluster based on mean edge weights. For sparse similarity matrices, the mean is computed by normalizing the sum of stored values by the full submatrix size to properly account for implicit zeros.
Examples
if (FALSE) { # \dontrun{
# After building similarity matrix
clusters <- leiden.cluster(
.tdr.obj = lm.obj,
.sim.matrix = jaccard_sim,
.resolution.parameter = 0.001,
.small.size = 10 # Merge clusters < 10 cells
)
table(clusters)
} # }
