Simulating Trajectory Gene Regulatory Networks via `dyngen`

Author

Affiliation

Jack R. Leary

University of Florida

Published

April 16, 2024

1 Introduction

As 2024 begins, a concept I’ve been very interested in is gene regulatory networks (GRNs), specifically how they function during developmental biological processes such as hematopoesis, neurogenesis, etc. As such, I’ve been prototyping some methods for trajectory GRN estimation & downstream analysis. This has necessarily led to the need for a ground-truth network against which I can compare GRN estimation tools. Most published GRN methods use ChIP-seq or ATAC-seq datasets as a silver-standard source of truth, since simulating GRNs is complex and difficult. To my knowledge, the only tool that exists that implements GRN-based scRNA-seq trajectory simulation is dyngen. However, it isn’t the easiest tool to use, hence why I’ve decided to put together a guide for 1) simulating a GRN and 2) checking several estimation methods against the ground truth network. Skip to Section 4 for the simulation, and Section 5 for the method comparisons.

2 Libraries

First we load the packages we’ll need for our analysis.

Code

library(epoch)      # dynamic GRN estimation
library(dplyr)      # data manipulation
library(Seurat)     # scRNA-seq tools
library(dyngen)     # simulations
library(scLANE)     # trajectory DE
library(ggplot2)    # pretty plots
library(foreach)    # parallel for-loops
library(patchwork)  # plot combination
library(slingshot)  # pseudotime estimation
rename <- dplyr::rename

3 Helper functions

We define a utility function to make our plot legends easier to read.

Code

guide_umap <- function(key.size = 4) {
  ggplot2::guides(color = ggplot2::guide_legend(override.aes = list(size = key.size,
                                                                    alpha = 1, 
                                                                    stroke = 0.25)))
}

And consistent color palettes will make our plots easier to understand.

Code

palette_cluster <- paletteer::paletteer_d("ggsci::nrc_npg")
palette_heatmap <- paletteer::paletteer_d("MetBrewer::Hiroshige", direction = -1)

Lastly, we load in a function I wrote previously that estimates a trajectory GRN using XGBoost. Expand the code block below for details.

Code

estimateTGRN <- function(dyn.mat = NULL,
                         expr.mat = NULL, 
                         dyn.genes = NULL, 
                         tx.factors = NULL, 
                         cor.method = "spearman",
                         n.cores = 4L, 
                         verbose = TRUE, 
                         random.seed = 312) {
  # check inputs 
  if (is.null(expr.mat) || is.null(dyn.mat) || is.null(dyn.genes) || is.null(tx.factors)) { stop("Arguments to estimateTGRN() are missing.") }
  # identify dynamic TFs
  tx.factors <- tx.factors[tx.factors %in% dyn.genes]
  # set up progress bar if desired
  if (verbose) {
    withr::with_output_sink(tempfile(), {
      pb <- utils::txtProgressBar(0, length(dyn.genes), style = 3)
    })
    progress_fun <- function(n) utils::setTxtProgressBar(pb, n)
    snow_opts <- list(progress = progress_fun)
  } else {
    snow_opts <- list()
  }
  # set up parallel processing
  if (n.cores > 1L) {
    cl <- parallel::makeCluster(n.cores)
    doSNOW::registerDoSNOW(cl)
  } else {
    cl <- foreach::registerDoSEQ()
  }
  # set up LightGBM model settings
  lgbm_params <- list(objective = "gamma", 
                      tree_learner = "serial", 
                      metric = "l2", 
                      boosting_type = "dart",
                      device = "cpu",
                      num_threads = 1L, 
                      seed = random.seed)
  # estimate trajectory GRN
  grn_res <- foreach::foreach(g = seq(dyn.genes), 
                              .combine = "list",
                              .multicombine = ifelse(length(dyn.genes) > 1, TRUE, FALSE),
                              .maxcombine = ifelse(length(dyn.genes) > 1, length(dyn.genes), 2),
                              .packages = c("lightgbm", "dplyr"),
                              .errorhandling = "pass",
                              .inorder = TRUE,
                              .verbose = FALSE,
                              .options.snow = snow_opts) %dopar% {
                                feature_mat <- dyn.mat[, tx.factors]
                                feature_mat <- feature_mat[, colnames(feature_mat) != dyn.genes[g]]
                                resp_var <- dyn.mat[, dyn.genes[g]]
                                lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                                lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = 100L, 
                                                            nfold = 5L, 
                                                            stratified = FALSE)
                                lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                                  data = lgbm_data, 
                                                                  nrounds = lgbm_cv$best_iter)
                                imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                             dplyr::mutate(Target_Gene = dyn.genes[g],
                                                           Target_Gene_Type = dplyr::if_else(Target_Gene %in% tx.factors, "TF", "Non-TF"), 
                                                           .before = 1)
                                imp_table
                              }
  names(grn_res) <- dyn.genes
  if (n.cores > 1L) {
    parallel::stopCluster(cl)
  }
  # format GRN table
  grn_table <- purrr::imap(grn_res, \(x, y) {
    if (!inherits(x, "data.frame")) {
      empty_res <- data.frame(Target_Gene = y, 
                              Target_Gene_Type = ifelse(y %in% tx.factors, "TF", "Non-TF"), 
                              Feature = NA_character_, 
                              Gain = NA_real_, 
                              Cover = NA_real_, 
                              Frequency = NA_real_)
      return(empty_res)
    } else {
      return(x)
    }
  })
  # add mean rank of gain & frequency
  grn_table <- purrr::reduce(grn_table, rbind) %>% 
               dplyr::select(-Cover) %>% 
               dplyr::rename(Tx_Factor = Feature) %>% 
               dplyr::filter(!is.na(Target_Gene), 
                             !is.na(Tx_Factor)) %>% 
               dplyr::arrange(Tx_Factor, dplyr::desc(Frequency)) %>% 
               dplyr::with_groups(Tx_Factor, 
                                  dplyr::mutate, 
                                  Frequency_Rank = dplyr::row_number()) %>% 
               dplyr::arrange(Tx_Factor, dplyr::desc(Gain)) %>% 
               dplyr::with_groups(Tx_Factor, 
                                  dplyr::mutate, 
                                  Gain_Rank = dplyr::row_number()) %>% 
               dplyr::rowwise() %>% 
               dplyr::mutate(Mean_Rank = mean(dplyr::c_across(c(Frequency_Rank, Gain_Rank)))) %>% 
               dplyr::ungroup() %>% 
               dplyr::arrange(Tx_Factor, Mean_Rank) %>% 
               dplyr::mutate(Mean_Rank_Weight = 1 / Mean_Rank)
  # add correlation & mutual information b/t TF and each target gene (dynamics and normalized expression)
  dyn_cormat <- stats::cor(dyn.mat[, dyn.genes], method = cor.method)
  expr_cormat <- stats::cor(expr.mat[, dyn.genes], method = cor.method)
  dyn_mimat <- minet::build.mim(dyn.mat[, dyn.genes], 
                                estimator = "mi.empirical", 
                                disc = "equalwidth")
  expr_mimat <- minet::build.mim(expr.mat[, dyn.genes],  
                                 estimator = "mi.empirical", 
                                 disc = "equalwidth")
  grn_table <- dplyr::mutate(grn_table, 
                             Dyn_Cor = NA_real_, 
                             Exp_Cor = NA_real_, 
                             Dyn_MI = NA_real_, 
                             Exp_MI = NA_real_)
  for (i in seq(nrow(grn_table))) {
    grn_table$Dyn_Cor[i] <- dyn_cormat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
    grn_table$Dyn_MI[i] <- dyn_mimat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
    grn_table$Exp_Cor[i] <- expr_cormat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
    grn_table$Exp_MI[i] <- expr_mimat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
  }
  grn_table <- dplyr::mutate(grn_table, State = dplyr::if_else(Dyn_Cor < 0, "Repression", "Activation"))
  return(grn_table)
}

4 Simulating scRNA-seq data

The dyngen workflow starts with what they call a backbone, which specifies the shape of the cellular manifold. In this case we opt for a simple bifurcating trajectory composed of 2500 cells. The number of transcription factors (TFs) is a random variable, but we specify 500 target genes and 5000 housekeeping genes.

Code

sim_backbone <- backbone_bifurcating()
sim_config <- initialise_model(sim_backbone,
                               num_cells = 4000, 
                               num_tfs = nrow(sim_backbone$module_info),
                               num_targets = 500,
                               num_hks = 4000,
                               distance_metric = "pearson", 
                               num_cores = 4L, 
                               verbose = FALSE)

We visualize the gene module structure of our trajectory; note that we simulated nrow(sim_backbone$module_info) modules, each with a controlling TF.

Code

p1 <- plot_backbone_modulenet(sim_config) + 
      labs(x = "Dim 1", 
           y = "Dim 2", 
           edge_width = "Strength", 
           color = "Gene Module") + 
      theme_scLANE(umap = TRUE) + 
      theme(legend.box = "horizontal")
p1

Figure 1: The cascading gene module structure of our trajectory.

Using the gene modules, we estimate our TF network.

Code

tf_model <- generate_tf_network(sim_config)

We visualize the relationships between our TFs.

Code

p2 <- plot_feature_network(tf_model, show_targets = FALSE) + 
      labs(x = "Dim 1", 
           y = "Dim 2", 
           size = "TF Status", 
           color = "Gene Module") + 
      theme_scLANE(umap = TRUE) + 
      theme(legend.box = "horizontal") + 
      guides(color = guide_legend(override.aes = list(size = 4, alpha = 1)))
p2

Figure 2: Directional relationships between the TFs that control each gene module.

Next, we generate our target and housekeeping genes, along with the splicing kinetics for all genes.

Code

tf_model <- generate_feature_network(tf_model) %>% 
            generate_kinetics()

This extraordinarily messy (thanks to the large number of genes) plot shows our TFs, their targets, and the non-regulatory housekeeping genes. The dyngen documentation specifies that TFs are regulated solely by TFs, target genes may be regulated by TFs or other target genes, and housekeeping genes are not part of regulatory structure of the trajectory.

Code

p3 <- plot_feature_network(tf_model, show_hks = TRUE) + 
      labs(x = "Dim 1", 
           y = "Dim 2", 
           size = "TF Status", 
           color = "Gene Module") + 
      theme_scLANE(umap = TRUE) + 
      theme(legend.box = "horizontal") + 
      guides(color = guide_legend(override.aes = list(size = 4, alpha = 1), order = 1))
p3

Figure 3: The directed structure of the relationships between all genes - TFs, target genes, and housekeeping genes.

Continuing on, we use the following sequence of functions to simulate cells along our bifurcating trajectory.

Warning

If you’re using a local machine (like I am), this step will take 30 or so minutes to run.

Code

tf_model <- generate_gold_standard(tf_model) %>% 
            generate_cells() %>% 
            generate_experiment()

Finally, we convert our simulated dataset to a Seurat object for easier downstream analysis. In addition, we add a normalized ground-truth pseudotime that exists on $[0, 1]$ for aesthetic purposes.

Code

seu <- as_seurat(tf_model)
seu@meta.data <- mutate(seu@meta.data, 
                        sim_time_norm = (sim_time - min(sim_time)) / (max(sim_time) - min(sim_time)), 
                        .before = 7)

The ground-truth pseudotime values are stored in sim_time, and sequencing depth and feature counts have already been computed for total RNA, unspliced & spliced RNA, and protein.

Code

slice_sample(seu@meta.data, n = 10) %>% 
  kableExtra::kable(digits = 2, booktabs = TRUE) %>% 
  kableExtra::kable_classic("hover", full_width = FALSE)

	orig.ident	nCount_RNA	nFeature_RNA	step_ix	simulation_i	sim_time	sim_time_norm	nCount_spliced	nFeature_spliced	nCount_unspliced	nFeature_unspliced	nCount_protein	nFeature_protein
cell2381	SeuratProject	2910	1683	147	1	400.01	0.52	2063	1328	847	722	173650007	3933
cell1922	SeuratProject	3436	1823	161	1	456.01	0.59	2471	1444	965	796	175090008	3933
cell3169	SeuratProject	2252	1412	2954	13	204.01	0.27	1642	1103	610	536	172936840	3876
cell3888	SeuratProject	2331	1444	7469	32	176.00	0.23	1679	1144	652	570	172914222	3903
cell807	SeuratProject	1439	1048	4580	20	44.01	0.06	1056	806	383	355	171761100	3857
cell1707	SeuratProject	2577	1529	4872	21	260.01	0.34	1892	1196	685	608	172955001	3893
cell3909	SeuratProject	4764	2134	1830	8	468.01	0.61	3477	1792	1287	996	177649564	3988
cell2405	SeuratProject	3333	1776	7047	30	392.01	0.51	2401	1437	932	764	175397683	3951
cell2122	SeuratProject	2312	1437	1801	8	352.01	0.46	1694	1128	618	550	172296747	3911
cell755	SeuratProject	3070	1698	5361	23	312.01	0.41	2286	1392	784	664	175106616	3929

Table 1: The cell-level metadata of our Seurat object.

5 Analysis

5.1 Preprocessing

We run the cells through a standard processing pipeline consisting of normalization, highly-variable gene (HVG) identification, linear dimension reduction with PCA followed by nonlinear dimension reduction with UMAP, and a graph-based clustering via the Leiden algorithm.

Code

seu <- NormalizeData(seu, verbose = FALSE) %>% 
       FindVariableFeatures(nfeatures = 3000, verbose = FALSE) %>% 
       ScaleData(verbose = FALSE) %>% 
       RunPCA(npcs = 50, 
              approx = TRUE, 
              seed.use = 312, 
              verbose = FALSE) %>% 
       RunUMAP(reduction = "pca", 
               dims = 1:30, 
               n.components = 2, 
               metric = "cosine", 
               seed.use = 312, 
               verbose = FALSE) %>% 
       FindNeighbors(reduction = "pca", 
                     k.param = 20,
                     nn.method = "annoy", 
                     annoy.metric = "cosine", 
                     verbose = FALSE) %>% 
       FindClusters(algorithm = 4, 
                    method = "igraph", 
                    resolution = 0.3, 
                    random.seed = 312, 
                    verbose = FALSE)

Our UMAP embedding has retained the bifurcating structure of our simulated trajectory well, and our clustering appears correct too.

Code

p4 <- as.data.frame(Embeddings(seu, "umap")) %>% 
      mutate(leiden = seu$seurat_clusters) %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = leiden)) + 
      geom_point(size = 1.75, 
                 stroke = 0, 
                 alpha = 0.75) + 
      scale_color_manual(values = palette_cluster) + 
      labs(x = "UMAP 1", 
           y = "UMAP 2", 
           color = "Leiden") +
      theme_scLANE(umap = TRUE) + 
      guide_umap()
p5 <- as.data.frame(Embeddings(seu, "umap")) %>% 
      mutate(PT = seu$sim_time_norm) %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = PT)) + 
      geom_point(size = 1.75, 
                 stroke = 0, 
                 alpha = 0.75) + 
      scale_color_gradientn(colors = palette_heatmap) + 
      labs(x = "UMAP 1", 
           y = "UMAP 2", 
           color = "Pseudotime") +
      scLANE::theme_scLANE(umap = TRUE)
p6 <- (p4 | p5) + 
      plot_layout(guides = "collect", axes = "collect")
p6

Figure 4: UMAP embedding colored by Leiden cluster ID (left) and ground-truth pseudotime (right).

Visualizing the spread of pseudotime values per-cluster reveals that cluster 1 should the starting (or root) node, and clusters 3 & 5 the terminal (or leaf) nodes.

Code

p7 <- data.frame(leiden = seu$seurat_clusters, 
                 PT = seu$sim_time_norm) %>% 
      ggplot(aes(x = leiden, y = PT, color = leiden)) + 
      ggbeeswarm::geom_quasirandom(size = 1.5, 
                                   stroke = 0,
                                   alpha = 0.75) + 
      stat_summary(color = "black", 
                   geom = "point", 
                   fun = "mean", 
                   size = 3) + 
      scale_color_manual(values = palette_cluster) + 
      labs(y = "Pseudotime", color = "Leiden") + 
      theme_scLANE() + 
      theme(axis.title.x = element_blank()) + 
      guide_umap()
p7

Figure 5: Beeswarm plot displaying the distribution of ground-truth pseudotime per Leiden cluster.

5.2 Pseudotime estimation

Since dyngen doesn’t explicitly assign each cell to a lineage (at least as far as I can tell), we use slingshot to estimate a lineage-specific pseudotime. As per Figure 5, we specify cluster 1 as the root of the trajectory. Afterwards, we create an overall pseudotime by mean-aggregating the per-lineage pseudotime values assigned to each cell.

Code

sling_res <- slingshot(Embeddings(seu, "umap"), 
                       clusterLabels = seu$seurat_clusters, 
                       start.clus = c("1"))
sling_curves <- slingCurves(sling_res, as.df = TRUE)
sling_mst <- slingMST(sling_res, as.df = TRUE)
sling_pt <- as.data.frame(slingPseudotime(sling_res)) %>% 
            rowwise() %>% 
            mutate(PT_Overall = mean(c_across(starts_with("Lineage")), na.rm = TRUE)) %>% 
            ungroup() %>% 
            mutate(across(c(starts_with("Lineage"), PT_Overall), 
                          \(x) (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))))
seu <- AddMetaData(seu, 
                   metadata = sling_pt$PT_Overall, 
                   col.name = "PT_Overall")

The Spearman correlation between the true & estimated pseudotimes for each cell is quite high, indicating that slingshot did a solid job recapitulating the ground truth. The fit isn’t perfect, but it looks good enough to justify using the lineage-specific estimated pseudotime. This will allow us to build a GRN per-lineage, which is more useful than a single overall network.

Code

p8 <- data.frame(PT_true = seu$sim_time_norm, 
                 PT_est = seu$PT_Overall, 
                 leiden = seu$seurat_clusters) %>% 
      ggplot(aes(x = PT_true, y = PT_est)) + 
      geom_point(aes(color = leiden), 
                 size = 1.5, 
                 stroke = 0, 
                 alpha = 0.75) + 
      geom_smooth(method = "lm", color = "black") + 
      ggpubr::stat_cor(method = "spearman", 
                       cor.coef.name = "rho", 
                       geom = "label", 
                       label.x = 0.5, 
                       label.y = 0.25) + 
      scale_x_continuous(limits = c(0, 1)) + 
      scale_y_continuous(limits = c(0, 1)) + 
      scale_color_manual(values = palette_cluster) + 
      labs(x = "True Pseudotime", 
           y = "Estimated Pseudotime", 
           color = "Leiden") + 
      theme_scLANE() + 
      guide_umap()
p8

Figure 6: The Spearman correlation between true & estimated pseudotime. A fit from a simple linear model is overlaid in black.

The minimum spanning tree (MST) and principal curves from slingshot appear to match our ground-truth trajectory structure quite well. In addition, as shown above in Figure 6 the estimated pseudotime closely matches the ground truth.

Code

p9 <- data.frame(Embeddings(seu, "umap")) %>% 
      mutate(leiden = seu$seurat_clusters) %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = leiden)) + 
      geom_point(size = 1.5, 
                 stroke = 0, 
                 alpha = 0.75) + 
      geom_path(data = sling_mst, mapping = aes(x = umap_1, y = umap_2, group = Lineage), 
                linewidth = 1.25, 
                color = "black") + 
      geom_point(data = sling_mst, mapping = aes(x = umap_1, y = umap_2, fill = Cluster), 
                color = "black", 
                shape = 21, 
                size = 4.5, 
                stroke = 1.25, 
                show.legend = FALSE) + 
      scale_color_manual(values = palette_cluster) + 
      scale_fill_manual(values = palette_cluster) + 
      labs(x = "UMAP 1", 
           y = "UMAP 2", 
           color = "Leiden") + 
      theme_scLANE(umap = TRUE) +  
      guide_umap()
p10 <- data.frame(Embeddings(seu, "umap")) %>% 
       mutate(leiden = seu$seurat_clusters) %>% 
       ggplot(aes(x = umap_1, y = umap_2, color = leiden)) + 
       geom_point(size = 1.5, 
                  stroke = 0, 
                  alpha = 0.75) + 
       geom_path(data = sling_curves,
                 mapping = aes(x = umap_1, y = umap_2, group = Lineage), 
                 color = "black", 
                 linewidth = 1.5, 
                 alpha = 0.75, 
                 lineend = "round") + 
       scale_color_manual(values = palette_cluster) + 
       labs(x = "UMAP 1", 
            y = "UMAP 2", 
            color =  "Leiden") + 
       theme_scLANE(umap = TRUE) + 
       guide_umap()
p11 <- select(sling_pt, starts_with("Lineage")) %>% 
       mutate(cell = colnames(seu), 
              umap_1 = seu@reductions$umap@cell.embeddings[, 1], 
              umap_2 = seu@reductions$umap@cell.embeddings[, 2], 
              .before = 1) %>% 
       tidyr::pivot_longer(cols = starts_with("Lineage"), 
                           names_to = "lineage",
                           values_to = "pseudotime") %>% 
       ggplot(aes(x = umap_1, y = umap_2, color = pseudotime)) + 
       facet_wrap(~lineage) + 
       geom_point(size = 1.5, 
                  stroke = 0, 
                  alpha = 0.75) + 
       scale_color_gradientn(colors = palette_heatmap) + 
       labs(x = "UMAP 1", 
            y = "UMAP 2", 
            color = "Pseudotime") + 
       theme_scLANE(umap = TRUE)
p12 <- (((p9 | p10) + plot_layout(axes = "collect")) / p11) + 
       plot_layout(guides = "collect")
p12

Figure 7: UMAP embedding with MST from Slinghot overlaid.

5.3 Estimating trajectory dynamics

Using scLANE (GitHub) we perform trajectory differential expression (DE) testing on the set of HVGs.

Code

pt_df <- select(sling_pt, -PT_Overall)
cell_offset <- createCellOffset(seu)
scLANE_models <- testDynamic(seu, 
                             pt = pt_df, 
                             genes = VariableFeatures(seu), 
                             size.factor.offset = cell_offset,
                             n.cores = 6L, 
                             verbose = FALSE)

scLANE testing completed for 3000 genes across 2 lineages in 21.853 mins

Code

scLANE_de_res <- getResultsDE(scLANE_models)

The top ten genes from scLANE are shown in Table 2.

Code

slice_head(scLANE_de_res, n = 10) %>% 
  select(Gene, Lineage, Test_Stat, P_Val_Adj, Gene_Dynamic_Lineage, Gene_Dynamic_Overall) %>% 
  mutate(Gene_Dynamic_Lineage = if_else(Gene_Dynamic_Lineage == 1, "Dynamic", "Static"), 
         Gene_Dynamic_Overall = if_else(Gene_Dynamic_Overall == 1, "Dynamic", "Static")) %>% 
  kableExtra::kable(digits = 2, 
                    booktabs = TRUE, 
                    col.names = c("Gene", "Lineage", "LRT Stat.", "Adj. P-value", "Lineage Status", "Overall Status")) %>% 
  kableExtra::kable_classic("hover", full_width = FALSE)

Gene	Lineage	LRT Stat.	Lineage Status	Overall Status
B5-TF1	A	1855.07	Dynamic	Dynamic
Target180	B	1799.82	Dynamic	Dynamic
Target158	A	1725.68	Dynamic	Dynamic
Target254	A	1683.16	Dynamic	Dynamic
Target414	B	1637.27	Dynamic	Dynamic
Target174	B	1613.97	Dynamic	Dynamic
Target81	A	1584.33	Dynamic	Dynamic
Target257	A	1567.35	Dynamic	Dynamic
Target92	A	1519.12	Dynamic	Dynamic
Target7	A	1424.00	Dynamic	Dynamic

Table 2: The top-10 trajectory DE genes from scLANE.

We identify a set of 202 dynamic genes.

Code

dyn_genes <- filter(scLANE_de_res, Gene_Dynamic_Overall == 1) %>% 
             distinct(Gene) %>% 
             pull(Gene)

Next, we pull a matrix of gene dynamics for each of the dynamic genes using the smoothedCountsMatrix() function.

Code

gene_dynamics <- smoothedCountsMatrix(scLANE_models, 
                                      size.factor.offset = cell_offset, 
                                      pt = pt_df,
                                      genes = dyn_genes, 
                                      log1p.norm = TRUE)

5.4 Building a GRN from scratch

We’ll start by building a GRN on our own with scTIRN (single cell Trajectory Inference via Regulatory Networks) without using external packages. The main logic comes from the function estimateTGRN() defined in Section 3.

Code

tx_factors <- filter(seu@assays$RNA@meta.features, is_tf == TRUE) %>% 
              rownames()
GRN_truth <- select(tf_model$feature_network, 
                    from, 
                    to, 
                    effect, 
                    strength) %>% 
             rename(Tx_Factor = from, 
                    Target_Gene = to) %>% 
             arrange(Tx_Factor, desc(strength)) %>% 
             mutate(Tx_Factor = gsub("_", "-", Tx_Factor), 
                    Target_Gene = gsub("_", "-", Target_Gene)) %>% 
             filter(!grepl("HK", Tx_Factor), 
                    !grepl("Target", Tx_Factor))
GRN_lineage1 <- estimateTGRN(dyn.mat = gene_dynamics$Lineage_A, 
                             expr.mat = as.matrix(t(seu@assays$RNA@data[colnames(gene_dynamics$Lineage_A), !is.na(pt_df$Lineage1)])), 
                             dyn.genes = colnames(gene_dynamics$Lineage_A), 
                             tx.factors = tx_factors,
                             n.cores = 6L, 
                             verbose = FALSE) %>% 
                mutate(Lineage = "Lineage1", .before = 1)
GRN_lineage2 <- estimateTGRN(gene_dynamics$Lineage_B, 
                             expr.mat = as.matrix(t(seu@assays$RNA@data[colnames(gene_dynamics$Lineage_B), !is.na(pt_df$Lineage2)])), 
                             dyn.genes = colnames(gene_dynamics$Lineage_B), 
                             tx.factors = tx_factors,
                             n.cores = 6L, 
                             verbose = FALSE) %>% 
                mutate(Lineage = "Lineage2", .before = 1)
GRN_all <- bind_rows(GRN_lineage1, GRN_lineage2) %>% 
           arrange(Lineage, Target_Gene, desc(Mean_Rank)) %>% 
           with_groups(c(Lineage, Target_Gene), 
                       slice_head, 
                       n = 10)

5.4.1 Estimating accuracy

Since our simulation only defines the network over the entire dataset, instead of in a lineage-specific manner, we’ll define scTIRN as being correct if a given TF-target gene link is recovered in either of the two lineage GRNs. In addition, we’ll define the directionality as being correct if it is correct in either lineage.

Code

scTIRN_detected_vec <- scTIRN_dir_vec_lin1 <- scTIRN_dir_vec_lin2 <- vector("numeric", nrow(GRN_truth))
for (i in seq(nrow(GRN_truth))) {
  sub_df <- filter(GRN_all, 
                   Tx_Factor == GRN_truth$Tx_Factor[i], 
                   Target_Gene == GRN_truth$Target_Gene[i])
 scTIRN_detected_vec[i] <- ifelse(nrow(sub_df > 0), 1, 0)
 scTIRN_dir_lin1 <- filter(GRN_lineage1, 
                           Tx_Factor == GRN_truth$Tx_Factor[i], 
                           Target_Gene == GRN_truth$Target_Gene[i]) %>% 
                    pull(Dyn_Cor)
 scTIRN_dir_vec_lin1[i] <- ifelse(length(scTIRN_dir_lin1) != 0, scTIRN_dir_lin1, NA_real_)
 scTIRN_dir_lin2 <- filter(GRN_lineage2, 
                           Tx_Factor == GRN_truth$Tx_Factor[i], 
                           Target_Gene == GRN_truth$Target_Gene[i]) %>% 
                    pull(Dyn_Cor)
 scTIRN_dir_vec_lin2[i] <- ifelse(length(scTIRN_dir_lin1) != 0, scTIRN_dir_lin2, NA_real_)
}
GRN_truth <- mutate(GRN_truth, 
                    scTIRN_detected = scTIRN_detected_vec, 
                    scTIRN_direction_lineage1 = scTIRN_dir_vec_lin1,
                    scTIRN_direction_lineage2 = scTIRN_dir_vec_lin2) %>% 
             mutate(scTIRN_dir_correct = case_when(effect == 1 & (scTIRN_direction_lineage1 > 0 | scTIRN_direction_lineage2 > 0) ~ 1, 
                                                   effect == -1 & (scTIRN_direction_lineage1 < 0 | scTIRN_direction_lineage2 < 0) ~ 1, 
                                                   TRUE ~ 0), 
                    scTIRN_fully_correct = if_else(scTIRN_detected == 1 & scTIRN_dir_correct == 1, 1, 0))

In order to correctly estimate the accuracy of scTIRN, we first filter the ground-truth GRN to just the genes identified by scLANE as being dynamic over the trajectory, then estimate the proportion of the true TF to target gene links identified by our method. The accuracy looks pretty good!

Code

filter(GRN_truth, Tx_Factor %in% dyn_genes) %>% 
  summarise(mu = mean(scTIRN_detected) * 100) %>% 
  kableExtra::kable(digits = 2, 
                    col.names = "Accuracy",
                    booktabs = TRUE, 
                    align = "c") %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")

Accuracy
63.22

Table 3: The accuracy of scTIRN when filtering to just the genes identified by scLANE as trajectory DE.

5.5 Building a GRN with `epoch`

Next, we use the epoch R package (paper, GitHub) to estimate a GRN for each lineage using the Pearson correlation of expression between dynamic genes. We start by identifying a set of genes dynamic across lineage 1 (as with scLANE, we test solely the HVG set used to build the trajectory), which we do by fitting GAMs using their findDynGenes() function. A quirk of the epoch method is that it’s necessary to provide the cluster structure of the trajectory - in order, which we do based on Figure 7. Next, we construct the GRN & perform crossweighting - the goal of which is to reduce the occurrence of false positives (see their Methods for details).

Note

The authors show in their paper (see e.g., Figure 2) that the mutual information (MI) method with crossweighting is the best. However, their MI implementation is very slow, so I’m opting to use the Pearson correlation implementation instead.

Code

epoch_expr_mat_lin1 <- as.matrix(seu@assays$RNA$data[VariableFeatures(seu), !is.na(sling_pt$Lineage1)])
epoch_pt_df <- select(sling_pt, Lineage1) %>% 
               mutate(leiden = seu$seurat_clusters) %>% 
               filter(!is.na(Lineage1)) %>% 
               as.data.frame() %>% 
               magrittr::set_rownames(colnames(epoch_expr_mat_lin1))
epoch_models_lin1 <- findDynGenes(epoch_expr_mat_lin1, 
                                  sampTab = epoch_pt_df, 
                                  path = c("1", "6", "7", "4", "5"), 
                                  group_column = "leiden", 
                                  pseudotime_column = "Lineage1",
                                  method = "gam")
epoch_dyn_genes_lin1 <- names(epoch_models_lin1$genes)[epoch_models_lin1$genes < 0.01]
epoch_dyn_genes_lin1 <- na.omit(epoch_dyn_genes_lin1)
epoch_GRN_lin1 <- reconstructGRN(epoch_expr_mat_lin1, 
                                 tfs = tx_factors, 
                                 dgenes = epoch_dyn_genes_lin1, 
                                 method = "pearson")
epoch_GRN_lin1 <- crossweight(epoch_GRN_lin1, 
                              expDat = epoch_expr_mat_lin1, 
                              xdyn = epoch_models_lin1) %>% 
                  mutate(Lineage = "Lineage1", .before = 1)

The results are shown below:

Code

slice_sample(epoch_GRN_lin1, n = 10) %>% 
  kableExtra::kable(digits = 2, 
                    row.names = FALSE, 
                    booktabs = TRUE, 
                    col.names = c("Lineage", "Target Gene", "TF", "Z-score", "Pearson Corr.", "Offset", "Weighted Score")) %>% 
  kableExtra::kable_classic(full_width = FALSE, c("hover"))

Lineage	Target Gene	TF	Z-score	Pearson Corr.	Offset	Weighted Score
Lineage1	A3-TF1	B7-TF1	2.30	0.15	34.60	2.30
Lineage1	HK2715	D2-TF1	4.35	0.15	25.13	4.35
Lineage1	Burn4-TF1	B13-TF1	2.47	0.10	4.70	2.47
Lineage1	HK3940	B13-TF1	4.94	0.09	30.80	4.94
Lineage1	HK3158	B3-TF1	4.37	0.16	9.72	4.37
Lineage1	HK1505	B12-TF1	2.04	0.11	77.36	0.57
Lineage1	HK277	B12-TF1	2.24	0.12	43.89	1.32
Lineage1	HK151	B5-TF1	2.52	-0.12	-91.41	2.52
Lineage1	HK1502	B3-TF1	3.57	0.13	-65.03	3.57
Lineage1	HK2587	B3-TF1	3.64	0.11	-83.22	3.64

Table 4: A sample of the GRN output for lineage 1 from epoch.

We repeat the process for lineage 2, again using the cluster structure shown in Figure 7 when identifying dynamic genes.

Code

epoch_expr_mat_lin2 <- as.matrix(seu@assays$RNA$data[VariableFeatures(seu), !is.na(sling_pt$Lineage2)])
epoch_pt_df <- select(sling_pt, Lineage2) %>% 
               mutate(leiden = seu$seurat_clusters) %>% 
               filter(!is.na(Lineage2)) %>% 
               as.data.frame() %>% 
               magrittr::set_rownames(colnames(epoch_expr_mat_lin2))
epoch_models_lin2 <- findDynGenes(epoch_expr_mat_lin2, 
                                  sampTab = epoch_pt_df, 
                                  path = c("1", "6", "2", "3"), 
                                  group_column = "leiden", 
                                  pseudotime_column = "Lineage2",
                                  method = "gam")
epoch_dyn_genes_lin2 <- names(epoch_models_lin2$genes)[epoch_models_lin2$genes < 0.01]
epoch_dyn_genes_lin2 <- na.omit(epoch_dyn_genes_lin2)
epoch_GRN_lin2 <- reconstructGRN(epoch_expr_mat_lin2, 
                                 tfs = tx_factors, 
                                 dgenes = epoch_dyn_genes_lin2, 
                                 method = "pearson")
epoch_GRN_lin2 <- crossweight(epoch_GRN_lin2, 
                              expDat = epoch_expr_mat_lin2, 
                              xdyn = epoch_models_lin2) %>% 
                  mutate(Lineage = "Lineage2", .before = 1)
epoch_GRN_all <- bind_rows(epoch_GRN_lin1, epoch_GRN_lin2)

5.5.1 Estimating accuracy

We compute the accuracy of epoch using the same method as we did for scTIRN i.e., by filtering the ground-truth GRN to just the genes identified as dynamic (across either lineage) by epoch.

Code

epoch_detected_vec <- epoch_dir_vec_lin1 <- epoch_dir_vec_lin2 <- vector("numeric", nrow(GRN_truth))
for (i in seq(nrow(GRN_truth))) {
  sub_df <- filter(epoch_GRN_all, 
                   TF == GRN_truth$Tx_Factor[i], 
                   TG == GRN_truth$Target_Gene[i])
 epoch_detected_vec[i] <- ifelse(nrow(sub_df > 0), 1, 0)
 epoch_dir_vec_lin1 <- filter(epoch_GRN_lin1, 
                              TF == GRN_truth$Tx_Factor[i], 
                              TG == GRN_truth$Target_Gene[i]) %>% 
                        pull(corr)
 epoch_dir_vec_lin1[i] <- ifelse(length(epoch_dir_vec_lin1) != 0, epoch_dir_vec_lin1, NA_real_)
 epoch_dir_vec_lin2 <- filter(epoch_GRN_lin2, 
                               TF == GRN_truth$Tx_Factor[i], 
                               TG == GRN_truth$Target_Gene[i]) %>% 
                        pull(corr)
 epoch_dir_vec_lin2[i] <- ifelse(length(epoch_dir_vec_lin2) != 0, epoch_dir_vec_lin2, NA_real_)
}
GRN_truth <- mutate(GRN_truth, 
                    epoch_detected = epoch_detected_vec, 
                    epoch_direction_lineage1 = epoch_dir_vec_lin1,
                    epoch_direction_lineage2 = epoch_dir_vec_lin2) %>% 
             mutate(epoch_dir_correct = case_when(effect == 1 & (epoch_direction_lineage1 > 0 | epoch_direction_lineage2 > 0) ~ 1, 
                                                  effect == -1 & (epoch_direction_lineage1 < 0 | epoch_direction_lineage2 < 0) ~ 1, 
                                                  TRUE ~ 0), 
                    epoch_fully_correct = if_else(epoch_detected == 1 & epoch_dir_correct == 1, 1, 0))

epoch performs fairly well; compare its accuracy with that of scTIRN as shown in Table 3.

Code

filter(GRN_truth, Tx_Factor %in% union(epoch_dyn_genes_lin1, epoch_dyn_genes_lin2)) %>% 
  summarise(mu = mean(epoch_detected) * 100) %>% 
  kableExtra::kable(digits = 2, 
                    col.names = "Accuracy",
                    booktabs = TRUE, 
                    align = "c") %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")

Accuracy
68.64

Table 5: The accuracy of epoch when filtering to just the genes identified by epoch as trajectory DE.

6 Conclusions

Overall both methods perform well, each having an accuracy in the mid-80s. scTIRN performed slightly better, likely due to its usage of less-noisy gene dynamics instead of normalized expression to build each lineage’s network. In addition, given that we utilized lineage-specific pseudotime from slingshot instead of the overall ground-truth pseudotime I imagine the performance of scTIRN could be improved if we were to use the ground-truth data instead - though this would lose the lineage specificity. Lastly, the dyngen package, while somewhat difficult to use, proved to be a useful tool in that it provides a granular, ground-truth GRN against which we can evaluate different GRN estimation methods.

7 Session info

Code

sessioninfo::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Sonoma 14.3
 system   x86_64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-02-27
 pandoc   3.1.9 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package              * version    date (UTC) lib source
 abind                  1.4-5      2016-07-21 [1] CRAN (R 4.3.0)
 assertthat             0.2.1      2019-03-21 [1] CRAN (R 4.3.0)
 backports              1.4.1      2021-12-13 [1] CRAN (R 4.3.0)
 beeswarm               0.4.0      2021-06-01 [1] CRAN (R 4.3.0)
 bigassertr             0.1.6      2023-01-10 [1] CRAN (R 4.3.0)
 bigparallelr           0.3.2      2021-10-02 [1] CRAN (R 4.3.0)
 bigstatsr              1.5.12     2022-10-14 [1] CRAN (R 4.3.0)
 Biobase              * 2.62.0     2023-10-24 [1] Bioconductor
 BiocGenerics         * 0.48.1     2023-11-01 [1] Bioconductor
 bit                    4.0.5      2022-11-15 [1] CRAN (R 4.3.0)
 bit64                  4.0.5      2020-08-30 [1] CRAN (R 4.3.0)
 bitops                 1.0-7      2021-04-24 [1] CRAN (R 4.3.0)
 boot                   1.3-28.1   2022-11-22 [1] CRAN (R 4.3.2)
 broom                  1.0.5      2023-06-09 [1] CRAN (R 4.3.0)
 broom.mixed            0.2.9.4    2022-04-17 [1] CRAN (R 4.3.0)
 car                    3.1-2      2023-03-30 [1] CRAN (R 4.3.0)
 carData                3.0-5      2022-01-06 [1] CRAN (R 4.3.0)
 cellranger             1.1.0      2016-07-27 [1] CRAN (R 4.3.0)
 class                  7.3-22     2023-05-03 [1] CRAN (R 4.3.2)
 cli                    3.6.2      2023-12-11 [1] CRAN (R 4.3.0)
 cluster                2.1.6      2023-12-01 [1] CRAN (R 4.3.0)
 coda                   0.19-4     2020-09-30 [1] CRAN (R 4.3.0)
 codetools              0.2-19     2023-02-01 [1] CRAN (R 4.3.2)
 colorspace             2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
 cowplot                1.1.2      2023-12-15 [1] CRAN (R 4.3.0)
 crayon                 1.5.2      2022-09-29 [1] CRAN (R 4.3.0)
 data.table             1.14.10    2023-12-08 [1] CRAN (R 4.3.0)
 DelayedArray           0.28.0     2023-10-24 [1] Bioconductor
 DelayedMatrixStats     1.24.0     2023-10-24 [1] Bioconductor
 deldir                 2.0-2      2023-11-23 [1] CRAN (R 4.3.0)
 desc                   1.4.3      2023-12-10 [1] CRAN (R 4.3.0)
 DescTools            * 0.99.52    2023-12-01 [1] CRAN (R 4.3.0)
 digest                 0.6.33     2023-07-07 [1] CRAN (R 4.3.0)
 doParallel             1.0.17     2022-02-07 [1] CRAN (R 4.3.0)
 doSNOW                 1.0.20     2022-02-04 [1] CRAN (R 4.3.0)
 dotCall64              1.1-1      2023-11-28 [1] CRAN (R 4.3.0)
 dplyr                * 1.1.4      2023-11-17 [1] CRAN (R 4.3.0)
 dyngen               * 1.0.5      2022-10-12 [1] CRAN (R 4.3.0)
 dynutils               1.0.11     2022-10-11 [1] CRAN (R 4.3.0)
 e1071                  1.7-14     2023-12-06 [1] CRAN (R 4.3.0)
 ellipsis               0.3.2      2021-04-29 [1] CRAN (R 4.3.0)
 emmeans                1.10.0     2024-01-23 [1] CRAN (R 4.3.2)
 epoch                * 0.0.0.9000 2023-12-22 [1] Github (pcahan1/epoch@bf44dff)
 estimability           1.4.1      2022-08-05 [1] CRAN (R 4.3.0)
 evaluate               0.23       2023-11-01 [1] CRAN (R 4.3.0)
 Exact                  3.2        2022-09-25 [1] CRAN (R 4.3.0)
 expm                   0.999-8    2023-11-29 [1] CRAN (R 4.3.0)
 fansi                  1.0.6      2023-12-08 [1] CRAN (R 4.3.0)
 farver                 2.1.1      2022-07-06 [1] CRAN (R 4.3.0)
 fastDummies            1.7.3      2023-07-06 [1] CRAN (R 4.3.0)
 fastmap                1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
 ff                     4.0.9      2023-01-25 [1] CRAN (R 4.3.0)
 fitdistrplus           1.1-11     2023-04-25 [1] CRAN (R 4.3.0)
 flock                  0.7        2016-11-12 [1] CRAN (R 4.3.0)
 forcats                1.0.0      2023-01-29 [1] CRAN (R 4.3.0)
 foreach              * 1.5.2      2022-02-02 [1] CRAN (R 4.3.0)
 furrr                  0.3.1      2022-08-15 [1] CRAN (R 4.3.0)
 future                 1.33.1     2023-12-22 [1] CRAN (R 4.3.0)
 future.apply           1.11.1     2023-12-21 [1] CRAN (R 4.3.0)
 gam                  * 1.22-3     2023-11-29 [1] CRAN (R 4.3.0)
 gamlss                 5.4-20     2023-10-04 [1] CRAN (R 4.3.0)
 gamlss.data            6.0-2      2021-11-07 [1] CRAN (R 4.3.0)
 gamlss.dist            6.1-1      2023-08-23 [1] CRAN (R 4.3.0)
 geeM                   0.10.1     2018-06-18 [1] CRAN (R 4.3.0)
 generics               0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
 GENIE3               * 1.24.0     2023-10-24 [1] Bioconductor
 GenomeInfoDb         * 1.38.5     2023-12-28 [1] Bioconductor 3.18 (R 4.3.2)
 GenomeInfoDbData       1.2.11     2023-12-22 [1] Bioconductor
 GenomicRanges        * 1.54.1     2023-10-29 [1] Bioconductor
 ggbeeswarm             0.7.2      2023-04-29 [1] CRAN (R 4.3.0)
 ggforce                0.4.1      2022-10-04 [1] CRAN (R 4.3.0)
 ggnetwork            * 0.5.12     2024-02-12 [1] Github (briatte/ggnetwork@f3b8b84)
 ggplot2              * 3.4.4      2023-10-12 [1] CRAN (R 4.3.0)
 ggpubr                 0.6.0      2023-02-10 [1] CRAN (R 4.3.0)
 ggraph                 2.1.0      2022-10-09 [1] CRAN (R 4.3.0)
 ggrepel                0.9.5      2024-01-10 [1] CRAN (R 4.3.0)
 ggridges               0.5.5      2023-12-15 [1] CRAN (R 4.3.0)
 ggsignif               0.6.4      2022-10-13 [1] CRAN (R 4.3.0)
 GillespieSSA2          0.3.0      2023-01-23 [1] CRAN (R 4.3.0)
 gld                    2.6.6      2022-10-23 [1] CRAN (R 4.3.0)
 glm2                 * 1.2.1      2018-08-11 [1] CRAN (R 4.3.0)
 glmmTMB                1.1.8      2023-10-07 [1] CRAN (R 4.3.0)
 globals                0.16.2     2022-11-21 [1] CRAN (R 4.3.0)
 glue                   1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
 goftest                1.2-3      2021-10-07 [1] CRAN (R 4.3.0)
 graphlayouts           1.1.0      2024-01-19 [1] CRAN (R 4.3.0)
 gridExtra            * 2.3        2017-09-09 [1] CRAN (R 4.3.0)
 gtable                 0.3.4      2023-08-21 [1] CRAN (R 4.3.0)
 hdf5r                * 1.3.8      2023-01-21 [1] CRAN (R 4.3.0)
 highr                  0.10       2022-12-22 [1] CRAN (R 4.3.0)
 hms                    1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
 htmltools              0.5.7      2023-11-03 [1] CRAN (R 4.3.0)
 htmlwidgets            1.6.4      2023-12-06 [1] CRAN (R 4.3.0)
 httpuv                 1.6.13     2023-12-06 [1] CRAN (R 4.3.0)
 httr                   1.4.7      2023-08-15 [1] CRAN (R 4.3.0)
 ica                    1.0-3      2022-07-08 [1] CRAN (R 4.3.0)
 igraph               * 2.0.1.1    2024-01-30 [1] CRAN (R 4.3.2)
 infotheo               1.2.0.1    2022-04-08 [1] CRAN (R 4.3.0)
 IRanges              * 2.36.0     2023-10-24 [1] Bioconductor
 irlba                  2.3.5.1    2022-10-03 [1] CRAN (R 4.3.0)
 iterators            * 1.0.14     2022-02-05 [1] CRAN (R 4.3.0)
 itertools            * 0.1-3      2014-03-12 [1] CRAN (R 4.3.0)
 jsonlite               1.8.8      2023-12-04 [1] CRAN (R 4.3.0)
 kableExtra             1.3.4      2021-02-20 [1] CRAN (R 4.3.0)
 KernSmooth             2.23-22    2023-07-10 [1] CRAN (R 4.3.2)
 knitr                  1.45       2023-10-30 [1] CRAN (R 4.3.0)
 labeling               0.4.3      2023-08-29 [1] CRAN (R 4.3.0)
 later                  1.3.2      2023-12-06 [1] CRAN (R 4.3.0)
 lattice                0.22-5     2023-10-24 [1] CRAN (R 4.3.0)
 lazyeval               0.2.2      2019-03-15 [1] CRAN (R 4.3.0)
 leiden                 0.4.3.1    2023-11-17 [1] CRAN (R 4.3.0)
 lifecycle              1.0.4      2023-11-07 [1] CRAN (R 4.3.0)
 listenv                0.9.0      2022-12-16 [1] CRAN (R 4.3.0)
 lmds                   0.1.0      2019-09-27 [1] CRAN (R 4.3.0)
 lme4                   1.1-35.1   2023-11-05 [1] CRAN (R 4.3.0)
 lmom                   3.0        2023-08-29 [1] CRAN (R 4.3.0)
 lmtest                 0.9-40     2022-03-21 [1] CRAN (R 4.3.0)
 loomR                * 0.2.0      2023-12-22 [1] Github (mojaveazure/loomR@df0144b)
 magrittr             * 2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
 MASS                   7.3-60     2023-05-04 [1] CRAN (R 4.3.0)
 Matrix                 1.6-5      2024-01-11 [1] CRAN (R 4.3.0)
 MatrixGenerics       * 1.14.0     2023-10-24 [1] Bioconductor
 matrixStats          * 1.2.0      2023-12-11 [1] CRAN (R 4.3.0)
 mgcv                   1.9-1      2023-12-21 [1] CRAN (R 4.3.0)
 mime                   0.12       2021-09-28 [1] CRAN (R 4.3.0)
 minet                * 3.60.0     2023-10-24 [1] Bioconductor
 miniUI                 0.1.1.1    2018-05-18 [1] CRAN (R 4.3.0)
 minqa                  1.2.6      2023-09-11 [1] CRAN (R 4.3.0)
 munsell                0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
 mvtnorm                1.2-4      2023-11-27 [1] CRAN (R 4.3.0)
 nlme                   3.1-164    2023-11-27 [1] CRAN (R 4.3.0)
 nloptr                 2.0.3      2022-05-26 [1] CRAN (R 4.3.0)
 numDeriv               2016.8-1.1 2019-06-06 [1] CRAN (R 4.3.0)
 paletteer              1.6.0      2024-01-21 [1] CRAN (R 4.3.0)
 parallelly             1.36.0     2023-05-26 [1] CRAN (R 4.3.0)
 patchwork            * 1.2.0      2024-01-08 [1] CRAN (R 4.3.0)
 pbapply                1.7-2      2023-06-27 [1] CRAN (R 4.3.0)
 pheatmap             * 1.0.12     2019-01-04 [1] CRAN (R 4.3.0)
 pillar                 1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig              2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
 plotly                 4.10.4     2024-01-13 [1] CRAN (R 4.3.0)
 plyr                   1.8.9      2023-10-02 [1] CRAN (R 4.3.0)
 png                    0.1-8      2022-11-29 [1] CRAN (R 4.3.0)
 polyclip               1.10-6     2023-09-27 [1] CRAN (R 4.3.0)
 princurve            * 2.1.6      2021-01-18 [1] CRAN (R 4.3.0)
 prismatic              1.1.1      2022-08-15 [1] CRAN (R 4.3.0)
 progressr              0.14.0     2023-08-10 [1] CRAN (R 4.3.0)
 promises               1.2.1      2023-08-10 [1] CRAN (R 4.3.0)
 proxy                  0.4-27     2022-06-09 [1] CRAN (R 4.3.0)
 proxyC                 0.3.4      2023-10-25 [1] CRAN (R 4.3.0)
 ps                     1.7.5      2023-04-18 [1] CRAN (R 4.3.0)
 purrr                  1.0.2      2023-08-10 [1] CRAN (R 4.3.0)
 R6                   * 2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
 RANN                   2.6.1      2019-01-08 [1] CRAN (R 4.3.0)
 RColorBrewer         * 1.1-3      2022-04-03 [1] CRAN (R 4.3.0)
 Rcpp                   1.0.11     2023-07-06 [1] CRAN (R 4.3.0)
 RcppAnnoy              0.0.21     2023-07-02 [1] CRAN (R 4.3.0)
 RcppEigen              0.3.3.9.4  2023-11-02 [1] CRAN (R 4.3.0)
 RcppHNSW               0.5.0      2023-09-19 [1] CRAN (R 4.3.0)
 RcppParallel           5.1.7      2023-02-27 [1] CRAN (R 4.3.0)
 RcppXPtrUtils          0.1.2      2022-05-24 [1] CRAN (R 4.3.0)
 RCurl                  1.98-1.13  2023-11-02 [1] CRAN (R 4.3.0)
 readr                  2.1.4      2023-02-10 [1] CRAN (R 4.3.0)
 readxl                 1.4.3      2023-07-06 [1] CRAN (R 4.3.0)
 rematch2               2.1.2      2020-05-01 [1] CRAN (R 4.3.0)
 remotes                2.4.2.1    2023-07-18 [1] CRAN (R 4.3.0)
 reshape2             * 1.4.4      2020-04-09 [1] CRAN (R 4.3.0)
 reticulate             1.34.0     2023-10-12 [1] CRAN (R 4.3.0)
 rlang                  1.1.2      2023-11-04 [1] CRAN (R 4.3.0)
 rmarkdown              2.25       2023-09-18 [1] CRAN (R 4.3.0)
 rmio                   0.4.0      2022-02-17 [1] CRAN (R 4.3.0)
 ROCR                   1.0-11     2020-05-02 [1] CRAN (R 4.3.0)
 rootSolve              1.8.2.4    2023-09-21 [1] CRAN (R 4.3.0)
 RSpectra               0.16-1     2022-04-24 [1] CRAN (R 4.3.0)
 rstatix                0.7.2      2023-02-01 [1] CRAN (R 4.3.0)
 rstudioapi             0.15.0     2023-07-07 [1] CRAN (R 4.3.0)
 Rtsne                  0.17       2023-12-07 [1] CRAN (R 4.3.0)
 rvest                  1.0.3      2022-08-19 [1] CRAN (R 4.3.0)
 S4Arrays               1.2.0      2023-10-24 [1] Bioconductor
 S4Vectors            * 0.40.2     2023-11-23 [1] Bioconductor
 scales                 1.3.0      2023-11-28 [1] CRAN (R 4.3.0)
 scattermore            1.2        2023-06-12 [1] CRAN (R 4.3.0)
 scLANE               * 0.7.9      2024-02-18 [1] Bioconductor
 sctransform            0.4.1      2023-10-19 [1] CRAN (R 4.3.0)
 sessioninfo            1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 Seurat               * 5.0.1      2023-11-17 [1] CRAN (R 4.3.0)
 SeuratObject         * 5.0.1      2023-11-17 [1] CRAN (R 4.3.0)
 shiny                  1.8.0      2023-11-17 [1] CRAN (R 4.3.0)
 SingleCellExperiment * 1.24.0     2023-10-24 [1] Bioconductor
 slingshot            * 2.10.0     2023-10-24 [1] Bioconductor
 snow                   0.4-4      2021-10-27 [1] CRAN (R 4.3.0)
 sp                   * 2.1-2      2023-11-26 [1] CRAN (R 4.3.0)
 spam                   2.10-0     2023-10-23 [1] CRAN (R 4.3.0)
 SparseArray            1.2.3      2023-12-25 [1] Bioconductor 3.18 (R 4.3.2)
 sparseMatrixStats      1.14.0     2023-10-24 [1] Bioconductor
 spatstat.data          3.0-3      2023-10-24 [1] CRAN (R 4.3.0)
 spatstat.explore       3.2-5      2023-10-22 [1] CRAN (R 4.3.0)
 spatstat.geom          3.2-7      2023-10-20 [1] CRAN (R 4.3.0)
 spatstat.random        3.2-2      2023-11-29 [1] CRAN (R 4.3.0)
 spatstat.sparse        3.0-3      2023-10-24 [1] CRAN (R 4.3.0)
 spatstat.utils         3.0-4      2023-10-24 [1] CRAN (R 4.3.0)
 stringi                1.8.3      2023-12-11 [1] CRAN (R 4.3.0)
 stringr                1.5.1      2023-11-14 [1] CRAN (R 4.3.0)
 SummarizedExperiment * 1.32.0     2023-10-24 [1] Bioconductor
 survival               3.5-7      2023-08-14 [1] CRAN (R 4.3.2)
 svglite                2.1.3      2023-12-08 [1] CRAN (R 4.3.0)
 systemfonts            1.0.5      2023-10-09 [1] CRAN (R 4.3.0)
 tensor                 1.5        2012-05-05 [1] CRAN (R 4.3.0)
 tibble                 3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidygraph              1.3.0      2023-12-18 [1] CRAN (R 4.3.0)
 tidyr                  1.3.1      2024-01-24 [1] CRAN (R 4.3.2)
 tidyselect             1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
 TMB                    1.9.10     2023-12-12 [1] CRAN (R 4.3.0)
 TrajectoryUtils      * 1.10.0     2023-10-24 [1] Bioconductor
 tweenr                 2.0.2      2022-09-06 [1] CRAN (R 4.3.0)
 tzdb                   0.4.0      2023-05-12 [1] CRAN (R 4.3.0)
 utf8                   1.2.4      2023-10-22 [1] CRAN (R 4.3.0)
 uwot                   0.1.16     2023-06-29 [1] CRAN (R 4.3.0)
 vctrs                  0.6.5      2023-12-01 [1] CRAN (R 4.3.0)
 vipor                  0.4.7      2023-12-18 [1] CRAN (R 4.3.0)
 viridis              * 0.6.4      2023-07-22 [1] CRAN (R 4.3.0)
 viridisLite          * 0.4.2      2023-05-02 [1] CRAN (R 4.3.0)
 webshot                0.5.5      2023-06-26 [1] CRAN (R 4.3.0)
 withr                  2.5.2      2023-10-30 [1] CRAN (R 4.3.0)
 xfun                   0.41       2023-11-01 [1] CRAN (R 4.3.0)
 xml2                   1.3.6      2023-12-04 [1] CRAN (R 4.3.0)
 xtable                 1.8-4      2019-04-21 [1] CRAN (R 4.3.0)
 XVector                0.42.0     2023-10-24 [1] Bioconductor
 yaml                   2.3.8      2023-12-11 [1] CRAN (R 4.3.0)
 zlibbioc               1.48.0     2023-10-24 [1] Bioconductor
 zoo                    1.8-12     2023-04-13 [1] CRAN (R 4.3.0)

 [1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library

─ Python configuration ───────────────────────────────────────────────────────
 python:         /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/bin/python
 libpython:      /usr/local/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/config-3.11-darwin/libpython3.11.dylib
 pythonhome:     /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site:/Users/jack/Desktop/PhD/Research/Python_Envs/personal_site
 version:        3.11.6 (main, Nov  2 2023, 04:52:24) [Clang 14.0.3 (clang-1403.0.22.14.1)]
 numpy:          /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/lib/python3.11/site-packages/numpy
 numpy_version:  1.26.3
 leidenalg:      /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/lib/python3.11/site-packages/leidenalg
 
 NOTE: Python version was forced by use_python() function

──────────────────────────────────────────────────────────────────────────────

1 Introduction

2 Libraries

3 Helper functions

4 Simulating scRNA-seq data

5 Analysis

5.1 Preprocessing

5.2 Pseudotime estimation

5.3 Estimating trajectory dynamics

5.4 Building a GRN from scratch

5.4.1 Estimating accuracy

5.5 Building a GRN with epoch

5.5.1 Estimating accuracy

6 Conclusions

7 Session info

5.5 Building a GRN with `epoch`