Building a Trajectory Gene Regulatory Network with scRNA-seq Data

Author
Affiliation

Jack R. Leary

University of Florida

Published

April 16, 2024

1 Introduction

In this analysis we’ll be looking at cells undergoing neurogenesis in the developing human frontal cortex. The data are taken from Nowakowski et al (2017). Our goal is to build a simple method for fitting single cell gene regulatory networks (GRNs) and compare the results across pseudotime lineages, keeping in mind the trajectory structure of the data. If you want to skip straight to the analysis, head to Section 6.

2 Libraries

We start by loading in a few necessary packages & resolving some function conflicts.

Code
library(dplyr)                 # data manipulation
library(Seurat)                # scRNA-seq tools
library(scLANE)                # trajectory DE
library(igraph)                # graph utilities
library(ggplot2)               # pretty plots
library(biomaRt)               # gene annotation
library(foreach)               # parallel loops
library(ggnetwork)             # plotting graphs
library(slingshot)             # pseudotime estimation
library(patchwork)             # plot alignment
library(SingleCellExperiment)  # scRNA-seq data structures
rename <- dplyr::rename
select <- dplyr::select

3 Helper functions

We define a helper function to make our legends look prettier.

Code
guide_umap <- function(key.size = 4) {
  ggplot2::guides(color = ggplot2::guide_legend(override.aes = list(size = key.size, 
                                                                    alpha = 1, 
                                                                    stroke = 0.25)))
}

Next, we assign a few consistent color palettes that we’ll use throughout.

Code
palette_heatmap <- paletteer::paletteer_d("MetBrewer::Hiroshige", direction = -1)
palette_celltype <- paletteer::paletteer_d("ggsci::category10_d3")
palette_cluster <- paletteer::paletteer_d("ggsci::default_locuszoom")
palette_timepoint <- paletteer::paletteer_d("ggsci::default_igv")

4 Data

Unfortunately the data that are made available as part of the scRNAseq package come in a transcripts-per-million (TPM) normalized matrix - not raw counts. A rough conversion back to the raw integer counts is to multiply each cell’s normalized expression vector by the TPM size factor. This gets us most of the way there but the results are still estimates, so we round them to the nearest integer.

Note

This conversion ignores gene length, which is why the results are estimates & require rounding. For a more detailed discussion on how to un-normalize TPM counts see this StackOverflow post. While the original authors used a plate-based protocol (Fluidigm C1) they do not mention whether they used unique molecular identifiers (UMIs); if UMIs were used then gene length normalization isn’t necessary. Since simply reversing the TPM transformation doesn’t result in integer values I would guess that UMIs were not used, though their Supplementary Methods don’t mention gene length normalization.

Code
sce_cortex <- scRNAseq::NowakowskiCortexData()
counts_matrix <- Matrix::t(sce_cortex@assays@data$tpm)
scale_factors <- Matrix::rowSums(counts_matrix) / 1e6
counts_matrix <- round(Matrix::t(counts_matrix) * scale_factors)
seu_cortex <- CreateSeuratObject(counts_matrix, 
                                 assay = "RNA", 
                                 meta.data = as.data.frame(colData(sce_cortex)), 
                                 project = "cortex", 
                                 min.cells = 5,
                                 min.features = 0)
excitatory_neuron_celltypes <- c("nEN-late", "nEN-early1", "nEN-early2",
                                 "EN-V1-1", "EN-V1-2", "EN-V1-3", 
                                 "EN-PFC1", "EN-PFC2", "EN-PFC3")
seu_cortex <- seu_cortex[, seu_cortex$WGCNAcluster %in% excitatory_neuron_celltypes]
seu_cortex@meta.data <- mutate(seu_cortex@meta.data, 
                               celltype_original = case_when(WGCNAcluster == "nEN-early1" ~ "Early Exc. Neuron 1", 
                                                             WGCNAcluster == "nEN-early2" ~ "Early Exc. Neuron 2", 
                                                             WGCNAcluster == "nEN-late" ~ "Late Exc. Neuron", 
                                                             WGCNAcluster == "EN-V1-1" ~ "Exc. Neuron 1", 
                                                             WGCNAcluster == "EN-V1-2" ~ "Exc. Neuron 2", 
                                                             WGCNAcluster == "EN-V1-3" ~ "Exc. Neuron 3", 
                                                             WGCNAcluster == "EN-PFC1" ~ "PFC Exc. Neuron 1", 
                                                             WGCNAcluster == "EN-PFC2" ~ "PFC Exc. Neuron 2", 
                                                             WGCNAcluster == "EN-PFC3" ~ "PFC Exc. Neuron 3", 
                                                             TRUE ~ NA_character_))

5 Preprocessing

We run the cells through a basic preprocessing pipeline: QC, normalization, highly variable gene (HVG) identification, PCA, UMAP, and graph-based Leiden clustering.

Code
seu_cortex <- seu_cortex %>% 
              PercentageFeatureSet(pattern = "^MT-", col.name = "percent_mito") %>% 
              PercentageFeatureSet(pattern = "^RP[SL]", col.name = "percent_ribo") %>% 
              NormalizeData(verbose = FALSE) %>% 
              FindVariableFeatures(nfeatures = 3000, verbose = FALSE) %>% 
              ScaleData(verbose = FALSE) %>% 
              RunPCA(npcs = 50, 
                     approx = TRUE, 
                     seed.use = 312, 
                     verbose = FALSE) %>% 
              CellCycleScoring(s.features = cc.genes.updated.2019$s.genes, 
                               g2m.features = cc.genes.updated.2019$g2m.genes, 
                               set.ident = FALSE) %>% 
              RunUMAP(reduction = "pca", 
                      dims = 1:30,
                      n.neighbors = 20, 
                      n.components = 2, 
                      metric = "cosine", 
                      n.epochs = 750,
                      seed.use = 312, 
                      verbose = FALSE) %>% 
              FindNeighbors(reduction = "pca", 
                            k.param = 20,
                            nn.method = "annoy", 
                            annoy.metric = "cosine", 
                            verbose = FALSE) %>% 
              FindClusters(algorithm = 4, 
                           method = "igraph", 
                           resolution = 0.5, 
                           random.seed = 312, 
                           verbose = FALSE)

Here we examine some of our dataset’s attributes - the original celltype annotations, our new clusters, age in post-conception weeks (PCW), and the proportion of each cell’s reads aligning to ribosomal genes.

Code
p1 <- DimPlot(seu_cortex, 
              group.by = "celltype_original",
              pt.size = 1.25,  
              alpha = 0.75) + 
      scale_color_manual(values = palette_celltype) + 
      theme_scLANE(umap = TRUE) + 
      labs(color = "Celltype") + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank()) + 
      guide_umap()
p2 <- FeaturePlot(seu_cortex, 
                  features = "Age_in_Weeks", 
                  pt.size = 1.25,  
                  alpha = 0.75) + 
      scale_color_gradientn(colors = palette_heatmap) + 
      labs(color = "Age (weeks)") + 
      theme_scLANE(umap = TRUE) + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank())
p3 <- DimPlot(seu_cortex, 
              group.by = "seurat_clusters", 
              pt.size = 1.25,  
              alpha = 0.75) + 
      scale_color_manual(values = palette_cluster) + 
      labs(color = "Leiden Cluster") + 
      theme_scLANE(umap = TRUE) + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank()) + 
      guide_umap()
p4 <- FeaturePlot(seu_cortex, 
                  features = "percent_ribo", 
                  pt.size = 1.25,  
                  alpha = 0.75) + 
      scale_color_gradientn(colors = palette_heatmap, labels = scales::label_percent(accuracy = 1, scale = 1)) + 
      labs(color = "Ribosomal\nReads") + 
      theme_scLANE(umap = TRUE) + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank())
p5 <- (p1 / p2) | (p3 / p4)
p5 <- ggpubr::annotate_figure(ggpubr::ggarrange(p5), 
                              bottom = "UMAP 1",
                              left = "UMAP 2")
p5
Figure 1: Characteristics of the cortical development dataset.

6 Analysis

6.1 Celltype annotation

We begin by running a basic differential expression (DE) testing routine to identify some potential marker genes for each cluster. This will help us assign a celltype identity to each cluster.

Tip

As of Seurat v5, if you install the presto package using remotes::install_github("immunogenomics/presto"), the Wilcox tests performed here will be much, much faster.

Code
cluster_markers <- FindAllMarkers(seu_cortex, 
                                  assay = "RNA", 
                                  logfc.threshold = 0.25, 
                                  test.use = "wilcox", 
                                  slot = "data", 
                                  min.pct = 0.1, 
                                  only.pos = TRUE, 
                                  verbose = FALSE, 
                                  random.seed = 312)
top5_cluster_markers <- arrange(cluster_markers, p_val_adj) %>% 
                        with_groups(cluster, 
                                    slice_head, 
                                    n = 5)

We visualize the markers for each cluster with a dotplot, which shows clear differences between clusters. For example, the transcription factor (TF) aristaless related homeobox (ARX) is specifically expressed in cluster 7; this TF’s expression is known to be necessary for healthy neuronal development and helps to regulate differentiation.

Code
p6 <- DotPlot(seu_cortex, 
              features = unique(top5_cluster_markers$gene),
              assay = "RNA", 
              group.by = "seurat_clusters", 
              dot.scale = 5,
              cols = paletteer::paletteer_d("wesanderson::Zissou1")[c(1, 5)], 
              scale.by = "radius") + 
      coord_flip() +
      labs(y = "Leiden Cluster") +
      theme_scLANE() + 
      theme(axis.title.y = element_blank(), 
            axis.text.y = element_text(face = "italic"))
p6
Figure 2: Putative marker genes for each cluster.

Now we identify markers for each celltype:

Code
Idents(seu_cortex) <- "celltype_original"
celltype_markers <- FindAllMarkers(seu_cortex, 
                                   assay = "RNA", 
                                   logfc.threshold = 0.25, 
                                   test.use = "wilcox", 
                                   slot = "data", 
                                   min.pct = 0.1, 
                                   only.pos = TRUE, 
                                   verbose = FALSE, 
                                   random.seed = 312)
top3_celltype_markers <- arrange(celltype_markers, p_val_adj) %>% 
                         with_groups(cluster, 
                                     slice_head, 
                                     n = 3)

The markers seem consistent with what we’d expect for excitatory neurons; for more information see e.g., Figure 4 and Suppl. Figures 4 & 11 in the original publication.

Code
select(top3_celltype_markers, 
       celltype = cluster, 
       gene, 
       avg_log2FC, 
       p_val_adj) %>% 
  kableExtra::kbl(digits = 3, 
                  booktabs = TRUE, 
                  col.names = c("Celltype", "Gene", "Mean log2FC", "Adj. P-value")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")
Celltype Gene Mean log2FC Adj. P-value
Early Exc. Neuron 2 NRP1 1.124 0
Early Exc. Neuron 2 DPY19L1 1.096 0
Early Exc. Neuron 2 SOX11 0.624 0
Late Exc. Neuron ZFHX4 2.034 0
Late Exc. Neuron RP11.436D23.1 1.751 0
Late Exc. Neuron LINC00478 2.164 0
Exc. Neuron 2 DOK5 2.873 0
Exc. Neuron 2 SATB2 1.677 0
Exc. Neuron 2 STMN2 0.857 0
Exc. Neuron 3 ACTN2 4.731 0
Exc. Neuron 3 LPL 3.325 0
Exc. Neuron 3 UNC5C 6.760 0
PFC Exc. Neuron 3 RP11.98J23.2 3.008 0
PFC Exc. Neuron 3 RP11.589F5.4 3.111 0
PFC Exc. Neuron 3 RP11.1415C14.3 3.198 0
Exc. Neuron 1 CRYM 3.523 0
Exc. Neuron 1 GRIK3 2.968 0
Exc. Neuron 1 FEZF2 3.872 0
PFC Exc. Neuron 1 CDH18 3.441 0
PFC Exc. Neuron 1 SORCS1 4.858 0
PFC Exc. Neuron 1 COL19A1 9.146 0
Early Exc. Neuron 1 NDST4 3.965 0
Early Exc. Neuron 1 STK17B 3.639 0
Early Exc. Neuron 1 CNTNAP2 2.501 0
PFC Exc. Neuron 2 CPNE8 5.170 0
PFC Exc. Neuron 2 CBLN2 4.410 0
PFC Exc. Neuron 2 SLN 4.456 0
Table 1: The top-3 markers genes for each identified celltype.

6.2 Pseudotime estimation

After estimating principal curves across our UMAP embedding with slingshot, we generate per-lineage pseudotime values for each cell. We use our clustering as a source of structure in the data, and assign start and end clusters based on age in PCW. In addition, we mean-aggregate the pseudotime values for each cell in order to get a total pseudotime which represents overall differentiation regardless of lineage.

Code
sling_res <- slingshot(Embeddings(seu_cortex, "umap"), 
                       clusterLabels = seu_cortex$seurat_clusters, 
                       start.clus = c("3"), 
                       end.clus = c("5", "6", "7"), 
                       approx_points = 1e3)
sling_curves <- slingCurves(sling_res, as.df = TRUE)
sling_mst <- slingMST(sling_res, as.df = TRUE)
sling_pt <- as.data.frame(slingPseudotime(sling_res)) %>% 
            rowwise() %>% 
            mutate(PT_Overall = mean(c_across(starts_with("Lineage")), na.rm = TRUE)) %>% 
            ungroup() %>% 
            mutate(across(c(starts_with("Lineage"), PT_Overall), 
                          \(x) (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))))
seu_cortex <- AddMetaData(seu_cortex, 
                          metadata = sling_pt$PT_Overall, 
                          col.name = "PT_Overall")

The graph structure estimate by slingshot is called a minimum spanning tree (MST), and describes the relationships between clusters in an undirected manner.

Code
p7 <- data.frame(Embeddings(seu_cortex, "umap")) %>% 
      mutate(leiden = seu_cortex$seurat_clusters) %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = leiden)) + 
      geom_point(size = 1.25, alpha = 0.75) + 
      geom_path(data = sling_mst, mapping = aes(x = umap_1, y = umap_2, group = Lineage), 
                linewidth = 1.25, 
                color = "black") + 
      geom_point(data = sling_mst, mapping = aes(x = umap_1, y = umap_2, fill = Cluster), 
                color = "black", 
                shape = 21, 
                size = 4.5, 
                stroke = 1.25, 
                show.legend = FALSE) + 
      scale_color_manual(values = palette_cluster) + 
      scale_fill_manual(values = palette_cluster) + 
      labs(x = "UMAP 1", y = "UMAP 2") + 
      theme_scLANE(umap = TRUE) +  
      theme(legend.title = element_blank()) + 
      guide_umap()
p7
Figure 3: UMAP embedding with MST from Slinghot overlaid.

Since our data have real-life experimental timepoints, it makes sense to normalize the pseudotime within each lineage to \([0, 1]\).

Code
p8 <- data.frame(Embeddings(seu_cortex, "umap")) %>% 
      bind_cols(sling_pt) %>% 
      tidyr::pivot_longer(starts_with("Lineage"), 
                          names_to = "lineage", 
                          values_to = "pseudotime") %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = pseudotime)) + 
      facet_wrap(~lineage, nrow = 3) + 
      geom_point(size = 1.25, alpha = 0.75) + 
      labs(x = "UMAP 1", 
           y = "UMAP 2", 
           color = "Pseudotime") + 
      scale_color_gradientn(colors = palette_heatmap, labels = scales::label_number(accuracy = .01)) + 
      theme_scLANE(umap = TRUE)
p8
Figure 4: UMAP embedding colored by per-lineage pseudotime.

The principal curves show a trifurcating lineage structure.

Code
p9 <- data.frame(Embeddings(seu_cortex, "umap")) %>% 
      mutate(leiden = seu_cortex$seurat_clusters) %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = leiden)) + 
      geom_point(size = 1.25, alpha = 0.75) + 
      geom_path(data = sling_curves,
                mapping = aes(x = umap_1, y = umap_2, group = Lineage), 
                color = "black", 
                linewidth = 1.5, 
                alpha = 0.75, 
                lineend = "round") + 
      scale_color_manual(values = palette_cluster) + 
      labs(x = "UMAP 1", y = "UMAP 2") + 
      theme_scLANE(umap = TRUE) + 
      theme(legend.title = element_blank()) + 
      guides(color = guide_legend(override.aes = list(size = 4, alpha = 1, stroke = 0.25)))
p9
Figure 5: UMAP embedding with principal curves from Slinghot overlaid in black.

Lastly, we plot the distribution of mean-aggregated pseudotime per-timepoint and see that later timepoints have, in general, larger values of overall pseudotime. This indicates that our estimated pseudotime is linked to real experimental time in a meaningful (though fairly noisy) way.

Code
p10 <- select(seu_cortex@meta.data, 
              Age_in_Weeks, 
              PT_Overall) %>% 
       mutate(Age_in_Weeks = as.factor(Age_in_Weeks)) %>% 
       ggplot(aes(x = Age_in_Weeks, y = PT_Overall, color = Age_in_Weeks)) + 
       ggbeeswarm::geom_quasirandom(size = 1.25, 
                                    alpha = 0.75, 
                                    show.legend = FALSE) + 
       stat_summary(geom = "point", 
                    fun = "mean", 
                    color = "black",
                    size = 3) + 
       scale_color_manual(values = palette_timepoint) + 
       labs(x = "Age (PCW)", y = "Mean Psueodtime") + 
       theme_scLANE()
p10
Figure 6: Beeswarm plot displaying the distribution of mean-aggregated pseudotime for each timepoint.

6.3 Trajectory DE testing

Using scLANE (see the GitHub repository or the preprint for details) we perform trajectory DE significance testing for each HVG across each lineage.

Code
top3k_hvg <- HVFInfo(seu_cortex) %>% 
             arrange(desc(variance.standardized)) %>% 
             slice_head(n = 3000) %>% 
             rownames(.)
cell_offset <- createCellOffset(seu_cortex)
pt_df <- select(sling_pt, -PT_Overall)
scLANE_models <- testDynamic(seu_cortex, 
                             pt = pt_df, 
                             genes = top3k_hvg,
                             size.factor.offset = cell_offset, 
                             n.cores = 6L, 
                             verbose = FALSE)
scLANE testing completed for 3000 genes across 3 lineages in 14.449 mins
Code
scLANE_de_res <- getResultsDE(scLANE_models)

We now have lineage-specific trajectory DE statistics for each gene:

Code
select(scLANE_de_res, 
       Gene, 
       Lineage, 
       Test_Stat, 
       P_Val, 
       P_Val_Adj,
       Gene_Dynamic_Lineage, 
       Gene_Dynamic_Overall) %>% 
  mutate(Gene_Dynamic_Lineage = if_else(Gene_Dynamic_Lineage == 1, "Dynamic", "Static"), 
         Gene_Dynamic_Overall = if_else(Gene_Dynamic_Overall == 1, "Dynamic", "Static")) %>% 
  slice_head(n = 10) %>% 
  kableExtra::kbl(digits = 4, 
                  booktabs = TRUE, 
                  col.names = c("Gene", "Lineage", "LRT stat.", "P-value", "Adj. p-value", "Gene status (lineage)", "Gene status (overall)")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")
Gene Lineage LRT stat. P-value Adj. p-value Gene status (lineage) Gene status (overall)
LMO3 B 4066855 0 0 Dynamic Dynamic
LMO3 A 4040903 0 0 Dynamic Dynamic
MEF2C B 3693517 0 0 Dynamic Dynamic
LMO3 C 3514314 0 0 Dynamic Dynamic
LIMCH1 A 3230531 0 0 Dynamic Dynamic
LIMCH1 B 3156160 0 0 Dynamic Dynamic
LINC01102 B 2912758 0 0 Dynamic Dynamic
LINC01102 A 2887690 0 0 Dynamic Dynamic
SHISA2 B 2121615 0 0 Dynamic Dynamic
LINC01102 C 2077441 0 0 Dynamic Dynamic
Table 2: The top-10 most DE genes from scLANE.

We identify a set of dynamic genes - dynamic meaning DE over any of the three lineages - to be used for downstream analysis.

Code
dyn_genes <- filter(scLANE_de_res, Gene_Dynamic_Overall == 1) %>% 
             distinct(Gene) %>% 
             pull(Gene)

Lastly, we pull a table of gene dynamics for each lineage.

Code
smoothed_counts <- smoothedCountsMatrix(scLANE_models, 
                                        size.factor.offset = cell_offset, 
                                        pt = pt_df, 
                                        genes = dyn_genes, 
                                        log1p.norm = TRUE, 
                                        n.cores = 4L)

6.4 Constructing a GRN from scratch

Building a GRN requires two main inputs - a set of transcription factors, and a set of potential target genes. We load a database of TFs from Lambert et al (2018).

Code
hs_ensembl <- useMart("ensembl",
                      dataset = "hsapiens_gene_ensembl", 
                      host = "https://useast.ensembl.org")
hs_tf_raw <- readr::read_csv("http://humantfs.ccbr.utoronto.ca/download/v_1.01/DatabaseExtract_v_1.01.csv",
                             col_select = -1,
                             num_threads = 2L,
                             show_col_types = FALSE) %>%
             janitor::clean_names() %>%
             filter(is_tf == "Yes") %>%
             select(ensembl_id)
hs_tfs <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol", "entrezgene_id", "description", "gene_biotype"),
                filters = "ensembl_gene_id",
                values = hs_tf_raw$ensembl_id,
                mart = hs_ensembl,
                uniqueRows = TRUE) %>%
          rename(ensembl_id = ensembl_gene_id,
                 entrez_id = entrezgene_id) %>%
          arrange(ensembl_id) %>%
          mutate(hgnc_symbol = if_else(hgnc_symbol == "", NA_character_, hgnc_symbol),
                 description = gsub("\\[Source.*", "", description))

Along with the HGNC symbol, we used BiomaRt to retrieve the Ensembl & Entrez IDs and a description for every TF.

Code
slice_sample(hs_tfs, n = 10) %>% 
  kableExtra::kbl(booktabs = TRUE, 
                  col.names = c("Ensembl ID", "HGNC symbol", "Entrez ID", "Description", "Biotype")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")
Ensembl ID HGNC symbol Entrez ID Description Biotype
ENSG00000173041 ZNF680 340252 zinc finger protein 680 protein_coding
ENSG00000149922 TBX6 6911 T-box transcription factor 6 protein_coding
ENSG00000085274 MYNN 55892 myoneurin protein_coding
ENSG00000169740 ZNF32 7580 zinc finger protein 32 protein_coding
ENSG00000214534 ZNF705EP NA zinc finger protein 705E, pseudogene transcribed_unprocessed_pseudogene
ENSG00000129071 MBD4 8930 methyl-CpG binding domain 4, DNA glycosylase protein_coding
ENSG00000114853 ZBTB47 92999 zinc finger and BTB domain containing 47 protein_coding
ENSG00000111206 FOXM1 2305 forkhead box M1 protein_coding
ENSG00000184436 THAP7 80764 THAP domain containing 7 protein_coding
ENSG00000196172 ZNF681 148213 zinc finger protein 681 protein_coding
Table 3: Human transcription factors.

We identify a subset of all the TFs that are dynamic across our trajectory. Since TFs can regulate the expression of other TFs as they do non-TF target genes, we will include TFs in the set of “target genes” that we model.

Code
dynamic_tfs <- hs_tfs$hgnc_symbol[hs_tfs$hgnc_symbol %in% dyn_genes]

The GRN has the following structure - for every target gene, a penalized model is fit with the dynamics of every possible TF as features (see below). We use the Gamma distribution with log link as our regression objective as the dynamics are continuous and strictly non-negative. The model’s internal regularization identifies a subset of TFs whose dynamics are predictive of the target gene’s dynamics. See e.g., the original XGBoost paper for details.

\[ \text{Dynamics}_{\text{Target}} = \begin{bmatrix} \text{Dynamics}_{\text{TF}_1} & \text{Dynamics}_{\text{TF}_2} & \dots & \text{Dynamics}_{\text{TF}_N} \end{bmatrix} \tag{1}\]

Our GRNs will be lineage-specific, since the matrix of gene dynamics differs by lineage. First we fit the GRN for lineage A; I’ve opted to use the LightGBM package to fit the models as it’s lightweight and in general much faster than other options such as XGBoost. In addition, I prefer this method to alternative penalized methods such as LASSO - boosted trees better capture nonlinearities in the data, though LASSO does offer advantages such as significance testing for coefficients and linear interpretability. Both methods have built-in cross-validation (CV) routines (lightgbm::lgb.cv() and glmnet::cv.glmnet(), respectively); here we use 5-fold CV to select the best set of hyperparameters before training the final model. Lastly, instead of the default booster we utilize a DART booster, which purports to improve on the overspecialization problem suffered by the default booster - see the DART preprint for details.

Note

Since we’re using the foreach and doSNOW packages to loop over the set of target genes in parallel, we need to ensure that our LightGBM model only uses a single thread when fitting the model. This is done by setting the parameters tree_learner = "serial" and num_threads = 1L in the model parameters list.

Code
cl <- parallel::makeCluster(6L)
doSNOW::registerDoSNOW(cl)
lgbm_params <- list(objective = "gamma", 
                    tree_learner = "serial", 
                    metric = "l2", 
                    boosting = "dart", 
                    num_threads = 1L, 
                    device_type = "cpu",
                    seed = 312)
grn_res_linA <- foreach(g = seq(dyn_genes), 
                        .combine = "list",
                        .multicombine = ifelse(length(dyn_genes) > 1, TRUE, FALSE),
                        .maxcombine = ifelse(length(dyn_genes) > 1, length(dyn_genes), 2),
                        .packages = c("lightgbm", "dplyr"),
                        .errorhandling = "pass",
                        .inorder = TRUE,
                        .verbose = FALSE) %dopar% {
                          feature_mat <- smoothed_counts$Lineage_A[, dynamic_tfs]
                          feature_mat <- feature_mat[, colnames(feature_mat) != dyn_genes[g]]  # remove TF as a predictor of itself
                          resp_var <- smoothed_counts$Lineage_A[, dyn_genes[g]]
                          lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                          lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                      data = lgbm_data, 
                                                      nrounds = 100L, 
                                                      nfold = 5L, 
                                                      stratified = FALSE)
                          lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = lgbm_cv$best_iter)
                          imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                       dplyr::mutate(Lineage = "A", 
                                                     Target_Gene = dyn_genes[g], 
                                                     Target_Gene_Type = if_else(Target_Gene %in% dynamic_tfs, "TF", "Target Gene"), 
                                                     .before = 1)
                          imp_table
                        }
names(grn_res_linA) <- dyn_genes
grn_table_linA <- purrr::imap(grn_res_linA, \(x, y) {
  if (!inherits(x, "data.frame")) {
    empty_res <- data.frame(Lineage = "A", 
                            Target_Gene = y, 
                            Target_Gene_Type = NA_character_, 
                            Feature = NA_character_, 
                            Gain = NA_real_, 
                            Cover = NA_real_, 
                            Frequency = NA_real_)
    return(empty_res)
  } else {
    return(x)
  }
})

We coerce the model output into a tidy table. The two importance metrics we’re interested in are gain (the improvement in L2 loss of a certain TF) and frequency (the proportion of times a TF was included in the model). For more information see this StackExchange post or the XGBoost docs. As such, we calculate the rank of each target gene for gain and frequency, then mean-aggregate the two ranks in order to produce an average rank of how relatively strong the relationship between each TF & target gene is.

Code
grn_table_linA <- purrr::reduce(grn_table_linA, rbind) %>% 
                  select(-Cover) %>% 
                  rename(Tx_Factor = Feature) %>% 
                  filter(!is.na(Target_Gene), 
                         !is.na(Tx_Factor)) %>% 
                  arrange(Tx_Factor, desc(Frequency)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Frequency_Rank = row_number()) %>% 
                  arrange(Tx_Factor, desc(Gain)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Gain_Rank = row_number()) %>% 
                  rowwise() %>% 
                  mutate(Mean_Rank = mean(c_across(c(Frequency_Rank, Gain_Rank)))) %>% 
                  ungroup() %>% 
                  arrange(Tx_Factor, Mean_Rank) %>% 
                  mutate(Mean_Rank_Weight = 1 / Mean_Rank)

Next we add the Spearman correlation and mutual information (MI) between TF and target gene dynamics, as well as the normalized gene expression correlation and MI. We denote the relationship between TF and target gene as repression if the correlation between their dynamics is negative, and activation if the correlation is positive.

Warning

Usually the correlations between TF and target dynamics and TF and target expression are similar, or at least have the same directionality (positive or negative). However, when expression patterns are noisy, the correlation directions might differ (see Table 4 for examples). In this case, we’re opting to trust the dynamics more, as they are a smoothed version of expression - though an argument could be made for the opposite stance as well.

Code
dyn_cormat <- cor(smoothed_counts$Lineage_A, method = "spearman")
dyn_mimat <- minet::build.mim(smoothed_counts$Lineage_A, 
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
exp_cormat <- cor(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage1))])), method = "spearman")
exp_mimat <- minet::build.mim(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage1))])),  
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
grn_table_linA <- mutate(grn_table_linA, 
                         Dyn_Cor = NA_real_, 
                         Exp_Cor = NA_real_, 
                         Dyn_MI = NA_real_, 
                         Exp_MI = NA_real_)
for (i in seq(nrow(grn_table_linA))) {
  grn_table_linA$Dyn_Cor[i] <- dyn_cormat[grn_table_linA$Target_Gene[i], grn_table_linA$Tx_Factor[i]]
  grn_table_linA$Dyn_MI[i] <- dyn_mimat[grn_table_linA$Target_Gene[i], grn_table_linA$Tx_Factor[i]]
  grn_table_linA$Exp_Cor[i] <- exp_cormat[grn_table_linA$Target_Gene[i], grn_table_linA$Tx_Factor[i]]
  grn_table_linA$Exp_MI[i] <- exp_mimat[grn_table_linA$Target_Gene[i], grn_table_linA$Tx_Factor[i]]
}
grn_table_linA <- mutate(grn_table_linA, State = if_else(Dyn_Cor < 0, "Repression", "Activation"))

The output is shown below. Interestingly, the mutual information seems to be generally higher for the dynamics than for expression - this indicates that our usage of the dynamics as a de-noising technique for the GRN might be working well.

Code
select(grn_table_linA, 
       Lineage, 
       Tx_Factor, 
       Target_Gene, 
       Gain, 
       Frequency, 
       Mean_Rank, 
       Dyn_Cor, 
       Exp_Cor, 
       Dyn_MI, 
       Exp_MI,
       State) %>% 
  slice_sample(n = 10) %>% 
  kableExtra::kbl(digits = 3, 
                  booktabs = TRUE, 
                  col.names = c("Lineage", "TF", "Target", "Gain", "Frequency", "Mean rank", 
                                "Dynamics cor.", "Expression cor.", "Dynamics MI", "Expression MI", "State")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")
Lineage TF Target Gain Frequency Mean rank Dynamics cor. Expression cor. Dynamics MI Expression MI State
A SIX4 RP11.9E17.1 0.011 0.161 106.5 -0.277 -0.011 0.649 0.000 Repression
A CREB5 CACNG3 0.000 0.028 116.0 -0.425 -0.064 0.515 0.004 Repression
A DLX5 CDYL2 0.200 0.192 13.5 -0.962 -0.015 0.413 0.000 Repression
A ZNF484 PRSS12 0.001 0.066 58.0 0.556 0.018 1.293 0.078 Activation
A CBX2 MMS22L 0.011 0.021 72.0 0.816 0.013 2.088 0.033 Activation
A EGR1 KIAA1524 0.000 0.018 192.5 -0.098 -0.049 1.409 0.060 Repression
A SATB1 ABLIM1 0.000 0.003 144.0 -0.148 0.038 2.047 0.110 Repression
A ZNF165 HNRNPA1L2 0.000 0.002 114.0 -0.011 -0.047 0.551 0.002 Repression
A EGR1 CNTNAP4 0.005 0.013 156.0 -0.357 -0.082 0.681 0.021 Repression
A ARID5B SLC1A2 0.395 0.025 49.5 0.654 0.069 1.725 0.073 Activation
Table 4: GRN for dynamic genes across Lineage A.

We repeat the fitting process for the other two lineages (expand this code block to see the details). Skip to Section 7 if you want a version of this logic that’s been wrapped into a function!

Code
grn_res_linB <- foreach(g = seq(dyn_genes), 
                        .combine = "list",
                        .multicombine = ifelse(length(dyn_genes) > 1, TRUE, FALSE),
                        .maxcombine = ifelse(length(dyn_genes) > 1, length(dyn_genes), 2),
                        .packages = c("lightgbm", "dplyr"),
                        .errorhandling = "pass",
                        .inorder = TRUE,
                        .verbose = FALSE) %dopar% {
                          feature_mat <- smoothed_counts$Lineage_B[, dynamic_tfs]
                          feature_mat <- feature_mat[, colnames(feature_mat) != dyn_genes[g]]  # remove TF as a predictor of itself
                          resp_var <- smoothed_counts$Lineage_B[, dyn_genes[g]]
                          lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                          lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                      data = lgbm_data, 
                                                      nrounds = 100L, 
                                                      nfold = 5L, 
                                                      stratified = FALSE)
                          lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = lgbm_cv$best_iter)
                          imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                       dplyr::mutate(Lineage = "B", 
                                                     Target_Gene = dyn_genes[g], 
                                                     Target_Gene_Type = if_else(Target_Gene %in% dynamic_tfs, "TF", "Target Gene"),
                                                     .before = 1)
                          imp_table
                        }
names(grn_res_linB) <- dyn_genes
grn_table_linB <- purrr::imap(grn_res_linB, \(x, y) {
  if (!inherits(x, "data.frame")) {
    empty_res <- data.frame(Lineage = "B", 
                            Target_Gene = y, 
                            Target_Gene_Type = NA_character_, 
                            Feature = NA_character_, 
                            Gain = NA_real_, 
                            Cover = NA_real_, 
                            Frequency = NA_real_)
    return(empty_res)
  } else {
    return(x)
  }
})
grn_table_linB <- purrr::reduce(grn_table_linB, rbind) %>% 
                  select(-Cover) %>% 
                  rename(Tx_Factor = Feature) %>% 
                  filter(!is.na(Target_Gene), 
                         !is.na(Tx_Factor)) %>% 
                  arrange(Tx_Factor, desc(Frequency)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Frequency_Rank = row_number()) %>% 
                  arrange(Tx_Factor, desc(Gain)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Gain_Rank = row_number()) %>% 
                  rowwise() %>% 
                  mutate(Mean_Rank = mean(c_across(c(Frequency_Rank, Gain_Rank)))) %>% 
                  ungroup() %>% 
                  arrange(Tx_Factor, Mean_Rank) %>% 
                  mutate(Mean_Rank_Weight = 1 / Mean_Rank)
dyn_cormat <- cor(smoothed_counts$Lineage_B, method = "spearman")
dyn_mimat <- minet::build.mim(smoothed_counts$Lineage_B, 
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
exp_cormat <- cor(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage2))])), method = "spearman")
exp_mimat <- minet::build.mim(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage2))])),  
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
grn_table_linB <- mutate(grn_table_linB, 
                         Dyn_Cor = NA_real_, 
                         Exp_Cor = NA_real_, 
                         Dyn_MI = NA_real_, 
                         Exp_MI = NA_real_)
for (i in seq(nrow(grn_table_linB))) {
  grn_table_linB$Dyn_Cor[i] <- dyn_cormat[grn_table_linB$Target_Gene[i], grn_table_linB$Tx_Factor[i]]
  grn_table_linB$Dyn_MI[i] <- dyn_mimat[grn_table_linB$Target_Gene[i], grn_table_linB$Tx_Factor[i]]
  grn_table_linB$Exp_Cor[i] <- exp_cormat[grn_table_linB$Target_Gene[i], grn_table_linB$Tx_Factor[i]]
  grn_table_linB$Exp_MI[i] <- exp_mimat[grn_table_linB$Target_Gene[i], grn_table_linB$Tx_Factor[i]]
}
grn_table_linB <- mutate(grn_table_linB, State = if_else(Dyn_Cor < 0, "Repression", "Activation"))
grn_res_linC <- foreach(g = seq(dyn_genes), 
                        .combine = "list",
                        .multicombine = ifelse(length(dyn_genes) > 1, TRUE, FALSE),
                        .maxcombine = ifelse(length(dyn_genes) > 1, length(dyn_genes), 2),
                        .packages = c("lightgbm", "dplyr"),
                        .errorhandling = "pass",
                        .inorder = TRUE,
                        .verbose = FALSE) %dopar% {
                          feature_mat <- smoothed_counts$Lineage_C[, dynamic_tfs]
                          feature_mat <- feature_mat[, colnames(feature_mat) != dyn_genes[g]]  # remove TF as a predictor of itself
                          resp_var <- smoothed_counts$Lineage_C[, dyn_genes[g]]
                          lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                          lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                      data = lgbm_data, 
                                                      nrounds = 100L, 
                                                      nfold = 5L, 
                                                      stratified = FALSE)
                          lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = lgbm_cv$best_iter)
                          imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                       dplyr::mutate(Lineage = "C", 
                                                     Target_Gene = dyn_genes[g], 
                                                     Target_Gene_Type = if_else(Target_Gene %in% dynamic_tfs, "TF", "Target Gene"),
                                                     .before = 1)
                          imp_table
                        }
names(grn_res_linC) <- dyn_genes
grn_table_linC <- purrr::imap(grn_res_linC, \(x, y) {
  if (!inherits(x, "data.frame")) {
    empty_res <- data.frame(Lineage = "C", 
                            Target_Gene = y, 
                            Target_Gene_Type = NA_character_, 
                            Feature = NA_character_, 
                            Gain = NA_real_, 
                            Cover = NA_real_, 
                            Frequency = NA_real_)
    return(empty_res)
  } else {
    return(x)
  }
})
grn_table_linC <- purrr::reduce(grn_table_linC, rbind) %>% 
                  select(-Cover) %>% 
                  rename(Tx_Factor = Feature) %>% 
                  filter(!is.na(Target_Gene), 
                         !is.na(Tx_Factor)) %>% 
                  arrange(Tx_Factor, desc(Frequency)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Frequency_Rank = row_number()) %>% 
                  arrange(Tx_Factor, desc(Gain)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Gain_Rank = row_number()) %>% 
                  rowwise() %>% 
                  mutate(Mean_Rank = mean(c_across(c(Frequency_Rank, Gain_Rank)))) %>% 
                  ungroup() %>% 
                  arrange(Tx_Factor, Mean_Rank) %>% 
                  mutate(Mean_Rank_Weight = 1 / Mean_Rank)
dyn_cormat <- cor(smoothed_counts$Lineage_C, method = "spearman")
dyn_mimat <- minet::build.mim(smoothed_counts$Lineage_C, 
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
exp_cormat <- cor(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage3))])), method = "spearman")
exp_mimat <- minet::build.mim(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage3))])),  
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
grn_table_linC <- mutate(grn_table_linC, 
                         Dyn_Cor = NA_real_, 
                         Exp_Cor = NA_real_, 
                         Dyn_MI = NA_real_, 
                         Exp_MI = NA_real_)
for (i in seq(nrow(grn_table_linC))) {
  grn_table_linC$Dyn_Cor[i] <- dyn_cormat[grn_table_linC$Target_Gene[i], grn_table_linC$Tx_Factor[i]]
  grn_table_linC$Dyn_MI[i] <- dyn_mimat[grn_table_linC$Target_Gene[i], grn_table_linC$Tx_Factor[i]]
  grn_table_linC$Exp_Cor[i] <- exp_cormat[grn_table_linC$Target_Gene[i], grn_table_linC$Tx_Factor[i]]
  grn_table_linC$Exp_MI[i] <- exp_mimat[grn_table_linC$Target_Gene[i], grn_table_linC$Tx_Factor[i]]
}
grn_table_linC <- mutate(grn_table_linC, State = if_else(Dyn_Cor < 0, "Repression", "Activation"))
parallel::stopCluster(cl)

We collate all three GRNs into a single overall table:

Code
grn_table_all <- bind_rows(grn_table_linA, 
                           grn_table_linB, 
                           grn_table_linC) %>% 
                 arrange(Lineage, 
                         Tx_Factor, 
                         Mean_Rank)

For each lineage we create a directed, weighted graph of the relationships between TFs and target genes using the igraph package. In the absence of a better, more sophisticated heuristic, we prune the graphs by selecting just the top 5 TFs per target gene. We use the inverse of the mean feature importance rank as edge weights.

Code
graph_df_linA <- select(grn_table_linA, 
                        Tx_Factor, 
                        Target_Gene,
                        Target_Gene_Type, 
                        Mean_Rank, 
                        Mean_Rank_Weight, 
                        Frequency, 
                        Gain, 
                        Dyn_MI, 
                        State) %>% 
                 arrange(Target_Gene, Mean_Rank) %>% 
                 with_groups(Target_Gene, 
                             slice_head, 
                             n = 5)
graph_obj_linA <- graph_from_data_frame(graph_df_linA, directed = TRUE)
E(graph_obj_linA)$weight <- graph_df_linA$Mean_Rank_Weight
graph_df_linB <- select(grn_table_linB, 
                        Tx_Factor, 
                        Target_Gene,
                        Target_Gene_Type, 
                        Mean_Rank, 
                        Mean_Rank_Weight,
                        Frequency, 
                        Gain, 
                        Dyn_MI, 
                        State) %>% 
                 arrange(Target_Gene, Mean_Rank) %>% 
                 with_groups(Target_Gene, 
                             slice_head, 
                             n = 5)
graph_obj_linB <- graph_from_data_frame(graph_df_linB, directed = TRUE)
E(graph_obj_linB)$weight <- graph_df_linB$Mean_Rank_Weight
graph_df_linC <- select(grn_table_linC, 
                        Tx_Factor, 
                        Target_Gene,
                        Target_Gene_Type, 
                        Mean_Rank, 
                        Mean_Rank_Weight,
                        Frequency, 
                        Gain, 
                        Dyn_MI, 
                        State) %>% 
                 arrange(Target_Gene, Mean_Rank) %>% 
                 with_groups(Target_Gene, 
                             slice_head, 
                             n = 5)
graph_obj_linC <- graph_from_data_frame(graph_df_linC, directed = TRUE)
E(graph_obj_linC)$weight <- graph_df_linC$Mean_Rank_Weight

One target gene of particular interest is LIM domain only 3 (LMO3). This gene is predominantly expressed in the brain, is involved in neurogenesis (source), and potentially acts as an oncogene during neuroblastoma (source) - so a comparison of its behavior across lineages should be pretty neat. We use a Fruchterman-Reingold layout to embed the first-order neighborhood graphs in 2D space via the layout_with_fr() function from igraph. We see that not only do the top target genes differ widely by lineage, so too do the directions of the relationships (activation vs. repression), and the strengths of those relationships. For example, we can see that in lineages A & C that LMO3 is repressed by EOMES and activated by MEF2C - though the MEF2C activation is stronger in lineage C. Conversely, in lineage B LMO3 is activated by both NFIA and NFIB, though neither TF appears to act on LMO3 in either of the other two lineages.

Code
set.seed(12)
ranks <- purrr::map(list(graph_df_linA, graph_df_linB, graph_df_linC), \(x) {
  filter(x, Target_Gene == "LMO3") %>% 
  pull(Mean_Rank_Weight)
})
rank_min <- min(purrr::reduce(ranks, c))
rank_max <- max(purrr::reduce(ranks, c))
LMO3_node <- which(names(V(graph_obj_linA)) == "LMO3")
p11 <- fortify(make_ego_graph(graph_obj_linA, nodes = LMO3_node, order = 1)[[1]], 
               layout = layout_with_fr(graph = make_ego_graph(graph_obj_linA, nodes = LMO3_node, order = 1)[[1]], 
                                       grid = "nogrid", 
                                       weights = E(make_ego_graph(graph_obj_linA, nodes = LMO3_node, order = 1)[[1]])$weights),
               arrow.gap = .05) %>% 
       mutate(Gene_Type = if_else(name %in% hs_tfs$hgnc_symbol, "TF", "Target Gene")) %>% 
       ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + 
       facet_wrap(~"Lineage A") + 
       geom_edges(aes(linewidth = Mean_Rank_Weight, alpha = Mean_Rank_Weight, color = State), 
                  arrow = arrow(length = unit(8, "pt"), type = "closed")) + 
       geom_nodelabel(aes(label = name, fill = Gene_Type), 
                      size = 5) + 
       scale_color_manual(values = c("forestgreen", "firebrick")) + 
       scale_fill_manual(values = c("darkorange2", "dodgerblue3")) + 
       scale_linewidth_continuous(range = c(0.5, 2), limits = c(rank_min, rank_max)) + 
       scale_alpha_continuous(range = c(0.75, 1), limits = c(rank_min, rank_max)) + 
       labs(x = "FR 1", 
            y = "FR 2", 
            color = "TF State", 
            fill = "Gene Type", 
            linewidth = "Rank Weight", 
            alpha = "Rank Weight") + 
       theme_scLANE(umap = TRUE) + 
       guides(fill = guide_legend(override.aes = list(label = ""), order = 1), 
              color = guide_legend(order = 2))
LMO3_node <- which(names(V(graph_obj_linB)) == "LMO3")
p12 <- fortify(make_ego_graph(graph_obj_linB, nodes = LMO3_node, order = 1)[[1]], 
               layout = layout_with_fr(graph = make_ego_graph(graph_obj_linB, nodes = LMO3_node, order = 1)[[1]], 
                                       grid = "nogrid", 
                                       weights = E(make_ego_graph(graph_obj_linB, nodes = LMO3_node, order = 1)[[1]])$weights),
               arrow.gap = .05) %>% 
       mutate(Gene_Type = if_else(name %in% hs_tfs$hgnc_symbol, "TF", "Target Gene")) %>% 
       ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + 
       facet_wrap(~"Lineage B") + 
       geom_edges(aes(linewidth = Mean_Rank_Weight, alpha = Mean_Rank_Weight, color = State), 
                  arrow = arrow(length = unit(8, "pt"), type = "closed")) + 
       geom_nodelabel(aes(label = name, fill = Gene_Type), 
                      size = 5) + 
       scale_color_manual(values = c("forestgreen", "firebrick")) + 
       scale_fill_manual(values = c("darkorange2", "dodgerblue3")) + 
       scale_linewidth_continuous(range = c(0.5, 2), limits = c(rank_min, rank_max)) + 
       scale_alpha_continuous(range = c(0.75, 1), limits = c(rank_min, rank_max)) + 
       labs(x = "FR 1", 
            y = "FR 2", 
            color = "TF State", 
            fill = "Gene Type", 
            linewidth = "Rank Weight", 
            alpha = "Rank Weight") + 
       theme_scLANE(umap = TRUE) + 
       guides(fill = guide_legend(override.aes = list(label = ""), order = 1), 
              color = guide_legend(order = 2))
LMO3_node <- which(names(V(graph_obj_linC)) == "LMO3")
p13 <- fortify(make_ego_graph(graph_obj_linC, nodes = LMO3_node, order = 1)[[1]], 
               layout = layout_with_fr(graph = make_ego_graph(graph_obj_linC, nodes = LMO3_node, order = 1)[[1]], 
                                       grid = "nogrid", 
                                       weights = E(make_ego_graph(graph_obj_linC, nodes = LMO3_node, order = 1)[[1]])$weights),
               arrow.gap = .05) %>% 
       mutate(Gene_Type = if_else(name %in% hs_tfs$hgnc_symbol, "TF", "Target Gene")) %>% 
       ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + 
       facet_wrap(~"Lineage C") + 
       geom_edges(aes(linewidth = Mean_Rank_Weight, alpha = Mean_Rank_Weight, color = State), 
                  arrow = arrow(length = unit(8, "pt"), type = "closed")) + 
       geom_nodelabel(aes(label = name, fill = Gene_Type), 
                      size = 5) + 
       scale_color_manual(values = c("forestgreen", "firebrick")) + 
       scale_fill_manual(values = c("darkorange2", "dodgerblue3")) + 
       scale_linewidth_continuous(range = c(0.5, 2), limits = c(rank_min, rank_max)) + 
       scale_alpha_continuous(range = c(0.75, 1), limits = c(rank_min, rank_max)) + 
       labs(x = "FR 1", 
            y = "FR 2", 
            color = "TF State", 
            fill = "Gene Type", 
            linewidth = "Rank Weight", 
            alpha = "Rank Weight") + 
       theme_scLANE(umap = TRUE) + 
       guides(fill = guide_legend(override.aes = list(label = ""), order = 1), 
              color = guide_legend(order = 2))
p14 <- (p11 / p12 / p13) + 
       plot_layout(guides = "collect", axes = "collect")
p14
Figure 7: Fruchterman-Reingold embeddings of the first-order relationships between LMO3 and lineage-specific TFs. Lines are weighted & shaded by the weighted ranking score between TF & target gene dynamics that we computed earlier.

The make_ego_graph() function allows us to pull out the first-order interactions for a given transcription factor; here we display those (admittedly complex) relationships for eomesodermin (EOMES). This highlights the multi-faceted roles of TFs during development, and the complexity of GRNs in general.

Code
EOMES_node <- which(names(V(graph_obj_linA)) == "EOMES")
set.seed(5)
p15 <- fortify(make_ego_graph(graph_obj_linA, nodes = EOMES_node, order = 1)[[1]], 
               layout = layout_with_fr(graph = make_ego_graph(graph_obj_linA, nodes = EOMES_node, order = 1)[[1]], 
                                       grid = "nogrid", 
                                       weights = E(make_ego_graph(graph_obj_linA, nodes = EOMES_node, order = 1)[[1]])$weights^3),
               arrow.gap = .05) %>% 
       mutate(Gene_Type = if_else(name %in% hs_tfs$hgnc_symbol, "TF", "Target Gene")) %>% 
       ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + 
       facet_wrap(~"Lineage A") + 
       geom_edges(aes(linewidth = Mean_Rank_Weight, alpha = Mean_Rank_Weight, color = State), 
                 arrow = arrow(length = unit(8, "pt"), type = "closed")) + 
       geom_nodelabel(aes(label = name, fill = Gene_Type), 
                     size = 5) + 
       scale_color_manual(values = c("forestgreen", "firebrick")) + 
       scale_fill_manual(values = c("darkorange2", "dodgerblue3")) + 
       scale_linewidth_continuous(range = c(0.5, 2)) + 
       scale_alpha_continuous(range = c(0.75, 1)) + 
       labs(x = "FR 1", 
           y = "FR 2", 
           color = "TF State", 
           fill = "Gene Type", 
           linewidth = "Rank Weight", 
           alpha = "Rank Weight") + 
       theme_scLANE(umap = TRUE) + 
       guides(fill = guide_legend(override.aes = list(label = ""), order = 1), 
             color = guide_legend(order = 2))
p15
Figure 8: Fruchterman-Reingold embedding of the first-order relationships for EOMES. Lines are weighted and shaded as in Figure 7.

7 Conclusions

Hopefully this analysis has demonstrated the utility of trajectory GRNs for comparing gene dynamics across lineages. Beyond just visual comparison of how gene expression changes over pseudotime, trajectory GRNs allow us to identify which transcription factors are activating or suppressing target genes, how top TF targets differ between lineages, and the ways in which TF-target relationships change based on the cell fate towards which cells are progressing. To conclude, I’ve written up the methodology for creating a trajectory GRN into a function, which is shown in the code block below. It includes functionality for a progress bar if desired, and takes as input matrices of gene dynamics (dyn.mat) and normalized expression (expr.mat), a set of dynamic genes (dyn.genes), and a set of transcription factors (tx.factors). Eventually I plan to expand this functionality and turn it into an R package, but for now it’ll stay a simple function.

Code
estimateTGRN <- function(dyn.mat = NULL,
                         expr.mat = NULL, 
                         dyn.genes = NULL, 
                         tx.factors = NULL, 
                         cor.method = "spearman",
                         n.cores = 4L, 
                         verbose = TRUE, 
                         random.seed = 312) {
  # check inputs 
  if (is.null(expr.mat) || is.null(dyn.mat) || is.null(dyn.genes) || is.null(tx.factors)) { stop("Arguments to estimateTGRN() are missing.") }
  # ID transcription factors 
  tx.factors <- tx.factors[tx.factors %in% dyn.genes]
  # set up progress bar if desired
  if (verbose) {
    withr::with_output_sink(tempfile(), {
      pb <- utils::txtProgressBar(0, length(target_genes), style = 3)
    })
    progress_fun <- function(n) utils::setTxtProgressBar(pb, n)
    snow_opts <- list(progress = progress_fun)
  } else {
    snow_opts <- list()
  }
  # set up parallel processing
  if (n.cores > 1L) {
    cl <- parallel::makeCluster(n.cores)
    doSNOW::registerDoSNOW(cl)
  } else {
    cl <- foreach::registerDoSEQ()
  }
  # set up LightGBM model settings
  lgbm_params <- list(objective = "gamma", 
                      tree_learner = "serial", 
                      metric = "l2", 
                      boosting_type = "dart",
                      device = "cpu",
                      num_threads = 1L, 
                      seed = random.seed)
  # estimate trajectory GRN
  grn_res <- foreach::foreach(g = seq(dyn.genes), 
                              .combine = "list",
                              .multicombine = ifelse(length(dyn.genes) > 1, TRUE, FALSE),
                              .maxcombine = ifelse(length(dyn.genes) > 1, length(dyn.genes), 2),
                              .packages = c("lightgbm", "dplyr"),
                              .errorhandling = "pass",
                              .inorder = TRUE,
                              .verbose = FALSE,
                              .options.snow = snow_opts) %dopar% {
                                feature_mat <- dyn.mat[, tx.factors]
                                feature_mat <- feature_mat[, colnames(feature_mat) != dyn.genes[g]] 
                                resp_var <- dyn.mat[, dyn.genes[g]]
                                lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                                lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = 100L, 
                                                            nfold = 5L, 
                                                            stratified = FALSE)
                                lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                                  data = lgbm_data, 
                                                                  nrounds = lgbm_cv$best_iter)
                                imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                             dplyr::mutate(Target_Gene = target_genes[g], .before = 1)
                                imp_table
                              }
  names(grn_res) <- target_genes
  if (n.cores > 1L) {
    parallel::stopCluster(cl)
  }
  # format GRN table
  grn_table <- purrr::imap(grn_res, \(x, y) {
    if (!inherits(x, "data.frame")) {
      empty_res <- data.frame(Target_Gene = y, 
                              Feature = NA_character_, 
                              Gain = NA_real_, 
                              Cover = NA_real_, 
                              Frequency = NA_real_)
      return(empty_res)
    } else {
      return(x)
    }
  })
  # add mean rank of gain & frequency
  grn_table <- purrr::reduce(grn_table, rbind) %>% 
               dplyr::select(-Cover) %>% 
               dplyr::rename(Tx_Factor = Feature) %>% 
               dplyr::filter(!is.na(Target_Gene), 
                             !is.na(Tx_Factor)) %>% 
               dplyr::arrange(Tx_Factor, desc(Frequency)) %>% 
               dplyr::with_groups(Tx_Factor, 
                                  dplyr::mutate, 
                                  Frequency_Rank = row_number()) %>% 
               dplyr::arrange(Tx_Factor, desc(Gain)) %>% 
               dplyr::with_groups(Tx_Factor, 
                                  dplyr::mutate, 
                                  Gain_Rank = row_number()) %>% 
               dplyr::rowwise() %>% 
               dplyr::mutate(Mean_Rank = mean(dplyr::c_across(c(Frequency_Rank, Gain_Rank)))) %>% 
               dplyr::ungroup() %>% 
               dplyr::arrange(Tx_Factor, Mean_Rank) %>% 
               dplyr::mutate(Mean_Rank_Weight = 1 / Mean_Rank)
  # add correlation & mutual information b/t TF and each target gene (dynamics and normalized expression)
  dyn_cormat <- stats::cor(dyn.mat[, dyn.genes], method = cor.method)
  expr_cormat <- stats::cor(expr.mat[, dyn.genes], method = cor.method)
  dyn_mimat <- minet::build.mim(dyn.mat[, dyn.genes], 
                                estimator = "mi.empirical", 
                                disc = "equalwidth")
  expr_mimat <- minet::build.mim(expr.mat[, dyn.genes],  
                                 estimator = "mi.empirical", 
                                 disc = "equalwidth")
  grn_table <- dplyr::mutate(grn_table, 
                             Dyn_Cor = NA_real_, 
                             Exp_Cor = NA_real_, 
                             Dyn_MI = NA_real_, 
                             Exp_MI = NA_real_)
  for (i in seq(nrow(grn_table))) {
    grn_table$Dyn_Cor[i] <- dyn_cormat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
    grn_table$Dyn_MI[i] <- dyn_mimat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
    grn_table$Exp_Cor[i] <- expr_cormat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
    grn_table$Exp_MI[i] <- expr_mimat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
  }
  grn_table <- dplyr::mutate(grn_table, State = dplyr::if_else(Dyn_Cor < 0, "Repression", "Activation"))
  return(grn_table)
}

8 Session info

Code
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Sonoma 14.3
 system   x86_64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-02-27
 pandoc   3.1.9 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package                * version    date (UTC) lib source
 abind                    1.4-5      2016-07-21 [1] CRAN (R 4.3.0)
 AnnotationDbi            1.64.1     2023-11-03 [1] Bioconductor
 AnnotationFilter         1.26.0     2023-10-24 [1] Bioconductor
 AnnotationHub            3.10.0     2023-10-24 [1] Bioconductor
 backports                1.4.1      2021-12-13 [1] CRAN (R 4.3.0)
 beeswarm                 0.4.0      2021-06-01 [1] CRAN (R 4.3.0)
 bigassertr               0.1.6      2023-01-10 [1] CRAN (R 4.3.0)
 bigparallelr             0.3.2      2021-10-02 [1] CRAN (R 4.3.0)
 bigstatsr                1.5.12     2022-10-14 [1] CRAN (R 4.3.0)
 Biobase                * 2.62.0     2023-10-24 [1] Bioconductor
 BiocFileCache            2.10.1     2023-10-26 [1] Bioconductor
 BiocGenerics           * 0.48.1     2023-11-01 [1] Bioconductor
 BiocIO                   1.12.0     2023-10-24 [1] Bioconductor
 BiocManager              1.30.22    2023-08-08 [1] CRAN (R 4.3.0)
 BiocParallel             1.36.0     2023-10-24 [1] Bioconductor
 BiocVersion              3.18.1     2023-11-15 [1] Bioconductor
 biomaRt                * 2.58.0     2023-10-24 [1] Bioconductor
 Biostrings               2.70.1     2023-10-25 [1] Bioconductor
 bit                      4.0.5      2022-11-15 [1] CRAN (R 4.3.0)
 bit64                    4.0.5      2020-08-30 [1] CRAN (R 4.3.0)
 bitops                   1.0-7      2021-04-24 [1] CRAN (R 4.3.0)
 blob                     1.2.4      2023-03-17 [1] CRAN (R 4.3.0)
 boot                     1.3-28.1   2022-11-22 [1] CRAN (R 4.3.2)
 broom                    1.0.5      2023-06-09 [1] CRAN (R 4.3.0)
 broom.mixed              0.2.9.4    2022-04-17 [1] CRAN (R 4.3.0)
 cachem                   1.0.8      2023-05-01 [1] CRAN (R 4.3.0)
 car                      3.1-2      2023-03-30 [1] CRAN (R 4.3.0)
 carData                  3.0-5      2022-01-06 [1] CRAN (R 4.3.0)
 cli                      3.6.2      2023-12-11 [1] CRAN (R 4.3.0)
 cluster                  2.1.6      2023-12-01 [1] CRAN (R 4.3.0)
 coda                     0.19-4     2020-09-30 [1] CRAN (R 4.3.0)
 codetools                0.2-19     2023-02-01 [1] CRAN (R 4.3.2)
 colorspace               2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
 cowplot                  1.1.2      2023-12-15 [1] CRAN (R 4.3.0)
 crayon                   1.5.2      2022-09-29 [1] CRAN (R 4.3.0)
 curl                     5.2.0      2023-12-08 [1] CRAN (R 4.3.0)
 data.table               1.14.10    2023-12-08 [1] CRAN (R 4.3.0)
 DBI                      1.2.0      2023-12-21 [1] CRAN (R 4.3.0)
 dbplyr                   2.4.0      2023-10-26 [1] CRAN (R 4.3.0)
 DelayedArray             0.28.0     2023-10-24 [1] Bioconductor
 DelayedMatrixStats       1.24.0     2023-10-24 [1] Bioconductor
 deldir                   2.0-2      2023-11-23 [1] CRAN (R 4.3.0)
 digest                   0.6.33     2023-07-07 [1] CRAN (R 4.3.0)
 doParallel               1.0.17     2022-02-07 [1] CRAN (R 4.3.0)
 doSNOW                   1.0.20     2022-02-04 [1] CRAN (R 4.3.0)
 dotCall64                1.1-1      2023-11-28 [1] CRAN (R 4.3.0)
 dplyr                  * 1.1.4      2023-11-17 [1] CRAN (R 4.3.0)
 ellipsis                 0.3.2      2021-04-29 [1] CRAN (R 4.3.0)
 emmeans                  1.10.0     2024-01-23 [1] CRAN (R 4.3.2)
 ensembldb                2.26.0     2023-10-24 [1] Bioconductor
 estimability             1.4.1      2022-08-05 [1] CRAN (R 4.3.0)
 evaluate                 0.23       2023-11-01 [1] CRAN (R 4.3.0)
 ExperimentHub            2.10.0     2023-10-24 [1] Bioconductor
 fansi                    1.0.6      2023-12-08 [1] CRAN (R 4.3.0)
 farver                   2.1.1      2022-07-06 [1] CRAN (R 4.3.0)
 fastDummies              1.7.3      2023-07-06 [1] CRAN (R 4.3.0)
 fastmap                  1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
 ff                       4.0.9      2023-01-25 [1] CRAN (R 4.3.0)
 filelock                 1.0.3      2023-12-11 [1] CRAN (R 4.3.0)
 fitdistrplus             1.1-11     2023-04-25 [1] CRAN (R 4.3.0)
 flock                    0.7        2016-11-12 [1] CRAN (R 4.3.0)
 forcats                  1.0.0      2023-01-29 [1] CRAN (R 4.3.0)
 foreach                * 1.5.2      2022-02-02 [1] CRAN (R 4.3.0)
 furrr                    0.3.1      2022-08-15 [1] CRAN (R 4.3.0)
 future                   1.33.1     2023-12-22 [1] CRAN (R 4.3.0)
 future.apply             1.11.1     2023-12-21 [1] CRAN (R 4.3.0)
 gamlss                   5.4-20     2023-10-04 [1] CRAN (R 4.3.0)
 gamlss.data              6.0-2      2021-11-07 [1] CRAN (R 4.3.0)
 gamlss.dist              6.1-1      2023-08-23 [1] CRAN (R 4.3.0)
 geeM                     0.10.1     2018-06-18 [1] CRAN (R 4.3.0)
 generics                 0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
 GenomeInfoDb           * 1.38.5     2023-12-28 [1] Bioconductor 3.18 (R 4.3.2)
 GenomeInfoDbData         1.2.11     2023-12-22 [1] Bioconductor
 GenomicAlignments        1.38.0     2023-10-24 [1] Bioconductor
 GenomicFeatures          1.54.1     2023-10-29 [1] Bioconductor
 GenomicRanges          * 1.54.1     2023-10-29 [1] Bioconductor
 ggbeeswarm               0.7.2      2023-04-29 [1] CRAN (R 4.3.0)
 ggnetwork              * 0.5.12     2024-02-12 [1] Github (briatte/ggnetwork@f3b8b84)
 ggplot2                * 3.4.4      2023-10-12 [1] CRAN (R 4.3.0)
 ggpubr                   0.6.0      2023-02-10 [1] CRAN (R 4.3.0)
 ggrepel                  0.9.5      2024-01-10 [1] CRAN (R 4.3.0)
 ggridges                 0.5.5      2023-12-15 [1] CRAN (R 4.3.0)
 ggsignif                 0.6.4      2022-10-13 [1] CRAN (R 4.3.0)
 glm2                   * 1.2.1      2018-08-11 [1] CRAN (R 4.3.0)
 glmmTMB                  1.1.8      2023-10-07 [1] CRAN (R 4.3.0)
 globals                  0.16.2     2022-11-21 [1] CRAN (R 4.3.0)
 glue                     1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
 goftest                  1.2-3      2021-10-07 [1] CRAN (R 4.3.0)
 gridExtra                2.3        2017-09-09 [1] CRAN (R 4.3.0)
 gtable                   0.3.4      2023-08-21 [1] CRAN (R 4.3.0)
 highr                    0.10       2022-12-22 [1] CRAN (R 4.3.0)
 hms                      1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
 htmltools                0.5.7      2023-11-03 [1] CRAN (R 4.3.0)
 htmlwidgets              1.6.4      2023-12-06 [1] CRAN (R 4.3.0)
 httpuv                   1.6.13     2023-12-06 [1] CRAN (R 4.3.0)
 httr                     1.4.7      2023-08-15 [1] CRAN (R 4.3.0)
 ica                      1.0-3      2022-07-08 [1] CRAN (R 4.3.0)
 igraph                 * 2.0.1.1    2024-01-30 [1] CRAN (R 4.3.2)
 infotheo                 1.2.0.1    2022-04-08 [1] CRAN (R 4.3.0)
 interactiveDisplayBase   1.40.0     2023-10-24 [1] Bioconductor
 IRanges                * 2.36.0     2023-10-24 [1] Bioconductor
 irlba                    2.3.5.1    2022-10-03 [1] CRAN (R 4.3.0)
 iterators                1.0.14     2022-02-05 [1] CRAN (R 4.3.0)
 janitor                  2.2.0      2023-02-02 [1] CRAN (R 4.3.0)
 jsonlite                 1.8.8      2023-12-04 [1] CRAN (R 4.3.0)
 kableExtra               1.3.4      2021-02-20 [1] CRAN (R 4.3.0)
 KEGGREST                 1.42.0     2023-10-24 [1] Bioconductor
 KernSmooth               2.23-22    2023-07-10 [1] CRAN (R 4.3.2)
 knitr                    1.45       2023-10-30 [1] CRAN (R 4.3.0)
 labeling                 0.4.3      2023-08-29 [1] CRAN (R 4.3.0)
 later                    1.3.2      2023-12-06 [1] CRAN (R 4.3.0)
 lattice                  0.22-5     2023-10-24 [1] CRAN (R 4.3.0)
 lazyeval                 0.2.2      2019-03-15 [1] CRAN (R 4.3.0)
 leiden                   0.4.3.1    2023-11-17 [1] CRAN (R 4.3.0)
 lifecycle                1.0.4      2023-11-07 [1] CRAN (R 4.3.0)
 limma                    3.58.1     2023-10-31 [1] Bioconductor
 listenv                  0.9.0      2022-12-16 [1] CRAN (R 4.3.0)
 lme4                     1.1-35.1   2023-11-05 [1] CRAN (R 4.3.0)
 lmtest                   0.9-40     2022-03-21 [1] CRAN (R 4.3.0)
 lubridate                1.9.3      2023-09-27 [1] CRAN (R 4.3.0)
 magrittr               * 2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
 MASS                     7.3-60     2023-05-04 [1] CRAN (R 4.3.0)
 Matrix                   1.6-5      2024-01-11 [1] CRAN (R 4.3.0)
 MatrixGenerics         * 1.14.0     2023-10-24 [1] Bioconductor
 matrixStats            * 1.2.0      2023-12-11 [1] CRAN (R 4.3.0)
 memoise                  2.0.1      2021-11-26 [1] CRAN (R 4.3.0)
 mgcv                     1.9-1      2023-12-21 [1] CRAN (R 4.3.0)
 mime                     0.12       2021-09-28 [1] CRAN (R 4.3.0)
 minet                    3.60.0     2023-10-24 [1] Bioconductor
 miniUI                   0.1.1.1    2018-05-18 [1] CRAN (R 4.3.0)
 minqa                    1.2.6      2023-09-11 [1] CRAN (R 4.3.0)
 munsell                  0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
 mvtnorm                  1.2-4      2023-11-27 [1] CRAN (R 4.3.0)
 nlme                     3.1-164    2023-11-27 [1] CRAN (R 4.3.0)
 nloptr                   2.0.3      2022-05-26 [1] CRAN (R 4.3.0)
 numDeriv                 2016.8-1.1 2019-06-06 [1] CRAN (R 4.3.0)
 paletteer                1.6.0      2024-01-21 [1] CRAN (R 4.3.0)
 parallelly               1.36.0     2023-05-26 [1] CRAN (R 4.3.0)
 patchwork              * 1.2.0      2024-01-08 [1] CRAN (R 4.3.0)
 pbapply                  1.7-2      2023-06-27 [1] CRAN (R 4.3.0)
 pillar                   1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig                2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
 plotly                   4.10.4     2024-01-13 [1] CRAN (R 4.3.0)
 plyr                     1.8.9      2023-10-02 [1] CRAN (R 4.3.0)
 png                      0.1-8      2022-11-29 [1] CRAN (R 4.3.0)
 polyclip                 1.10-6     2023-09-27 [1] CRAN (R 4.3.0)
 presto                   1.0.0      2023-12-27 [1] Github (immunogenomics/presto@31dc97f)
 prettyunits              1.2.0      2023-09-24 [1] CRAN (R 4.3.0)
 princurve              * 2.1.6      2021-01-18 [1] CRAN (R 4.3.0)
 prismatic                1.1.1      2022-08-15 [1] CRAN (R 4.3.0)
 progress                 1.2.3      2023-12-06 [1] CRAN (R 4.3.0)
 progressr                0.14.0     2023-08-10 [1] CRAN (R 4.3.0)
 promises                 1.2.1      2023-08-10 [1] CRAN (R 4.3.0)
 ProtGenerics             1.34.0     2023-10-24 [1] Bioconductor
 ps                       1.7.5      2023-04-18 [1] CRAN (R 4.3.0)
 purrr                    1.0.2      2023-08-10 [1] CRAN (R 4.3.0)
 R6                       2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
 RANN                     2.6.1      2019-01-08 [1] CRAN (R 4.3.0)
 rappdirs                 0.3.3      2021-01-31 [1] CRAN (R 4.3.0)
 RColorBrewer             1.1-3      2022-04-03 [1] CRAN (R 4.3.0)
 Rcpp                     1.0.11     2023-07-06 [1] CRAN (R 4.3.0)
 RcppAnnoy                0.0.21     2023-07-02 [1] CRAN (R 4.3.0)
 RcppEigen                0.3.3.9.4  2023-11-02 [1] CRAN (R 4.3.0)
 RcppHNSW                 0.5.0      2023-09-19 [1] CRAN (R 4.3.0)
 RCurl                    1.98-1.13  2023-11-02 [1] CRAN (R 4.3.0)
 readr                    2.1.4      2023-02-10 [1] CRAN (R 4.3.0)
 rematch2                 2.1.2      2020-05-01 [1] CRAN (R 4.3.0)
 reshape2                 1.4.4      2020-04-09 [1] CRAN (R 4.3.0)
 restfulr                 0.0.15     2022-06-16 [1] CRAN (R 4.3.0)
 reticulate               1.34.0     2023-10-12 [1] CRAN (R 4.3.0)
 rjson                    0.2.21     2022-01-09 [1] CRAN (R 4.3.0)
 rlang                    1.1.2      2023-11-04 [1] CRAN (R 4.3.0)
 rmarkdown                2.25       2023-09-18 [1] CRAN (R 4.3.0)
 rmio                     0.4.0      2022-02-17 [1] CRAN (R 4.3.0)
 ROCR                     1.0-11     2020-05-02 [1] CRAN (R 4.3.0)
 Rsamtools                2.18.0     2023-10-24 [1] Bioconductor
 RSpectra                 0.16-1     2022-04-24 [1] CRAN (R 4.3.0)
 RSQLite                  2.3.4      2023-12-08 [1] CRAN (R 4.3.0)
 rstatix                  0.7.2      2023-02-01 [1] CRAN (R 4.3.0)
 rstudioapi               0.15.0     2023-07-07 [1] CRAN (R 4.3.0)
 rtracklayer              1.62.0     2023-10-24 [1] Bioconductor
 Rtsne                    0.17       2023-12-07 [1] CRAN (R 4.3.0)
 rvest                    1.0.3      2022-08-19 [1] CRAN (R 4.3.0)
 S4Arrays                 1.2.0      2023-10-24 [1] Bioconductor
 S4Vectors              * 0.40.2     2023-11-23 [1] Bioconductor
 scales                   1.3.0      2023-11-28 [1] CRAN (R 4.3.0)
 scattermore              1.2        2023-06-12 [1] CRAN (R 4.3.0)
 scLANE                 * 0.7.9      2024-02-18 [1] Bioconductor
 scRNAseq               * 2.16.0     2023-10-26 [1] Bioconductor
 sctransform              0.4.1      2023-10-19 [1] CRAN (R 4.3.0)
 sessioninfo              1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 Seurat                 * 5.0.1      2023-11-17 [1] CRAN (R 4.3.0)
 SeuratObject           * 5.0.1      2023-11-17 [1] CRAN (R 4.3.0)
 shiny                    1.8.0      2023-11-17 [1] CRAN (R 4.3.0)
 SingleCellExperiment   * 1.24.0     2023-10-24 [1] Bioconductor
 slingshot              * 2.10.0     2023-10-24 [1] Bioconductor
 snakecase                0.11.1     2023-08-27 [1] CRAN (R 4.3.0)
 snow                     0.4-4      2021-10-27 [1] CRAN (R 4.3.0)
 sp                     * 2.1-2      2023-11-26 [1] CRAN (R 4.3.0)
 spam                     2.10-0     2023-10-23 [1] CRAN (R 4.3.0)
 SparseArray              1.2.3      2023-12-25 [1] Bioconductor 3.18 (R 4.3.2)
 sparseMatrixStats        1.14.0     2023-10-24 [1] Bioconductor
 spatstat.data            3.0-3      2023-10-24 [1] CRAN (R 4.3.0)
 spatstat.explore         3.2-5      2023-10-22 [1] CRAN (R 4.3.0)
 spatstat.geom            3.2-7      2023-10-20 [1] CRAN (R 4.3.0)
 spatstat.random          3.2-2      2023-11-29 [1] CRAN (R 4.3.0)
 spatstat.sparse          3.0-3      2023-10-24 [1] CRAN (R 4.3.0)
 spatstat.utils           3.0-4      2023-10-24 [1] CRAN (R 4.3.0)
 statmod                  1.5.0      2023-01-06 [1] CRAN (R 4.3.0)
 stringi                  1.8.3      2023-12-11 [1] CRAN (R 4.3.0)
 stringr                  1.5.1      2023-11-14 [1] CRAN (R 4.3.0)
 SummarizedExperiment   * 1.32.0     2023-10-24 [1] Bioconductor
 survival                 3.5-7      2023-08-14 [1] CRAN (R 4.3.2)
 svglite                  2.1.3      2023-12-08 [1] CRAN (R 4.3.0)
 systemfonts              1.0.5      2023-10-09 [1] CRAN (R 4.3.0)
 tensor                   1.5        2012-05-05 [1] CRAN (R 4.3.0)
 tibble                   3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidyr                    1.3.1      2024-01-24 [1] CRAN (R 4.3.2)
 tidyselect               1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
 timechange               0.2.0      2023-01-11 [1] CRAN (R 4.3.0)
 TMB                      1.9.10     2023-12-12 [1] CRAN (R 4.3.0)
 TrajectoryUtils        * 1.10.0     2023-10-24 [1] Bioconductor
 tzdb                     0.4.0      2023-05-12 [1] CRAN (R 4.3.0)
 utf8                     1.2.4      2023-10-22 [1] CRAN (R 4.3.0)
 uwot                     0.1.16     2023-06-29 [1] CRAN (R 4.3.0)
 vctrs                    0.6.5      2023-12-01 [1] CRAN (R 4.3.0)
 vipor                    0.4.7      2023-12-18 [1] CRAN (R 4.3.0)
 viridisLite              0.4.2      2023-05-02 [1] CRAN (R 4.3.0)
 vroom                    1.6.5      2023-12-05 [1] CRAN (R 4.3.0)
 webshot                  0.5.5      2023-06-26 [1] CRAN (R 4.3.0)
 withr                    2.5.2      2023-10-30 [1] CRAN (R 4.3.0)
 xfun                     0.41       2023-11-01 [1] CRAN (R 4.3.0)
 XML                      3.99-0.16  2023-11-29 [1] CRAN (R 4.3.0)
 xml2                     1.3.6      2023-12-04 [1] CRAN (R 4.3.0)
 xtable                   1.8-4      2019-04-21 [1] CRAN (R 4.3.0)
 XVector                  0.42.0     2023-10-24 [1] Bioconductor
 yaml                     2.3.8      2023-12-11 [1] CRAN (R 4.3.0)
 zlibbioc                 1.48.0     2023-10-24 [1] Bioconductor
 zoo                      1.8-12     2023-04-13 [1] CRAN (R 4.3.0)

 [1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library

─ Python configuration ───────────────────────────────────────────────────────
 python:         /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/bin/python
 libpython:      /usr/local/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/config-3.11-darwin/libpython3.11.dylib
 pythonhome:     /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site:/Users/jack/Desktop/PhD/Research/Python_Envs/personal_site
 version:        3.11.6 (main, Nov  2 2023, 04:52:24) [Clang 14.0.3 (clang-1403.0.22.14.1)]
 numpy:          /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/lib/python3.11/site-packages/numpy
 numpy_version:  1.26.3
 leidenalg:      /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/lib/python3.11/site-packages/leidenalg
 
 NOTE: Python version was forced by use_python() function

──────────────────────────────────────────────────────────────────────────────