Building a Trajectory Gene Regulatory Network with scRNA-seq Data

Author

Affiliation

Jack R. Leary

University of Florida

Published

April 16, 2024

1 Introduction

In this analysis we’ll be looking at cells undergoing neurogenesis in the developing human frontal cortex. The data are taken from Nowakowski et al (2017). Our goal is to build a simple method for fitting single cell gene regulatory networks (GRNs) and compare the results across pseudotime lineages, keeping in mind the trajectory structure of the data. If you want to skip straight to the analysis, head to Section 6.

2 Libraries

We start by loading in a few necessary packages & resolving some function conflicts.

Code

library(dplyr)                 # data manipulation
library(Seurat)                # scRNA-seq tools
library(scLANE)                # trajectory DE
library(igraph)                # graph utilities
library(ggplot2)               # pretty plots
library(biomaRt)               # gene annotation
library(foreach)               # parallel loops
library(ggnetwork)             # plotting graphs
library(slingshot)             # pseudotime estimation
library(patchwork)             # plot alignment
library(SingleCellExperiment)  # scRNA-seq data structures
rename <- dplyr::rename
select <- dplyr::select

3 Helper functions

We define a helper function to make our legends look prettier.

Code

guide_umap <- function(key.size = 4) {
  ggplot2::guides(color = ggplot2::guide_legend(override.aes = list(size = key.size, 
                                                                    alpha = 1, 
                                                                    stroke = 0.25)))
}

Next, we assign a few consistent color palettes that we’ll use throughout.

Code

palette_heatmap <- paletteer::paletteer_d("MetBrewer::Hiroshige", direction = -1)
palette_celltype <- paletteer::paletteer_d("ggsci::category10_d3")
palette_cluster <- paletteer::paletteer_d("ggsci::default_locuszoom")
palette_timepoint <- paletteer::paletteer_d("ggsci::default_igv")

4 Data

Unfortunately the data that are made available as part of the scRNAseq package come in a transcripts-per-million (TPM) normalized matrix - not raw counts. A rough conversion back to the raw integer counts is to multiply each cell’s normalized expression vector by the TPM size factor. This gets us most of the way there but the results are still estimates, so we round them to the nearest integer.

Note

This conversion ignores gene length, which is why the results are estimates & require rounding. For a more detailed discussion on how to un-normalize TPM counts see this StackOverflow post. While the original authors used a plate-based protocol (Fluidigm C1) they do not mention whether they used unique molecular identifiers (UMIs); if UMIs were used then gene length normalization isn’t necessary. Since simply reversing the TPM transformation doesn’t result in integer values I would guess that UMIs were not used, though their Supplementary Methods don’t mention gene length normalization.

Code

sce_cortex <- scRNAseq::NowakowskiCortexData()
counts_matrix <- Matrix::t(sce_cortex@assays@data$tpm)
scale_factors <- Matrix::rowSums(counts_matrix) / 1e6
counts_matrix <- round(Matrix::t(counts_matrix) * scale_factors)
seu_cortex <- CreateSeuratObject(counts_matrix, 
                                 assay = "RNA", 
                                 meta.data = as.data.frame(colData(sce_cortex)), 
                                 project = "cortex", 
                                 min.cells = 5,
                                 min.features = 0)
excitatory_neuron_celltypes <- c("nEN-late", "nEN-early1", "nEN-early2",
                                 "EN-V1-1", "EN-V1-2", "EN-V1-3", 
                                 "EN-PFC1", "EN-PFC2", "EN-PFC3")
seu_cortex <- seu_cortex[, seu_cortex$WGCNAcluster %in% excitatory_neuron_celltypes]
seu_cortex@meta.data <- mutate(seu_cortex@meta.data, 
                               celltype_original = case_when(WGCNAcluster == "nEN-early1" ~ "Early Exc. Neuron 1", 
                                                             WGCNAcluster == "nEN-early2" ~ "Early Exc. Neuron 2", 
                                                             WGCNAcluster == "nEN-late" ~ "Late Exc. Neuron", 
                                                             WGCNAcluster == "EN-V1-1" ~ "Exc. Neuron 1", 
                                                             WGCNAcluster == "EN-V1-2" ~ "Exc. Neuron 2", 
                                                             WGCNAcluster == "EN-V1-3" ~ "Exc. Neuron 3", 
                                                             WGCNAcluster == "EN-PFC1" ~ "PFC Exc. Neuron 1", 
                                                             WGCNAcluster == "EN-PFC2" ~ "PFC Exc. Neuron 2", 
                                                             WGCNAcluster == "EN-PFC3" ~ "PFC Exc. Neuron 3", 
                                                             TRUE ~ NA_character_))

5 Preprocessing

We run the cells through a basic preprocessing pipeline: QC, normalization, highly variable gene (HVG) identification, PCA, UMAP, and graph-based Leiden clustering.

Code

seu_cortex <- seu_cortex %>% 
              PercentageFeatureSet(pattern = "^MT-", col.name = "percent_mito") %>% 
              PercentageFeatureSet(pattern = "^RP[SL]", col.name = "percent_ribo") %>% 
              NormalizeData(verbose = FALSE) %>% 
              FindVariableFeatures(nfeatures = 3000, verbose = FALSE) %>% 
              ScaleData(verbose = FALSE) %>% 
              RunPCA(npcs = 50, 
                     approx = TRUE, 
                     seed.use = 312, 
                     verbose = FALSE) %>% 
              CellCycleScoring(s.features = cc.genes.updated.2019$s.genes, 
                               g2m.features = cc.genes.updated.2019$g2m.genes, 
                               set.ident = FALSE) %>% 
              RunUMAP(reduction = "pca", 
                      dims = 1:30,
                      n.neighbors = 20, 
                      n.components = 2, 
                      metric = "cosine", 
                      n.epochs = 750,
                      seed.use = 312, 
                      verbose = FALSE) %>% 
              FindNeighbors(reduction = "pca", 
                            k.param = 20,
                            nn.method = "annoy", 
                            annoy.metric = "cosine", 
                            verbose = FALSE) %>% 
              FindClusters(algorithm = 4, 
                           method = "igraph", 
                           resolution = 0.5, 
                           random.seed = 312, 
                           verbose = FALSE)

Here we examine some of our dataset’s attributes - the original celltype annotations, our new clusters, age in post-conception weeks (PCW), and the proportion of each cell’s reads aligning to ribosomal genes.

Code

p1 <- DimPlot(seu_cortex, 
              group.by = "celltype_original",
              pt.size = 1.25,  
              alpha = 0.75) + 
      scale_color_manual(values = palette_celltype) + 
      theme_scLANE(umap = TRUE) + 
      labs(color = "Celltype") + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank()) + 
      guide_umap()
p2 <- FeaturePlot(seu_cortex, 
                  features = "Age_in_Weeks", 
                  pt.size = 1.25,  
                  alpha = 0.75) + 
      scale_color_gradientn(colors = palette_heatmap) + 
      labs(color = "Age (weeks)") + 
      theme_scLANE(umap = TRUE) + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank())
p3 <- DimPlot(seu_cortex, 
              group.by = "seurat_clusters", 
              pt.size = 1.25,  
              alpha = 0.75) + 
      scale_color_manual(values = palette_cluster) + 
      labs(color = "Leiden Cluster") + 
      theme_scLANE(umap = TRUE) + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank()) + 
      guide_umap()
p4 <- FeaturePlot(seu_cortex, 
                  features = "percent_ribo", 
                  pt.size = 1.25,  
                  alpha = 0.75) + 
      scale_color_gradientn(colors = palette_heatmap, labels = scales::label_percent(accuracy = 1, scale = 1)) + 
      labs(color = "Ribosomal\nReads") + 
      theme_scLANE(umap = TRUE) + 
      theme(axis.title = element_blank(), 
            plot.title = element_blank())
p5 <- (p1 / p2) | (p3 / p4)
p5 <- ggpubr::annotate_figure(ggpubr::ggarrange(p5), 
                              bottom = "UMAP 1",
                              left = "UMAP 2")
p5

Figure 1: Characteristics of the cortical development dataset.

6 Analysis

6.1 Celltype annotation

We begin by running a basic differential expression (DE) testing routine to identify some potential marker genes for each cluster. This will help us assign a celltype identity to each cluster.

Tip

As of Seurat v5, if you install the presto package using remotes::install_github("immunogenomics/presto"), the Wilcox tests performed here will be much, much faster.

Code

cluster_markers <- FindAllMarkers(seu_cortex, 
                                  assay = "RNA", 
                                  logfc.threshold = 0.25, 
                                  test.use = "wilcox", 
                                  slot = "data", 
                                  min.pct = 0.1, 
                                  only.pos = TRUE, 
                                  verbose = FALSE, 
                                  random.seed = 312)
top5_cluster_markers <- arrange(cluster_markers, p_val_adj) %>% 
                        with_groups(cluster, 
                                    slice_head, 
                                    n = 5)

We visualize the markers for each cluster with a dotplot, which shows clear differences between clusters. For example, the transcription factor (TF) aristaless related homeobox (ARX) is specifically expressed in cluster 7; this TF’s expression is known to be necessary for healthy neuronal development and helps to regulate differentiation.

Code

p6 <- DotPlot(seu_cortex, 
              features = unique(top5_cluster_markers$gene),
              assay = "RNA", 
              group.by = "seurat_clusters", 
              dot.scale = 5,
              cols = paletteer::paletteer_d("wesanderson::Zissou1")[c(1, 5)], 
              scale.by = "radius") + 
      coord_flip() +
      labs(y = "Leiden Cluster") +
      theme_scLANE() + 
      theme(axis.title.y = element_blank(), 
            axis.text.y = element_text(face = "italic"))
p6

Figure 2: Putative marker genes for each cluster.

Now we identify markers for each celltype:

Code

Idents(seu_cortex) <- "celltype_original"
celltype_markers <- FindAllMarkers(seu_cortex, 
                                   assay = "RNA", 
                                   logfc.threshold = 0.25, 
                                   test.use = "wilcox", 
                                   slot = "data", 
                                   min.pct = 0.1, 
                                   only.pos = TRUE, 
                                   verbose = FALSE, 
                                   random.seed = 312)
top3_celltype_markers <- arrange(celltype_markers, p_val_adj) %>% 
                         with_groups(cluster, 
                                     slice_head, 
                                     n = 3)

The markers seem consistent with what we’d expect for excitatory neurons; for more information see e.g., Figure 4 and Suppl. Figures 4 & 11 in the original publication.

Code

select(top3_celltype_markers, 
       celltype = cluster, 
       gene, 
       avg_log2FC, 
       p_val_adj) %>% 
  kableExtra::kbl(digits = 3, 
                  booktabs = TRUE, 
                  col.names = c("Celltype", "Gene", "Mean log2FC", "Adj. P-value")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")

Celltype	Gene	Mean log2FC
Early Exc. Neuron 2	NRP1	1.124
Early Exc. Neuron 2	DPY19L1	1.096
Early Exc. Neuron 2	SOX11	0.624
Late Exc. Neuron	ZFHX4	2.034
Late Exc. Neuron	RP11.436D23.1	1.751
Late Exc. Neuron	LINC00478	2.164
Exc. Neuron 2	DOK5	2.873
Exc. Neuron 2	SATB2	1.677
Exc. Neuron 2	STMN2	0.857
Exc. Neuron 3	ACTN2	4.731
Exc. Neuron 3	LPL	3.325
Exc. Neuron 3	UNC5C	6.760
PFC Exc. Neuron 3	RP11.98J23.2	3.008
PFC Exc. Neuron 3	RP11.589F5.4	3.111
PFC Exc. Neuron 3	RP11.1415C14.3	3.198
Exc. Neuron 1	CRYM	3.523
Exc. Neuron 1	GRIK3	2.968
Exc. Neuron 1	FEZF2	3.872
PFC Exc. Neuron 1	CDH18	3.441
PFC Exc. Neuron 1	SORCS1	4.858
PFC Exc. Neuron 1	COL19A1	9.146
Early Exc. Neuron 1	NDST4	3.965
Early Exc. Neuron 1	STK17B	3.639
Early Exc. Neuron 1	CNTNAP2	2.501
PFC Exc. Neuron 2	CPNE8	5.170
PFC Exc. Neuron 2	CBLN2	4.410
PFC Exc. Neuron 2	SLN	4.456

Table 1: The top-3 markers genes for each identified celltype.

6.2 Pseudotime estimation

After estimating principal curves across our UMAP embedding with slingshot, we generate per-lineage pseudotime values for each cell. We use our clustering as a source of structure in the data, and assign start and end clusters based on age in PCW. In addition, we mean-aggregate the pseudotime values for each cell in order to get a total pseudotime which represents overall differentiation regardless of lineage.

Code

sling_res <- slingshot(Embeddings(seu_cortex, "umap"), 
                       clusterLabels = seu_cortex$seurat_clusters, 
                       start.clus = c("3"), 
                       end.clus = c("5", "6", "7"), 
                       approx_points = 1e3)
sling_curves <- slingCurves(sling_res, as.df = TRUE)
sling_mst <- slingMST(sling_res, as.df = TRUE)
sling_pt <- as.data.frame(slingPseudotime(sling_res)) %>% 
            rowwise() %>% 
            mutate(PT_Overall = mean(c_across(starts_with("Lineage")), na.rm = TRUE)) %>% 
            ungroup() %>% 
            mutate(across(c(starts_with("Lineage"), PT_Overall), 
                          \(x) (x - min(x, na.rm = TRUE)) / (max(x, na.rm = TRUE) - min(x, na.rm = TRUE))))
seu_cortex <- AddMetaData(seu_cortex, 
                          metadata = sling_pt$PT_Overall, 
                          col.name = "PT_Overall")

The graph structure estimate by slingshot is called a minimum spanning tree (MST), and describes the relationships between clusters in an undirected manner.

Code

p7 <- data.frame(Embeddings(seu_cortex, "umap")) %>% 
      mutate(leiden = seu_cortex$seurat_clusters) %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = leiden)) + 
      geom_point(size = 1.25, alpha = 0.75) + 
      geom_path(data = sling_mst, mapping = aes(x = umap_1, y = umap_2, group = Lineage), 
                linewidth = 1.25, 
                color = "black") + 
      geom_point(data = sling_mst, mapping = aes(x = umap_1, y = umap_2, fill = Cluster), 
                color = "black", 
                shape = 21, 
                size = 4.5, 
                stroke = 1.25, 
                show.legend = FALSE) + 
      scale_color_manual(values = palette_cluster) + 
      scale_fill_manual(values = palette_cluster) + 
      labs(x = "UMAP 1", y = "UMAP 2") + 
      theme_scLANE(umap = TRUE) +  
      theme(legend.title = element_blank()) + 
      guide_umap()
p7

Figure 3: UMAP embedding with MST from Slinghot overlaid.

Since our data have real-life experimental timepoints, it makes sense to normalize the pseudotime within each lineage to \([0, 1]\).

Code

p8 <- data.frame(Embeddings(seu_cortex, "umap")) %>% 
      bind_cols(sling_pt) %>% 
      tidyr::pivot_longer(starts_with("Lineage"), 
                          names_to = "lineage", 
                          values_to = "pseudotime") %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = pseudotime)) + 
      facet_wrap(~lineage, nrow = 3) + 
      geom_point(size = 1.25, alpha = 0.75) + 
      labs(x = "UMAP 1", 
           y = "UMAP 2", 
           color = "Pseudotime") + 
      scale_color_gradientn(colors = palette_heatmap, labels = scales::label_number(accuracy = .01)) + 
      theme_scLANE(umap = TRUE)
p8

Figure 4: UMAP embedding colored by per-lineage pseudotime.

The principal curves show a trifurcating lineage structure.

Code

p9 <- data.frame(Embeddings(seu_cortex, "umap")) %>% 
      mutate(leiden = seu_cortex$seurat_clusters) %>% 
      ggplot(aes(x = umap_1, y = umap_2, color = leiden)) + 
      geom_point(size = 1.25, alpha = 0.75) + 
      geom_path(data = sling_curves,
                mapping = aes(x = umap_1, y = umap_2, group = Lineage), 
                color = "black", 
                linewidth = 1.5, 
                alpha = 0.75, 
                lineend = "round") + 
      scale_color_manual(values = palette_cluster) + 
      labs(x = "UMAP 1", y = "UMAP 2") + 
      theme_scLANE(umap = TRUE) + 
      theme(legend.title = element_blank()) + 
      guides(color = guide_legend(override.aes = list(size = 4, alpha = 1, stroke = 0.25)))
p9

Figure 5: UMAP embedding with principal curves from Slinghot overlaid in black.

Lastly, we plot the distribution of mean-aggregated pseudotime per-timepoint and see that later timepoints have, in general, larger values of overall pseudotime. This indicates that our estimated pseudotime is linked to real experimental time in a meaningful (though fairly noisy) way.

Code

p10 <- select(seu_cortex@meta.data, 
              Age_in_Weeks, 
              PT_Overall) %>% 
       mutate(Age_in_Weeks = as.factor(Age_in_Weeks)) %>% 
       ggplot(aes(x = Age_in_Weeks, y = PT_Overall, color = Age_in_Weeks)) + 
       ggbeeswarm::geom_quasirandom(size = 1.25, 
                                    alpha = 0.75, 
                                    show.legend = FALSE) + 
       stat_summary(geom = "point", 
                    fun = "mean", 
                    color = "black",
                    size = 3) + 
       scale_color_manual(values = palette_timepoint) + 
       labs(x = "Age (PCW)", y = "Mean Psueodtime") + 
       theme_scLANE()
p10

Figure 6: Beeswarm plot displaying the distribution of mean-aggregated pseudotime for each timepoint.

6.3 Trajectory DE testing

Using scLANE (see the GitHub repository or the preprint for details) we perform trajectory DE significance testing for each HVG across each lineage.

Code

top3k_hvg <- HVFInfo(seu_cortex) %>% 
             arrange(desc(variance.standardized)) %>% 
             slice_head(n = 3000) %>% 
             rownames(.)
cell_offset <- createCellOffset(seu_cortex)
pt_df <- select(sling_pt, -PT_Overall)
scLANE_models <- testDynamic(seu_cortex, 
                             pt = pt_df, 
                             genes = top3k_hvg,
                             size.factor.offset = cell_offset, 
                             n.cores = 6L, 
                             verbose = FALSE)

scLANE testing completed for 3000 genes across 3 lineages in 14.449 mins

Code

scLANE_de_res <- getResultsDE(scLANE_models)

We now have lineage-specific trajectory DE statistics for each gene:

Code

select(scLANE_de_res, 
       Gene, 
       Lineage, 
       Test_Stat, 
       P_Val, 
       P_Val_Adj,
       Gene_Dynamic_Lineage, 
       Gene_Dynamic_Overall) %>% 
  mutate(Gene_Dynamic_Lineage = if_else(Gene_Dynamic_Lineage == 1, "Dynamic", "Static"), 
         Gene_Dynamic_Overall = if_else(Gene_Dynamic_Overall == 1, "Dynamic", "Static")) %>% 
  slice_head(n = 10) %>% 
  kableExtra::kbl(digits = 4, 
                  booktabs = TRUE, 
                  col.names = c("Gene", "Lineage", "LRT stat.", "P-value", "Adj. p-value", "Gene status (lineage)", "Gene status (overall)")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")

Gene	Lineage	LRT stat.	Gene status (lineage)	Gene status (overall)
LMO3	B	4066855	Dynamic	Dynamic
LMO3	A	4040903	Dynamic	Dynamic
MEF2C	B	3693517	Dynamic	Dynamic
LMO3	C	3514314	Dynamic	Dynamic
LIMCH1	A	3230531	Dynamic	Dynamic
LIMCH1	B	3156160	Dynamic	Dynamic
LINC01102	B	2912758	Dynamic	Dynamic
LINC01102	A	2887690	Dynamic	Dynamic
SHISA2	B	2121615	Dynamic	Dynamic
LINC01102	C	2077441	Dynamic	Dynamic

Table 2: The top-10 most DE genes from scLANE.

We identify a set of dynamic genes - dynamic meaning DE over any of the three lineages - to be used for downstream analysis.

Code

dyn_genes <- filter(scLANE_de_res, Gene_Dynamic_Overall == 1) %>% 
             distinct(Gene) %>% 
             pull(Gene)

Lastly, we pull a table of gene dynamics for each lineage.

Code

smoothed_counts <- smoothedCountsMatrix(scLANE_models, 
                                        size.factor.offset = cell_offset, 
                                        pt = pt_df, 
                                        genes = dyn_genes, 
                                        log1p.norm = TRUE, 
                                        n.cores = 4L)

6.4 Constructing a GRN from scratch

Building a GRN requires two main inputs - a set of transcription factors, and a set of potential target genes. We load a database of TFs from Lambert et al (2018).

Code

hs_ensembl <- useMart("ensembl",
                      dataset = "hsapiens_gene_ensembl", 
                      host = "https://useast.ensembl.org")
hs_tf_raw <- readr::read_csv("http://humantfs.ccbr.utoronto.ca/download/v_1.01/DatabaseExtract_v_1.01.csv",
                             col_select = -1,
                             num_threads = 2L,
                             show_col_types = FALSE) %>%
             janitor::clean_names() %>%
             filter(is_tf == "Yes") %>%
             select(ensembl_id)
hs_tfs <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol", "entrezgene_id", "description", "gene_biotype"),
                filters = "ensembl_gene_id",
                values = hs_tf_raw$ensembl_id,
                mart = hs_ensembl,
                uniqueRows = TRUE) %>%
          rename(ensembl_id = ensembl_gene_id,
                 entrez_id = entrezgene_id) %>%
          arrange(ensembl_id) %>%
          mutate(hgnc_symbol = if_else(hgnc_symbol == "", NA_character_, hgnc_symbol),
                 description = gsub("\\[Source.*", "", description))

Along with the HGNC symbol, we used BiomaRt to retrieve the Ensembl & Entrez IDs and a description for every TF.

Code

slice_sample(hs_tfs, n = 10) %>% 
  kableExtra::kbl(booktabs = TRUE, 
                  col.names = c("Ensembl ID", "HGNC symbol", "Entrez ID", "Description", "Biotype")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")

Ensembl ID	HGNC symbol	Entrez ID	Description	Biotype
ENSG00000173041	ZNF680	340252	zinc finger protein 680	protein_coding
ENSG00000149922	TBX6	6911	T-box transcription factor 6	protein_coding
ENSG00000085274	MYNN	55892	myoneurin	protein_coding
ENSG00000169740	ZNF32	7580	zinc finger protein 32	protein_coding
ENSG00000214534	ZNF705EP	NA	zinc finger protein 705E, pseudogene	transcribed_unprocessed_pseudogene
ENSG00000129071	MBD4	8930	methyl-CpG binding domain 4, DNA glycosylase	protein_coding
ENSG00000114853	ZBTB47	92999	zinc finger and BTB domain containing 47	protein_coding
ENSG00000111206	FOXM1	2305	forkhead box M1	protein_coding
ENSG00000184436	THAP7	80764	THAP domain containing 7	protein_coding
ENSG00000196172	ZNF681	148213	zinc finger protein 681	protein_coding

Table 3: Human transcription factors.

We identify a subset of all the TFs that are dynamic across our trajectory. Since TFs can regulate the expression of other TFs as they do non-TF target genes, we will include TFs in the set of “target genes” that we model.

Code

dynamic_tfs <- hs_tfs$hgnc_symbol[hs_tfs$hgnc_symbol %in% dyn_genes]

The GRN has the following structure - for every target gene, a penalized model is fit with the dynamics of every possible TF as features (see below). We use the Gamma distribution with log link as our regression objective as the dynamics are continuous and strictly non-negative. The model’s internal regularization identifies a subset of TFs whose dynamics are predictive of the target gene’s dynamics. See e.g., the original XGBoost paper for details.

\[ \text{Dynamics}_{\text{Target}} = \begin{bmatrix} \text{Dynamics}_{\text{TF}_1} & \text{Dynamics}_{\text{TF}_2} & \dots & \text{Dynamics}_{\text{TF}_N} \end{bmatrix} \tag{1}\]

Our GRNs will be lineage-specific, since the matrix of gene dynamics differs by lineage. First we fit the GRN for lineage A; I’ve opted to use the LightGBM package to fit the models as it’s lightweight and in general much faster than other options such as XGBoost. In addition, I prefer this method to alternative penalized methods such as LASSO - boosted trees better capture nonlinearities in the data, though LASSO does offer advantages such as significance testing for coefficients and linear interpretability. Both methods have built-in cross-validation (CV) routines (lightgbm::lgb.cv() and glmnet::cv.glmnet(), respectively); here we use 5-fold CV to select the best set of hyperparameters before training the final model. Lastly, instead of the default booster we utilize a DART booster, which purports to improve on the overspecialization problem suffered by the default booster - see the DART preprint for details.

Note

Since we’re using the foreach and doSNOW packages to loop over the set of target genes in parallel, we need to ensure that our LightGBM model only uses a single thread when fitting the model. This is done by setting the parameters tree_learner = "serial" and num_threads = 1L in the model parameters list.

Code

cl <- parallel::makeCluster(6L)
doSNOW::registerDoSNOW(cl)
lgbm_params <- list(objective = "gamma", 
                    tree_learner = "serial", 
                    metric = "l2", 
                    boosting = "dart", 
                    num_threads = 1L, 
                    device_type = "cpu",
                    seed = 312)
grn_res_linA <- foreach(g = seq(dyn_genes), 
                        .combine = "list",
                        .multicombine = ifelse(length(dyn_genes) > 1, TRUE, FALSE),
                        .maxcombine = ifelse(length(dyn_genes) > 1, length(dyn_genes), 2),
                        .packages = c("lightgbm", "dplyr"),
                        .errorhandling = "pass",
                        .inorder = TRUE,
                        .verbose = FALSE) %dopar% {
                          feature_mat <- smoothed_counts$Lineage_A[, dynamic_tfs]
                          feature_mat <- feature_mat[, colnames(feature_mat) != dyn_genes[g]]  # remove TF as a predictor of itself
                          resp_var <- smoothed_counts$Lineage_A[, dyn_genes[g]]
                          lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                          lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                      data = lgbm_data, 
                                                      nrounds = 100L, 
                                                      nfold = 5L, 
                                                      stratified = FALSE)
                          lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = lgbm_cv$best_iter)
                          imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                       dplyr::mutate(Lineage = "A", 
                                                     Target_Gene = dyn_genes[g], 
                                                     Target_Gene_Type = if_else(Target_Gene %in% dynamic_tfs, "TF", "Target Gene"), 
                                                     .before = 1)
                          imp_table
                        }
names(grn_res_linA) <- dyn_genes
grn_table_linA <- purrr::imap(grn_res_linA, \(x, y) {
  if (!inherits(x, "data.frame")) {
    empty_res <- data.frame(Lineage = "A", 
                            Target_Gene = y, 
                            Target_Gene_Type = NA_character_, 
                            Feature = NA_character_, 
                            Gain = NA_real_, 
                            Cover = NA_real_, 
                            Frequency = NA_real_)
    return(empty_res)
  } else {
    return(x)
  }
})

We coerce the model output into a tidy table. The two importance metrics we’re interested in are gain (the improvement in L2 loss of a certain TF) and frequency (the proportion of times a TF was included in the model). For more information see this StackExchange post or the XGBoost docs. As such, we calculate the rank of each target gene for gain and frequency, then mean-aggregate the two ranks in order to produce an average rank of how relatively strong the relationship between each TF & target gene is.

Code

grn_table_linA <- purrr::reduce(grn_table_linA, rbind) %>% 
                  select(-Cover) %>% 
                  rename(Tx_Factor = Feature) %>% 
                  filter(!is.na(Target_Gene), 
                         !is.na(Tx_Factor)) %>% 
                  arrange(Tx_Factor, desc(Frequency)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Frequency_Rank = row_number()) %>% 
                  arrange(Tx_Factor, desc(Gain)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Gain_Rank = row_number()) %>% 
                  rowwise() %>% 
                  mutate(Mean_Rank = mean(c_across(c(Frequency_Rank, Gain_Rank)))) %>% 
                  ungroup() %>% 
                  arrange(Tx_Factor, Mean_Rank) %>% 
                  mutate(Mean_Rank_Weight = 1 / Mean_Rank)

Next we add the Spearman correlation and mutual information (MI) between TF and target gene dynamics, as well as the normalized gene expression correlation and MI. We denote the relationship between TF and target gene as repression if the correlation between their dynamics is negative, and activation if the correlation is positive.

Warning

Usually the correlations between TF and target dynamics and TF and target expression are similar, or at least have the same directionality (positive or negative). However, when expression patterns are noisy, the correlation directions might differ (see Table 4 for examples). In this case, we’re opting to trust the dynamics more, as they are a smoothed version of expression - though an argument could be made for the opposite stance as well.

Code

dyn_cormat <- cor(smoothed_counts$Lineage_A, method = "spearman")
dyn_mimat <- minet::build.mim(smoothed_counts$Lineage_A, 
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
exp_cormat <- cor(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage1))])), method = "spearman")
exp_mimat <- minet::build.mim(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage1))])),  
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
grn_table_linA <- mutate(grn_table_linA, 
                         Dyn_Cor = NA_real_, 
                         Exp_Cor = NA_real_, 
                         Dyn_MI = NA_real_, 
                         Exp_MI = NA_real_)
for (i in seq(nrow(grn_table_linA))) {
  grn_table_linA$Dyn_Cor[i] <- dyn_cormat[grn_table_linA$Target_Gene[i], grn_table_linA$Tx_Factor[i]]
  grn_table_linA$Dyn_MI[i] <- dyn_mimat[grn_table_linA$Target_Gene[i], grn_table_linA$Tx_Factor[i]]
  grn_table_linA$Exp_Cor[i] <- exp_cormat[grn_table_linA$Target_Gene[i], grn_table_linA$Tx_Factor[i]]
  grn_table_linA$Exp_MI[i] <- exp_mimat[grn_table_linA$Target_Gene[i], grn_table_linA$Tx_Factor[i]]
}
grn_table_linA <- mutate(grn_table_linA, State = if_else(Dyn_Cor < 0, "Repression", "Activation"))

The output is shown below. Interestingly, the mutual information seems to be generally higher for the dynamics than for expression - this indicates that our usage of the dynamics as a de-noising technique for the GRN might be working well.

Code

select(grn_table_linA, 
       Lineage, 
       Tx_Factor, 
       Target_Gene, 
       Gain, 
       Frequency, 
       Mean_Rank, 
       Dyn_Cor, 
       Exp_Cor, 
       Dyn_MI, 
       Exp_MI,
       State) %>% 
  slice_sample(n = 10) %>% 
  kableExtra::kbl(digits = 3, 
                  booktabs = TRUE, 
                  col.names = c("Lineage", "TF", "Target", "Gain", "Frequency", "Mean rank", 
                                "Dynamics cor.", "Expression cor.", "Dynamics MI", "Expression MI", "State")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")

Lineage	TF	Target	Gain	Frequency	Mean rank	Dynamics cor.	Expression cor.	Dynamics MI	Expression MI	State
A	SIX4	RP11.9E17.1	0.011	0.161	106.5	-0.277	-0.011	0.649	0.000	Repression
A	CREB5	CACNG3	0.000	0.028	116.0	-0.425	-0.064	0.515	0.004	Repression
A	DLX5	CDYL2	0.200	0.192	13.5	-0.962	-0.015	0.413	0.000	Repression
A	ZNF484	PRSS12	0.001	0.066	58.0	0.556	0.018	1.293	0.078	Activation
A	CBX2	MMS22L	0.011	0.021	72.0	0.816	0.013	2.088	0.033	Activation
A	EGR1	KIAA1524	0.000	0.018	192.5	-0.098	-0.049	1.409	0.060	Repression
A	SATB1	ABLIM1	0.000	0.003	144.0	-0.148	0.038	2.047	0.110	Repression
A	ZNF165	HNRNPA1L2	0.000	0.002	114.0	-0.011	-0.047	0.551	0.002	Repression
A	EGR1	CNTNAP4	0.005	0.013	156.0	-0.357	-0.082	0.681	0.021	Repression
A	ARID5B	SLC1A2	0.395	0.025	49.5	0.654	0.069	1.725	0.073	Activation

Table 4: GRN for dynamic genes across Lineage A.

We repeat the fitting process for the other two lineages (expand this code block to see the details). Skip to Section 7 if you want a version of this logic that’s been wrapped into a function!

Code

grn_res_linB <- foreach(g = seq(dyn_genes), 
                        .combine = "list",
                        .multicombine = ifelse(length(dyn_genes) > 1, TRUE, FALSE),
                        .maxcombine = ifelse(length(dyn_genes) > 1, length(dyn_genes), 2),
                        .packages = c("lightgbm", "dplyr"),
                        .errorhandling = "pass",
                        .inorder = TRUE,
                        .verbose = FALSE) %dopar% {
                          feature_mat <- smoothed_counts$Lineage_B[, dynamic_tfs]
                          feature_mat <- feature_mat[, colnames(feature_mat) != dyn_genes[g]]  # remove TF as a predictor of itself
                          resp_var <- smoothed_counts$Lineage_B[, dyn_genes[g]]
                          lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                          lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                      data = lgbm_data, 
                                                      nrounds = 100L, 
                                                      nfold = 5L, 
                                                      stratified = FALSE)
                          lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = lgbm_cv$best_iter)
                          imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                       dplyr::mutate(Lineage = "B", 
                                                     Target_Gene = dyn_genes[g], 
                                                     Target_Gene_Type = if_else(Target_Gene %in% dynamic_tfs, "TF", "Target Gene"),
                                                     .before = 1)
                          imp_table
                        }
names(grn_res_linB) <- dyn_genes
grn_table_linB <- purrr::imap(grn_res_linB, \(x, y) {
  if (!inherits(x, "data.frame")) {
    empty_res <- data.frame(Lineage = "B", 
                            Target_Gene = y, 
                            Target_Gene_Type = NA_character_, 
                            Feature = NA_character_, 
                            Gain = NA_real_, 
                            Cover = NA_real_, 
                            Frequency = NA_real_)
    return(empty_res)
  } else {
    return(x)
  }
})
grn_table_linB <- purrr::reduce(grn_table_linB, rbind) %>% 
                  select(-Cover) %>% 
                  rename(Tx_Factor = Feature) %>% 
                  filter(!is.na(Target_Gene), 
                         !is.na(Tx_Factor)) %>% 
                  arrange(Tx_Factor, desc(Frequency)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Frequency_Rank = row_number()) %>% 
                  arrange(Tx_Factor, desc(Gain)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Gain_Rank = row_number()) %>% 
                  rowwise() %>% 
                  mutate(Mean_Rank = mean(c_across(c(Frequency_Rank, Gain_Rank)))) %>% 
                  ungroup() %>% 
                  arrange(Tx_Factor, Mean_Rank) %>% 
                  mutate(Mean_Rank_Weight = 1 / Mean_Rank)
dyn_cormat <- cor(smoothed_counts$Lineage_B, method = "spearman")
dyn_mimat <- minet::build.mim(smoothed_counts$Lineage_B, 
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
exp_cormat <- cor(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage2))])), method = "spearman")
exp_mimat <- minet::build.mim(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage2))])),  
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
grn_table_linB <- mutate(grn_table_linB, 
                         Dyn_Cor = NA_real_, 
                         Exp_Cor = NA_real_, 
                         Dyn_MI = NA_real_, 
                         Exp_MI = NA_real_)
for (i in seq(nrow(grn_table_linB))) {
  grn_table_linB$Dyn_Cor[i] <- dyn_cormat[grn_table_linB$Target_Gene[i], grn_table_linB$Tx_Factor[i]]
  grn_table_linB$Dyn_MI[i] <- dyn_mimat[grn_table_linB$Target_Gene[i], grn_table_linB$Tx_Factor[i]]
  grn_table_linB$Exp_Cor[i] <- exp_cormat[grn_table_linB$Target_Gene[i], grn_table_linB$Tx_Factor[i]]
  grn_table_linB$Exp_MI[i] <- exp_mimat[grn_table_linB$Target_Gene[i], grn_table_linB$Tx_Factor[i]]
}
grn_table_linB <- mutate(grn_table_linB, State = if_else(Dyn_Cor < 0, "Repression", "Activation"))
grn_res_linC <- foreach(g = seq(dyn_genes), 
                        .combine = "list",
                        .multicombine = ifelse(length(dyn_genes) > 1, TRUE, FALSE),
                        .maxcombine = ifelse(length(dyn_genes) > 1, length(dyn_genes), 2),
                        .packages = c("lightgbm", "dplyr"),
                        .errorhandling = "pass",
                        .inorder = TRUE,
                        .verbose = FALSE) %dopar% {
                          feature_mat <- smoothed_counts$Lineage_C[, dynamic_tfs]
                          feature_mat <- feature_mat[, colnames(feature_mat) != dyn_genes[g]]  # remove TF as a predictor of itself
                          resp_var <- smoothed_counts$Lineage_C[, dyn_genes[g]]
                          lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                          lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                      data = lgbm_data, 
                                                      nrounds = 100L, 
                                                      nfold = 5L, 
                                                      stratified = FALSE)
                          lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = lgbm_cv$best_iter)
                          imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                       dplyr::mutate(Lineage = "C", 
                                                     Target_Gene = dyn_genes[g], 
                                                     Target_Gene_Type = if_else(Target_Gene %in% dynamic_tfs, "TF", "Target Gene"),
                                                     .before = 1)
                          imp_table
                        }
names(grn_res_linC) <- dyn_genes
grn_table_linC <- purrr::imap(grn_res_linC, \(x, y) {
  if (!inherits(x, "data.frame")) {
    empty_res <- data.frame(Lineage = "C", 
                            Target_Gene = y, 
                            Target_Gene_Type = NA_character_, 
                            Feature = NA_character_, 
                            Gain = NA_real_, 
                            Cover = NA_real_, 
                            Frequency = NA_real_)
    return(empty_res)
  } else {
    return(x)
  }
})
grn_table_linC <- purrr::reduce(grn_table_linC, rbind) %>% 
                  select(-Cover) %>% 
                  rename(Tx_Factor = Feature) %>% 
                  filter(!is.na(Target_Gene), 
                         !is.na(Tx_Factor)) %>% 
                  arrange(Tx_Factor, desc(Frequency)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Frequency_Rank = row_number()) %>% 
                  arrange(Tx_Factor, desc(Gain)) %>% 
                  with_groups(Tx_Factor, 
                              mutate, 
                              Gain_Rank = row_number()) %>% 
                  rowwise() %>% 
                  mutate(Mean_Rank = mean(c_across(c(Frequency_Rank, Gain_Rank)))) %>% 
                  ungroup() %>% 
                  arrange(Tx_Factor, Mean_Rank) %>% 
                  mutate(Mean_Rank_Weight = 1 / Mean_Rank)
dyn_cormat <- cor(smoothed_counts$Lineage_C, method = "spearman")
dyn_mimat <- minet::build.mim(smoothed_counts$Lineage_C, 
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
exp_cormat <- cor(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage3))])), method = "spearman")
exp_mimat <- minet::build.mim(as.matrix(Matrix::t(seu_cortex@assays$RNA$data[dyn_genes, which(!is.na(sling_pt$Lineage3))])),  
                              estimator = "mi.empirical", 
                              disc = "equalwidth")
grn_table_linC <- mutate(grn_table_linC, 
                         Dyn_Cor = NA_real_, 
                         Exp_Cor = NA_real_, 
                         Dyn_MI = NA_real_, 
                         Exp_MI = NA_real_)
for (i in seq(nrow(grn_table_linC))) {
  grn_table_linC$Dyn_Cor[i] <- dyn_cormat[grn_table_linC$Target_Gene[i], grn_table_linC$Tx_Factor[i]]
  grn_table_linC$Dyn_MI[i] <- dyn_mimat[grn_table_linC$Target_Gene[i], grn_table_linC$Tx_Factor[i]]
  grn_table_linC$Exp_Cor[i] <- exp_cormat[grn_table_linC$Target_Gene[i], grn_table_linC$Tx_Factor[i]]
  grn_table_linC$Exp_MI[i] <- exp_mimat[grn_table_linC$Target_Gene[i], grn_table_linC$Tx_Factor[i]]
}
grn_table_linC <- mutate(grn_table_linC, State = if_else(Dyn_Cor < 0, "Repression", "Activation"))
parallel::stopCluster(cl)

We collate all three GRNs into a single overall table:

Code

grn_table_all <- bind_rows(grn_table_linA, 
                           grn_table_linB, 
                           grn_table_linC) %>% 
                 arrange(Lineage, 
                         Tx_Factor, 
                         Mean_Rank)

For each lineage we create a directed, weighted graph of the relationships between TFs and target genes using the igraph package. In the absence of a better, more sophisticated heuristic, we prune the graphs by selecting just the top 5 TFs per target gene. We use the inverse of the mean feature importance rank as edge weights.

Code

graph_df_linA <- select(grn_table_linA, 
                        Tx_Factor, 
                        Target_Gene,
                        Target_Gene_Type, 
                        Mean_Rank, 
                        Mean_Rank_Weight, 
                        Frequency, 
                        Gain, 
                        Dyn_MI, 
                        State) %>% 
                 arrange(Target_Gene, Mean_Rank) %>% 
                 with_groups(Target_Gene, 
                             slice_head, 
                             n = 5)
graph_obj_linA <- graph_from_data_frame(graph_df_linA, directed = TRUE)
E(graph_obj_linA)$weight <- graph_df_linA$Mean_Rank_Weight
graph_df_linB <- select(grn_table_linB, 
                        Tx_Factor, 
                        Target_Gene,
                        Target_Gene_Type, 
                        Mean_Rank, 
                        Mean_Rank_Weight,
                        Frequency, 
                        Gain, 
                        Dyn_MI, 
                        State) %>% 
                 arrange(Target_Gene, Mean_Rank) %>% 
                 with_groups(Target_Gene, 
                             slice_head, 
                             n = 5)
graph_obj_linB <- graph_from_data_frame(graph_df_linB, directed = TRUE)
E(graph_obj_linB)$weight <- graph_df_linB$Mean_Rank_Weight
graph_df_linC <- select(grn_table_linC, 
                        Tx_Factor, 
                        Target_Gene,
                        Target_Gene_Type, 
                        Mean_Rank, 
                        Mean_Rank_Weight,
                        Frequency, 
                        Gain, 
                        Dyn_MI, 
                        State) %>% 
                 arrange(Target_Gene, Mean_Rank) %>% 
                 with_groups(Target_Gene, 
                             slice_head, 
                             n = 5)
graph_obj_linC <- graph_from_data_frame(graph_df_linC, directed = TRUE)
E(graph_obj_linC)$weight <- graph_df_linC$Mean_Rank_Weight

One target gene of particular interest is LIM domain only 3 (LMO3). This gene is predominantly expressed in the brain, is involved in neurogenesis (source), and potentially acts as an oncogene during neuroblastoma (source) - so a comparison of its behavior across lineages should be pretty neat. We use a Fruchterman-Reingold layout to embed the first-order neighborhood graphs in 2D space via the layout_with_fr() function from igraph. We see that not only do the top target genes differ widely by lineage, so too do the directions of the relationships (activation vs. repression), and the strengths of those relationships. For example, we can see that in lineages A & C that LMO3 is repressed by EOMES and activated by MEF2C - though the MEF2C activation is stronger in lineage C. Conversely, in lineage B LMO3 is activated by both NFIA and NFIB, though neither TF appears to act on LMO3 in either of the other two lineages.

Code

set.seed(12)
ranks <- purrr::map(list(graph_df_linA, graph_df_linB, graph_df_linC), \(x) {
  filter(x, Target_Gene == "LMO3") %>% 
  pull(Mean_Rank_Weight)
})
rank_min <- min(purrr::reduce(ranks, c))
rank_max <- max(purrr::reduce(ranks, c))
LMO3_node <- which(names(V(graph_obj_linA)) == "LMO3")
p11 <- fortify(make_ego_graph(graph_obj_linA, nodes = LMO3_node, order = 1)[[1]], 
               layout = layout_with_fr(graph = make_ego_graph(graph_obj_linA, nodes = LMO3_node, order = 1)[[1]], 
                                       grid = "nogrid", 
                                       weights = E(make_ego_graph(graph_obj_linA, nodes = LMO3_node, order = 1)[[1]])$weights),
               arrow.gap = .05) %>% 
       mutate(Gene_Type = if_else(name %in% hs_tfs$hgnc_symbol, "TF", "Target Gene")) %>% 
       ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + 
       facet_wrap(~"Lineage A") + 
       geom_edges(aes(linewidth = Mean_Rank_Weight, alpha = Mean_Rank_Weight, color = State), 
                  arrow = arrow(length = unit(8, "pt"), type = "closed")) + 
       geom_nodelabel(aes(label = name, fill = Gene_Type), 
                      size = 5) + 
       scale_color_manual(values = c("forestgreen", "firebrick")) + 
       scale_fill_manual(values = c("darkorange2", "dodgerblue3")) + 
       scale_linewidth_continuous(range = c(0.5, 2), limits = c(rank_min, rank_max)) + 
       scale_alpha_continuous(range = c(0.75, 1), limits = c(rank_min, rank_max)) + 
       labs(x = "FR 1", 
            y = "FR 2", 
            color = "TF State", 
            fill = "Gene Type", 
            linewidth = "Rank Weight", 
            alpha = "Rank Weight") + 
       theme_scLANE(umap = TRUE) + 
       guides(fill = guide_legend(override.aes = list(label = ""), order = 1), 
              color = guide_legend(order = 2))
LMO3_node <- which(names(V(graph_obj_linB)) == "LMO3")
p12 <- fortify(make_ego_graph(graph_obj_linB, nodes = LMO3_node, order = 1)[[1]], 
               layout = layout_with_fr(graph = make_ego_graph(graph_obj_linB, nodes = LMO3_node, order = 1)[[1]], 
                                       grid = "nogrid", 
                                       weights = E(make_ego_graph(graph_obj_linB, nodes = LMO3_node, order = 1)[[1]])$weights),
               arrow.gap = .05) %>% 
       mutate(Gene_Type = if_else(name %in% hs_tfs$hgnc_symbol, "TF", "Target Gene")) %>% 
       ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + 
       facet_wrap(~"Lineage B") + 
       geom_edges(aes(linewidth = Mean_Rank_Weight, alpha = Mean_Rank_Weight, color = State), 
                  arrow = arrow(length = unit(8, "pt"), type = "closed")) + 
       geom_nodelabel(aes(label = name, fill = Gene_Type), 
                      size = 5) + 
       scale_color_manual(values = c("forestgreen", "firebrick")) + 
       scale_fill_manual(values = c("darkorange2", "dodgerblue3")) + 
       scale_linewidth_continuous(range = c(0.5, 2), limits = c(rank_min, rank_max)) + 
       scale_alpha_continuous(range = c(0.75, 1), limits = c(rank_min, rank_max)) + 
       labs(x = "FR 1", 
            y = "FR 2", 
            color = "TF State", 
            fill = "Gene Type", 
            linewidth = "Rank Weight", 
            alpha = "Rank Weight") + 
       theme_scLANE(umap = TRUE) + 
       guides(fill = guide_legend(override.aes = list(label = ""), order = 1), 
              color = guide_legend(order = 2))
LMO3_node <- which(names(V(graph_obj_linC)) == "LMO3")
p13 <- fortify(make_ego_graph(graph_obj_linC, nodes = LMO3_node, order = 1)[[1]], 
               layout = layout_with_fr(graph = make_ego_graph(graph_obj_linC, nodes = LMO3_node, order = 1)[[1]], 
                                       grid = "nogrid", 
                                       weights = E(make_ego_graph(graph_obj_linC, nodes = LMO3_node, order = 1)[[1]])$weights),
               arrow.gap = .05) %>% 
       mutate(Gene_Type = if_else(name %in% hs_tfs$hgnc_symbol, "TF", "Target Gene")) %>% 
       ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + 
       facet_wrap(~"Lineage C") + 
       geom_edges(aes(linewidth = Mean_Rank_Weight, alpha = Mean_Rank_Weight, color = State), 
                  arrow = arrow(length = unit(8, "pt"), type = "closed")) + 
       geom_nodelabel(aes(label = name, fill = Gene_Type), 
                      size = 5) + 
       scale_color_manual(values = c("forestgreen", "firebrick")) + 
       scale_fill_manual(values = c("darkorange2", "dodgerblue3")) + 
       scale_linewidth_continuous(range = c(0.5, 2), limits = c(rank_min, rank_max)) + 
       scale_alpha_continuous(range = c(0.75, 1), limits = c(rank_min, rank_max)) + 
       labs(x = "FR 1", 
            y = "FR 2", 
            color = "TF State", 
            fill = "Gene Type", 
            linewidth = "Rank Weight", 
            alpha = "Rank Weight") + 
       theme_scLANE(umap = TRUE) + 
       guides(fill = guide_legend(override.aes = list(label = ""), order = 1), 
              color = guide_legend(order = 2))
p14 <- (p11 / p12 / p13) + 
       plot_layout(guides = "collect", axes = "collect")
p14

Figure 7: Fruchterman-Reingold embeddings of the first-order relationships between *LMO3* and lineage-specific TFs. Lines are weighted & shaded by the weighted ranking score between TF & target gene dynamics that we computed earlier.

The make_ego_graph() function allows us to pull out the first-order interactions for a given transcription factor; here we display those (admittedly complex) relationships for eomesodermin (EOMES). This highlights the multi-faceted roles of TFs during development, and the complexity of GRNs in general.

Code

EOMES_node <- which(names(V(graph_obj_linA)) == "EOMES")
set.seed(5)
p15 <- fortify(make_ego_graph(graph_obj_linA, nodes = EOMES_node, order = 1)[[1]], 
               layout = layout_with_fr(graph = make_ego_graph(graph_obj_linA, nodes = EOMES_node, order = 1)[[1]], 
                                       grid = "nogrid", 
                                       weights = E(make_ego_graph(graph_obj_linA, nodes = EOMES_node, order = 1)[[1]])$weights^3),
               arrow.gap = .05) %>% 
       mutate(Gene_Type = if_else(name %in% hs_tfs$hgnc_symbol, "TF", "Target Gene")) %>% 
       ggplot(aes(x = x, y = y, xend = xend, yend = yend)) + 
       facet_wrap(~"Lineage A") + 
       geom_edges(aes(linewidth = Mean_Rank_Weight, alpha = Mean_Rank_Weight, color = State), 
                 arrow = arrow(length = unit(8, "pt"), type = "closed")) + 
       geom_nodelabel(aes(label = name, fill = Gene_Type), 
                     size = 5) + 
       scale_color_manual(values = c("forestgreen", "firebrick")) + 
       scale_fill_manual(values = c("darkorange2", "dodgerblue3")) + 
       scale_linewidth_continuous(range = c(0.5, 2)) + 
       scale_alpha_continuous(range = c(0.75, 1)) + 
       labs(x = "FR 1", 
           y = "FR 2", 
           color = "TF State", 
           fill = "Gene Type", 
           linewidth = "Rank Weight", 
           alpha = "Rank Weight") + 
       theme_scLANE(umap = TRUE) + 
       guides(fill = guide_legend(override.aes = list(label = ""), order = 1), 
             color = guide_legend(order = 2))
p15

Figure 8: Fruchterman-Reingold embedding of the first-order relationships for *EOMES*. Lines are weighted and shaded as in Figure 7.

7 Conclusions

Hopefully this analysis has demonstrated the utility of trajectory GRNs for comparing gene dynamics across lineages. Beyond just visual comparison of how gene expression changes over pseudotime, trajectory GRNs allow us to identify which transcription factors are activating or suppressing target genes, how top TF targets differ between lineages, and the ways in which TF-target relationships change based on the cell fate towards which cells are progressing. To conclude, I’ve written up the methodology for creating a trajectory GRN into a function, which is shown in the code block below. It includes functionality for a progress bar if desired, and takes as input matrices of gene dynamics (dyn.mat) and normalized expression (expr.mat), a set of dynamic genes (dyn.genes), and a set of transcription factors (tx.factors). Eventually I plan to expand this functionality and turn it into an R package, but for now it’ll stay a simple function.

Code

estimateTGRN <- function(dyn.mat = NULL,
                         expr.mat = NULL, 
                         dyn.genes = NULL, 
                         tx.factors = NULL, 
                         cor.method = "spearman",
                         n.cores = 4L, 
                         verbose = TRUE, 
                         random.seed = 312) {
  # check inputs 
  if (is.null(expr.mat) || is.null(dyn.mat) || is.null(dyn.genes) || is.null(tx.factors)) { stop("Arguments to estimateTGRN() are missing.") }
  # ID transcription factors 
  tx.factors <- tx.factors[tx.factors %in% dyn.genes]
  # set up progress bar if desired
  if (verbose) {
    withr::with_output_sink(tempfile(), {
      pb <- utils::txtProgressBar(0, length(target_genes), style = 3)
    })
    progress_fun <- function(n) utils::setTxtProgressBar(pb, n)
    snow_opts <- list(progress = progress_fun)
  } else {
    snow_opts <- list()
  }
  # set up parallel processing
  if (n.cores > 1L) {
    cl <- parallel::makeCluster(n.cores)
    doSNOW::registerDoSNOW(cl)
  } else {
    cl <- foreach::registerDoSEQ()
  }
  # set up LightGBM model settings
  lgbm_params <- list(objective = "gamma", 
                      tree_learner = "serial", 
                      metric = "l2", 
                      boosting_type = "dart",
                      device = "cpu",
                      num_threads = 1L, 
                      seed = random.seed)
  # estimate trajectory GRN
  grn_res <- foreach::foreach(g = seq(dyn.genes), 
                              .combine = "list",
                              .multicombine = ifelse(length(dyn.genes) > 1, TRUE, FALSE),
                              .maxcombine = ifelse(length(dyn.genes) > 1, length(dyn.genes), 2),
                              .packages = c("lightgbm", "dplyr"),
                              .errorhandling = "pass",
                              .inorder = TRUE,
                              .verbose = FALSE,
                              .options.snow = snow_opts) %dopar% {
                                feature_mat <- dyn.mat[, tx.factors]
                                feature_mat <- feature_mat[, colnames(feature_mat) != dyn.genes[g]] 
                                resp_var <- dyn.mat[, dyn.genes[g]]
                                lgbm_data <- lightgbm::lgb.Dataset(data = feature_mat, label = resp_var)
                                lgbm_cv <- lightgbm::lgb.cv(params = lgbm_params, 
                                                            data = lgbm_data, 
                                                            nrounds = 100L, 
                                                            nfold = 5L, 
                                                            stratified = FALSE)
                                lgbm_model <- lightgbm::lgb.train(params = lgbm_params, 
                                                                  data = lgbm_data, 
                                                                  nrounds = lgbm_cv$best_iter)
                                imp_table <- as.data.frame(lightgbm::lgb.importance(lgbm_model)) %>% 
                                             dplyr::mutate(Target_Gene = target_genes[g], .before = 1)
                                imp_table
                              }
  names(grn_res) <- target_genes
  if (n.cores > 1L) {
    parallel::stopCluster(cl)
  }
  # format GRN table
  grn_table <- purrr::imap(grn_res, \(x, y) {
    if (!inherits(x, "data.frame")) {
      empty_res <- data.frame(Target_Gene = y, 
                              Feature = NA_character_, 
                              Gain = NA_real_, 
                              Cover = NA_real_, 
                              Frequency = NA_real_)
      return(empty_res)
    } else {
      return(x)
    }
  })
  # add mean rank of gain & frequency
  grn_table <- purrr::reduce(grn_table, rbind) %>% 
               dplyr::select(-Cover) %>% 
               dplyr::rename(Tx_Factor = Feature) %>% 
               dplyr::filter(!is.na(Target_Gene), 
                             !is.na(Tx_Factor)) %>% 
               dplyr::arrange(Tx_Factor, desc(Frequency)) %>% 
               dplyr::with_groups(Tx_Factor, 
                                  dplyr::mutate, 
                                  Frequency_Rank = row_number()) %>% 
               dplyr::arrange(Tx_Factor, desc(Gain)) %>% 
               dplyr::with_groups(Tx_Factor, 
                                  dplyr::mutate, 
                                  Gain_Rank = row_number()) %>% 
               dplyr::rowwise() %>% 
               dplyr::mutate(Mean_Rank = mean(dplyr::c_across(c(Frequency_Rank, Gain_Rank)))) %>% 
               dplyr::ungroup() %>% 
               dplyr::arrange(Tx_Factor, Mean_Rank) %>% 
               dplyr::mutate(Mean_Rank_Weight = 1 / Mean_Rank)
  # add correlation & mutual information b/t TF and each target gene (dynamics and normalized expression)
  dyn_cormat <- stats::cor(dyn.mat[, dyn.genes], method = cor.method)
  expr_cormat <- stats::cor(expr.mat[, dyn.genes], method = cor.method)
  dyn_mimat <- minet::build.mim(dyn.mat[, dyn.genes], 
                                estimator = "mi.empirical", 
                                disc = "equalwidth")
  expr_mimat <- minet::build.mim(expr.mat[, dyn.genes],  
                                 estimator = "mi.empirical", 
                                 disc = "equalwidth")
  grn_table <- dplyr::mutate(grn_table, 
                             Dyn_Cor = NA_real_, 
                             Exp_Cor = NA_real_, 
                             Dyn_MI = NA_real_, 
                             Exp_MI = NA_real_)
  for (i in seq(nrow(grn_table))) {
    grn_table$Dyn_Cor[i] <- dyn_cormat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
    grn_table$Dyn_MI[i] <- dyn_mimat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
    grn_table$Exp_Cor[i] <- expr_cormat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
    grn_table$Exp_MI[i] <- expr_mimat[grn_table$Target_Gene[i], grn_table$Tx_Factor[i]]
  }
  grn_table <- dplyr::mutate(grn_table, State = dplyr::if_else(Dyn_Cor < 0, "Repression", "Activation"))
  return(grn_table)
}

8 Session info

Code

sessioninfo::session_info()

─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Sonoma 14.3
 system   x86_64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-02-27
 pandoc   3.1.9 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package                * version    date (UTC) lib source
 abind                    1.4-5      2016-07-21 [1] CRAN (R 4.3.0)
 AnnotationDbi            1.64.1     2023-11-03 [1] Bioconductor
 AnnotationFilter         1.26.0     2023-10-24 [1] Bioconductor
 AnnotationHub            3.10.0     2023-10-24 [1] Bioconductor
 backports                1.4.1      2021-12-13 [1] CRAN (R 4.3.0)
 beeswarm                 0.4.0      2021-06-01 [1] CRAN (R 4.3.0)
 bigassertr               0.1.6      2023-01-10 [1] CRAN (R 4.3.0)
 bigparallelr             0.3.2      2021-10-02 [1] CRAN (R 4.3.0)
 bigstatsr                1.5.12     2022-10-14 [1] CRAN (R 4.3.0)
 Biobase                * 2.62.0     2023-10-24 [1] Bioconductor
 BiocFileCache            2.10.1     2023-10-26 [1] Bioconductor
 BiocGenerics           * 0.48.1     2023-11-01 [1] Bioconductor
 BiocIO                   1.12.0     2023-10-24 [1] Bioconductor
 BiocManager              1.30.22    2023-08-08 [1] CRAN (R 4.3.0)
 BiocParallel             1.36.0     2023-10-24 [1] Bioconductor
 BiocVersion              3.18.1     2023-11-15 [1] Bioconductor
 biomaRt                * 2.58.0     2023-10-24 [1] Bioconductor
 Biostrings               2.70.1     2023-10-25 [1] Bioconductor
 bit                      4.0.5      2022-11-15 [1] CRAN (R 4.3.0)
 bit64                    4.0.5      2020-08-30 [1] CRAN (R 4.3.0)
 bitops                   1.0-7      2021-04-24 [1] CRAN (R 4.3.0)
 blob                     1.2.4      2023-03-17 [1] CRAN (R 4.3.0)
 boot                     1.3-28.1   2022-11-22 [1] CRAN (R 4.3.2)
 broom                    1.0.5      2023-06-09 [1] CRAN (R 4.3.0)
 broom.mixed              0.2.9.4    2022-04-17 [1] CRAN (R 4.3.0)
 cachem                   1.0.8      2023-05-01 [1] CRAN (R 4.3.0)
 car                      3.1-2      2023-03-30 [1] CRAN (R 4.3.0)
 carData                  3.0-5      2022-01-06 [1] CRAN (R 4.3.0)
 cli                      3.6.2      2023-12-11 [1] CRAN (R 4.3.0)
 cluster                  2.1.6      2023-12-01 [1] CRAN (R 4.3.0)
 coda                     0.19-4     2020-09-30 [1] CRAN (R 4.3.0)
 codetools                0.2-19     2023-02-01 [1] CRAN (R 4.3.2)
 colorspace               2.1-0      2023-01-23 [1] CRAN (R 4.3.0)
 cowplot                  1.1.2      2023-12-15 [1] CRAN (R 4.3.0)
 crayon                   1.5.2      2022-09-29 [1] CRAN (R 4.3.0)
 curl                     5.2.0      2023-12-08 [1] CRAN (R 4.3.0)
 data.table               1.14.10    2023-12-08 [1] CRAN (R 4.3.0)
 DBI                      1.2.0      2023-12-21 [1] CRAN (R 4.3.0)
 dbplyr                   2.4.0      2023-10-26 [1] CRAN (R 4.3.0)
 DelayedArray             0.28.0     2023-10-24 [1] Bioconductor
 DelayedMatrixStats       1.24.0     2023-10-24 [1] Bioconductor
 deldir                   2.0-2      2023-11-23 [1] CRAN (R 4.3.0)
 digest                   0.6.33     2023-07-07 [1] CRAN (R 4.3.0)
 doParallel               1.0.17     2022-02-07 [1] CRAN (R 4.3.0)
 doSNOW                   1.0.20     2022-02-04 [1] CRAN (R 4.3.0)
 dotCall64                1.1-1      2023-11-28 [1] CRAN (R 4.3.0)
 dplyr                  * 1.1.4      2023-11-17 [1] CRAN (R 4.3.0)
 ellipsis                 0.3.2      2021-04-29 [1] CRAN (R 4.3.0)
 emmeans                  1.10.0     2024-01-23 [1] CRAN (R 4.3.2)
 ensembldb                2.26.0     2023-10-24 [1] Bioconductor
 estimability             1.4.1      2022-08-05 [1] CRAN (R 4.3.0)
 evaluate                 0.23       2023-11-01 [1] CRAN (R 4.3.0)
 ExperimentHub            2.10.0     2023-10-24 [1] Bioconductor
 fansi                    1.0.6      2023-12-08 [1] CRAN (R 4.3.0)
 farver                   2.1.1      2022-07-06 [1] CRAN (R 4.3.0)
 fastDummies              1.7.3      2023-07-06 [1] CRAN (R 4.3.0)
 fastmap                  1.1.1      2023-02-24 [1] CRAN (R 4.3.0)
 ff                       4.0.9      2023-01-25 [1] CRAN (R 4.3.0)
 filelock                 1.0.3      2023-12-11 [1] CRAN (R 4.3.0)
 fitdistrplus             1.1-11     2023-04-25 [1] CRAN (R 4.3.0)
 flock                    0.7        2016-11-12 [1] CRAN (R 4.3.0)
 forcats                  1.0.0      2023-01-29 [1] CRAN (R 4.3.0)
 foreach                * 1.5.2      2022-02-02 [1] CRAN (R 4.3.0)
 furrr                    0.3.1      2022-08-15 [1] CRAN (R 4.3.0)
 future                   1.33.1     2023-12-22 [1] CRAN (R 4.3.0)
 future.apply             1.11.1     2023-12-21 [1] CRAN (R 4.3.0)
 gamlss                   5.4-20     2023-10-04 [1] CRAN (R 4.3.0)
 gamlss.data              6.0-2      2021-11-07 [1] CRAN (R 4.3.0)
 gamlss.dist              6.1-1      2023-08-23 [1] CRAN (R 4.3.0)
 geeM                     0.10.1     2018-06-18 [1] CRAN (R 4.3.0)
 generics                 0.1.3      2022-07-05 [1] CRAN (R 4.3.0)
 GenomeInfoDb           * 1.38.5     2023-12-28 [1] Bioconductor 3.18 (R 4.3.2)
 GenomeInfoDbData         1.2.11     2023-12-22 [1] Bioconductor
 GenomicAlignments        1.38.0     2023-10-24 [1] Bioconductor
 GenomicFeatures          1.54.1     2023-10-29 [1] Bioconductor
 GenomicRanges          * 1.54.1     2023-10-29 [1] Bioconductor
 ggbeeswarm               0.7.2      2023-04-29 [1] CRAN (R 4.3.0)
 ggnetwork              * 0.5.12     2024-02-12 [1] Github (briatte/ggnetwork@f3b8b84)
 ggplot2                * 3.4.4      2023-10-12 [1] CRAN (R 4.3.0)
 ggpubr                   0.6.0      2023-02-10 [1] CRAN (R 4.3.0)
 ggrepel                  0.9.5      2024-01-10 [1] CRAN (R 4.3.0)
 ggridges                 0.5.5      2023-12-15 [1] CRAN (R 4.3.0)
 ggsignif                 0.6.4      2022-10-13 [1] CRAN (R 4.3.0)
 glm2                   * 1.2.1      2018-08-11 [1] CRAN (R 4.3.0)
 glmmTMB                  1.1.8      2023-10-07 [1] CRAN (R 4.3.0)
 globals                  0.16.2     2022-11-21 [1] CRAN (R 4.3.0)
 glue                     1.6.2      2022-02-24 [1] CRAN (R 4.3.0)
 goftest                  1.2-3      2021-10-07 [1] CRAN (R 4.3.0)
 gridExtra                2.3        2017-09-09 [1] CRAN (R 4.3.0)
 gtable                   0.3.4      2023-08-21 [1] CRAN (R 4.3.0)
 highr                    0.10       2022-12-22 [1] CRAN (R 4.3.0)
 hms                      1.1.3      2023-03-21 [1] CRAN (R 4.3.0)
 htmltools                0.5.7      2023-11-03 [1] CRAN (R 4.3.0)
 htmlwidgets              1.6.4      2023-12-06 [1] CRAN (R 4.3.0)
 httpuv                   1.6.13     2023-12-06 [1] CRAN (R 4.3.0)
 httr                     1.4.7      2023-08-15 [1] CRAN (R 4.3.0)
 ica                      1.0-3      2022-07-08 [1] CRAN (R 4.3.0)
 igraph                 * 2.0.1.1    2024-01-30 [1] CRAN (R 4.3.2)
 infotheo                 1.2.0.1    2022-04-08 [1] CRAN (R 4.3.0)
 interactiveDisplayBase   1.40.0     2023-10-24 [1] Bioconductor
 IRanges                * 2.36.0     2023-10-24 [1] Bioconductor
 irlba                    2.3.5.1    2022-10-03 [1] CRAN (R 4.3.0)
 iterators                1.0.14     2022-02-05 [1] CRAN (R 4.3.0)
 janitor                  2.2.0      2023-02-02 [1] CRAN (R 4.3.0)
 jsonlite                 1.8.8      2023-12-04 [1] CRAN (R 4.3.0)
 kableExtra               1.3.4      2021-02-20 [1] CRAN (R 4.3.0)
 KEGGREST                 1.42.0     2023-10-24 [1] Bioconductor
 KernSmooth               2.23-22    2023-07-10 [1] CRAN (R 4.3.2)
 knitr                    1.45       2023-10-30 [1] CRAN (R 4.3.0)
 labeling                 0.4.3      2023-08-29 [1] CRAN (R 4.3.0)
 later                    1.3.2      2023-12-06 [1] CRAN (R 4.3.0)
 lattice                  0.22-5     2023-10-24 [1] CRAN (R 4.3.0)
 lazyeval                 0.2.2      2019-03-15 [1] CRAN (R 4.3.0)
 leiden                   0.4.3.1    2023-11-17 [1] CRAN (R 4.3.0)
 lifecycle                1.0.4      2023-11-07 [1] CRAN (R 4.3.0)
 limma                    3.58.1     2023-10-31 [1] Bioconductor
 listenv                  0.9.0      2022-12-16 [1] CRAN (R 4.3.0)
 lme4                     1.1-35.1   2023-11-05 [1] CRAN (R 4.3.0)
 lmtest                   0.9-40     2022-03-21 [1] CRAN (R 4.3.0)
 lubridate                1.9.3      2023-09-27 [1] CRAN (R 4.3.0)
 magrittr               * 2.0.3      2022-03-30 [1] CRAN (R 4.3.0)
 MASS                     7.3-60     2023-05-04 [1] CRAN (R 4.3.0)
 Matrix                   1.6-5      2024-01-11 [1] CRAN (R 4.3.0)
 MatrixGenerics         * 1.14.0     2023-10-24 [1] Bioconductor
 matrixStats            * 1.2.0      2023-12-11 [1] CRAN (R 4.3.0)
 memoise                  2.0.1      2021-11-26 [1] CRAN (R 4.3.0)
 mgcv                     1.9-1      2023-12-21 [1] CRAN (R 4.3.0)
 mime                     0.12       2021-09-28 [1] CRAN (R 4.3.0)
 minet                    3.60.0     2023-10-24 [1] Bioconductor
 miniUI                   0.1.1.1    2018-05-18 [1] CRAN (R 4.3.0)
 minqa                    1.2.6      2023-09-11 [1] CRAN (R 4.3.0)
 munsell                  0.5.0      2018-06-12 [1] CRAN (R 4.3.0)
 mvtnorm                  1.2-4      2023-11-27 [1] CRAN (R 4.3.0)
 nlme                     3.1-164    2023-11-27 [1] CRAN (R 4.3.0)
 nloptr                   2.0.3      2022-05-26 [1] CRAN (R 4.3.0)
 numDeriv                 2016.8-1.1 2019-06-06 [1] CRAN (R 4.3.0)
 paletteer                1.6.0      2024-01-21 [1] CRAN (R 4.3.0)
 parallelly               1.36.0     2023-05-26 [1] CRAN (R 4.3.0)
 patchwork              * 1.2.0      2024-01-08 [1] CRAN (R 4.3.0)
 pbapply                  1.7-2      2023-06-27 [1] CRAN (R 4.3.0)
 pillar                   1.9.0      2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig                2.0.3      2019-09-22 [1] CRAN (R 4.3.0)
 plotly                   4.10.4     2024-01-13 [1] CRAN (R 4.3.0)
 plyr                     1.8.9      2023-10-02 [1] CRAN (R 4.3.0)
 png                      0.1-8      2022-11-29 [1] CRAN (R 4.3.0)
 polyclip                 1.10-6     2023-09-27 [1] CRAN (R 4.3.0)
 presto                   1.0.0      2023-12-27 [1] Github (immunogenomics/presto@31dc97f)
 prettyunits              1.2.0      2023-09-24 [1] CRAN (R 4.3.0)
 princurve              * 2.1.6      2021-01-18 [1] CRAN (R 4.3.0)
 prismatic                1.1.1      2022-08-15 [1] CRAN (R 4.3.0)
 progress                 1.2.3      2023-12-06 [1] CRAN (R 4.3.0)
 progressr                0.14.0     2023-08-10 [1] CRAN (R 4.3.0)
 promises                 1.2.1      2023-08-10 [1] CRAN (R 4.3.0)
 ProtGenerics             1.34.0     2023-10-24 [1] Bioconductor
 ps                       1.7.5      2023-04-18 [1] CRAN (R 4.3.0)
 purrr                    1.0.2      2023-08-10 [1] CRAN (R 4.3.0)
 R6                       2.5.1      2021-08-19 [1] CRAN (R 4.3.0)
 RANN                     2.6.1      2019-01-08 [1] CRAN (R 4.3.0)
 rappdirs                 0.3.3      2021-01-31 [1] CRAN (R 4.3.0)
 RColorBrewer             1.1-3      2022-04-03 [1] CRAN (R 4.3.0)
 Rcpp                     1.0.11     2023-07-06 [1] CRAN (R 4.3.0)
 RcppAnnoy                0.0.21     2023-07-02 [1] CRAN (R 4.3.0)
 RcppEigen                0.3.3.9.4  2023-11-02 [1] CRAN (R 4.3.0)
 RcppHNSW                 0.5.0      2023-09-19 [1] CRAN (R 4.3.0)
 RCurl                    1.98-1.13  2023-11-02 [1] CRAN (R 4.3.0)
 readr                    2.1.4      2023-02-10 [1] CRAN (R 4.3.0)
 rematch2                 2.1.2      2020-05-01 [1] CRAN (R 4.3.0)
 reshape2                 1.4.4      2020-04-09 [1] CRAN (R 4.3.0)
 restfulr                 0.0.15     2022-06-16 [1] CRAN (R 4.3.0)
 reticulate               1.34.0     2023-10-12 [1] CRAN (R 4.3.0)
 rjson                    0.2.21     2022-01-09 [1] CRAN (R 4.3.0)
 rlang                    1.1.2      2023-11-04 [1] CRAN (R 4.3.0)
 rmarkdown                2.25       2023-09-18 [1] CRAN (R 4.3.0)
 rmio                     0.4.0      2022-02-17 [1] CRAN (R 4.3.0)
 ROCR                     1.0-11     2020-05-02 [1] CRAN (R 4.3.0)
 Rsamtools                2.18.0     2023-10-24 [1] Bioconductor
 RSpectra                 0.16-1     2022-04-24 [1] CRAN (R 4.3.0)
 RSQLite                  2.3.4      2023-12-08 [1] CRAN (R 4.3.0)
 rstatix                  0.7.2      2023-02-01 [1] CRAN (R 4.3.0)
 rstudioapi               0.15.0     2023-07-07 [1] CRAN (R 4.3.0)
 rtracklayer              1.62.0     2023-10-24 [1] Bioconductor
 Rtsne                    0.17       2023-12-07 [1] CRAN (R 4.3.0)
 rvest                    1.0.3      2022-08-19 [1] CRAN (R 4.3.0)
 S4Arrays                 1.2.0      2023-10-24 [1] Bioconductor
 S4Vectors              * 0.40.2     2023-11-23 [1] Bioconductor
 scales                   1.3.0      2023-11-28 [1] CRAN (R 4.3.0)
 scattermore              1.2        2023-06-12 [1] CRAN (R 4.3.0)
 scLANE                 * 0.7.9      2024-02-18 [1] Bioconductor
 scRNAseq               * 2.16.0     2023-10-26 [1] Bioconductor
 sctransform              0.4.1      2023-10-19 [1] CRAN (R 4.3.0)
 sessioninfo              1.2.2      2021-12-06 [1] CRAN (R 4.3.0)
 Seurat                 * 5.0.1      2023-11-17 [1] CRAN (R 4.3.0)
 SeuratObject           * 5.0.1      2023-11-17 [1] CRAN (R 4.3.0)
 shiny                    1.8.0      2023-11-17 [1] CRAN (R 4.3.0)
 SingleCellExperiment   * 1.24.0     2023-10-24 [1] Bioconductor
 slingshot              * 2.10.0     2023-10-24 [1] Bioconductor
 snakecase                0.11.1     2023-08-27 [1] CRAN (R 4.3.0)
 snow                     0.4-4      2021-10-27 [1] CRAN (R 4.3.0)
 sp                     * 2.1-2      2023-11-26 [1] CRAN (R 4.3.0)
 spam                     2.10-0     2023-10-23 [1] CRAN (R 4.3.0)
 SparseArray              1.2.3      2023-12-25 [1] Bioconductor 3.18 (R 4.3.2)
 sparseMatrixStats        1.14.0     2023-10-24 [1] Bioconductor
 spatstat.data            3.0-3      2023-10-24 [1] CRAN (R 4.3.0)
 spatstat.explore         3.2-5      2023-10-22 [1] CRAN (R 4.3.0)
 spatstat.geom            3.2-7      2023-10-20 [1] CRAN (R 4.3.0)
 spatstat.random          3.2-2      2023-11-29 [1] CRAN (R 4.3.0)
 spatstat.sparse          3.0-3      2023-10-24 [1] CRAN (R 4.3.0)
 spatstat.utils           3.0-4      2023-10-24 [1] CRAN (R 4.3.0)
 statmod                  1.5.0      2023-01-06 [1] CRAN (R 4.3.0)
 stringi                  1.8.3      2023-12-11 [1] CRAN (R 4.3.0)
 stringr                  1.5.1      2023-11-14 [1] CRAN (R 4.3.0)
 SummarizedExperiment   * 1.32.0     2023-10-24 [1] Bioconductor
 survival                 3.5-7      2023-08-14 [1] CRAN (R 4.3.2)
 svglite                  2.1.3      2023-12-08 [1] CRAN (R 4.3.0)
 systemfonts              1.0.5      2023-10-09 [1] CRAN (R 4.3.0)
 tensor                   1.5        2012-05-05 [1] CRAN (R 4.3.0)
 tibble                   3.2.1      2023-03-20 [1] CRAN (R 4.3.0)
 tidyr                    1.3.1      2024-01-24 [1] CRAN (R 4.3.2)
 tidyselect               1.2.0      2022-10-10 [1] CRAN (R 4.3.0)
 timechange               0.2.0      2023-01-11 [1] CRAN (R 4.3.0)
 TMB                      1.9.10     2023-12-12 [1] CRAN (R 4.3.0)
 TrajectoryUtils        * 1.10.0     2023-10-24 [1] Bioconductor
 tzdb                     0.4.0      2023-05-12 [1] CRAN (R 4.3.0)
 utf8                     1.2.4      2023-10-22 [1] CRAN (R 4.3.0)
 uwot                     0.1.16     2023-06-29 [1] CRAN (R 4.3.0)
 vctrs                    0.6.5      2023-12-01 [1] CRAN (R 4.3.0)
 vipor                    0.4.7      2023-12-18 [1] CRAN (R 4.3.0)
 viridisLite              0.4.2      2023-05-02 [1] CRAN (R 4.3.0)
 vroom                    1.6.5      2023-12-05 [1] CRAN (R 4.3.0)
 webshot                  0.5.5      2023-06-26 [1] CRAN (R 4.3.0)
 withr                    2.5.2      2023-10-30 [1] CRAN (R 4.3.0)
 xfun                     0.41       2023-11-01 [1] CRAN (R 4.3.0)
 XML                      3.99-0.16  2023-11-29 [1] CRAN (R 4.3.0)
 xml2                     1.3.6      2023-12-04 [1] CRAN (R 4.3.0)
 xtable                   1.8-4      2019-04-21 [1] CRAN (R 4.3.0)
 XVector                  0.42.0     2023-10-24 [1] Bioconductor
 yaml                     2.3.8      2023-12-11 [1] CRAN (R 4.3.0)
 zlibbioc                 1.48.0     2023-10-24 [1] Bioconductor
 zoo                      1.8-12     2023-04-13 [1] CRAN (R 4.3.0)

 [1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library

─ Python configuration ───────────────────────────────────────────────────────
 python:         /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/bin/python
 libpython:      /usr/local/opt/python@3.11/Frameworks/Python.framework/Versions/3.11/lib/python3.11/config-3.11-darwin/libpython3.11.dylib
 pythonhome:     /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site:/Users/jack/Desktop/PhD/Research/Python_Envs/personal_site
 version:        3.11.6 (main, Nov  2 2023, 04:52:24) [Clang 14.0.3 (clang-1403.0.22.14.1)]
 numpy:          /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/lib/python3.11/site-packages/numpy
 numpy_version:  1.26.3
 leidenalg:      /Users/jack/Desktop/PhD/Research/Python_Envs/personal_site/lib/python3.11/site-packages/leidenalg
 
 NOTE: Python version was forced by use_python() function

──────────────────────────────────────────────────────────────────────────────