Annotating and Exploring Human Transcription Factors

Author
Affiliation

Jack R. Leary

University of Florida

Published

April 16, 2024

1 Introduction

This vignette is going to be a bit different from what I usually do. Instead of focusing on an scRNA-seq analysis, we’ll use web scraping to pull functional annotations and summaries for a set of human transcription factors (TFs), then use natural language processing (NLP) tools to explore the data.

2 Libraries

Code
library(tm)       # text mining
library(dplyr)    # data manipulation
library(rvest)    # HTML processing tools
library(polite)   # web scraping tools
library(plotly)   # interactive plots
library(biomaRt)  # gene annotation
select <- dplyr::select

3 Color palettes

Code
palette_cluster <- as.character(paletteer::paletteer_d("ggsci::default_locuszoom"))

4 Data

4.1 Identifying human TFs

First we need to connect to the H. sapiens Ensembl database.

Code
hs_ensembl <- useMart("ensembl",
                      dataset = "hsapiens_gene_ensembl", 
                      host = "https://useast.ensembl.org")

We’ll start the data-gathering process by reading in a complete set of all known and likely human TFs from Lambert et al (2018). We clean up the column names using the janitor package, then select just the unique Ensembl IDs.

Code
hs_tf_raw <- readr::read_csv("http://humantfs.ccbr.utoronto.ca/download/v_1.01/DatabaseExtract_v_1.01.csv",
                             col_select = -1,
                             num_threads = 2,
                             show_col_types = FALSE) %>%
             janitor::clean_names() %>%
             filter(is_tf == "Yes") %>%
             select(ensembl_id) %>% 
             distinct()

Next, using biomaRt we pull the HGNC symbol, HGNC ID, Entrez ID, gene description, and gene biotype from the Ensembl database. We perform some light data cleaning, then create a character variable called summary in which we’ll store the gene summaries we scrape.

Code
hs_tfs <- getBM(attributes = c("ensembl_gene_id", "hgnc_symbol", "hgnc_id", "entrezgene_id", "description", "gene_biotype"),
                filters = "ensembl_gene_id",
                values = hs_tf_raw$ensembl_id,
                mart = hs_ensembl,
                uniqueRows = TRUE) %>%
          rename(ensembl_id = ensembl_gene_id,
                 entrez_id = entrezgene_id) %>%
          arrange(ensembl_id) %>%
          mutate(hgnc_symbol = if_else(hgnc_symbol == "", NA_character_, hgnc_symbol),
                 hgnc_id = gsub("HGNC:", "", hgnc_id), 
                 description = gsub("\\[Source.*", "", description), 
                 summary = NA_character_)

Here’s what the dataset looks like so far:

Code
slice_sample(hs_tfs, n = 7) %>% 
  kableExtra::kbl(booktabs = TRUE, 
                  col.names = c("Ensembl ID", "HGNC Symbol", "HGNC ID", "Entrez ID", "Description", "Biotype", "Summary")) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")
Ensembl ID HGNC Symbol HGNC ID Entrez ID Description Biotype Summary
ENSG00000150347 ARID5B 17362 84159 AT-rich interaction domain 5B protein_coding NA
ENSG00000116731 PRDM2 9347 7799 PR/SET domain 2 protein_coding NA
ENSG00000169953 HSFY2 23950 159119 heat shock transcription factor Y-linked 2 protein_coding NA
ENSG00000130675 MNX1 4979 3110 motor neuron and pancreas homeobox 1 protein_coding NA
ENSG00000167377 ZNF23 13023 7571 zinc finger protein 23 protein_coding NA
ENSG00000141905 NFIC 7786 4782 nuclear factor I C protein_coding NA
ENSG00000196417 ZNF765 25092 91661 zinc finger protein 765 protein_coding NA
Table 1: A random sample of the human transcription factors.

4.2 Scraping gene summaries

Now the gene descriptions we have as shown in Table 1 are useful, but they don’t really tell us a whole lot about how each TF actually functions. To retrieve a functional summary for our TFs we’ll use web scraping to pull the NCBI summary for each one. To make this happen we’ll use the rvest and polite packages. The rvest package contains a variety of tools for processing HTML data, while polite ensures you do so while respecting the scraping rules each site has in a file called robots.txt (more information here). This makes scraping a little bit slower, but also ensures that you (almost certainly) won’t get banned from the site for doing so.

We’ll start by pulling a summary for one gene as an example. The NCBI site uses the numeric Entrez ID of each gene e.g., for the TF forkhead box A3 (FOXA3) the Entrez ID is 3171. Next, we identify the site rules using the bow() function, which also creates a web session. Then we scrape the actual content of the page using the aptly-named scrape() function.

Code
entrez_ID_FOXA3 <- filter(hs_tfs, hgnc_symbol == "FOXA3") %>% 
                   pull(entrez_id)
ncbi_url <- paste0("https://www.ncbi.nlm.nih.gov/gene?Cmd=DetailsSearch&Term=", entrez_ID_FOXA3)
web_page <- bow(ncbi_url)
web_page_scraped <- scrape(web_page)

This is where it gets a little tricky. In order to correctly identify which bit of the HTML content to extract, it’s necessary to use something like the web inspector mode (guide for Safari, guide for Chrome) to pull the CSS selector for the summary element. This is provided to the html_node() function, after which we pull the raw text using html_text(). Finally, after a little text cleanup we have the summary for FOXA3 in plain English!

Code
summary_text <- html_node(web_page_scraped, '#summaryDl > dd:nth-child(20)') %>% 
                html_text()
summary_text <- trimws(gsub("\\[provided by.*", "", summary_text))
summary_text
[1] "This gene encodes a member of the forkhead class of DNA-binding proteins. These hepatocyte nuclear factors are transcriptional activators for liver-specific transcripts such as albumin and transthyretin, and they also interact with chromatin. Similar family members in mice have roles in the regulation of metabolism and in the differentiation of the pancreas and liver. The crystal structure of a similar protein in rat has been resolved."

We wrap the above operations into a (slightly more intelligent) function - expand the code block below to see it - which we’ll then apply to every TF in the dataset.

Code
pullGeneSummary <- function(entrez.id = NULL) {
  # check inputs
  if (is.null(entrez.id)) { stop("You must provide a valid Entrez ID.") }
  # scrape web page
  ncbi_url <- paste0("https://www.ncbi.nlm.nih.gov/gene?Cmd=DetailsSearch&Term=", entrez.id)
  web_page <- polite::bow(ncbi_url)
  web_page_scraped <- polite::scrape(web_page)
  # extract gene summary 
  summary_node <- rvest::html_node(web_page_scraped, '#summaryDl') %>% 
                  rvest::html_children()
  summary_node_loc <- which(as.character(summary_node) == "<dt>Summary</dt>") + 1
  if (length(summary_node_loc) == 0L) {
    summary_text <- NA_character_
  } else {
    summary_text <- rvest::html_node(web_page_scraped, paste0("#summaryDl > dd:nth-child(", summary_node_loc, ")")) %>% 
                    rvest::html_text()
    summary_text <- gsub("\\[provided by.*", "", summary_text)
    summary_text <- trimws(summary_text)
  }
  return(summary_text)
}

With our function in hand, we iterate over the set of TFs and pull the textual summary for each. This will take a while, but there’s not really a way around that - if, for example, we ran the loop in parallel, we might hit a rate limit exception due to the number of scraping requests submitted. After all the scraping, we filter out genes for which no summary was available (mostly non-coding RNAs or uncharacterized loci).

Code
for (e in seq(hs_tfs$entrez_id)) {
  hs_tfs$summary[e] <- pullGeneSummary(hs_tfs$entrez_id[e])
}
hs_tfs <- filter(hs_tfs, !is.na(summary))

For a downloadable version of the final TF table, see Table 3.

5 Analysis

5.1 Text processing

To begin our analysis we must convert our vector of textual gene summaries into some sort of numeric matrix, upon which we can perform downstream analysis tasks. We’ll leverage the tm package, which implements a variety of tools for text processing. The conversion we’ll be performing is referred to as tokenization. In this case, we’ll be treating each summary as its own “document” and each word as a token. The total set of all documents is referred to as a corpus.

Code
gene_summary_vec <- pull(hs_tfs, summary)
gene_summary_corpus <- Corpus(VectorSource(gene_summary_vec))

Next we perform several preprocessing steps including the removal of punctuation, numbers, unimportant words called stopwords (more information here), and whitespace.

Code
gene_summary_corpus <- tm_map(gene_summary_corpus, content_transformer(tolower)) %>% 
                       tm_map(removePunctuation) %>% 
                       tm_map(removeNumbers) %>% 
                       tm_map(removeWords, stopwords("english")) %>% 
                       tm_map(stripWhitespace)

Next, we create a document-term matrix (DTM) - a matrix specifying which words occur in which documents. After we create the DTM, we use term frequency-inverse document frequency (TF-IDF) weighting to assign an “importance” to each term. This value essentially tells us how specific a given term is to a given document.

Code
gene_summary_DTM <- DocumentTermMatrix(gene_summary_corpus)
gene_summary_TFIDF <- weightTfIdf(gene_summary_DTM)
gene_summary_TFIDF_mat <- as.matrix(gene_summary_TFIDF)
rownames(gene_summary_TFIDF_mat) <- pull(hs_tfs, entrez_id)

Here’s a glance at the TF-IDF matrix:

Code
as.data.frame(gene_summary_TFIDF_mat[1:5, 1:5]) %>% 
  kableExtra::kbl(booktabs = TRUE) %>% 
  kableExtra::kable_classic(full_width = FALSE, "hover")
activation addition affinity alternative associates
4800 0.0811234 0.0901228 0.1179282 0.0485776 0.120892
170302 0.0000000 0.0000000 0.0000000 0.0000000 0.000000
3207 0.0000000 0.0000000 0.0000000 0.0000000 0.000000
4222 0.0000000 0.0000000 0.0000000 0.0000000 0.000000
30812 0.0000000 0.0000000 0.0000000 0.0000000 0.000000
Table 2: The first 5 rows (Entrez IDs) and columns (document terms) in our TF-IDF matrix.

5.2 Graph-based clustering

After creating a shared nearest-neighbors (SNN) graph with \(k = 20\) neighbors, we utilize the Leiden algorithm to sort the graph into communities, or clusters. We use the cosine distance instead of the default Euclidean. This is a common practice in the NLP field since Euclidean distance breaks down in high dimensions - especially with sparse data (see e.g., this old CrossValidated post).

Code
SNN_graph <- bluster::makeSNNGraph(gene_summary_TFIDF_mat, 
                                   k = 20, 
                                   BNPARAM = BiocNeighbors::AnnoyParam(distance = "Cosine"))
gene_summary_clusters <- igraph::cluster_leiden(SNN_graph,
                                                objective_function = "modularity", 
                                                resolution_parameter = 1)

5.3 Embeddings

We first generate a linear embedding of the TF-IDF matrix in 30 dimensions with PCA.

Code
pca_embedding <- irlba::prcomp_irlba(gene_summary_TFIDF_mat, 
                                     n = 30,
                                     center = TRUE, 
                                     scale. = TRUE)

Next, we generate a nonlinear two-dimensional embedding of the matrix via UMAP. We tweak the default settings a bit based on my prior experience with the algorithm.

Code
umap_embedding <- uwot::umap(gene_summary_TFIDF_mat,
                             n_neighbors = 20, 
                             n_components = 2, 
                             metric = "cosine",
                             n_epochs = 750,
                             nn_method = "annoy", 
                             ret_model = TRUE, 
                             ret_nn = TRUE, 
                             ret_extra = c("fgraph"),
                             n_threads = 2,
                             seed = 312)

We use t-SNE to generate a final 2D embedding.

Code
tsne_embedding <- Rtsne::Rtsne(gene_summary_TFIDF_mat, 
                               dims = 2, 
                               perplexity = 30, 
                               check_duplicates = FALSE, 
                               pca = FALSE)

Finally, we create a table of our embeddings plus our clustering, which we’ll use for visualization.

Code
embed_df <- data.frame(entrez_id = pull(hs_tfs, entrez_id), 
                       hgnc_symbol = pull(hs_tfs, hgnc_symbol), 
                       ensembl_id = pull(hs_tfs, ensembl_id), 
                       description = pull(hs_tfs, description), 
                       pc1 = pca_embedding$x[ ,1], 
                       pc2 = pca_embedding$x[ ,2], 
                       umap1 = umap_embedding$embedding[, 1], 
                       umap2 = umap_embedding$embedding[, 2], 
                       tsne1 = tsne_embedding$Y[, 1], 
                       tsne2 = tsne_embedding$Y[, 2], 
                       leiden = factor(gene_summary_clusters$membership))

Using the plotly library we can produce interactive visualizations - hover over each observation to see gene IDs and plot coordinates! Examining the PCA embedding, we see some separation by cluster along the first PC, with the second PC seeming to identify one outlier observation: PIN1.

Code
fig <- plot_ly(embed_df, 
               x = ~pc1,
               y = ~pc2,
               color = ~leiden, 
               text = ~paste("<i>", hgnc_symbol, "</i>", "<br>", 
                             ensembl_id, "<br>",
                             "Cluster:", leiden), 
               type = "scatter", 
               mode = "markers", 
               colors = palette_cluster) %>% 
      layout(legend = list(title = list(text = "Leiden")),
             xaxis = list(title = "PC 1", tickvals = NULL, showticklabels = FALSE, zeroline = FALSE, showline = TRUE, linewidth = 2), 
             yaxis = list(title = "PC 2", tickvals = NULL, showticklabels = FALSE, zeroline = FALSE, showline = TRUE, linewidth = 2))
fig
Figure 1: PCA embedding of the gene summary TF-IDF matrix colored by Leiden cluster.

We pull the summary for the TF:

Code
filter(hs_tfs, hgnc_symbol == "PIN1") %>% 
  pull(summary)
[1] "Peptidyl-prolyl cis/trans isomerases (PPIases) catalyze the cis/trans isomerization of peptidyl-prolyl peptide bonds. This gene encodes one of the PPIases, which specifically binds to phosphorylated ser/thr-pro motifs to catalytically regulate the post-phosphorylation conformation of its substrates. The conformational regulation catalyzed by this PPIase has a profound impact on key proteins involved in the regulation of cell growth, genotoxic and other stress responses, the immune response, induction and maintenance of pluripotency, germ cell development, neuronal differentiation, and survival. This enzyme also plays a key role in the pathogenesis of Alzheimer's disease and many cancers. Multiple alternatively spliced transcript variants have been found for this gene."

In order to determine what makes PIN1 unique we can examine the TF-IDF matrix. Using the Entrez ID for PIN1 (5300), we pull the top terms for the TF. We see that terms such as peptidyl-prolyl, PPIases, and several relating to catalysis help to define PIN1’s function.

Code
gene_summary_TFIDF_mat["5300", ] %>% 
  sort(decreasing = TRUE) %>% 
  head(n = 10)
      cistrans peptidylprolyl        ppiases            key          bonds 
     0.3183682      0.3183682      0.3183682      0.1852509      0.1591841 
 catalytically       catalyze      catalyzed conformational      genotoxic 
     0.1591841      0.1591841      0.1591841      0.1591841      0.1591841 

Indeed, if we pull the number of genes with a non-zero score for peptidyl-prolyl we find that PIN1 is the only TF with that word in its summary.

Code
sum(gene_summary_TFIDF_mat[, "peptidylprolyl"] > 0)
[1] 1

Next, the UMAP embedding seems to perform much better than PCA at preserving the cluster structure of the data (as expected). Interestingly, if you hover over cluster 5 you’ll see that it’s almost entirely composed of TFs belonging to the zinc finger protein (abbreviated ZNF or ZFP) family. This indicates that our clustering and embedding routine actually pulled out some useful structure from the data.

Code
fig <- plot_ly(embed_df, 
               x = ~umap1,
               y = ~umap2,
               color = ~leiden, 
               text = ~paste("<i>", hgnc_symbol, "</i>", "<br>", 
                             ensembl_id, "<br>", 
                             "Cluster:", leiden), 
               type = "scatter", 
               mode = "markers", 
               colors = palette_cluster) %>% 
       layout(legend = list(title = list(text = "Leiden")),
              xaxis = list(title = "UMAP 1", tickvals = NULL, showticklabels = FALSE, zeroline = FALSE, showline = TRUE, linewidth = 2), 
              yaxis = list(title = "UMAP 2", tickvals = NULL, showticklabels = FALSE, zeroline = FALSE, showline = TRUE, linewidth = 2))
fig
Figure 2: UMAP embedding of the gene summary TF-IDF matrix colored by Leiden cluster.

And lastly, the t-SNE embedding, which does not seem to preserve the cluster structure of the data well. This isn’t too much of a surprise, as UMAP generally provides better embeddings than t-SNE when used on sparse data.

Code
fig <- plot_ly(embed_df, 
               x = ~tsne1,
               y = ~tsne2,
               color = ~leiden, 
               text = ~paste("<i>", hgnc_symbol, "</i>", 
                             "<br>", ensembl_id, "<br>", 
                             "Cluster:", leiden), 
               type = "scatter", 
               mode = "markers", 
               colors = palette_cluster) %>% 
       layout(legend = list(title = list(text = "Leiden")),
              xaxis = list(title = "t-SNE 1", tickvals = NULL, showticklabels = FALSE, zeroline = FALSE, showline = TRUE, linewidth = 2), 
              yaxis = list(title = "t-SNE 2", tickvals = NULL, showticklabels = FALSE, zeroline = FALSE, showline = TRUE, linewidth = 2))
fig
Figure 3: t-SNE embedding of the gene summary TF-IDF matrix colored by Leiden cluster.

6 Conclusions

In summary, we began by identifying a peer-reviewed set of human TFs and pulling the relevant gene metadata from Ensembl. We next used web scraping to pull a functional summary of each TF. Lastly, using NLP techniques we generated a TF-IDF matrix of the per-gene summaries and estimated several low-dimensional embeddings of the latent space. This had varying results, PCA showed us some interesting information about PIN1 but didn’t retain much of the global structure. UMAP performed well, but t-SNE did not. Overall, more could probably be done to analyze this dataset, but even just having a functional summary of each TF that can be searched and used programmatically is likely useful.

The final version of the TF table is shown below.

Code
DT::datatable(hs_tfs, 
              colnames = c("Ensembl ID", "HGNC Symbol", "HGNC ID", "Entrez ID", "Description", "Biotype", "Summary"), 
              rownames = FALSE, 
              extensions = "Buttons", 
              options = list(paging = TRUE, 
                             searching = TRUE, 
                             ordering = TRUE, 
                             dom = "Bfrtip", 
                             buttons = c("csv", "excel", "pdf"),
                             pageLength = 5))
Table 3: A searchable & downloadable representation of the TF table.

7 Session info

Code
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.3.2 (2023-10-31)
 os       macOS Sonoma 14.3
 system   x86_64, darwin20
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2024-01-27
 pandoc   3.1.9 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package          * version   date (UTC) lib source
 AnnotationDbi      1.64.1    2023-11-03 [1] Bioconductor
 assertthat         0.2.1     2019-03-21 [1] CRAN (R 4.3.0)
 Biobase            2.62.0    2023-10-24 [1] Bioconductor
 BiocFileCache      2.10.1    2023-10-26 [1] Bioconductor
 BiocGenerics       0.48.1    2023-11-01 [1] Bioconductor
 BiocNeighbors      1.20.1    2023-12-18 [1] Bioconductor 3.18 (R 4.3.2)
 BiocParallel       1.36.0    2023-10-24 [1] Bioconductor
 biomaRt          * 2.58.0    2023-10-24 [1] Bioconductor
 Biostrings         2.70.1    2023-10-25 [1] Bioconductor
 bit                4.0.5     2022-11-15 [1] CRAN (R 4.3.0)
 bit64              4.0.5     2020-08-30 [1] CRAN (R 4.3.0)
 bitops             1.0-7     2021-04-24 [1] CRAN (R 4.3.0)
 blob               1.2.4     2023-03-17 [1] CRAN (R 4.3.0)
 bluster            1.12.0    2023-10-24 [1] Bioconductor
 bslib              0.6.1     2023-11-28 [1] CRAN (R 4.3.0)
 cachem             1.0.8     2023-05-01 [1] CRAN (R 4.3.0)
 cli                3.6.2     2023-12-11 [1] CRAN (R 4.3.0)
 cluster            2.1.6     2023-12-01 [1] CRAN (R 4.3.0)
 codetools          0.2-19    2023-02-01 [1] CRAN (R 4.3.2)
 colorspace         2.1-0     2023-01-23 [1] CRAN (R 4.3.0)
 crayon             1.5.2     2022-09-29 [1] CRAN (R 4.3.0)
 crosstalk          1.2.1     2023-11-23 [1] CRAN (R 4.3.0)
 curl               5.2.0     2023-12-08 [1] CRAN (R 4.3.0)
 data.table         1.14.10   2023-12-08 [1] CRAN (R 4.3.0)
 DBI                1.2.0     2023-12-21 [1] CRAN (R 4.3.0)
 dbplyr             2.4.0     2023-10-26 [1] CRAN (R 4.3.0)
 digest             0.6.33    2023-07-07 [1] CRAN (R 4.3.0)
 dplyr            * 1.1.4     2023-11-17 [1] CRAN (R 4.3.0)
 DT                 0.31      2023-12-09 [1] CRAN (R 4.3.0)
 ellipsis           0.3.2     2021-04-29 [1] CRAN (R 4.3.0)
 evaluate           0.23      2023-11-01 [1] CRAN (R 4.3.0)
 fansi              1.0.6     2023-12-08 [1] CRAN (R 4.3.0)
 farver             2.1.1     2022-07-06 [1] CRAN (R 4.3.0)
 fastmap            1.1.1     2023-02-24 [1] CRAN (R 4.3.0)
 filelock           1.0.3     2023-12-11 [1] CRAN (R 4.3.0)
 fs                 1.6.3     2023-07-20 [1] CRAN (R 4.3.0)
 generics           0.1.3     2022-07-05 [1] CRAN (R 4.3.0)
 GenomeInfoDb       1.38.5    2023-12-28 [1] Bioconductor 3.18 (R 4.3.2)
 GenomeInfoDbData   1.2.11    2023-12-22 [1] Bioconductor
 ggplot2          * 3.4.4     2023-10-12 [1] CRAN (R 4.3.0)
 glue               1.6.2     2022-02-24 [1] CRAN (R 4.3.0)
 gtable             0.3.4     2023-08-21 [1] CRAN (R 4.3.0)
 highr              0.10      2022-12-22 [1] CRAN (R 4.3.0)
 hms                1.1.3     2023-03-21 [1] CRAN (R 4.3.0)
 htmltools          0.5.7     2023-11-03 [1] CRAN (R 4.3.0)
 htmlwidgets        1.6.4     2023-12-06 [1] CRAN (R 4.3.0)
 httr               1.4.7     2023-08-15 [1] CRAN (R 4.3.0)
 igraph             1.6.0     2023-12-11 [1] CRAN (R 4.3.0)
 IRanges            2.36.0    2023-10-24 [1] Bioconductor
 irlba              2.3.5.1   2022-10-03 [1] CRAN (R 4.3.0)
 janitor            2.2.0     2023-02-02 [1] CRAN (R 4.3.0)
 jquerylib          0.1.4     2021-04-26 [1] CRAN (R 4.3.0)
 jsonlite           1.8.8     2023-12-04 [1] CRAN (R 4.3.0)
 kableExtra         1.3.4     2021-02-20 [1] CRAN (R 4.3.0)
 KEGGREST           1.42.0    2023-10-24 [1] Bioconductor
 knitr              1.45      2023-10-30 [1] CRAN (R 4.3.0)
 lattice            0.22-5    2023-10-24 [1] CRAN (R 4.3.0)
 lazyeval           0.2.2     2019-03-15 [1] CRAN (R 4.3.0)
 lifecycle          1.0.4     2023-11-07 [1] CRAN (R 4.3.0)
 lubridate          1.9.3     2023-09-27 [1] CRAN (R 4.3.0)
 magrittr           2.0.3     2022-03-30 [1] CRAN (R 4.3.0)
 Matrix             1.6-4     2023-11-30 [1] CRAN (R 4.3.0)
 memoise            2.0.1     2021-11-26 [1] CRAN (R 4.3.0)
 mime               0.12      2021-09-28 [1] CRAN (R 4.3.0)
 munsell            0.5.0     2018-06-12 [1] CRAN (R 4.3.0)
 NLP              * 0.2-1     2020-10-14 [1] CRAN (R 4.3.0)
 paletteer          1.6.0     2024-01-21 [1] CRAN (R 4.3.0)
 pillar             1.9.0     2023-03-22 [1] CRAN (R 4.3.0)
 pkgconfig          2.0.3     2019-09-22 [1] CRAN (R 4.3.0)
 plotly           * 4.10.4    2024-01-13 [1] CRAN (R 4.3.0)
 png                0.1-8     2022-11-29 [1] CRAN (R 4.3.0)
 polite           * 0.1.3     2023-06-30 [1] CRAN (R 4.3.0)
 prettyunits        1.2.0     2023-09-24 [1] CRAN (R 4.3.0)
 prismatic          1.1.1     2022-08-15 [1] CRAN (R 4.3.0)
 progress           1.2.3     2023-12-06 [1] CRAN (R 4.3.0)
 purrr              1.0.2     2023-08-10 [1] CRAN (R 4.3.0)
 R6                 2.5.1     2021-08-19 [1] CRAN (R 4.3.0)
 rappdirs           0.3.3     2021-01-31 [1] CRAN (R 4.3.0)
 ratelimitr         0.4.1     2018-10-07 [1] CRAN (R 4.3.0)
 Rcpp               1.0.11    2023-07-06 [1] CRAN (R 4.3.0)
 RcppAnnoy          0.0.21    2023-07-02 [1] CRAN (R 4.3.0)
 RCurl              1.98-1.13 2023-11-02 [1] CRAN (R 4.3.0)
 readr              2.1.4     2023-02-10 [1] CRAN (R 4.3.0)
 rematch2           2.1.2     2020-05-01 [1] CRAN (R 4.3.0)
 rlang              1.1.2     2023-11-04 [1] CRAN (R 4.3.0)
 rmarkdown          2.25      2023-09-18 [1] CRAN (R 4.3.0)
 robotstxt          0.7.13    2020-09-03 [1] CRAN (R 4.3.0)
 RSQLite            2.3.4     2023-12-08 [1] CRAN (R 4.3.0)
 rstudioapi         0.15.0    2023-07-07 [1] CRAN (R 4.3.0)
 Rtsne              0.17      2023-12-07 [1] CRAN (R 4.3.0)
 rvest            * 1.0.3     2022-08-19 [1] CRAN (R 4.3.0)
 S4Vectors          0.40.2    2023-11-23 [1] Bioconductor
 sass               0.4.8     2023-12-06 [1] CRAN (R 4.3.0)
 scales             1.3.0     2023-11-28 [1] CRAN (R 4.3.0)
 selectr            0.4-2     2019-11-20 [1] CRAN (R 4.3.0)
 sessioninfo        1.2.2     2021-12-06 [1] CRAN (R 4.3.0)
 slam               0.1-50    2022-01-08 [1] CRAN (R 4.3.0)
 snakecase          0.11.1    2023-08-27 [1] CRAN (R 4.3.0)
 spiderbar          0.2.5     2023-02-11 [1] CRAN (R 4.3.0)
 stringi            1.8.3     2023-12-11 [1] CRAN (R 4.3.0)
 stringr            1.5.1     2023-11-14 [1] CRAN (R 4.3.0)
 svglite            2.1.3     2023-12-08 [1] CRAN (R 4.3.0)
 systemfonts        1.0.5     2023-10-09 [1] CRAN (R 4.3.0)
 tibble             3.2.1     2023-03-20 [1] CRAN (R 4.3.0)
 tidyr              1.3.1     2024-01-24 [1] CRAN (R 4.3.2)
 tidyselect         1.2.0     2022-10-10 [1] CRAN (R 4.3.0)
 timechange         0.2.0     2023-01-11 [1] CRAN (R 4.3.0)
 tm               * 0.7-11    2023-02-05 [1] CRAN (R 4.3.0)
 tzdb               0.4.0     2023-05-12 [1] CRAN (R 4.3.0)
 usethis            2.2.2     2023-07-06 [1] CRAN (R 4.3.0)
 utf8               1.2.4     2023-10-22 [1] CRAN (R 4.3.0)
 uwot               0.1.16    2023-06-29 [1] CRAN (R 4.3.0)
 vctrs              0.6.5     2023-12-01 [1] CRAN (R 4.3.0)
 viridisLite        0.4.2     2023-05-02 [1] CRAN (R 4.3.0)
 vroom              1.6.5     2023-12-05 [1] CRAN (R 4.3.0)
 webshot            0.5.5     2023-06-26 [1] CRAN (R 4.3.0)
 withr              2.5.2     2023-10-30 [1] CRAN (R 4.3.0)
 xfun               0.41      2023-11-01 [1] CRAN (R 4.3.0)
 XML                3.99-0.16 2023-11-29 [1] CRAN (R 4.3.0)
 xml2               1.3.6     2023-12-04 [1] CRAN (R 4.3.0)
 XVector            0.42.0    2023-10-24 [1] Bioconductor
 yaml               2.3.8     2023-12-11 [1] CRAN (R 4.3.0)
 zlibbioc           1.48.0    2023-10-24 [1] Bioconductor

 [1] /Library/Frameworks/R.framework/Versions/4.3-x86_64/Resources/library

──────────────────────────────────────────────────────────────────────────────