Benchmarking Negative Binomial GEE Model Backends

Author
Affiliation

Jack Leary

University of Florida

Published

April 16, 2024

Introduction

In the course of my recent work on trajectory differential expression, I’ve spent a fair amount of time fitting generalized estimating equation (GEE) models. These models are like classical GLMs in that they can handle non-normally distributed response variables via some specified link function & iterative estimation of coefficients, but they differ in that they can account for the variation inherent to longitudinal or otherwise clustered datasets in which observations are measured repeatedly from multiple subjects & are thus not independent. Since scRNA-seq experiments are almost always composed of samples from multiple subjects nowadays, GEEs can be of great use when building models of gene expression, as they allow us to be more confident that our standard errors are accurate. Note that unlike (generalized) linear mixed models, the GEE is a marginal model and thus provides only a population-level fit - GLMMs are conditional models as they provide subject-specific fits.

Since single cell mRNA abundance follows the negative binomial distribution, it’s necessary that whatever GEE fitting algorithm we use support that distribution. Since the negative binomial is a two-parameter distribution, this does complicate things; the most commonly-used R packages for GEEs do not support NB models. We do have a couple options in R though, the Python statsmodels ecosystem supports NB GEEs as well, and there’s a Julia package too. Our goal here will be to benchmark the available options in terms of runtime and model accuracy in terms of the fitted values & the estimated parameters. We’ll keep our simulated data simple, with a negative binomial response \(Y\) and a single continuously-valued covariate \(X\); thus we’ll estimate the coefficient vector \(\boldsymbol{\beta} = [\beta_0, \beta_1]^T\) using each method for each set of simulations.

Libraries

R

Code
library(dplyr)       # data manipulation
library(ggplot2)     # plots 
library(JuliaCall)   # call Julia from R
library(reticulate)  # call Python from R

Python

We’ll call the Python code from R using reticulate, but we’ll need to make sure to use the virtual environment I set up previously that has the statsmodels and pandas libraries (and their various dependencies) installed.

Code
use_virtualenv("~/Desktop/Python/science/venv/", required = TRUE)
sm <- import("statsmodels.api")
pd <- import("pandas")

Julia

We’ll also need to activate a Julia environment, into which I’ve installed the GEE.jl package and its necessary dependencies.

Code
julia_setup(verbose = FALSE)
julia_command('using Pkg; Pkg.activate("/Users/jack/Desktop/Julia/science/");')
julia_command("using Distributions, GLM, GEE, DataFrames;")

Data

Let’s first simulate an example dataset to show what each further simulation should look like. Our parameters are generated from the following distributions (note that we use the mean & dispersion parameterization of the negative binomial):

\[ \begin{aligned} \boldsymbol{\beta} &\sim \mathcal{N}(\mu = 0,\: \sigma^2 = 0.25) \\ X &\sim \mathcal{N}(\mu = 0,\: \sigma^2 = 1) \\ Y &\sim \text{NB}(\mu = e^{\beta_0 + X\beta_1},\: \theta = 3) \end{aligned} \]

We’ll generate \(n = 25\) samples to start, though our actual sample size in the simulations will be larger than this. We add a uniformly-sampled dropout with a rate of 10% to our response \(Y\) to mimic the technical variation inherent to scRNA-seq experiments. Lastly, I’ll note that we’re making this example slightly less complex by not including multiple observations per subject, which we will do in our actual experiment.

Code
set.seed(312)
sim_beta <- rnorm(2, mean = 0, sd = 0.5)
sim_df <- data.frame(X = rnorm(25, mean = 0, sd = 1)) %>% 
          mutate(Beta_X = sim_beta[1] + sim_beta[2] * X, 
                 Exp_Beta_X = exp(Beta_X)) %>% 
          rowwise() %>% 
          mutate(Y = rnbinom(1, mu = Exp_Beta_X, size = 3)) %>% 
          ungroup()
dropout_idx <- sample(seq(nrow(sim_df)), size = round(0.1 * nrow(sim_df)))
sim_df$Y[dropout_idx] <- 0

Next, since we’re dealing with independent observations we fit an NB GLM via the MASS package which we’ll use to generate fitted values and standard errors.

Code
nb_glm <- MASS::glm.nb(Y ~ X, 
                       data = sim_df, 
                       link = "log")

Let’s visualize the results. We see that the confidence interval around the fitted mean widens as \(X\) and \(Y\) increase. This data looks similar enough to what I see in real scRNA-seq studies, so I’m satisfied with using the above simulation scheme for a more rigorous experiment.

Code
sim_df %>% 
  mutate(glm_pred = predict(nb_glm), 
         glm_pred_se = predict(nb_glm, se.fit = TRUE)$se.fit, 
         glm_fit = exp(glm_pred), 
         glm_fit_ci_lo = exp(glm_pred - 1.96 * glm_pred_se), 
         glm_fit_ci_hi = exp(glm_pred + 1.96 * glm_pred_se)) %>% 
  ggplot(aes(x = X, y = Y)) + 
  geom_point(size = 3) + 
  geom_line(aes(x = X, y = glm_fit), 
            color = "forestgreen", 
            size = 1) + 
  geom_ribbon(aes(x = X, ymin = glm_fit_ci_lo, ymax = glm_fit_ci_hi, color = NULL), 
              fill = "forestgreen", 
              alpha = 0.4) + 
  labs(x = "X", 
       y = "Y", 
       title = "Simulated Negative Binomial mRNA Counts", 
       subtitle = "Fitted values from NB GLM with Wald 95% C.I.") + 
  theme_classic(base_size = 14) + 
  theme(plot.subtitle = element_text(face = "italic", size = 12))

Simulations

Helper Functions

Using the same general simulation procedure from the example above, we’ll write a function to simulate negative binomial data suitable for using GEEs. We’ll generate multiple observations per subject, though the resulting data will be in the same format, albeit with an added subject ID column. We’ll keep things simple by allocating our total sample size evenly across subjects & keeping \(\boldsymbol{\beta}\) constant across all subjects.

Code
sim_gee_data <- function(n.subjects = NULL, 
                         n.per.subject = NULL, 
                         dropout.rate = 0.1, 
                         theta.y = 3) {
  # simulate true coefficients
  sim_beta <- rnorm(2, mean = 0, sd = 0.5)
  # simulate X & Y per subject
  subject_names <- paste0("S", seq(n.subjects))
  subject_sims <- purrr::map(subject_names, function(s) {
    sim_df <- data.frame(ID = s, 
                         X = rnorm(n.per.subject, mean = 0, sd = 1)) %>% 
              mutate(Beta_X = sim_beta[1] + sim_beta[2] * X, 
                     Exp_Beta_X = exp(Beta_X)) %>% 
              rowwise() %>% 
              mutate(Y = rnbinom(1, mu = Exp_Beta_X, size = theta.y)) %>% 
              ungroup()
  })
  subject_sim_df <- purrr::reduce(subject_sims, rbind) %>% 
                    mutate(ID = as.factor(ID))
  # add stochastic dropout
  dropout_idx <- sample(seq(nrow(subject_sim_df)), size = round(dropout.rate * nrow(subject_sim_df)))
  subject_sim_df$Y[dropout_idx] <- 0
  res_list <- list(beta = sim_beta, 
                   sim_df = subject_sim_df)
  return(res_list)
}

Next, we’ll write a function to run all of our different model types. If we use the R implementations of the GEE framework, we need to provide an estimate for \(\theta\), since the estimation procedure depends on \(\theta\) being “known”. To approximate this, we provide the value of \(\theta\) estimated using the intercept-only model, which in my experience has been a good enough approximation for simpler datasets like this one.

Code
run_model <- function(sim.data = NULL, model.type = NULL) {
  # run correct model framework
  if (model.type == "NB GLM") {
    start_time <- Sys.time()
    model_fit <- MASS::glm.nb(Y ~ X, 
                              data = sim.data, 
                              link = "log")
    diff_time <- Sys.time() - start_time
    model_preds <- predict(model_fit, type = "link")
    model_coef <- coef(model_fit)
    model_est_alpha <- NA_real_
  } else if (model.type == "geeM") {
    start_time <- Sys.time()
    theta_hat <- MASS::theta.mm(y = sim.data$Y,
                                mu = mean(sim.data$Y),
                                dfr = nrow(sim.data) - 1)
    model_fit <- geeM::geem(Y ~ X, 
                            id = sim.data$ID, 
                            data = sim.data, 
                            family = MASS::negative.binomial(theta = theta_hat, link = "log"), 
                            corstr = "exchangeable", 
                            sandwich = TRUE)
    diff_time <- Sys.time() - start_time
    model_preds <- predict(model_fit, type = "link")
    model_coef <- coef(model_fit)
    model_est_alpha <- model_fit$alpha
  } else if (model.type == "mmmgee") {
    theta_hat <- MASS::theta.mm(y = sim.data$Y,
                                mu = mean(sim.data$Y),
                                dfr = nrow(sim.data) - 1)
    start_time <- Sys.time()
    model_fit <- mmmgee::geem2(Y ~ X, 
                               id = sim.data$ID, 
                               data = sim.data, 
                               family = MASS::negative.binomial(theta = theta_hat, link = "log"), 
                               corstr = "exchangeable", 
                               sandwich = TRUE)
    diff_time <- Sys.time() - start_time
    model_preds <- predict(model_fit, type = "link")
    model_coef <- coef(model_fit)
    model_est_alpha <- model_fit$alpha
  } else if (model.type == "statsmodels") {
    start_time <- Sys.time()
    nb_family <- sm$families$NegativeBinomial(link = sm$genmod$families$links$log)
    cor_structure <- sm$cov_struct$Exchangeable()
    py_gee <- sm$GEE$from_formula("Y ~ X", 
                                  "ID", 
                                  data = sim.data, 
                                  cov_struct = cor_structure, 
                                  family = nb_family)
    model_fit <- py_gee$fit()
    diff_time <- Sys.time() - start_time
    model_preds <- model_fit$get_prediction()$linpred$predicted_mean
    model_coef <- model_fit$params
    model_est_alpha <- model_fit$cov_struct$dep_params
  } else if (model.type == "GEE.jl") {
    start_time <- Sys.time()
    JuliaCall::julia_assign("sim_data", sim.data)
    JuliaCall::julia_command('model_fit = gee(@formula(Y ~ X), sim_data, sim_data.ID, NegativeBinomial(), ExchangeableCor(), LogLink(), cov_type="robust");')
    diff_time <- Sys.time() - start_time
    model_est_alpha <- JuliaCall::julia_eval("GEE.corparams(model_fit)")
    model_coef <- JuliaCall::julia_eval("GEE.coef(model_fit)")
    model_preds <- log(JuliaCall::julia_eval("GEE.predict(model_fit)"))  # GEE.predict() gives fitted values 
  }
  # format results & compute model error
  names(model_coef) <- c("B0", "B1")
  model_rmse <- yardstick::rmse_vec(truth = sim.data$Y, 
                                    estimate = exp(model_preds))
  model_huber_loss <- yardstick::huber_loss_vec(truth = sim.data$Y, 
                                                estimate = exp(model_preds), 
                                                delta = 2)
  res_list <- list(model_type = model.type, 
                   model_runtime = as.numeric(diff_time), 
                   model_runtime_units = attributes(diff_time)$units, 
                   model_rmse = model_rmse, 
                   model_huber_loss = model_huber_loss, 
                   model_beta = model_coef, 
                   model_alpha = model_est_alpha)
  return(res_list)
}

Running the Experiment

We fit each model over 100 iterations, and save the results in a list (purrr is great for this kind of thing).

Code
sim_list <- purrr::map(seq(100), function(i) {
  set.seed(i)
  sim_data <- sim_gee_data(n.subjects = 3, 
                           n.per.subject = 400, 
                           dropout.rate = 0.1, 
                           theta.y = 5)
  glm_res <- run_model(sim.data = sim_data$sim_df, model.type = "NB GLM")
  geeM_res <- run_model(sim.data = sim_data$sim_df, model.type = "geeM")
  mmmgee_res <- run_model(sim.data = sim_data$sim_df, model.type = "mmmgee")
  statsmodels_res <- run_model(sim.data = sim_data$sim_df, model.type = "statsmodels")
  julia_GEE_res <- run_model(sim.data = sim_data$sim_df, model.type = "GEE.jl")
  all_models <- list(glm_res, 
                     geeM_res, 
                     mmmgee_res, 
                     statsmodels_res, 
                     julia_GEE_res)
  overall_res_df <- data.frame(Iter = i, 
                               Beta_0 = sim_data$beta[1], 
                               Beta_1 = sim_data$beta[2], 
                               Method = purrr::map_chr(all_models, \(x) x$model_type), 
                               Est_Beta_0 = purrr::map_dbl(all_models, \(x) x$model_beta[1]), 
                               Est_Beta_1 = purrr::map_dbl(all_models, \(x) x$model_beta[2]), 
                               Est_Alpha = purrr::map_dbl(all_models, \(x) x$model_alpha), 
                               RMSE = purrr::map_dbl(all_models, \(x) x$model_rmse), 
                               Huber_Loss = purrr::map_dbl(all_models, \(x) x$model_huber_loss), 
                               Runtime = purrr::map_dbl(all_models, \(x) x$model_runtime), 
                               Runtime_Units = purrr::map_chr(all_models, \(x) x$model_runtime_units))
  return(overall_res_df)
})

Let’s coerce the results to a dataframe & add a couple more features.

Code
sim_res_df <- purrr::reduce(sim_list, rbind) %>% 
              mutate(Abs_Error_Beta0 = abs(Beta_0 - Est_Beta_0), 
                     Abs_Error_Beta1 = abs(Beta_1 - Est_Beta_1), 
                     Runtime_Seconds = if_else(Runtime_Units == "secs", Runtime, Runtime * 60))

Analysis

Now that we have everything simulated & recorded, we can analyze the data. Let’s start by just looking at the distribution of runtime per method. While we should have expected that the GLMs would be quicker, I’m pretty surprised by just how much faster the statsmodels GEEs were than either R implementation. The GEE.jl implementation is very fast too, but we can see a known issue with Julia - the first execution takes way longer than anything else.

Code
ggplot(sim_res_df, aes(x = Method, y = Runtime_Seconds, fill = Method)) + 
  geom_violin(scale = "width", 
              draw_quantiles = 0.5, 
              size = 1) + 
  scale_fill_manual(values = paletteer::paletteer_d("ggsci::category10_d3")) + 
  labs(y = "Runtime (seconds)", fill = "Method") + 
  theme_classic(base_size = 14) + 
  theme(axis.title.x = element_blank()) + 
  guides(fill = guide_legend(override.aes = list(color = NULL)))
Warning in regularize.values(x, y, ties, missing(ties), na.rm = na.rm):
collapsing to unique 'x' values

Moving onto model accuracy, let’s look at the RMSE & Huber loss. All methods seem to have essentially the same distribution.

Code
sim_res_df %>% 
  select(Iter, Method, RMSE, Huber_Loss) %>% 
  tidyr::pivot_longer(cols = c(RMSE, Huber_Loss), names_to = "Metric", values_to = "Metric_Value") %>% 
  mutate(Metric = if_else(Metric == "RMSE", "RMSE", "Huber")) %>% 
  ggplot(aes(x = Method, y = Metric_Value, fill = Method)) + 
  facet_wrap(~Metric, 
             nrow = 2, 
             scales = "free_y") + 
  geom_violin(scale = "width", 
              draw_quantiles = 0.5, 
              size = 1) + 
  scale_fill_manual(values = paletteer::paletteer_d("ggsci::category10_d3")) + 
  labs(y = "Loss", fill = "Method") + 
  theme_classic(base_size = 14) + 
  theme(axis.title.x = element_blank()) + 
  guides(fill = guide_legend(override.aes = list(color = NULL)))

How about the estimated intercepts? It again seems that all methods have essentially the same error distribution.

Code
ggplot(sim_res_df, aes(x = Method, y = Abs_Error_Beta0, fill = Method)) + 
  geom_violin(scale = "width", 
              draw_quantiles = 0.5, 
              size = 1) + 
  scale_fill_manual(values = paletteer::paletteer_d("ggsci::category10_d3")) + 
  labs(y = latex2exp::TeX(r"(Absolute Error for $\beta_0$)"), fill = "Method") + 
  theme_classic(base_size = 14) + 
  theme(axis.title.x = element_blank()) + 
  guides(fill = guide_legend(override.aes = list(color = NULL)))

As with the estimated values for \(\beta_0\), there is no clear difference between methods with respect to which method is better at estimating \(\beta_1\).

Code
ggplot(sim_res_df, aes(x = Method, y = Abs_Error_Beta1, fill = Method)) + 
  geom_violin(scale = "width", 
              draw_quantiles = 0.5, 
              size = 1) + 
  scale_fill_manual(values = paletteer::paletteer_d("ggsci::category10_d3")) + 
  labs(y = latex2exp::TeX(r"(Absolute Error for $\beta_1$)"), fill = "Method") + 
  theme_classic(base_size = 14) + 
  theme(axis.title.x = element_blank()) + 
  guides(fill = guide_legend(override.aes = list(color = NULL)))

Lastly, let’s check to see if any of the GEE methods significantly differs from the others in terms of the estimated within-group correlation parameter \(\alpha\).

Code
sim_res_df %>% 
  filter(!is.na(Est_Alpha)) %>% 
  ggplot(aes(x = Est_Alpha, fill = Method, color = Method)) + 
  facet_wrap(~Method, scales = "free_y") + 
  geom_density(alpha = 0.6, size = 1) + 
  scale_color_manual(values = paletteer::paletteer_d("ggsci::category10_d3")[-4]) + 
  scale_fill_manual(values = paletteer::paletteer_d("ggsci::category10_d3")[-4]) + 
  labs(x = latex2exp::TeX(r"(Estimated Correlation Parameter $\alpha$)"), 
       y = "Density", 
       fill = "Method") + 
  theme_classic(base_size = 14) + 
  guides(fill = guide_legend(override.aes = list(alpha = 1, color = NULL)))

Conclusions

Since the estimated coefficients and correlation parameters seem to be pretty much the same between methods, and the RMSE / Huber loss don’t differ much either, it would make sense to choose the modeling framework with the fastest runtime as the best option. For GEEs this is either of statsmodels or GEE.jl, which outperformed the R-based geeM and mmmgee quite handily (mean and median values are shown below for each method). The Julia implementation is good too, if ever so marginally slower than statsmodels, though the time it takes to perform the first fit is somewhat annoying. In addition, the statsmodels implementation varies less in runtime than all other options. This would seem to be an easy decision other than that it might be difficult to release & support R packages utilizing a Python library instead of a native R backend, since not all R users know Python, have it installed, or are capable of debugging its output. GEE.jl obviously carries the same drawbacks, with the additional caveat of Julia being a much less widely-used language than either of R or Python. This is a tricky decision & I’m not sure of the right course to take yet, but I am impressed with how well statsmodels performs and will absolutely be considering using it more in the future.

Code
sim_res_df %>% 
  with_groups(Method, 
              summarise, 
              mu = mean(Runtime_Seconds), 
              med = median(Runtime_Seconds), 
              sd = sd(Runtime_Seconds)) %>% 
  arrange(mu) %>% 
  kableExtra::kbl(digits = 3, 
                  booktabs = TRUE, 
                  caption = "Runtime measured in seconds", 
                  col.names = c("Method", "Mean Runtime", "Median Runtime", "S.D. Runtime")) %>% 
  kableExtra::kable_classic(full_width = FALSE)
Runtime measured in seconds
Method Mean Runtime Median Runtime S.D. Runtime
statsmodels 0.020 0.019 0.002
NB GLM 0.034 0.034 0.011
GEE.jl 0.271 0.098 1.717
mmmgee 2.652 2.495 0.492
geeM 2.750 2.633 0.490

Session Info

Code
sessioninfo::session_info()
─ Session info ───────────────────────────────────────────────────────────────
 setting  value
 version  R version 4.2.1 (2022-06-23)
 os       macOS Big Sur ... 10.16
 system   x86_64, darwin17.0
 ui       X11
 language (EN)
 collate  en_US.UTF-8
 ctype    en_US.UTF-8
 tz       America/New_York
 date     2023-03-28
 pandoc   2.19.2 @ /usr/local/bin/ (via rmarkdown)

─ Packages ───────────────────────────────────────────────────────────────────
 package     * version  date (UTC) lib source
 assertthat    0.2.1    2019-03-21 [1] CRAN (R 4.2.0)
 cli           3.3.0    2022-04-25 [1] CRAN (R 4.2.0)
 codetools     0.2-18   2020-11-04 [1] CRAN (R 4.2.1)
 colorspace    2.0-3    2022-02-21 [1] CRAN (R 4.2.0)
 crayon        1.5.1    2022-03-26 [1] CRAN (R 4.2.0)
 DBI           1.1.3    2022-06-18 [1] CRAN (R 4.2.0)
 digest        0.6.29   2021-12-01 [1] CRAN (R 4.2.0)
 dplyr       * 1.0.9    2022-04-28 [1] CRAN (R 4.2.0)
 ellipsis      0.3.2    2021-04-29 [1] CRAN (R 4.2.0)
 evaluate      0.16     2022-08-09 [1] CRAN (R 4.2.0)
 fansi         1.0.3    2022-03-24 [1] CRAN (R 4.2.0)
 farver        2.1.1    2022-07-06 [1] CRAN (R 4.2.0)
 fastmap       1.1.0    2021-01-25 [1] CRAN (R 4.2.0)
 geeM          0.10.1   2018-06-18 [1] CRAN (R 4.2.0)
 generics      0.1.3    2022-07-05 [1] CRAN (R 4.2.0)
 ggplot2     * 3.3.6    2022-05-03 [1] CRAN (R 4.2.0)
 glue          1.6.2    2022-02-24 [1] CRAN (R 4.2.0)
 gtable        0.3.0    2019-03-25 [1] CRAN (R 4.2.0)
 here          1.0.1    2020-12-13 [1] CRAN (R 4.2.0)
 highr         0.9      2021-04-16 [1] CRAN (R 4.2.0)
 htmltools     0.5.3    2022-07-18 [1] CRAN (R 4.2.0)
 htmlwidgets   1.5.4    2021-09-08 [1] CRAN (R 4.2.0)
 httr          1.4.4    2022-08-17 [1] CRAN (R 4.2.0)
 jsonlite      1.8.0    2022-02-22 [1] CRAN (R 4.2.0)
 JuliaCall   * 0.17.5   2022-09-08 [1] CRAN (R 4.2.0)
 kableExtra    1.3.4    2021-02-20 [1] CRAN (R 4.2.0)
 knitr         1.40     2022-08-24 [1] CRAN (R 4.2.0)
 labeling      0.4.2    2020-10-20 [1] CRAN (R 4.2.0)
 latex2exp     0.9.4    2022-03-02 [1] CRAN (R 4.2.0)
 lattice       0.20-45  2021-09-22 [1] CRAN (R 4.2.1)
 lifecycle     1.0.1    2021-09-24 [1] CRAN (R 4.2.0)
 magrittr      2.0.3    2022-03-30 [1] CRAN (R 4.2.0)
 MASS          7.3-58.1 2022-08-03 [1] CRAN (R 4.2.0)
 Matrix        1.4-1    2022-03-23 [1] CRAN (R 4.2.1)
 mmmgee        1.20     2019-06-21 [1] CRAN (R 4.2.0)
 munsell       0.5.0    2018-06-12 [1] CRAN (R 4.2.0)
 mvtnorm       1.1-3    2021-10-08 [1] CRAN (R 4.2.0)
 paletteer     1.5.0    2022-10-19 [1] CRAN (R 4.2.0)
 pillar        1.8.1    2022-08-19 [1] CRAN (R 4.2.0)
 pkgconfig     2.0.3    2019-09-22 [1] CRAN (R 4.2.0)
 png           0.1-7    2013-12-03 [1] CRAN (R 4.2.0)
 prismatic     1.1.1    2022-08-15 [1] CRAN (R 4.2.0)
 purrr         0.3.4    2020-04-17 [1] CRAN (R 4.2.0)
 R6            2.5.1    2021-08-19 [1] CRAN (R 4.2.0)
 Rcpp          1.0.9    2022-07-08 [1] CRAN (R 4.2.0)
 rematch2      2.1.2    2020-05-01 [1] CRAN (R 4.2.0)
 reticulate  * 1.25     2022-05-11 [1] CRAN (R 4.2.0)
 rlang         1.0.4    2022-07-12 [1] CRAN (R 4.2.0)
 rmarkdown     2.16     2022-08-24 [1] CRAN (R 4.2.0)
 rprojroot     2.0.3    2022-04-02 [1] CRAN (R 4.2.0)
 rstudioapi    0.14     2022-08-22 [1] CRAN (R 4.2.0)
 rvest         1.0.3    2022-08-19 [1] CRAN (R 4.2.0)
 scales        1.2.1    2022-08-20 [1] CRAN (R 4.2.0)
 sessioninfo   1.2.2    2021-12-06 [1] CRAN (R 4.2.0)
 stringi       1.7.8    2022-07-11 [1] CRAN (R 4.2.0)
 stringr       1.4.1    2022-08-20 [1] CRAN (R 4.2.0)
 svglite       2.1.0    2022-02-03 [1] CRAN (R 4.2.0)
 systemfonts   1.0.4    2022-02-11 [1] CRAN (R 4.2.0)
 tibble        3.1.8    2022-07-22 [1] CRAN (R 4.2.0)
 tidyr         1.2.0    2022-02-01 [1] CRAN (R 4.2.0)
 tidyselect    1.1.2    2022-02-21 [1] CRAN (R 4.2.0)
 utf8          1.2.2    2021-07-24 [1] CRAN (R 4.2.0)
 vctrs         0.4.1    2022-04-13 [1] CRAN (R 4.2.0)
 viridisLite   0.4.1    2022-08-22 [1] CRAN (R 4.2.0)
 webshot       0.5.3    2022-04-14 [1] CRAN (R 4.2.0)
 withr         2.5.0    2022-03-03 [1] CRAN (R 4.2.0)
 xfun          0.32     2022-08-10 [1] CRAN (R 4.2.0)
 xml2          1.3.3    2021-11-30 [1] CRAN (R 4.2.0)
 yaml          2.3.5    2022-02-21 [1] CRAN (R 4.2.0)
 yardstick     1.0.0    2022-06-06 [1] CRAN (R 4.2.0)

 [1] /Library/Frameworks/R.framework/Versions/4.2/Resources/library

─ Python configuration ───────────────────────────────────────────────────────
 python:         /Users/jack/Desktop/Python/science/venv/bin/python
 libpython:      /usr/local/opt/python@3.8/Frameworks/Python.framework/Versions/3.8/lib/python3.8/config-3.8-darwin/libpython3.8.dylib
 pythonhome:     /Users/jack/Desktop/Python/science/venv:/Users/jack/Desktop/Python/science/venv
 virtualenv:     /Users/jack/Desktop/Python/science/venv/bin/activate_this.py
 version:        3.8.16 (default, Dec  7 2022, 01:36:11)  [Clang 14.0.0 (clang-1400.0.29.202)]
 numpy:          /Users/jack/Desktop/Python/science/venv/lib/python3.8/site-packages/numpy
 numpy_version:  1.22.4
 statsmodels:    /Users/jack/Desktop/Python/science/venv/lib/python3.8/site-packages/statsmodels
 
 NOTE: Python version was forced by use_python function

──────────────────────────────────────────────────────────────────────────────