R | Mikkel Meyer Andersen

Dummy variables in R

Dummy variables are important but also cause much frustration in intro-stat courses. Below I will demonstrate the concept via a linear regression model. The basic idea is that a factor $f$ with $k$ levels can be replaced by $k-1$ dummy variables that act as switches to select different levels. When all switches are turned off, the reference level is chosen. Mathematically, let $f$ be the factor with levels $l_0, l_1, \ldots, l_{k-1}$, i.

caracas: Computer Algebra in R via SymPy

It is with great pleasure that we can announce the release of caracas version 1.0.1 to CRAN (https://cran.r-project.org/package=caracas). The package enables user to make computer algebra from R using the Python library SymPy. You can now install the caracas package as follows: install.packages("caracas") And then load it by: library(caracas) The source code and the development version is available at https://github.com/r-cas/caracas/. Online documentation (of the development version) can be found at https://r-cas.

Shiny apps with math exercises

It is often very useful to practise mathematics by automatically generated exercises. One approach is multiple choice quizzes (MCQ), but it turns out to be fairly difficult to generate authentic wrong answers. Instead, we want the user to input the answer and be able to parse the answer and check whether this is the correct answer. There are many fun challenges in this, e.g. to verify that 2 is equal to 1 + 1 (as text strings the two are different, but mathematically they are equal, at least to a convenient approximation in this case).

Ryacas version 1.1.0 publised in Journal of Open Source Software and released to CRAN

It is with great pleasure that I can announce that Ryacas version 1.1.0 has now been accepted into Journal of Open Source Software and same version released to CRAN. (The source code is available at https://github.com/mikldk/ryacas/.) I already wrote about Ryacas many times before. I will refer you to the “Getting started” and “The high-level (symbol) interface” vignettes or one of the others available at the CRAN page or the package’s website.

Ryacas version 1.0.0 released!

It is with great pleasure that I can announce that Ryacas version 1.0.0 is now released to CRAN (https://cran.r-project.org/package=Ryacas). I wish to thank all co-authors: Rob Goedman, Gabor Grothendieck, Søren Højsgaard, Grzegorz Mazur, Ayal Pinkus. It means that you can install the package by (possible after binaries have been built): install.packages("Ryacas") Followed by: library(Ryacas) (The source code is available at https://github.com/mikldk/ryacas/.) Now you have the yacas computer algebra system fully available!

How much pizza and how much frozen yogurt? ...with Gröbner bases

In a recent blog post I tried to get yacas to solve a system of polynomial equations. Unfortunately it could not do that, so I solved it numerically instead. Now it is possible – together with many other systems of polynomial equations thanks to fixing a small error in yacas. It has now been fixed, also in Ryacas (development version), so hurry up and update Ryacas to the latest version 0.

Prediction intervals for Generalized Additive Models (GAMs)

Update on Aug 9, 2022: In the code chunk below, sd = summary(fit_gam)$scale) was changed to sd = sqrt(summary(fit_gam)$scale)): y_sim <- matrix(rnorm(n = prod(dim(exp_val_sim)), mean = exp_val_sim, sd = summary(fit_gam)$scale), nrow = nrow(exp_val_sim), ncol = ncol(exp_val_sim)) Thanks to David Kaplan (IRD, France) Finding prediction intervals (for future observations) is something different than finding confidence intervals (for unknown population parameters). Here, I demonstrate one approach to doing so.

The cost of evaluating an expression instead of using the language directly

In a recent blog post I used something like this (for use in a call to optim()): obj_fun_expr <- expression((x^4 + 4 * x^2 * y^2 - 12 * x^2 * a + 144 * x^2 - 48 * x * y * a + 144 * x * y - 4320 * x + 36 * y^2 - 2160 * y + 180 * a^2 + 32400)/16) f_expr <- function(par) { x <- par[1] y <- par[2] a <- par[3] val <- eval(obj_fun_expr, list(x = x, y = y, a = a)) return(val) } I have been wondering what the cost of eval(expr, .

Correlation is not transitive, in general at least: A simulation approach

Let $\rho_{XY}$ be the correlation between the stochastic variables $X$ and $Y$ and similarly for $\rho_{XZ}$ and $\rho_{YZ}$. If we know two of these, can we say anything about the third? In a recent blog post I dealt with the problem mathematically and I used the concept of a partial correlation coefficient. Here I will take a simulation approach. First z is simulated. Then x and y is simulated based on z in a regression context with a slope between $-1$ and $1$.

Automatic 'testthat' test skeletons with new R package 'roxytest' extending 'roxygen2'

It is important to test software. One approach is unit-testing, and for R packages this can e.g. be done using testthat. It is also important to document software. For R packages roxygen2 is really helpful: It enables you to write documentation in the code file in the R/ folder where the function is implemented. And then roxygen2 takes care of handling the Rd files in the man/ folder. I have made a new R package that combines these approaches: roxytest.