R

Dummy variables in R

Dummy variables are important but also cause much frustration in intro-stat courses. Below I will demonstrate the concept via a linear regression model. The basic idea is that a factor \(f\) with \(k\) levels can be replaced by \(k-1\) dummy variables that act as switches to select different levels. When all switches are turned off, the reference level is chosen. Mathematically, let \(f\) be the factor with levels \(l_0, l_1, \ldots, l_{k-1}\), i.

caracas: Computer Algebra in R via SymPy

It is with great pleasure that we can announce the release of caracas version 1.0.1 to CRAN (https://cran.r-project.org/package=caracas). The package enables user to make computer algebra from R using the Python library SymPy. You can now install the caracas package as follows: install.packages("caracas") And then load it by: library(caracas) The source code and the development version is available at https://github.com/r-cas/caracas/. Online documentation (of the development version) can be found at https://r-cas.

Shiny apps with math exercises

It is often very useful to practise mathematics by automatically generated exercises. One approach is multiple choice quizzes (MCQ), but it turns out to be fairly difficult to generate authentic wrong answers. Instead, we want the user to input the answer and be able to parse the answer and check whether this is the correct answer. There are many fun challenges in this, e.g. to verify that 2 is equal to 1 + 1 (as text strings the two are different, but mathematically they are equal, at least to a convenient approximation in this case).

Ryacas version 1.1.0 publised in Journal of Open Source Software and released to CRAN

It is with great pleasure that I can announce that Ryacas version 1.1.0 has now been accepted into Journal of Open Source Software and same version released to CRAN. (The source code is available at https://github.com/mikldk/ryacas/.) I already wrote about Ryacas many times before. I will refer you to the “Getting started” and “The high-level (symbol) interface” vignettes or one of the others available at the CRAN page or the package’s website.

Variance in reproductive success (VRS) in forensic genetics lineage markers

Back in 2017, David Balding and I published the paper “How convincing is a matching Y-chromosome profile?”. One of the key parameters of the simulation model was the variance in reproductive success (VRS). Here I will discuss and demonstrate this parameter. First note for intuition that in a Wright–Fisher model, all individuals have the same probability of becoming a father, or of having reproductive success. So the VRS here is 0.

Wolfe conditions for deciding step length in inexact line search: An example with the Rosenbrock function

In inexact line search (a numerical optimisation technique) the step length (or learning rate) must be decided. In connection to that the Wolfe conditions are central. Here I will give an example showing why they are useful. More on this topic can be read elsewhere, e.g. in the book of Nocedal and Wright (2006), “Numerical Optimization”, Springer. Wolfe conditions The Wolfe conditions consists of the sufficient decrease condition (SDC) and curvature condition (CC):

Ryacas version 1.0.0 released!

It is with great pleasure that I can announce that Ryacas version 1.0.0 is now released to CRAN (https://cran.r-project.org/package=Ryacas). I wish to thank all co-authors: Rob Goedman, Gabor Grothendieck, Søren Højsgaard, Grzegorz Mazur, Ayal Pinkus. It means that you can install the package by (possible after binaries have been built): install.packages("Ryacas") Followed by: library(Ryacas) (The source code is available at https://github.com/mikldk/ryacas/.) Now you have the yacas computer algebra system fully available!

Approximating small probabilities using importance sampling

Update Oct 14, 2019: Michael Höhle caught a mistake and notified me on Twitter. Thanks! The problem is that I used \(\text{Unif}(-10, 10)\) as importance distribution; this does not have infinite support as the target has. This is required, see e.g. Art B. Owen (2013), “Monte Carlo theory, methods and examples”. I have now updated the post to use a normal distribution instead. Box plots are often used. They are not always the best visualisation (e.

How much pizza and how much frozen yogurt? ...with Gröbner bases

In a recent blog post I tried to get yacas to solve a system of polynomial equations. Unfortunately it could not do that, so I solved it numerically instead. Now it is possible – together with many other systems of polynomial equations thanks to fixing a small error in yacas. It has now been fixed, also in Ryacas (development version), so hurry up and update Ryacas to the latest version 0.

Prediction intervals for Generalized Additive Models (GAMs)

Update on Aug 9, 2022: In the code chunk below, sd = summary(fit_gam)$scale) was changed to sd = sqrt(summary(fit_gam)$scale)): y_sim <- matrix(rnorm(n = prod(dim(exp_val_sim)), mean = exp_val_sim, sd = summary(fit_gam)$scale), nrow = nrow(exp_val_sim), ncol = ncol(exp_val_sim)) Thanks to David Kaplan (IRD, France) Finding prediction intervals (for future observations) is something different than finding confidence intervals (for unknown population parameters). Here, I demonstrate one approach to doing so.