October 22, 2013

Like many, I often divide my computational work between Python and R. For a while, I’ve primarily done analysis in R. And with the power of data frames and packages that operate on them like reshape, my data manipulation and aggregation has moved more and more into the R world as well. Perhaps my favorite tool of all has been plyr, which allows you to easily split up a data set into subsets based on some criteria, apply a function or set of functions to those pieces, and combine those results back together (a.k.a. “split-apply-combine”). For example, I often use this to split up a data set by treatment, calculate some summary stats for each treatment, and put these statistics back together for comparison. With R and these excellent packages, these steps are about as painless (I actually enjoy them, but that’s probably not normal) as it gets. Because of this, R has long been the choice for doing this kind of work.

* Read more*