Combining rows of unequal length into a matrix or data.frame

In preparation of Utrecht University’s Mplus summer school, I am developing some R functions for running and plotting mixture models. Doing so requires a bit of data wrangling, because Mplus output is essentially plain text. Moreover, the formatting can differ between sections. One section consisted of rows of different lengths, because section headers were printed only once for a block of rows, instead of being repeated for each row. I came up with a fast solution to rbind rows of different lengths, and to repeat the section headers across lines.

Negative explained variance in random forests

The random forests algorithm is known for being relatively robust to overfitting. The reason for this is that, in random forests, many (thousands) of tree-like models are grown on bootstrapped samples of the data. Tree-like models split the data repeatedly into groups, by the predictor variable and value that lead to the most homogenous post-split groups. Random forests further de-correlates the tree-type models by allowing each tree to choose only from a small sub-selection of predictors at each split.

Meta-analysis using random forests, made easy

Meta-analyses have a reputation of being the highest standard of evidence for a scientific finding. However, especially in the social sciences, studies combined in meta-analyses often differ substantially from one another. This is problematic, because classic approaches to meta-analysis assume that either 1) all studies share a common true effect, or 2) that the true effect follows an underlying (normal) distribution, or 3) that any moderators which influence the underlying true effect are known and controlled for.

Sandcastles: Model fit in SEM

Recently, I had a debate with a collaborator, who argued that good model fit in confirmatory factor analysis provides support for the theory underlying the questionnaire which is represented by the measurement model of the factor analysis. I disagreed with this point of view, for statistical and epistemological reasons. To explain the concept, I came up with the following metaphor. Imagine you’re taking a leisurely stroll on the beach. Your feet sink into the sand with every step, the waves are lapping at the beach, and the reeds in the dunes are rustling in a gentle breeze.

Power analysis for logistic regression with interactions

I recently consulted on a study which aimed to examine whether people are better able to solve a logic puzzle, called the “Wason task” (see Cosmides, Barrett, & Tooby, 2010), when it is framed in terms of a moral dilemma (experimental condition) than when it is framed in a neutral way (control condition). My collaborator wanted to explore whether individual differences in morality moderated performance on this Wason task. Perhaps people who score higher in morality would be better able to solve the logic puzzle when it was framed in terms of a moral dilemma?