16 Data Science Repositories

Each one is a repository in its own, and they cover topics such as time series, regression, outliers, clustering, correlation, Hadoop, deep learning, Python, IoT, data sets, cheat sheets, infographics, and more (AI coming soon.)

12 Interesting Reads for Math Geeks

Many data scientists have a passion for mathematics, and many modern math problems can be explored using data science. Below is a selection of interesting articles, many about challenging, deep mathematical problems, by a data scientist who developed math-free algorithms. Some of these articles cover statistical theory and thus belong to data science, some are just about mathematics and number theory for its own sake. Most of them can be understood by the layman. Some include R code to produce visualizations, and some include processing vast amounts of data — trillions of data points: thus it provides an excellent sandbox to test distributed architecture implementations, and high performance computing.

The Ultimate Guide for Choosing Algorithms for Predictive Modeling

There are three ways to look at data. The first is analytics. This is when you look at data from the (potentially very recent) past. Think analytics. It allows you to explore the questions what happened and why did it happen? The second is monitoring. This is looking at things as they happen. In many cases, monitoring is used to find abnormalities. Finally, there is predictive analytics. This is looking at data in a way that helps make predictions about what might happen in the future.

10 Tools For Working With Big Data For Successful Analytics

Traditional computer systems and software applications don’t have what it takes to support big data. If you want to collect, store, refine, or analyze big data, you have to have the right tools. Check out the following ten tools that are specifically designed with big data in mind.

101+ Resources to Learn Data Science

Many people are seeking to learn data science these days. It’s become a trendy topic associated with high salaries and some of the most interesting problems in the world. This demand has created many different resources in the data science space. People have curated their selection of favorite resources to learn data science, but I was seeking out something more comprehensive — so I built this list. Here’s my attempt at getting you my favorite resources in the data science space so you can understand what’s going on in the field — and how you can get your hands dirty and start learning right away.

Mixed models for ANOVA designs with one observation per unit of observation and cell of the design

Together with David Kellen I am currently working on an introductory chapter to mixed models for a book edited by Dan Spieler and Eric Schumacher (the current version can be found here). The goal is to provide a theoretical and practical introduction that is targeted mainly at experimental psychologists, neuroscientists, and others working with experimental designs and human data. The practical part focuses obviously on R, specifically on lme4 and afex. One part of the chapter was supposed to deal with designs that cannot be estimated with the maximal random effects structure justified by the design (Barr, Levy, Scheepers and Tily, 2013) because there is only one observation per participant and cell of the design. Such designs are the classical repeated-measures ANOVA design as ANOVA cannot deal with replicates at the cell levels (i.e., those are usually aggregated to yield one observation per cell and unit of observation).

Instrumental Variables in R exercises (Part-3)

In this exercise set we will use Generalized Method of Moments (GMM) estimation technique using the examples from part-1 and part-2. Recall that GMM estimation relies on the relevant moment conditions. For OLS we assume that predictors are uncorrelated with the error term. Similarly, IV estimation implies that the instrument(s) is uncorrelated with the error term.