Why R is Bad for You

Someone had to say it. In my opinion R is not the best way to learn data science and not the best way to practice it either. More and more large employers agree.

Dashboards in R

This is a tutorial to show how to implement dashboards in R, using the new “flexdashboard” library package.

Modern Machine Learning Algorithms: Strengths and Weaknesses

In this guide, we’ll take a practical, concise tour through modern machine learning algorithms. While other such lists exist, they don’t really explain the practical tradeoffs of each algorithm, which we hope to do here. We’ll discuss the advantages and disadvantages of each algorithm based on our experience. Categorizing machine learning algorithms is tricky, and there are several reasonable approaches; they can be grouped into generative/discriminative, parametric/non-parametric, supervised/unsupervised, and so on.

The ‘Displayr’ Data Science Platform is Impressive

Scientific data can be somewhat abstract to convey or explain, so the ‘Displayr’ data science platform aims to make it possible for researchers, students and educational professionals to do so in a simplified manner. ‘Displayr’ works by taking data and creating visualizations that can be used in dashboards and reports in order to convey findings to team members. The cloud-based ‘Displayr’ app enables users to be able to access things anywhere they are and continue working without having to feel restricted by not being on their own system. The ‘Displayr’ data science platform is capable of interpreting and displaying a multitude of different kinds of data to create simple tables and even advanced machine learning information that can be somewhat complex to convey.

Performing Hyperparameter Optimization with Amazon Machine Learning

This sample code builds a hyperparameter optimization pipeline for Amazon Machine Learning using the latest AWS SDK for Python (Boto 3). The user can optionally specify hyperparameters upfront for manual tuning or use SigOpt’s API for Bayesian optimization. This example is directly based off of Amazon’s K-Fold Cross Validation example.

Teaching the data science process

Curricula for teaching machine learning have existed for decades and even more recent technical subjects (deep learning or big data architectures) have almost standard course outlines and linearized storylines. On the other hand, teaching support for the data science process has been elusive, even though the outlines of the process have been around since the 90s. Understanding the process requires not only wide technical background in machine learning but also basic notions of businesses administration. I have elaborated on the organizational difficulties of data science transformation stemming from these complexities in a previous essay; here I will share my experience on teaching the data science process.

xts Cheat Sheet: Time Series in R

Even though the data.frame object is one of the core objects to hold data in R, you’ll find that it’s not really efficient when you’re working with time series data. You’ll find yourself wanting a more flexible time series class in R that offers a variety of methods to manipulate your data.

Easily add images to a correspondence analysis plot in R

You can take your correspondence analysis plots to the next level by including images. Better still, you don’t need to paste in the images after the analysis is complete – you can include them right from the start. The plot above shows the results of a correspondence analysis based on data from a study of how people perceive different carbonated soft drinks. Logos, which are from jpeg files, are shown instead of brand names, with lines and dots indicating the precise location of the brands. This post describes how to create this plot using R.

Spinning Globes With R

It has been a long held dream of mine to create a spinning globe using nothing but R (I wish I was joking, but I’m not). Thanks to the brilliant mapmate package created by Matt Leonawicz and shed loads of computing power, today that dream became a reality. The globe below took 19 hours and 30 processors to produce from a relatively low resolution NASA black marble data, and so I accept R is not the best software to be using for this – but it’s amazing that you can do this in R at all!

Machine Learning. Stock Market Data, Part 3: Quadratic Discriminant Analysis and KNN.

It is important to mention that the present posts series began as a personal way of practicing R programming and machine learning. Subsequently feedback from the community, urged me to continue performing these exercises and sharing them. The bibliography and corresponding authors are cited at all times and this posts series is a way of honoring and giving them the credit they deserve for their work.