Visualizing Different Data Visualizations

Copenhagen-based data visualization firm Ferdio has launched a website called the Data Viz Project consisting of over 150 different types of data visualizations to serve as a reference tool to help data scientists and designers figure out the best way to present different kinds of data. The visualizations are searchable by type of dataset used, function, and shape, and each visualization has a description of how it works as well as several examples.

Deep Learning From Scratch I: Computational Graphs

This is part 1 of a series of tutorials, in which we develop the mathematical and algorithmic underpinnings of deep neural networks from scratch and implement our own neural network library in Python, mimicing the TensorFlow API.

Machine Learning for Intraday Stock Price Prediction 1: Linear Models

This is the first of a series of posts on the task of applying machine learning for intraday stock price/return prediction. Price prediction is extremely crucial to most trading firms. People have been using various prediction techniques for many years. We will explore those techniques as well as recently popular algorithms like neural networks. In this post, we will focus on applying linear models on the features derived from market data.

Robot Localization

In this series of articles, we have introduced the Bayes Filter as a means to maintain a belief about the state of a system over time and periodically update it according to how the state evolves and which observations are made. We came across the problem that, for a continuous state space, the belief could generally not be represented in a computationally tractable way. We saw three solutions to this problem, all of which have their advantages and disadvantages. The first solution, the Histogram Filter, solves the problem by slicing the state space into a finite amount of bins and representing the belief as a discrete probability distribution over these bins. This allows us to approximately represent arbitrary probability distributions. The second solution, the Kalman Filter, assumes the transition and sensor mod- els to be linear Gaussians and the initial belief to be Gaussian, which makes it inapplicable for non-linear dynamic systems – at least in its original form. As we showed, this assumption results in the fact that the belief distribution is always a Gaussian and can thus be represented by a mean and a variance only, which is very memory efficient. The last solution, the Particle Filter, solves the problem by representing the belief as a finite set of guesses at the state, which are approximately distributed according to the actual belief distribution and are therefore a good representation for it. Like the Histogram Filter, it is able to represent arbitrary belief distributions, with the difference that the state space is not binned and therefore the approximation is more accurate.

Advanced Analytics in Three Steps – How to Put Descriptive, Predictive and Prescriptive Analytics to Work for You

In this special guest feature, Will Fellers, Product Manager at Quantum Spatial Inc., explores different types of analytics – descriptive, predictive and prescriptive – and discusses how companies can leverage these tools to drive improvements to their processes and bottom line. To highlight the benefits of these three types of analytics, he will use examples from the utility industry, which relies on these methods for vegetation management. Since 2006, Will has spearheaded the technical development of a comprehensive set of innovative products utilized across technical platforms at Quantum Spatial. He and his team are currently focused on state-of-the-art solutions for remote sensing applications using machine learning/artificial intelligence systems, advanced data analytics, high performance cluster computing, immersive 3-D environments and cloud-based data distribution models.

Using Machine Learning to Predict and Explain Employee Attrition

Employee attrition (churn) is a major cost to an organization. We recently used two new techniques to predict and explain employee turnover: automated ML with H2O and variable importance analysis with LIME.

Parametric Inference: Likelihood Ratio Test Problem 2

More on Likelihood Ratio Test, the following problem is originally from Casella and Berger (2001).

Parametric Inference: Likelihood Ratio Test Problem 1

Another post for mathematical statistics, the problem below is originally from Casella and Berger (2001).

Writing academic articles using R Sweave and LaTeX

One of my favourite activities in R is using Markdown to create business reports. Most of my work I export to MS Word to communicate analytical results with my colleagues. For my academic work and eBooks, I prefer LaTeX to produce great typography. This article explains how to write academic articles and essays combining R Sweave and LaTeX. The article is formatted in accordance with the APA (American Psychological Association) requirements. To illustrate the principles of using R Sweave and LaTeX, I recycled an essay about problems with body image that I wrote for a psychology course many years ago. You can find the completed paper and all necessary files on my GitHub repository.

Introducing the Deep Learning Virtual Machine on Azure

A new member has just joined the family of Data Science Virtual Machines on Azure: The Deep Learning Virtual Machine. Like other DSVMs in the family, the Deep Learning VM is a pre-configured environment with all the tools you need for data science and AI development pre-installed. The Deep Learning VM is designed specifically for GPU-enabled instances, and comes with a complete suite of deep learning frameworks including Tensorflow, PyTorch, MXNet, Caffe2 and CNTK.

WordR – A New R Package for Rendering Documents in MS Word Format

One day earlier this year, I was faced with the challenge of creating a report for management. It had to be an MS Word document (corporate requirement, you know). It was supposed to be polished and use many of the standard MS Word features like headers, footers, table of contents, and styles. I am not a Word guy, and besides, I wanted to make a reproducible document that would make it easy for me to include R code and plots in the text. Having no idea what tools were available, I revved up Google and found several packages including R2wd, officer and ReporteRs. They are all very good packages, but for different reasons, none of them covered a hundred percent of my needs. Also, I thought that they make some things unnecessarily complicated (again, for my needs). For example, I wanted to be able to create the document template that included a significant amount of text and styles. I also wanted to be able to mark a place to add a table or plot and mainly, I wanted to have something like R inline code in R markdown language (link). I found out that with combination of officer and ReporteRs packages and some effort, I could achieve what I needed. (It could have been done solely with officer, but the tables and plots creation/insertion capability in ReporteRs serves me better).