Best practices of orchestrating Python and R code in ML projects

Today, data scientists are generally divided among two languages?—?some prefer R, some prefer Python. I will not try to explain in this article which one is better. Instead of that I will try to find an answer to a question: “What is the best way to integrate both languages in one data science project? What are the best practices?”. Beside git and shell scripting additional tools are developed to facilitate the development of predictive model in a multi-language environments. For fast data exchange between R and Python let’s use binary data file format Feather. Another language agnostic tool DVC can make the research reproducible?—?let’s use DVC to orchestrate R and Python code instead of a regular shell scripts.

Oneway ANOVA Explanation and Example in R; Part 1

This tutorial was inspired by a this post published at DataScience+ by Bidyut Ghosh. Special thanks also to Dani Navarro, The University of New South Wales (Sydney) for the book Learning Statistics with R (hereafter simply LSR) and the lsr packages available through CRAN. I highly recommend it.

Traveling Salesman

In the second part of the course on graphs and networks, we will focus on economic applications, and flows. The first series of slides are on the traveling salesman problem. Slides are available online.

10 Things Everyone Should Know About Machine Learning

1. Machine learning means learning from data; AI is a buzzword. 2. Machine learning is about data and algorithms, but mostly data. 3. Unless you have a lot of data, you should stick to simple models. 4. Machine learning can only be as good as the data you use to train it. 5. Machine learning only works if your training data is representative. 6. Most of the hard work for machine learning is data transformation. 7. Deep learning is a revolutionary advance, but it isn’t a magic bullet. 8. Machine learning systems are highly vulnerable to operator error. 9. Machine learning can inadvertently create a self-fulfilling prophecy. 10. AI is not going to become self-aware, rise up, and destroy humanity.

The Search for the Fastest Keras Deep Learning Backend

This is an overview of the performance comparison for the popular Deep Learning frameworks supported by Keras – TensorFlow, CNTK, MXNet and Theano.

Churn Prediction with Automatic ML

In this example we will help a telecom company to predict, which consumers are likely to renew a contract and which are not. We will base on the data from the past. This dataset in publicly available and can be downloaded for example here: https://…/data.

Meet the new Microsoft R Server: Microsoft ML Server 9.2

Microsoft R Server has received a new name and a major update: Microsoft ML Server 9.2 is now available. ML Server provides a scalable production platform for R — and now Python — programs. The basic idea is that a local client can push R or Python code and have it operationalized on the remote server. ML Server is also included with the Data Science Virtual Machine and HDInsight Spark clusters on Azure.

Microsoft Cognitive Services Vision API in R

A little while ago I did a brief tutorial of the Google Vision API using RoogleVision created by Mark Edmonson. I couldn’t find anything similar to that in R for the Microsoft Cognitive Services API so I thought I would give it a shot. I whipped this example together quickly to give it a proof-of-concept but I could certainly see myself building an R package to support this (unless someone can point to one – and please do if one exists)!

Regular Expression Searching within Shiny Selectize Objects

regexSelect is a small package that uses Shiny modules to solve a problem in Shiny selectize objects – regular expression (regex) searching. You can quickly filter the values in the selectize object, while being able to add that new regex query to the selectize list. This is great for long lists, since you can return multiple item simultaneously without needing to endlessly click items in a list!