Today, data scientists are generally divided among two languages?—?some prefer R, some prefer Python. I will not try to explain in this article which one is better. Instead of that I will try to find an answer to a question: “What is the best way to integrate both languages in one data science project? What are the best practices?”. Beside git and shell scripting additional tools are developed to facilitate the development of predictive model in a multi-language environments. For fast data exchange between R and Python let’s use binary data file format Feather. Another language agnostic tool DVC can make the research reproducible?—?let’s use DVC to orchestrate R and Python code instead of a regular shell scripts.
This tutorial was inspired by a this post published at DataScience+ by Bidyut Ghosh. Special thanks also to Dani Navarro, The University of New South Wales (Sydney) for the book Learning Statistics with R (hereafter simply LSR) and the lsr packages available through CRAN. I highly recommend it.
In the second part of the course on graphs and networks, we will focus on economic applications, and flows. The first series of slides are on the traveling salesman problem. Slides are available online.
1. Machine learning means learning from data; AI is a buzzword. 2. Machine learning is about data and algorithms, but mostly data. 3. Unless you have a lot of data, you should stick to simple models. 4. Machine learning can only be as good as the data you use to train it. 5. Machine learning only works if your training data is representative. 6. Most of the hard work for machine learning is data transformation. 7. Deep learning is a revolutionary advance, but it isn’t a magic bullet. 8. Machine learning systems are highly vulnerable to operator error. 9. Machine learning can inadvertently create a self-fulfilling prophecy. 10. AI is not going to become self-aware, rise up, and destroy humanity.
This is an overview of the performance comparison for the popular Deep Learning frameworks supported by Keras – TensorFlow, CNTK, MXNet and Theano.
In this example we will help a telecom company to predict, which consumers are likely to renew a contract and which are not. We will base on the data from the past. This dataset in publicly available and can be downloaded for example here: https://…/data.
Microsoft R Server has received a new name and a major update: Microsoft ML Server 9.2 is now available. ML Server provides a scalable production platform for R — and now Python — programs. The basic idea is that a local client can push R or Python code and have it operationalized on the remote server. ML Server is also included with the Data Science Virtual Machine and HDInsight Spark clusters on Azure.
A little while ago I did a brief tutorial of the Google Vision API using RoogleVision created by Mark Edmonson. I couldn’t find anything similar to that in R for the Microsoft Cognitive Services API so I thought I would give it a shot. I whipped this example together quickly to give it a proof-of-concept but I could certainly see myself building an R package to support this (unless someone can point to one – and please do if one exists)!
regexSelect is a small package that uses Shiny modules to solve a problem in Shiny selectize objects – regular expression (regex) searching. You can quickly filter the values in the selectize object, while being able to add that new regex query to the selectize list. This is great for long lists, since you can return multiple item simultaneously without needing to endlessly click items in a list!