Will Data Scientists be Replaced by Machines?

Data Science automation is a hot topic recently, with several articles about it[1]. Most of them discuss the so-called “automation” tools[2]. Too often, editors claim that their tools can automate the Data Science process. This provides the feeling that combining these tools with a Big Data architecture can solve any business problems. The misconception comes from the confusion between the whole Data Science process[3] and the sub-tasks of data preparation (feature extraction, etc.) and modeling (algorithm selection, hyper-parameters tuning, etc.) which I call Machine Learning. This issue is amplified by the recent success of platforms such as Kaggle (www.kaggle.com) and DrivenData (www.drivendata.org). Competitors are provided with a clear problem to solve and clean data. Choosing and tuning a machine learning algorithm is the main task. Participants are evaluated using metrics such as test set accuracy. In industry, data scientists will be evaluated on the value added to the business, rather than algorithm accuracy. A project with 99% classification accuracy, but that isn’t deployed in production, is bringing no value to the company.


How to build your first Machine Learning model on iPhone (Intro to Apple’s CoreML)?

This article on how to build the first Machine Learning model on iPhone X was posted by Mohd Sanad Zaki Razvi. Sanad is currently pursuing B.Tech in Computer Science from National Institute of Engineering, Mysore. A data science rookie, he is passionate about Machine Learning, Data Visualization and the impact AI can have on the world.


Recurrent Neural Networks for Email List Churn Prediction

Not very long after finishing writing my lessons learned from building a Hello World Neural Network I thought that I could move on from a simple MLP to a more sophisticated neural net. Probably it was for Karpathy’s blog post about the unreasonable effectiveness of Recurrent Neural Networks that I chose to continue with an RNN. To be honest, this wasn’t my only motive. Those who read my posts will already know that one of the problems I have studied extensively during the past few months, is the mailing list churn prediction using data from MailChimp.


Neural Networks for Advertisers

Recently I came across a problem to solve using some sort of machine learning capabilities, which was the need to count the total time during which a specific company was advertised on the various places at a football match. Not totally sure if this is a real problem or not, I have found it interesting enough to try to solve it with some sort of convolutional neural network architecture and one of the available frameworks like Caffe or Keras and TensorFlow. The idea is to get the video of the football match, pass it to our deployed neural network, process it and get the total time for which the specified brand was visible to the viewer. I have revised some pros and cons of different approaches to solve this problem and have chosen one which looks to me like the best in terms of flexibility, speed and complexity.


Practical Machine Learning with R and Python – Part 2

In this 2nd part of the series ‘Practical Machine Learning with R and Python – Part 2’, I continue where I left off in my first post Practical Machine Learning with R and Python – Part 2. In this post I cover the some classification algorithmns and cross validation. Specifically I touch
• Logistic Regression
• K Nearest Neighbors (KNN) classification
• Leave out one Cross Validation (LOOCV)
• K Fold Cross Validation
in both R and Python.


An Ultimate Beginner’s Guide to BlockChain [Infographic]

Blockchains are set to get more popular as more and more uses for it are developed. Using it as an undeceivable online passbook is surely among the most promising applications of the technology. Stacy Miller October 13, 2017 ?105 Shares
Advertisements