Here’s why so many data scientists are leaving their jobs

Yes, I am a data scientist and yes, you did read the title correctly, but someone had to say it. We read so many stories about data science being the sexiest job of the 21st century and the attractive sums of money that you can make as a data scientist that it can seem like the absolute dream job. Factor in that the field contains an abundance of highly skilled people geeking out to solve complex problems (yes it’s a positive thing to “geek out”), there is everything to love about the job. But the truth is that data scientists typically “spend 1-2 hours a week looking for a new job” as stated in this article by the Financial Times. Furthermore, the article also states that “Machine learning specialists topped its list of developers who said they were looking for a new job, at 14.3 per cent. Data scientists were a close second, at 13.2 per cent.” These data were collected by Stack Overflow in their survey based on 64,000 developers.


Essentials of Deep Learning: Getting to know CapsuleNets (with Python codes)

Neural networks have been around since the last century but in the last decade, they have reshaped how we see the world. From classifying images of animals to extracting parts of speech, researchers are building deep neural networks in diverse and vast fields to push and break boundaries. But as advancements in deep learning reach new heights, a new concept has lately been introduced that is a twist on the old neural network architecture – Capsule Networks. It improves on the effectiveness of the old traditional methods and understands even when presented with a situation that is shown from a different angle.


Top 7 Data Science & Machine Learning GitHub Repositories in March 2018

I live GitHub! Not only can you follow the work happening in different domains, but you can also collaborate on multiple open source projects. All tech companies, from Google to Facebook, upload their open source project codes on GitHub so the wider coding / ML community can benefit from it. But, if you are too busy, or find following GitHub difficult, we bring you a summary of top repositories month on month. You can keep yourself updated with the latest breakthroughs and even replicate the code on your own machine! This month’s list includes some awesome libraries. From Google Brain’s AstroNet to an artificial neural network visualizer, we have curated a list of unique repositories that will expand your machine learning horizons.


Automated Deep Learning – So Simple Anyone Can Do It

There are several things holding back our use of deep learning methods and chief among them is that they are complicated and hard. Now there are three platforms that offer Automated Deep Learning (ADL) so simple that almost anyone can do it.


Clojure Integration with R


Regular Expressions Every R programmer Should Know

Regular expressions. How they can be cruel! Well we’re here to make them a tad easier. To do so we’re going to make use of the stringr package


Neglected R Super Functions

R has a lot of under-appreciated super powerful functions. I list a few of our favorites below.


Meetup slides: Introducing Deep Learning with Keras

On April 4th, 2018 I gave a talk about Deep Learning with Keras at the Ruhr.Py Meetup in Essen, Germany. The talk was not specific to Python, though – so if you’re intersted: the slides can be found here: https://…ucing-deep-learning-with-keras-and-python


Introducing TensorFlow Probability

At the 2018 TensorFlow Developer Summit, we announced TensorFlow Probability: a probabilistic programming toolbox for machine learning researchers and practitioners to quickly and reliably build sophisticated models that leverage state-of-the-art hardware.


Introduction to Fama French

Today, we move beyond CAPM’s simple linear regression and explore the Fama French (FF) multi-factor model of equity risk/return. For more background, have a look at the original article published in The Journal Financial Economics, Common risk factors in the returns on stocks and bonds. The FF model extends CAPM by regressing portfolio returns on several variables, in addition to market returns. From a general data science point of view, FF extends CAPM’s simple linear regression, where we had one independent variable, to a multiple linear regression, where we have numerous independent variables.


Shiny Server (Pro) 1.5.7

Highlights for this release are a major-version Node upgrade, support for HTTP gzip/deflate compression and (optionally) secure cookies, and numerous bug fixes. We’ve also dropped support for some Linux distro versions that have reached their end of life.


Assessing the predictive causality of individual based models using Bayesian inference intervention analysis: an application in epidemiology

Understanding dynamics in time and the predominant underlying factors that shape them is a central question in biological and medical sciences. Data are more ubiquitous and richer than ever before and population biology in the big data era need to integrate novel methods. Calibrated Individual Based Models (IBMs) are powerful tools for process based predictive modelling. Intervention analysis is the analysis in time series of the potential impact of an event such as an extreme event or an experimentally designed intervention on the time series, for example vaccinating a population. A method for big data analytics (causal impact) that implements a Bayesian intervention approach to estimating the causal effect of a designed intervention on a time series is used to quantify the deviance between data and IBM outputs. Having quantified the deviance between IBM outputs and data, IBM scenarios are used to predict the counterfactual. The counterfactual is how the IBM response metric would have evolved after the intervention if the intervention had never occurred. The method is exemplified to quantify the deviance between a calibrated IBM outputs and epidemiological data of Bovine Tuberculosis with changing the cattle TB testing frequency as the intervention covariate. The advantage of IBM data validation and uncertainty assessment as time series is also discussed.


The importance of transparency and user control in machine learning

In this episode of the Data Show, I spoke with Guillaume Chaslot, an ex-YouTube engineer and founder of AlgoTransparency, an organization dedicated to helping the public understand the profound impact algorithms have on our lives. We live in an age when many of our interactions with companies and services are governed by algorithms. At a time when their impact continues to grow, there are many settings where these algorithms are far from transparent. There is growing awareness about the vast amounts of data companies are collecting on their users and customers, and people are starting to demand control over their data. A similar conversation is starting to happen about algorithms—users are wanting more control over what these models optimize for and an understanding of how they work.


Forecasting with ARIMA – Part I

Some of the methods for doing forecasting in Business and Economics are (1) Exponential Smoothing Technique (2) Single Equation Regression Technique (3) Simultaneous-equation Regression Method (4) Autoregressive Integrated Moving Average (ARIMA) Models (5) Vector Autoregression (VAR) Method


12 Useful Things to Know about Machine Learning

1. Learning = Representation + Evaluation + Optimization
2. It’s Generalization that Counts
3. Data Alone is Not Enough
4. Overfitting Has Many Faces
5. Intuition Fails in High Dimensions
6. Theoretical Guarantees are Not What They Seem
7. Feature Engineering is the Key
8. More Data Beats a Cleverer Algorithm
9. Learn Many Models, Not Just One
10. Simplicity Does Not Imply Accuracy
11. Representable Does Not Imply Learnable
12. Correlation Does Not Imply Causation
Advertisements