In the last few months I have worked really hard to put together a introductory course in data coding for those, who are new to Data Science. I’ve selected bash (aka. the command line) as the first data language to show you, because I find it easy to interpret – even for first timers. In my articles I’ve started “the story” from the very beginning, so if you have never touched coding/programming so far, don’t worry, you will understand everything. My main focus was to keep everything easy-to-follow, but also practical and hands-on.
Awesome Deep Learning: Most Cited Deep Learning Papers; Data Science for the Layman; Best Data Science Courses from Udemy; Negative Results on Negative Images: Major Flaw in Deep Learning?
This is a highlight from a talk by Diogo Almeida, ‘Deep learning: Modular in theory, inflexible in practice.’ Visit Safari to view the full session from the 2016 Artificial Intelligence Conference in New York. Given the recent success of deep learning, it is tempting to believe that it can solve any problem placed in its path. Just build more neural networks, throw more data at them, and connect them in a modular fashion to do anything. In truth, this approach would never work without thoughtful systems engineering. In this talk excerpt, Diogo Almeida discusses challenging and related issues like data inefficiency and explainability, and he proposes how we should think about tackling these problems.
The knitr package by Yihui Xie is a wonderful tool for reproducible data science. I especially like using it with R Markdown documents, where with some simple markup in an easy-to-read document I can easily combine R code and narrative text to generate an attractive document with words, tables and pictures in HTML, PDF or Word format.
Uncertainty is the biggest enemy of a profitable business. That is especially true of small business who don’t have enough resources to survive an unexpected diminution of revenue or to capitalize on a sudden increase of demand. In this context, it is especially important to be able to predict accurately the change in the markets to be able to make better decision and stay competitive. This series of posts will teach you how to use data to make sound prediction. In the last set of exercises, we’ve seen how to make predictions on a random walk by isolating the white noise components via differentiation of the term of the time series. But this approach is valid only if the random components of the time series follow a normal distribution of constant mean and variance and if those components are added together in each iteration to create the new observations.
We will develop a well-known k-NN exercise originally published in ‘Machine Learning in R’ by Brett Lantz, PACKT publishing 2015. K-nearest neighbors is a classification algorithm and is perhaps one of the simplest machine learning algorithms. The exercise we will develop is: ‘diagnosing breast cancer with the k-NN algorithm’, and we will use the Wisconsin Breast Cancer Diagnostic dataset from the UCI Machine Learning Repository at http://…/ml.