Deep Learning vs. Machine Learning – the essential differences you need to know

Machine learning and deep learning on a rage! All of a sudden every one is talking about them – irrespective of whether they understand the differences or not! Whether you have been actively following data science or not – you would have heard these terms.

Introduction to Principal Component Analysis

The sheer size of data in the modern age is not only a challenge for computer hardware but also the main bottleneck for the performance of many machine learning algorithms. The main goal of a PCA analysis is to identify patterns in data. PCA aims to detect the correlation between variables. If a strong correlation between variables exists, the attempt to reduce the dimensionality only makes sense. It is a statistical method used to reduce the number of variables in a data-set. It does so by lumping highly correlated variables together. Naturally, this comes at the expense of accuracy. However, if you have 50 variables and realize that 40 of them are highly correlated, you will gladly trade a little accuracy for simplicity.

Build a Recurrent Neural Net in 5 Min

In this video, I explain the basics of recurrent neural networks. Then we code our own RNN in 80 lines of python (plus white-space) that predicts the sum of two binary numbers after training.

Web Scraping and Applied Clustering Global Happiness and Social Progress Index

Increasing amount of data is available on the web. Web scraping is a technique developed to extract data from web pages automatically and transforming it into a data format for further data analysis and insights. Applied clustering is an unsupervised learning technique that refers to a family of pattern discovery and data mining tools with applications in machine learning, bioinformatics, image analysis, and segmentation of consumer types, among others. R offers several packages and tools for web scraping, data manipulation, statistical analysis and machine learning. The motivation for this post is to illustrate the applications of web scraping, dimension reduction and applied clustering tools in R. There are two separate data sets for web scraping in this post. The first data set is from a recently released World Happiness Report 2017 by the United Nations Sustainable Development Solutions Network. The 2017 report launched on March 20, the day of world happiness, contained global rankings for happiness and social well-being. The second data set for web scraping is the 2015 social progress index of countries in the world. Social Progress Index has been described as measuring the extent to which countries provide for the social and environmental needs of their citizens. In this exercise, the two data sets joined by country column were pre-processed prior to principal component analysis (PCA) and clustering. The goals of the clustering approach in this post were to segment rows of the over 150 countries in the data into separate groups (clusters), The expectation is that sets of countries within a cluster are as similar as possible to each other for happiness and social progress, and as dissimilar as possible to the other sets of countries assigned in different clusters.

The Difference Between Curve Fitting and Regression

One of the hardest things about entering a new field is learning the terminology. It’s even harder when you have already learned similar terms for years. Coming from a physics background to the world of machine learning and statistics, the terminology took a little getting used to. So I thought I’d clear it up a little bit for anyone who might be facing the same problems I faced.

Become a Practicing Data Scientist

Move from a beginner to a practitioner with help of our innovative teaching method & work on 15 real world case studies during class along with our top facullty

Correlation and Correlogram Exercises

Correlation analysis is one of the most popular techniques for data exploration. This set of exercises is intended to help you to extend, speed up, and validate your correlation analysis. It allows to practice in:
• calculating linear and nonlinear correlation coefficients,
• testing those coefficients for statistical significance,
• creating correlation matrices to study interdependence between variables in dataframes,
• drawing graphical representations of those matrices (correlograms),
• calculating coefficients for partial correlation between two variables (controlling for their correlation with other variables).

ReinforcementLearning: A package for replicating human behavior in R

Reinforcement learning has recently gained a great deal of traction in studies that call for human-like learning. In settings where an explicit teacher is not available, this method teaches an agent via interaction with its environment without any supervision other than its own decision-making policy. In many cases, this approach appears quite natural by mimicking the fundamental way humans learn. However, implementing reinforcement learning is programmatically challenging, since it relies on continuous interactions between an agent and its environment. In fact, there is currently no package available that performs model-free reinforcement learning in R. As a remedy, we introduce the ReinforcementLearning R package, which allows an agent to learn optimal behavior based on sample experience consisting of states, actions and rewards. The result of the learning process is a highly interpretable reinforcement learning policy that defines the best possible action in each state.