What is Granger Causality?

In this video you will learn about what is Granger Causality and what is its role in time series forecasting. Granger Causality is used to test of another time series has causal effect on the future prices of the given time series .


10 Best Big Data Analytics Courses Online

1. Data Analysis and Presentation Skills: the PwC Approach Speciali…
2. Data Science Specialization
3. Big Data Specialization
4. Statistics with R
5. Microsoft Professional Program in Data Science
6. Marketing Analytics
7.Big Data Fundamentals
8. Advanced Data Structures
9. Python
10. Java Tutorial for Complete Beginners


On the accuracy of linear regression routines in some data mining packages

While articles assessing the accuracy of traditional statistical packages are fairly commonplace, data mining software has escaped this important scrutiny. We apply the National Institute of Standards and Technology Statistical Reference Datasets tests for the numerical accuracy of statistical packages to 7 data mining packages: IBM Modeler, KNIME, Orange, Python, RapidMiner, Weka, and XLMiner. We find that one package has an unstable algorithm for the calculation of the sample variance and only two have reliable linear regression routines. Of these two packages that offer analysis of variance, one has a bad algorithm. The accuracy of statistical calculations in data mining packages cannot be taken for granted.


Why Every Data Scientist Needs A Data Engineer

Thousands of companies across a myriad of industries are hiring data scientists, likened to the’quants’ of Wall Street in the 1980s and 1990s for their exclusive abilities to understand and interpret data, a kind of secret weapon for doing better business, as depicted in The Big Short. But with a supply of just over 11,000 data scientists and a rapidly growing demand, the competition among employers to secure this role is steep. The U.S. Bureau of Labor Statistics projects that demand will be 50-60% higher than supply by 2018. And McKinsey predicts that by 2018, the United States alone will face a shortage of 1.5 million analysts and managers who know how to use big data to make decisions. Companies who don´t hire a data scientist now might not be able to find one at all.


Encoding fixed length high cardinality non-numeric columns for a ML algorithm

ML algorithms work only with numerical values. So there is a need to model a problem and its data completely in numbers. For example, to run a clustering algorithm on a road network, representing the network / graph as an adjacency matrix is one way to model it.


Understanding the Covariance Matrix

This article is showing a geometric and intuitive explanation of the covariance matrix and the way it describes the shape of a data set. We will describe the geometric relationship of the covariance matrix with the use of linear transformations and eigendecomposition.


Write less terrible code with Jupyter Notebook

Jupyter Notebook (or Lab) is great for prototyping but not really suited for writing good code. I love Notebooks for trying out new things, plotting, documenting my research, and as an educational tool. However, they don’t help you like an IDE with, for instance, code linting and refactoring. Notebooks written by data scientist are notorious for being unreadable, unreproducible and full of bugs. One solution for writing less terrible code in Notebooks, is to only use an IDE and write no code in Notebooks. But wouldn’t it be great if you could have both the assistance of an IDE and the interactivity of a Notebook?


K-Means in Real Life: Clustering Workout Sessions

K -means clustering is a very popular unsupervised learning algorithm. In this article I want to provide a bit of background about it, and show how we could use it in an anecdotal real-life situation. … K-Means Clustering is an algorithm that, given a dataset, will identify which data points belong to each one of the k clusters. It takes your data and learns how it can be grouped. Through a series of iterations, the algorithm creates groups of data points – referred to as clusters – that have similar variance and that minimize a specific cost function: the within-cluster sum of squares.


5 things you should be monitoring

So you’ve deployed your brand new web application (now with blockchain!), and your users love it. Traffic is increasing, and you’re getting great press. Then one morning you wake up to find that the site has been slow all night, and your users are complaining. How do you find the problem? Whether you’re a developer building websites or internal applications, or an administrator building the infrastructure to back them, your job doesn’t stop once they’re up and running. Machine failure, releases containing bugs, and growth in usage can all lead to problems that need to be dealt with. To detect them, you need monitoring. But monitoring can do more than just send you alerts about the things that are going wrong. It can also help you debug those problems and prevent them in the future. So what things should you be monitoring?


Collecting Expressions in R

Not a full R article, but a quick note demonstrating by example the advantage of being able to collect many expressions and pack them into a single extend_se() node.


Estimating treatment effects and ICCs from (G)LMMs on the observed scale using Bayes, Part 1: lognormal models

When a multilevel model includes either a non-linear transformation (such as the log-transformation) of the response variable, or of the expectations via a GLM link-function, then the interpretation of the results will be different compared to a standard Gaussian multilevel model; specifically, the estimates will be on a transformed scale and not in the original units, and the effects will no longer refer to the average effect in the population, instead they are conditional/cluster-specific. In this post, I will deal with linear mixed-effects models (LMM) that use a log-transformed outcome variable. This will be the first part of a three-part tutorial on some of the finer details of (G)LMMs, and how Bayes can make your (frequentist) life easier.


Longitudinal Heat Plots

During our research on the effect of prednisone consumption during pregency on health outcomes of the baby (Palmsten K, Rolland M, Hebert MF, et al., Patterns of prednisone use during pregnancy in women with rheumatoid arthritis: Daily and cumulative dose. Pharmacoepidemiol Drug Saf. 2018 Apr;27(4):430-438. https://…/29488292 ) we developed a custom plot to visualize for each patient their daily and cumulative consumption of prednisone during pregenancy. Since the publication these plots have raised some interest so here is the code used to produce them.
Advertisements