Fast causal inference with non-random missingness by test-wise deletion

Many real datasets contain values missing not at random (MNAR). In this scenario, investigators often perform list-wise deletion, or delete samples with any missing values, before applying causal discovery algorithms. List-wise deletion is a sound and general strategy when paired with algorithms such as FCI and RFCI, but the deletion procedure also eliminates otherwise good samples that contain only a few missing values. In this report, we show that we can more efficiently utilize the observed values with test-wise deletion while still maintaining algorithmic soundness. Here, test-wise deletion refers to the process of list-wise deleting samples only among the variables required for each conditional independence (CI) test used in constraint-based searches. Test-wise deletion therefore often saves more samples than list-wise deletion for each CI test, especially when we have a sparse underlying graph. Our theoretical results show that test-wise deletion is sound under the justifiable assumption that none of the missingness mechanisms causally affect each other in the underlying causal graph. We also find that FCI and RFCI with test-wise deletion outperform their list-wise deletion and imputation counterparts on average when MNAR holds in both synthetic and real data.


Reading and Writing Files in Python Tutorial

Learn how to open, read and write data into flat files, such as JSON and text files, as well as binary files in Python with the io and os modules.


Getting Started with the Tidyverse: Tutorial

Start analyzing titanic data with R and the tidyverse: learn how to filter, arrange, summarise, mutate and visualize your data with dplyr and ggplot2!


Ten Machine Learning Algorithms You Should Know to Become a Data Scientist

Machine Learning Practitioners have different personalities. While some of them are “I am an expert in X and X can train on any type of data”, where X = some algorithm, some others are “Right tool for the right job people”. A lot of them also subscribe to “Jack of all trades. Master of one” strategy, where they have one area of deep expertise and know slightly about different fields of Machine Learning. That said, no one can deny the fact that as practicing Data Scientists, we will have to know basics of some common machine learning algorithms, which would help us engage with a new-domain problem we come across. This is a whirlwind tour of common machine learning algorithms and quick resources about them which can help you get started on them.


Getting Started with PyTorch Part 1: Understanding how Automatic Differentiation works

When I started to code neural networks, I ended up using what everyone else around me was using. TensorFlow. But recently, PyTorch has emerged as a major contender in the race to be the king of deep learning frameworks. What makes it really luring is it’s dynamic computation graph paradigm. Don’t worry if the last line doesn’t make sense to you now. By the end of this post, it will. But take my word that it makes debugging neural networks way easier.


Top 8 Free Must-Read Books on Deep Learning

1. Deep Learning
2. Deep Learning Tutorial
3. Deep Learning: Methods and Applications
4. First Contact with TensorFlow, get started with Deep Learning Programming
5. Neural Networks and Deep Learning
6. A Brief Introduction to Neural Networks
7. Neural Network Design (2nd edition)
8. Neural Networks and Learning Machines (3rd edition)


Genetic Algorithm Key Terms, Explained

1. Genetic Algorithm
2. Evolutionary Algorithm
3. Genetic Programming
4. Population
5. Chromosome
6. Gene
7. Generation
8. Breeding
9. Selection
10. Crossover
11. Mutation
12. Fitness


The Cold Start Problem with Artificial Intelligence

If you have become a Data Scientist in the last three or four years, and you haven’t experienced the 1990’s or the 2000’s or even a large part of the 2010’s in the workforce, it is sometimes hard to imagine, how much things have changed. Nowadays we use GPU-Powered Databases, to query billions of rows, whereas we used to be lucky if we were able to generate daily aggregated reports. But as we have become accustomed to having data and business intelligence/analytics, a new problem is stopping eager Data Scientists from putting the algorithms they were using on Toy Problems, and applying them on actual real-life business problems. Other wise known as the Cold Start Problem with Artificial Intelligence. In this post, I discuss why companies struggle with implementing AI and how they can overcome it.


Data engineers vs. data scientists

It’s important to understand the differences between a data engineer and a data scientist. Misunderstanding or not knowing these differences are making teams fail or underperform with big data. A key misunderstanding is the strengths and weaknesses of each position. I think some of these misconceptions come from the diagrams that are used to describe data scientists and data engineers.


CRANalerts – Get email alerts when a CRAN package gets updated

Have you ever found yourself asking “how can I make sure I don’t miss the next version release of package X”? That’s the exact problem I set out to solve with CRANalerts. Simply provide your email address and an R package name, and every time the package gets updated on CRAN in the future, you’ll get notified.
Advertisements