Long to Wide Data in R

Learn why you would transform your data from a long to a wide format and vice versa and explore how to do this in R with melt() and dcast()!


Math Can’t Solve Everything: Questions We Need To Be Asking Before Deciding an Algorithm is the Answer

Across the globe, algorithms are quietly but increasingly being relied upon to make important decisions that impact our lives. This includes determining the number of hours of in-home medical care patients will receive, whether a child is so at risk that child protective services should investigate, if a teacher adds value to a classroom or should be fired, and whether or not someone should continue receiving welfare benefits. The use of algorithmic decision-making is typically well-intentioned, but it can result in serious unintended consequences. In the hype of trying to figure out if and how they can use an algorithm, organizations often skip over one of the most important questions: will the introduction of the algorithm reduce or reinforce inequity in the system? There are various factors that impact the analysis. Here are a few that all organizations need to consider to determine if implementing a system based on algorithmic decision-making is an appropriate and ethical solution to their problem:
1. Will this algorithm influence – or serve as the basis of – decisions with the potential to negatively impact people’s lives?
2. Can the available data actually lead to a good outcome?
3. Is the algorithm fair?
4. How will the results (really) be used by humans?
5. Will people affected by these decisions have any influence over the system?


Introducing Google AI

For the past several years, we’ve pursued research that reflects our commitment to make AI available for everyone. From computer vision to healthcare research to AutoML, we have increasingly put emphasis on implementing machine learning techniques in nearly everything we do at Google. Our research has been core to the development and integration of these systems into Google products and platforms. To better reflect this commitment, we’re unifying our efforts under “Google AI”, which encompasses all the state-of-the-art research happening across Google


Semantic Image Segmentation with DeepLab in TensorFlow

Semantic image segmentation, the task of assigning a semantic label, such as “road”, “sky”, “person”, “dog”, to every pixel in an image enables numerous new applications, such as the synthetic shallow depth-of-field effect shipped in the portrait mode of the Pixel 2 and Pixel 2 XL smartphones and mobile real-time video segmentation. Assigning these semantic labels requires pinpointing the outline of objects, and thus imposes much stricter localization accuracy requirements than other visual entity recognition tasks such as image-level classification or bounding box-level detection.


5 Reasons “Logistic Regression” should be the first thing you learn when becoming a Data Scientist

I started my way in the Data Science world a few years back. I was a Software Engineer back then and I started to learn online first (before starting my Master’s degree). I remember that as I searched for online resources I saw only names of learning algorithms – Linear Regression, Support Vector Machine, Decision Tree, Random Forest, Neural Networks and so on. It was very hard to understand where I should start. Today I know that the most important thing to learn to become a Data Scientist is the pipeline, i.e, the process of getting and processing data, understanding the data, building the model, evaluating the results (both of the model and the data processing phase) and deployment. So as a TL;DR for this post: Learn Logistic Regression first to become familiar with the pipeline and not being overwhelmed with fancy algorithms.


An overview of R with a curated learning path

I recently wrote a 80-page guide to how to get a programming job without a degree, curated from my experience helping students do just that at Springboard, a leading data science bootcamp. This excerpt is a part where I focus on an overview of the R programming language.


An R Tutorial: Visual Representation of Complex Multivariate Relationships Using the R qgraph Package, Part Two Repost

In two previous blog posts I discussed some techniques for visualizing relationships involving two or three variables and a large number of cases. In this tutorial I will extend that discussion to show some techniques that can be used on datasets with complex multivariate relationships involving three or more variables. In this post I will use a dataset called ‘Detroit.’
Advertisements