Data Lineage: The History of your Data

A common scenario that data analysts in general encounter is what I like to describe as ‘data denialism’. Often, and especially while consulting, an analyst will find that the data tells a different story than what the customer holds to be true. It is also often the case that, when presenting this finding, the customer will outright deny the evidence, asserting that either the data or the analysis must be wrong. For example, it may be that a retailer focused on the low-end market is getting most of its sales from high-end customers, and such a fact upends months -maybe even years- of marketing planning and strategy. (This may, or may not, be based on one of my previous consulting experiences) It is of course part of the analyst’s job to present and discuss such controversial findings carefully and in a way that they can be understood an accepted, or tell a story that is compelling enough to be believable. Of course, too, some discussion about findings is definitely healthy and desirable. But even if the customer is convinced that the analyst did their job right, there’s still the matter of the data itself, for how can the customer be assured that the data is correct? After the myriad transformations, schema modifications, unifications and predictive tasks, how can even the analyst be sure that everything went right?

Causation: The Why Beneath The What

A lot of marketing research is aimed at uncovering why consumers do what they do and not just predicting what they’ll do next. Marketing scientist Kevin Gray asks Harvard Professor Tyler VanderWeele about causal analysis, arguably the next frontier in analytics.

RStudio Server Pro is ready for BigQuery on the Google Cloud Platform

RStudio is excited to announce the availability of RStudio Server Pro on the Google Cloud Platform.

Finding chairs the data scientist way! (Hint: using Deep Learning) – Part I

I have been going through the deep learning literature for quite some time now. I have also participated in a few challenges to get my hands dirty. But what I enjoy the most is to apply deep learning in a real life problem. A real life problem which encompasses my daily life. This is partly why I picked up this problem of chair count recognition, to finally solve a problem which was unsolved till now! In this article, I will cover how I defined the problem. I will also mention what were the steps I took to solve the problem. Consider it as a raw uncut version of my experience as I tried to solve the problem ??

Clustering applied to showers in the OPERA

in this post I discuss clustering: techniques that form this method and some peculiarities of using clustering in practice. This post continues previous one about the OPERA.

Dots vs. polygons: How I choose the right visualization

When I start designing a map I consider: How do I want the viewer to read the information on my map? Do I want them to see how a measurement varies across a geographic area at a glance? Do I want to show the level of variability within a specific region? Or do I want to indicate busy pockets of activity or the relative volume/density within an area?

Probability Functions Beginner

On this set of exercises, we are going to explore some of the probability functions in R with practical applications. Basic probability knowledge is required. Note: We are going to use random number functions and random process functions in R such as runif, a problem with these functions is that every time you run them you will obtain a different value. To make your results reproducible you can specify the value of the seed using set.seed(‘any number’) before calling a random function. (If you are not familiar with seeds, think of them as the tracking number of your random numbers). For this set of exercises we will use set.seed(1), don’t forget to specify it before every random exercise.