Is Game Theory important for Data Scientists?

Imagine you are driving down a lane in heavy traffic, you observe that the traffic in your lane is moving slowly. You switch to another lane where the traffic seems to move faster. However, after a while you observe that the traffic in the prior lane is now moving at a faster pace. This is when you have to take a strategic decision – should you stay where you are or should you switch back to the previous lane? Game theory, deals with understanding strategic situations- where, how well a person performs, depends on what others do and vice-versa. The basic principle of game theory is to find out an optimal solution for a given situation. It is not just the games like Poker, Football and Chess that fit into Game theory but there are many other important decisions like investing, customer engagement, deciding what job to take, etc. Game theory applications can be found in various strategic decision making expanses like Sports, Economics, Politics, Geosciences, etc.

Examining the KNIME open source data analytics platform

KNIME offers open source data analytics, reporting and integration tools, as well as commercial software that can help build more efficient workflows.

Data Science and EU Privacy Regulations: A Storm on the Horizon

The European Union is a few short months away from finalizing a sweeping regulation that will dramatically change the way in which data can be handled and in which data science can be utilized. This new regulation will affect all corporations using data from EU citizens, not just those with offices in the EU. Those collecting data from more than 5k EU citizens per year will be consider accountable, regardless of company location. The EU parliament is so serious about compliance with these new privacy and data protection laws that it has proposed a fine for violations of up to 5% of global annual turnover (1 million Euros for smaller companies). Needless to say, this massive fine has attracted serious attention to the regulation. Companies have already started preparations to comply.

Thumbs up for Anaconda

I would say definitely give Anaconda a try. Anaconda is responsible for installing the entire ecosystem (including the copy of R it wants to use) so the Anaconda developers directly experience “integration debt” (and presumably act in their own interest and continuously reduce this debt).

Dangers of Using RMSE: Netflix Case Study

Dangers of Using RMSE: Netflix Case Study…

Navigating into the World of Analytics

Note 1: All analytics is not machine learning!
Note 2: All about analytics is not enjoyable and rosy!
Note 3: Analytics is about getting into the murky details!
Note 4: Analytics is about challenging your own and other people’s biases!
Note 5: Analytics is about action.

Visualizing Machine Learning Thresholds to Make Better Business Decisions

As data scientists, when we build a machine learning model our ultimate goal is to create value: We want to leverage our model’s predictions to do something better than we were doing it before, when we didn’t have a model or when our model was more primitive. Focusing on outcomes means that our final measure of a model’s performance is how useful it was, measured as the amount of value it created in the application for which it was used. In this post, we’ll use data visualization as a powerful tool in choosing and understanding the modeling decisions that maximize business value. For classification algorithms, one of the most common usage patterns is thresholding: All cases with a model score above the threshold get some sort of special treatment.

Big-data analytics for lenders and creditors

Credit today is granted by various organizations such as banks, building societies, retailers, mail order companies, utilities and various others. Because of growing demand, stronger competition and advances in computer technology, over the last 30 years traditional methods of making credit decisions that rely mostly on human judgment have been replaced by methods that rely mostly on statistical models. Such statistical models today are not only used for deciding whether or not to accept an applicant (application scoring), but also to predict the likely default of customers that have already been accepted (behavioral scoring) and to predict the likely amount of debt that the lender can expect to recover (collection scoring). The term credit scoring can be defined on several conceptual levels. Most fundamentally, credit scoring means applying a statistical model to assign a risk score to a credit application or to an existing credit account. On a higher level, credit scoring also means the process of developing such a statistical model from historical data. On yet a higher level, the term also refers to monitoring the accuracy of one or many such statistical models and monitoring the effect that score based decisions have on key business performance indicators.

Delta Method Confidence Bands for Gaussian Mixture Density (Can Behave Badly)

This post follows from a previous post (2798), in which the delta method was used to create an approximate pointwise 95% confidence band for a Gaussian density estimate. Note that the quality of this estimate was not assessed (e.g., whether the band has the correct pointwise coverage). Here we extend that approach to the Gaussian mixture density, which is much more flexible, and given sufficient mixture components, can be used to model ANY density. Here we show how the delta method can behave badly…

Flip a fair coin 4x. Probability of H following H is 40%?

A recent working paper has come out arguing for the existence of Hot Hands (in basketball), a concept psychologists had dismissed decades ago. Hot hands is where a player is thought to have a higher likelihood of scoring the next basket if the last three baskets where shot successfully. (In NBA Jam, that is when you hands caught on fire). The first paragraph of the paper reads, ‘Jack takes a coin from his pocket and decides that he will flip it 4 times in a row, writing down the outcome of each flip on a scrap of paper. After he is done flipping, he will look at the flips that immediately followed an outcome of heads, and compute the relative frequency of heads on those flips. Because the coin is fair, Jack of course expects this empirical probability of heads to be equal to the true probability of flipping a heads: 0.5. Shockingly, Jack is wrong. If he were to sample one million fair coins and flip each coin 4 times, observing the conditional relative frequency for each coin, on average the relative frequency would be approximately 0.4.’