3 Ways to Move Your Data Science Into Production

In this live webinar, on May 24th at 11AM Central, learn how Anaconda empowers data scientists to encapsulate and deploy their data science projects as live applications with a single click.


Must-Know: Key issues and problems with A/B testing

A look at 2 topics in A/B testing: Ensuring that bucket assignment is truly random, and conducting an A/B test on an opt-in feature. KNIME Analytics Platform solves your complex data puzzles KNIME Analytics Platform


The Marcos Lopez de Prado Hierarchical Risk Parity Algorithm

This post will be about replicating the Marcos Lopez de Prado algorithm from his paper building diversified portfolios that outperform out of sample. This algorithm is one that attempts to make a tradeoff between the classic mean-variance optimization algorithm that takes into account a covariance structure, but is unstable, and an inverse volatility algorithm that ignores covariance, but is more stable. This is a paper that I struggled with until I ran the code in Python (I have anaconda installed but have trouble installing some packages such as keras because I’m on windows…would love to have someone walk me through setting up a Linux dual-boot), as I assumed that the clustering algorithm actually was able to concretely group every asset into a particular cluster (I.E. ETF 1 would be in cluster 1, ETF 2 in cluster 3, etc.). Turns out, that isn’t at all the case. Here’s how the algorithm actually works.


Instrumental Variables in R exercises (Part-2)

This is the second part of the series on Instrumental Variables. For other parts of the series follow the tag instrumental variables. In this exercise set we will build on the example from part-1. We will now consider an over-identified case i.e. we have multiple IVs for an endogenous variable. We will also look at tests for endogeneity and over-identifying restrictions.


How to analyze max-diff data in R

This post discusses a number of options that are available in R for analyzing data from max-diff experiments, using the package flipMaxDiff. For a more detailed explanation of how to analyze max-diff, and what the outputs mean, you should read the post How max-diff analysis works. The post will cover the processes of installing packages, importing your data and experimental design, before discussing counting analysis and the more powerful, and valid, latent class analysis.


Principal Components Analysis

Principal components analysis (PCA) is a statistical technique that allows to identify underlying linear patterns in a data set so it can be expressed in terms of other data set of significatively lower dimension without much loss of information. The final data set should be able to explain most of the variance of the original data set by making a variable reduction. The final variables will be named as principal components.
Advertisements