Parsing Text for Emotion Terms: Analysis & Visualization Using R

Recently, I read a post regarding a sentiment analysis of Mr Warren Buffett’s annual shareholder letters in the past 40 years written by Michael Toth. In this post, only five of the annual shareholder letters showed negative net sentiment scores, whereas a majority of the letters (88%) displayed a positive net sentiment score. Toth noted that the years with negative net sentiment scores (1987, 1990, 2001, 2002 and 2008), coincided with lower annual returns on investments and global market decline. This observation caught my attention and triggered my curiosity about emotion words in those same shareholder letters, and whether or not emotion words were differentially expressed among the 40 letters.

5 Common Myths Around Virtualizing Big Data (Number 3 is SANdalous!)

Big data burst on to the scene a little over a decade ago. Today it is not an obscure term confined to just a handful of bleeding edge companies. It is a mainstream trend that every enterprise undergoing a digital transformation journey has adopted. The technology landscape around big data has broadened dramatically; in the early days it meant Apache Hadoop, today it includes Apache Spark and NoSQL databases like MongoDB and Apache Cassandra among many other new technologies.

Two Sigma Financial Modeling Code Competition, 5th Place Winners’ Interview: Team Best Fitting | Bestfitting, Zero, & CircleCircle

Kaggle’s inaugural code competition, the Two Sigma Financial Modeling Challenge ran from December 2016 to March 2017. Over 2,000 players competed to search for signal in unpredictable financial markets data. As the very first code competition, competitors experimented with the data, trained models, and made submissions directly via Kernels, Kaggle’s in-browser code execution platform. In this winners’ interview, team Bestfitting describes how they managed to remain a top-5 team even after a wicked leaderboard shake-up by focusing on building stable models and working effectively as a team. Read on to learn how they accounted for volatile periods of the market and experimented with reinforcement learning approaches.

Data Version Control: iterative machine learning

ML modeling is an iterative process and it is extremely important to keep track of all the steps and dependencies between code and data. New open-source tool helps you do that.

How to go about interpreting regression cofficients

Following my post about logistic regressions, Ryan got in touch about one bit of building logistic regressions models that I didn’t cover in much detail – interpreting regression coefficients. This post will hopefully help Ryan (and others) out.

Introducing Dask-SearchCV

We introduce a new library for doing distributed hyperparameter optimization with Scikit-Learn estimators. We compare it to the existing Scikit-Learn implementations, and discuss when it may be useful compared to other approaches.

The Two Phases of Gradient Descent in Deep Learning

Thanks to great experimental work by several research groups studying the behavior of Stochastic Gradient Descent (SGD), we are collectively gaining a much clearer understanding as to what happens in the neighborhood of training convergence. The story begins with the best paper award winner for ICLR 2017, “Rethinking Generalization”. This paper I first discussed several months ago in a blog post “Rethinking Generalization in Deep Learning”. One interesting observation in that paper is the role of SGD.