RStudio Connect

We’re excited to announce the release of RStudio Connect: version This release includes the ability to manage different versions of your work on RStudio Connect.

Measuring Audience Sentiments about Movies using Twitter and Text Analytics

The practice of using analytics to measure movie’s success is not a new phenomenon. Most of these predictive models are based on structured data with input variables such as Cost of Production, Genre of the Movie, Actor, Director, Production House, Marketing expenditure, no of distribution platforms, etc. However, with the advent of social media platforms, young demographics, digital media and the increasing adoption of platforms like Twitter, Facebook, etc to express views and opinions. Social Media has become a potent tool to measure Audience Sentiments about a movie. This report is an attempt to understand one such platform, i.e., Twitter. The movie chosen is Rangoon, which is a 2017 Bollywood movie, directed by Vishal Bhardwaj and produced by Sajid Nadiadwala and Viacom 18 Motion Pictures. The lead actors are Saif Ali Khan, Shahid Kapoor and Kangana Ranaut. The film was released on 24 February 2017 in India on a weekend. I will be using R, an open source statistical programming tool, to carry out the analysis. Note: I will focus on the approach and the findings. The R Code to carry out the analysis can be found at the end of the article.

TensorFlow 101: Understanding Tensors and Graphs to get you started in Deep Learning

TensorFlow is one of the most popular libraries in Deep Learning. When I started with TensorFlow it felt like an alien language. But after attending couple of sessions in TensorFlow, I got the hang of it. I found the topic so interesting that I delved fu

Understanding Linear SVM with R

Linear Support Vector Machine or linear-SVM(as it is often abbreviated), is a supervised classifier, generally used in bi-classification problem, that is the problem setting, where there are two classes. Of course it can be extended to multi-class problem. In this work, we will take a mathematical understanding of linear SVM along with R code to understand the critical components of SVM. Taking the liberty to assume that some readers are new to machine learning, let us first find out what the supervised classifier is supposed to do. The term supervised comes from the concept of supervised learning. In case of supervised learning, you provide some examples along with their labels to your classifier and the classifier then tries to learn important features from the provided examples. For example, consider the following data set. You, will provide a part of this data to your linear SVM and tune the parameters such that your SVM can can act as a discriminatory function separating the ham messages from the spam messages. So, let us add the following R-code to our task.

Predicting House Prices Playground Competition: Winning Kernels

The House Prices playground competition originally ran on Kaggle from August 2016 to February 2017. During this time, over 2,000 competitors experimented with advanced regression techniques like XGBoost to accurately predict a home’s sale price based on 79 features. In this blog post, we feature authors of kernels recognized for their excellence in data exploration, feature engineering, and more.

A Beginner’s Guide to Tweet Analytics with Pandas

Unlike a lot of other tutorials which often pull from the real-time Twitter API, we will be using the downloadable Twitter Analytics data, and most of what we do will be done in Pandas.

Deep Learning, Generative Adversarial Networks & Boxing – Toward a Fundamental Understanding

In this post we will see why GANs have so much potential, and frame GANs as a boxing match between two opponents.

Trying gradient descent for linear regression

The best way to learn an algorith is to code it. So here it is, my take on Gradient Descent Algorithm for simple linear regression. First, we fit a simple linear model with lm for comparison with gradient descent values.

Reproducible Research – when your results can’t be reproduced?

Even most sophisticated machine learning methods, most beautiful visualisations and perfect datasets can be useless if you do your research carelessly and ignore the context of your code execution. In this article, you will read about a few situations that can destroy your reproducibility and learn how to resolve and avoid them in the future.

shinyHeatmaply – a shiny app for creating interactive cluster heatmaps

My friend Jonathan Sidi and I (Tal Galili) are pleased to announce the release of shinyHeatmaply (0.1.0): a new Shiny application (and Shiny gadget) for creating interactive cluster heatmaps. shinyHeatmaply is based on the heatmaply R package which strives to make it easy as possible to create interactive cluster heatmaps.