Text Classification & Word Representations using FastText (An NLP library by Facebook)

If you put a status update on Facebook about purchasing a car – don’t be surprised if Facebook serves you a car ad on your screen. This is not black magic! This is Facebook leveraging the text data to serve you better ads. The picture below takes a jibe at a challenge while dealing with text data. facebook ad serving using nlp Well, it clearly failed in the above attempt to deliver the right ad. It is all the more important to capture the context in which the word has been used. This is a common problem in Natural Processing Language (NLP) tasks. A single word with the same spelling and pronunciation (homonyms) can be used in multiple contexts and a potential solution to the above problem is computing word representations. Now, imagine the challenge for Facebook. Facebook deals with enormous amount of text data on a daily basis in the form of status updates, comments etc. And it is all the more important for Facebook to utilise this text data to serve its users better. And using this text data generated by billions of users to compute word representations was a very time expensive task until Facebook developed their own library FastText, for Word Representations and Text Classification. In this article, we will see how we can calculate Word Representations and perform Text Classification, all in a matter of seconds in comparison to the existing methods which took days to achieve the same performance.


The Beginner’s Guide to Kaggle

Kaggle, a popular platform for data science competitions, can be intimidating for beginners to get into. After all, some of the listed competitions have over $1,000,000 prize pools and hundreds of competitors. Top teams boast decades of combined experience, tackling ambitious problems such as improving airport security or analyzing satellite data.


How to build a data science pipeline

There is no debate on how a well-functioning predictive workflow works when it is finally put into production. Data sources are transformed into a set of features or indicators X, describing each instance (client, piece of equipment, asset) on which the prediction will act on. A predictor then turns X into an actionable piece of information y_pred (will the client churn?, will the equipment fail?, will the asset price go up?). In certain fluid markets (e.g., ad targeting) the prediction is then monetized through a fully automated process, in other cases it is used as decision support with a human in the loop.


Lagrangian Polynomial Interpolation with R

Polynomial interpolation is the method of determining a polynomial that fits a set of given points. There are several approaches to polynomial interpolation, of which one of the most well known is the Lagrangian method. This post will introduce the Lagrangian method of interpolating polynomials and how to perform the procedure in R.


Surface Renewal Analysis for Energy Flux Exchanges in an Ecosystem: 1: Calculating Ramp Characteristics using R.

The application of Surface Renewal (SR) analysis as an alternative to energy and gaseous flux measurements using eddy covariance has gained prominence as a less costly and reliable method. The R code provided in this post is the first of two posts. In this first post R programming language is applied to calculate ramp characteristics; namely: ramp amplitude (A) and ramp duration (t). A sample dataset is provided to demonstrate the functionality of the code.


Parallel Computing Exercises: Foreach and DoParallel (Part-2)

In general, foreach is a statement for iterating over items in a collection without using any explicit counter. In R, it is also a way to run code in parallel, which may be more convenient and readable that the sfLapply function (considered in the previous set of exercises of this series) or other apply-alike functions. Apart from being able to run code in parallel, the R’s foreach has some other differences from the standard for loop. Specifically, the foreach statement:
• allows to iterate over several variables simultaneously,
• returns a value (a list, a vector, a matrix, or another object),
• is able to skip some iterations based on a condition (the last two properties make it similar to the list comprehension, which is present in Python and some other languages),
• has a special syntax that includes operators %do% (see an example in Exercise 1), %dopar%, and %:%.


PCA course using FactoMineR

Here is a course with videos that present Principal Component Analysis in a French way. Three videos present a course on PCA, highlighting the way to interpret the data. Then you will find videos presenting the way to implement in FactoMineR, to deal with missing values in PCA thanks to the package missMDA and lastly a video to draw interactive graphs with Factoshiny. And finally you will see that the new package FactoInvestigate allows you to obtain automatically an interpretation of your PCA results.
Advertisements