**Entity disambiguation with decomposable neural networks**

Entity disambiguation is a fundamental task in natural language processing and computational linguistics. Given a query consisting of a mention (name string) and a background document, entity disambiguation aims at linking the mention to an entity from a reference knowledge base such as Wikipedia. A main challenge of this task is how to effectively represent the meaning of the mention and the entity, based on which the semantic relatedness between the mention and the entity could be conveniently measured. Towards this goal, we introduce computational models to effectively represent the mention and the entity in some vector space. We decompose the problem into subproblems and develop various neural network architectures, all of which are purely data-driven and capable of learning continuous representations of the mention and the entity from data. To effectively train the neural network models, we explore a simple yet effective way that enables us to collect millions of training examples from Wikipedia without using any manual annotation. Empirical results on two benchmark datasets show that our approaches based on convolutional neural network and long short-term memory consistently outperform top-performed systems on both datasets.

**Understanding the Changing Position Roles in Data Science**

Is everyone a ‘data scientist’? What about ‘data engineers’ and the junior versus senior, or skill level distinctions? We do seem to need some agreement about titling. Data Scientists is still the prestige title but there are some folks lobbying to take that title away.

**Text Clustering : Get quick insights from Unstructured Data 2**

In this two-part series, we will explore text clustering and how to get insights from unstructured data. It will be quite powerful and industrial strength. The first part will focus on the motivation. The second part will be about implementation. This post is the second part of the two-part series on how to get insights from unstructured data using text clustering. We will build this in a very modular way so that it can be applied to any dataset. Moreover, we will also focus on exposing the functionalities as an API so that it can serve as a plug and play model without any disruptions to the existing systems.

**What is Free Energy part I: Hinton, Helmholtz, and Legendre**

Hinton introduced Free Energies in his 1994 paper, Autoencoders, minimum description length, and Helmholtz Free Energy This paper, along with his wake-sleep algorithm, set the foundations for modern variational learning. They appear in his RBMs, and more recently, in Variational AutoEncoders (VAEs) . Of course, Free Energies come from Chemical Physics. And this is not surprising, since Hinton’s graduate advisor was a famous theoretical chemist. They are so important that Karl Friston has proposed the The Free Energy Principle : A Unified Brain Theory ?

**Ordinary Least Squares (OLS) Linear Regression in R**

Ordinary Least Squares (OLS) linear regression is a statistical technique used for the analysis and modelling of linear relationships between a response variable and one or more predictor variables. If the relationship between two variables appears to be linear, then a straight line can be fit to the data in order to model the relationship. The linear equation (or equation for a straight line) for a bivariate regression takes the following form: y = mx + c where y is the response (dependent) variable, m is the gradient (slope), x is the predictor (independent) variable, and c is the intercept. The modelling application of OLS linear regression allows one to predict the value of the response variable for varying inputs of the predictor variable given the slope and intercept coefficients of the line of best fit.

**Machine Learning Explained: Regularization**

Welcome to this new post of Machine Learning Explained.After dealing with overfitting, today we will study a way to correct overfitting with regularization. Regularization adds a penalty on the different parameters of the model to reduce the freedom of the model. Hence, the model will be less likely to fit the noise of the training data and will improve the generalization abilities of the model. In this post, we will study and compare:

• The L1 regularization (also called Lasso)

• The L2 regularization (also called Ridge)

• The L1/L2 regularization (also called Elastic net)

You can find the R code for regularization at the end of the post.

• The L1 regularization (also called Lasso)

• The L2 regularization (also called Ridge)

• The L1/L2 regularization (also called Elastic net)

You can find the R code for regularization at the end of the post.

**Machine Learning Explained: Overfitting**

A good model is able to learn the pattern from your training data and then to generalize it on new data (from a similar distribution). Overfitting is when a model is able to fit almost perfectly your training data but is performing poorly on new data. A model will overfit when it is learning the very specific pattern and noise from the training data, this model is not able to extract the “big picture” nor the general pattern from your data. Hence, on new and different data the performance of the overfitted model will be poor.

**Machine Learning Explained: Bagging**

Bagging is a powerful method to improve the performance of simple models and reduce overfitting of more complex models. The principle is very easy to understand, instead of fitting the model on one sample of the population, several models are fitted on different samples (with replacement) of the population. Then, these models are aggregated by using their average, weighted average or a voting system (mainly for classification). Though bagging reduces the explanatory ability of your model, it makes it much more robust and able to get the ‘big picture’ from your data.

**Machine Learning Explained: Classification Trees**

Decision trees are often used to easily visualize the different choices to be taken, the uncertainty under which they are to be taken and their outcomes. They are easy to visualize and to understand even for non-technical audience. Let’s see this with a very easy example. You are running a car selling business and you want you employee to bring the customer to the car they are the most likely to buy.

Advertisements