Natural Language Processing for Beginners: Using TextBlob

Natural Language Processing (NLP) is an area of growing attention due to increasing number of applications like chatbots, machine translation etc. In some ways, the entire revolution of intelligent machines in based on the ability to understand and interact with humans. I have been exploring NLP for some time now. My journey started with NLTK library in Python, which was the recommended library to get started at that time. NLTK is a perfect library for education and research, it becomes very heavy and tedious for completing even the simple tasks. Later, I got introduced to TextBlob, which is built on the shoulders of NLTK and Pattern. A big advantage of this is, it is easy to learn and offers a lot of features like sentiment analysis, pos-tagging, noun phrase extraction, etc. It has now become my go-to library for performing NLP tasks. On a side note, there is spacy, which is widely recognized as one of the powerful and advanced library used to implement NLP tasks. But having encountered both spacy and TextBlob, I would still suggest TextBlob to a beginner due to its simple interface. If it is your first step in NLP, TextBlob is the perfect library for you to get hands-on with. The best way to go through this article is to follow along with the code and perform the tasks yourself. So let’s get started!


Cognitive computing: Moving From Hype to Deployment

Although cognitive computing, which is many a times referred to as AI or Artificial Intelligence, is not a new concept, the hype surrounding it and the level of interest pertaining to it is definitely new. The combination of hype surrounding robot overlords, vendor marketing and concerns regarding job losses has fueled the hype into where we stand now. But, behind the cloud of hype that is surrounding the technology currently, there lies a potential for increased productivity, the ability to solve problems deemed too complex for the average human brains and better knowledge based transactions and interactions with consumers. I recently got a chance to catch up with Dmitri Tcherevik, who is the CTO of Progress, about this disruption and we had a healthy discussion which led to the following insights. Cognitive computing is considered a marketing jargon by many, but in layman terms it is used to define the ability of computers to replicate or stimulate human thought processes. The processes behind cognitive computing may make use of the same principles as AI, including neural networks, machine learning, contextual awareness, sentimental analysis, and natural language processing. However, there is a minute difference between both of them.


3 Simple Data Transformation Tricks in R that are often not used

It is our habit to expect a solution that is more complex when the problem presented seems harder but that doesn’t have to be the case always. These are such cases while performing Data Transformation in R.


Which Machine Learning Algo will continue to be in use in year 2118?

The Lindy effect is a concept that the future life expectancy of some non-perishable things like a technology or an idea is proportional to their current age, so that every additional period of survival implies a longer remaining life expectancy.


R Interfaces to Python Keras Package

Keras is a popular Python package to do the prototyping for deep neural networks with multiple backends, including TensorFlow, CNTK, and Theano. Currently, there are two R interfaces that allow us to use Keras from R through the reticulate package. While the keras R package is able to provide a flexible and feature-rich API, the kerasR R package is more convenient and computationally efficient. For instance, in the below example mimicking the Python code shown in https://…t-regularization-in-deep-neural-networks, the kerasR package is at least 10% faster than the keras package in terms of the computing time.


Parametric Portfolio Policies

There are several ways to do portfolio optimization out there, each with its advantages and disadvantages. We already discussed some techniques here. Today I am going to show another method to perform portfolio optimization that works very well in large datasets because it produces very robust weights, which results in a good out-of-sample performance. This technique is called Parametric Portfolio Policies (PPP) and it was proposed by Brandt, Santa-Clara and Valkanov in 2009 (click here to read the full article).


ggplot2 features for visualizing the NHANES data

The National Health and Nutrition Examination Survey (NHANES) is a study research program conducted by the National Center for Health Statistics to evaluate the health and nutritional status of people in the United States and to track changes over time. These data are a combination of interviews, physical examinations, and laboratory tests.


WTTE-RNN: Weibull Time To Event Recurrent Neural Network

In this thesis we propose a new model for predicting time to events: the Weibull Time To Event RNN. This is a simple framework for time-series prediction of the time to the next event applicable when we have any or all of the problems of con- tinuous or discrete time, right censoring, recurrent events, temporal patterns, time varying covariates or time series of varying lengths. All these problems are frequently encountered in customer churn, remaining useful life, failure, spike-train and event prediction. The proposed model estimates the distribution of time to the next event as having a discrete or continuous Weibull distribution with parameters being the output of a recurrent neural network. The model is trained using a special objective function (log-likelihood-loss for censored data) commonly used in survival analysis. The Weibull distribution is simple enough to avoid sparsity and can easily be regularized to avoid overfitting but is still expressive enough to encode concepts like increasing, stationary or decreasing risk and can converge to a point-estimate if allowed. The predicted Weibull-parameters can be used to predict expected value and quantiles of the time to the next event. It also leads to a natural 2d-embedding of future risk which can be used for monitoring and exploratory analysis. We describe the WTTE-RNN using a general framework for censored data which can easily be extended with other distributions and adapted for multivariate pre- diction. We show that the common Proportional Hazards model and the Weibull Accelerated Failure time model are special cases of the WTTE-RNN. The proposed model is evaluated on simulated data with varying degrees of censoring and temporal resolution. We compared it to binary fixed window forecast models and naive ways of handling censored data. The model outperforms naive methods and is found to have many advantages and comparable performance to binary fixed-window RNNs without the need to specify window size and the ability to train on more data. Application to the CMAPSS-dataset for PHM-run-to-failure of simulated Jet-Engines gives promising results.


How Data Scientists Are Wasting Their Time

Today’s definition of what most companies want in a data scientist seems to be something akin to a superhero. Companies are looking for a regular Mister Fantastic (the Marvel Comics’ superhero who was “one of the bravest adventurers and most brilliant scientific minds of his generation”).


Slidecast: Striim – Streaming Integration with Intelligence

Coming to you from our popular “Rich Report Podcast” channel, we offer a new slidecast with Steve Wilkes, Co-founder and CTO at Striim presenting: Striim – Streaming Integration with Intelligence. Striim recently announced the launch of version 3.8 of the Striim platform, with 47 new and enhanced capabilities.


4 Things You Probably Didn’t Know Machine Learning and AI was used for

1. Deep Learning Has Created a Biological Aging Clock
2. Deep Learning Puts a Dancing John Travolta into your Living Room
3. Machine Learning Can Turn Summer into Winter
4. Deep Learning Predicts Earthquakes
Advertisements