Streaming Event Modeling

Data modeling has been a fixture of enterprise architecture since the 1970’s with ANSI defined conceptual, logical, and physical data schema. As data models developed, so did the availability of templates for business use. Retail banks use similar data models, as do other industries. A shared approach to data modeling advanced the discussion and planning of solutions. Growth in unstructured data has led to development of tools to search data to identify context or bring structure to data analysis. Elasticsearch is used to identify contextual data in an unstructured data store. Just as structured data models brought greater standardization, the ability to template solutions is growing that combine structured and unstructured data, for more efficient solution delivery. Growth in streaming data (real time events) raises a need for a shared ontology for streaming event modeling. Streaming event processing, commonly referred to as Streaming Analytics, is focused on discrete events that are processed and combined in real time to drive real time customer engagement. A shared model will benefit users and data science professionals with increased collaboration and templates for solution delivery.

Customer Churn – Logistic Regression with R

In the customer management lifecycle, customer churn refers to a decision made by the customer about ending the business relationship. It is also referred as loss of clients or customers. Customer loyalty and customer churn always add up to 100%. If a firm has a 60% of loyalty rate, then their loss or churn rate of customers is 40%. As per 80/20 customer profitability rule, 20% of customers are generating 80% of revenue. So, it is very important to predict the users likely to churn from business relationship and the factors affecting the customer decisions. In this blog post, we are going to show how logistic regression model using R can be used to identify the customer churn in the telecom dataset.

Visualizing (censored) lifetime distributions

There are now more than 10,000 R packages available from CRAN, much more if you include those available only on github. So, to be honest, it become difficult to know all of them. But sometimes, you discover a nice function in one of them, and that is really awesome. Consider for instance some (standard) censored lifetime data, …

Predictive Maintenance: A Primer

While leading the data science team at our DataRPM, we have had the opportunity to interact with major players in the world for Discrete and Process Manufacturing. What we have found is that most companies (at-least those that matter) are moving towards an era of being resource consciousness since the cost of resources are going up. Companies in the past could focus on increasing top-line growth and comfortably lead the market. That is no longer true. Every CEO today has to worry about every line under the costs header and keep it under control.

How do you find the parts of speech in a sentence using Python?

Learn how to use spaCy to parse a sentence to return the parts of speech (noun, verb, etc.) and dependencies.

Forecasting: Linear Trend and ARIMA Models Exercises (Part-2)

There are two main approaches to time series forecasting. One of them is to find persistent patterns in a time series itself, and extrapolate those patterns. Another approach is to discover how a series depend on other variables, which serve as predictors.

Gender Roles with Text Mining and N-grams

I saw this paper by Matthew Jockers and Gabi Kirilloff a number of months ago and the ideas in it have been knocking around in my head ever since. The authors of that paper used text mining to examine a corpus of 19th century novels and explore how gendered pronouns (he/she/him/her) are associated with different verbs. These authors used the Stanford CoreNLP library to parse dependencies in sentences and find which verbs are connected to which pronouns; I have been thinking about how to apply a different approach to this question using tidy data principles and n-grams. Let’s see what we can do!

Machine Learning Tools You Should Know About

Artificial intelligence, big data, and hybrid cloud computing have made a significant impact on the business industry. Joining this list is machine learning. This new advancement in business technology provides machines with the ability to learn data and store information without being specifically programed to do so. Machine learning tends to focus on developing computer programs that are able to make the necessary adjustments when new data presents itself. Every new development in the technological industry is followed by new tools. These tools work to help businesses and users stay up to date with all the technological advancements. Some specific tools that will help businesses as they incorporate and work with machine learning are Amazon Machine Learning, Tensor Flow, Azure Machine Learning Studio, H20, Caffe, MLlib, and Torch.

How do I compare document similarity using Python?

Learn how to use the gensim Python library to determine the similarity between two or more documents.

Composing reproducible manuscripts using R Markdown

The technology is available for researchers to create a reproducible manuscript whereby calculations and graphs are generated computationally – thereby saving researcher time and avoiding human error. eLife is exploring ways to support life and biomedical scientists who wish to communicate their research in this way. In this post, Chris Hartgerink, a metascience researcher at Tilburg University, the Netherlands, describes how he composes a reproducible manuscript using R Markdown. For any researchers interested in submitting a manuscript to eLife in R Markdown, or a similar format, please let us know by email to