Online Learning Guide with Text Classification using Vowpal Wabbit (VW)

A large number of E-Commerce and tech companies rely on real time training and predictions for their products. Google predicts real time click-through rates for their ads. This is used as an input to their auction mechanism, apart from a bid from the advertiser to decide which ads to show to the user. Stackoverflow uses real time predictions to automatically tag a question with the correct programming language so that they reach the right asker. An election management team might want to predict real time sentiment using Twitter to assess the impact of their campaign. Such datasets are also characterized by their large size. For training the model, we can always take a sub-sample of the data, but its a rather unreliable method. There’s a good chance we might miss out on a significant amount of information. Is there a solution that tackles both these problems? As it turns out, there is. In this article, we will discuss a comparison of batch learning and online learning. In the second section, we’ll look at an example of text classification using an online learning framework called Vowpal Wabbit (VW).


10 Great Articles about Stochastic Processes and Related Topics

This resource is part of a series on specific topics related to data science: regression, clustering, neural networks, deep learning, Hadoop, decision trees, ensembles, correlation, outliers, regression, Python, R, Tensorflow, SVM, data reduction, feature selection, experimental design, time series, cross-validation, model fitting, dataviz, AI and many more.


Creating Your First Machine Learning Classifier with Sklearn

We examine how the popular framework sklearn can be used with the iris dataset to classify species of flowers. We go through all the steps required to make a machine learning model from start to end.


Machine Learning Explained: Understanding Supervised, Unsupervised, and Reinforcement Learning

Once we start delving into the concepts behind Artificial Intelligence (AI) and Machine Learning (ML), we come across copious amounts of jargon related to this field of study. Understanding this jargon and how it can have an impact on the study related to ML goes a long way in comprehending the study that has been conducted by researchers and data scientists to get AI to the state it now is. In this article, I will be providing you with a comprehensive definition of supervised, unsupervised and reinforcement learning in the broader field of Machine Learning. You must have encountered these terms while hovering over articles pertaining to the progress made in AI and the role played by ML in propelling this success forward. Understanding these concepts is a given fact, and should not be compromised at any cost. Here we discuss the concepts in detail, while making sure that the time you spend understanding these concepts pays off and that you are constantly aware of what is happening during this progress towards an Artificially Intelligent society. Supervised, unsupervised and reinforcement Machine Learning basically are a description of ways in which you can let machines or algorithms loose on a data set. The machines would also be expected to learn something useful out of the process. Supervised, unsupervised and reinforcement learning lead the way into the future of machines that is expected to be bright, and will over time assist humans in doing everyday things.


Image Processing and Neural Networks Intuition: Part 1

In this series, I will talk about training a simple neural network on image data. To give a brief overview, neural networks is a kind of supervised learning. By this I mean, the model needs to train on historical data to understand the relationship between input variables and target variables. Once trained, the model can be used to predict target variable on new input data. In the previous posts, we have written about linear, lasso and ridge regression. All those methods come under supervised learning. But what is special about neural networks is, it works really well for image, audio, video and language datasets. A multilayer neural network and its variations are commonly called deep learning. In this blog, I will focus on handling and processing the image data. In the next blog, I will show how to train the model. I will use python for implementation as python as many useful functions for image processing. If you are new to python, I recommend you to quickly take a numpy (till array manipulation) and matplotlib tutorial.


Hadoop 3.0 Perspectives by Hortonwork’s Hadoop YARN & MapReduce Development Lead, Vinod Kumar Vavilapalli

The Apache Software Foundation (ASF), the all-volunteer developers, stewards, and incubators of more than 350 Open Source projects and initiatives, recently announced Apache® Hadoop® v3.0.0, the latest version of the Open Source software framework for reliable, scalable, distributed computing. Over the past decade, Apache Hadoop has become ubiquitous within the greater Big Data ecosystem by enabling firms to run and manage data applications on large hardware clusters in a distributed computing environment. In the brief Q&A below, Hortonwork’s Hadoop YARN & MapReduce Development Lead, Vinod Kumar Vavilapalli, offers his perspectives on the new release.


Learning Curves for Machine Learning

When building machine learning models, we want to keep error as low as possible. Two major sources of error are bias and variance. If we managed to reduce these two, then we could build more accurate models. But how do we diagnose bias and variance in the first place? And what actions should we take once we’ve detected something? In this post, we’ll learn how to answer both these questions using learning curves. We’ll work with a real world data set and try to predict the electrical energy output of a power plant.


Machine learning tools for fairness, at scale

It’s time to think about how the systems we build interact with our world and build systems that make our world a better place.


Put machine learning to work in the real world

Over the past few years, companies have invested in data gathering and data management technologies, and many have began unlocking value from their vast repositories. From the inception of Strata, we’ve featured case studies from companies across a wide variety of industries. This marks the second year we will offer a series of executive briefings over two days. These are 40-minute overviews on important topics in big data and data science, aimed at executives and managers tasked with understanding how to incorporate these technologies and techniques into their business operations. Topics will include privacy and security, AI and machine learning, data infrastructure, and culture (including hiring, managing, and nurturing data teams). We also have tutorials and many case studies tailored for managers and executives, including a day-long focus on media and ad tech.


How to make your machine learning model available as an API with the plumber package

The plumber package for R makes it easy to expose existing R code as a webservice via an API (Trestle Technology, LLC 2017). You take an existing R script and make it accessible with plumber by simply adding a few lines of comments. If you have worked with Roxygen before, e.g. when building a package, you will already be familiar with the core concepts. If not, here are the most important things to know:
• you define the output or endpoint
• you can add additional annotation to customize your input, output and other functionalities of your API
• you can define every input parameter that will go into your function
• every such annotation will begin with either #’ or #*
With this setup, we can take a trained machine learning model and make it available as an API. With this API, other programs can access it and use it to make predictions.
Advertisements