The “Big Data-ization” of Artificial Intelligence

It seems like it was only a few years ago that the term “big data” went from a promising area of research and interest to something so ubiquitous that it lost all meaning, descending ultimately into the butt of jokes. As everyone piled onto the big data bandwagon, it became impossible to separate truth from fiction. Every executive and entrepreneur that I ran into was doing a “big data” thing. I recall meeting someone whose company shot videos for enterprise customers and was pitching it as a ‘Big Data play’ – because video files, you know, are huge – they take up lots of space. Thankfully, the noise associated with “big data” is abating as sophistication and common sense take hold. In fact, in many circles the term actually exposes the user as someone who doesn’t really understand the space.


F-Test: Compare Two Variances in R

F-test is used to assess whether the variances of two populations (A and B) are equal.


Using Machine Learning to Predict Value of Homes On Airbnb

Data products have always been an instrumental part of Airbnb’s service. However, we have long recognized that it’s costly to make data products. For example, personalized search ranking enables guests to more easily discover homes, and smart pricing allows hosts to set more competitive prices according to supply and demand. However, these projects each required a lot of dedicated data science and engineering time and effort. Recently, advances in Airbnb’s machine learning infrastructure have lowered the cost significantly to deploy new machine learning models to production. For example, our ML Infra team built a general feature repository that allows users to leverage high quality, vetted, reusable features in their models. Data scientists have started to incorporate several AutoML tools into their workflows to speed up model selection and performance benchmarking. Additionally, ML infra created a new framework that will automatically translate Jupyter notebooks into Airflow pipelines. In this post, I will describe how these tools worked together to expedite the modeling process and hence lower the overall development costs for a specific use case of LTV modeling?—?predicting the value of homes on Airbnb.


Machine Learning Showdown: Python vs R

Let’s say you have an amazing idea for a machine learning app. It’s going to be brilliant. It’s going to revolutionize the world of finance, mobile advertising, or… some other world, but it’s definitely going to revolutionize something. And gosh darn it, it’s going to be the smartest, most learned app the world has ever seen. The only thing standing between you and glory is the small matter of actually coding your brilliant idea; and the first question you would want to ask yourself in this regard is which programming language you want to use for your app, with the two immediate candidates likely being R and Python. Each of these languages has its pros, cons, and diehard fanbase. This article is meant to help developers choose between these two bitter rivals, in the context of machine learning (for a more general, feature-by-feature comparison you might want to check out this great infographic by DataCamp). Let’s get down to it then!


The Machine Learning Abstracts (Part 2): Decision Trees

Decision Tree Learning is a classic algorithm used in machine learning for classification and regression purposes Regression is the process of predicting a continuous value as opposed to predicting a discrete class label in classification The basic intuition behind a decision tree is to map out all possible decision paths in the form of a tree.


Train your deep model faster and sharper?—?two novel techniques

Deep neural networks have many, many learnable parameters that are used to make inferences. Often, this poses a problem in two ways: Sometimes, the model does not make very accurate predictions. It also takes a long time to train them. This post talks about increasing accuracy while also reducing training time using two very novel ways. The papers can be found here(Snapshot ensembles) and here(FreezeOut).


DeepSense: a unified deep learning framework for time-series mobile sensing data processing

DeepSense: a unified deep learning framework for time-series mobile sensing data processing Yao et al., WWW’17 DeepSense is a deep learning framework that runs on mobile devices, and can be used for regression and classification tasks based on data coming from mobile sensors (e.g., motion sensors). An example of a classification task is heterogeneous human activity recognition (HHAR) – detecting which activity someone might be engaged in (walking, biking, standing, and so on) based on motion sensor measurements. Another example is biometric motion analysis where a user must be identified from their gait. An example of a regression task is tracking the location of a car using acceleration measurements to infer position. Compared to the state-of-art, DeepSense provides an estimator with far smaller tracking error on the car tracking problem, and outperforms state-of-the-art algorithms on the HHAR and biometric user identification tasks by a large margin.


Why AI and machine learning researchers are beginning to embrace PyTorch

PyTorch is a python package that provides two high-level features:
• Tensor computation (like numpy) with strong GPU acceleration
• Deep Neural Networks built on a tape-based autograd system
You can reuse your favorite python packages such as numpy, scipy and Cython to extend PyTorch when needed.


Reinforcement learning for complex goals, using TensorFlow

Reinforcement learning (RL) is about training agents to complete tasks. We typically think of this as being able to accomplish some goal. Take, for example, a robot we might want to train to open a door. Reinforcement learning can be used as a framework for teaching the robot to open the door by allowing it to learn from trial and error. But what if we are interested in having our agent solve not just one goal, but a set that might vary over time?
In this article, and the accompanying notebook available on GitHub, I am going to introduce and walk through both the traditional reinforcement learning paradigm in machine learning as well as a new and emerging paradigm for extending reinforcement learning to allow for complex goals that vary over time.
Get O’Reilly’s AI newsletter
I will start by demonstrating how to build a simple Q-learning agent that is guided by a single reward signal to navigate an environment and make deliveries. I will then demonstrate how this simple formulation becomes problematic for more complex behavior we might envision. To allow for greater flexibility, I will then describe how to build a class of reinforcement learning agents, which can optimize for various goals called “direct feature prediction” (DFP). All the code is available in TensorFlow in this accompanying iPython Jupyter Notebook.
Advertisements