Applied Machine Learning at Facebook: A Datacenter Infrastructure Perspective

Machine learning sits at the core of many essential products and services at Facebook. This paper describes the hardware and software infrastructure that supports machine learning at global scale. Facebook’s machine learning workloads are extremely diverse: services require many different types of models in practice. This diversity has implications at all layers in the system stack. In addition, a sizable fraction of all data stored at Facebook flows through machine learning pipelines, presenting significant challenges in delivering data to high-performance distributed training flows. Computational requirements are also intense, leveraging both GPU and CPU platforms for training and abundant CPU capacity for real-time inference. Addressing these and other emerging challenges continues to require diverse efforts that span machine learning algorithms, software, and hardware design.


Amazing New AI Innovations Unveiled at CES 2018 in Las Vegas

• The future of Healthcare
• L’Oreal’s Thumbnail-sized Sensor
• Cocoon Cam Clarity
• Rinseed Snap
• Toyota’s e-Palette Concept Car
• Google Assistant is taking on Amazon’s Alexa, in a BIG way
• Youtube’s Recommendations Keep Getting Better


A Simple Introduction to ANOVA (with applications in Excel)

Buying a new product or testing a new technique but not sure how it stacks up against the alternatives? It’s an all too familiar situation for most of us. Most of the options sound similar to each other so picking the best out of the lot is a challenge. Consider a scenario where we have three medical treatments to apply on patients with similar diseases. Once we have the test results, one approach is to assume that the treatment which took the least time to cure the patients is the best among them. What if some of these patients had already been partially cured, or if any other medication was already working on them? In order to make a confident and reliable decision, we will need evidence to support our approach. This is where the concept of ANOVA comes into play. In this article, I’ll introduce you to the different ANOVA techniques used for making the best decisions. We’ll take a few cases and try to understand the techniques for getting the results. We will also be leveraging the use of Excel to understand these concepts. You must know the basics of statistics to understand this topic. Knowledge of t-tests and Hypothesis testing would be an additional benefit.


Putting AI-enhanced analytics at the heart of retail customer experience

Last Sunday, my husband and I went to visit our daughter. As we drove, my cell informed me that we were 30 minutes from our destination. How did it know? I hadn’t told it where we were going – there wasn’t an appointment on my calendar. The cell had worked out this was a trip we regularly take on a Sunday and was able to provide us with useful information based on that knowledge. This is just an everyday example of how quickly Artificial Intelligence (AI) is becoming a normal part of our lives. It’s something that’s beginning to shape retail customer experience. In this blog, I want to look at how AI and analytics together can deliver the highly targeted and personalized experience that customers demand. The holiday season has just passed and, if you’re like me, you’ll be giving thanks to Amazon (other online shopping services are available!). Going online is quick and convenient. Personally, I like shopping in the mall but our busy lives often make this practically impossible. What’s more, the personalization and recommendations engines of services such as Amazon are now so sophisticated that it really does feel that I’m receiving an individual service that understands my wants and preferences. This level of personal service is something that every retailer must aspire to.


A survey of incremental high-utility itemset mining

Traditional association rule mining has been widely studied. But it is unsuitable for real-world applications where factors such as unit profits of items and purchase quantities must be considered. High-utility itemset mining (HUIM) is designed to find highly profitable patterns by considering both the purchase quantities and unit profits of items. However, most HUIM algorithms are designed to be applied to static databases. But in real-world applications such as market basket analysis and business decision-making, databases are often dynamically updated by inserting new data such as customer transactions. Several researchers have proposed algorithms to discover high-utility itemsets (HUIs) in dynamically updated databases. Unlike batch algorithms, which always process a database from scratch, incremental high-utility itemset mining (iHUIM) algorithms incrementally update and output HUIs, thus reducing the cost of discovering HUIs. This paper provides an up-to-date survey of the state-of-the-art iHUIM algorithms, including Apriori-based, tree-based, and utility-list-based approaches. To the best of our knowledge, this is the first survey on the mining task of incremental high-utility itemset mining. The paper also identifies several important issues and research challenges for iHUIM.


Design Patterns for Deep Learning Architectures?

Deep Learning Architecture can be described as a new method or style of building machine learning systems. Deep Learning is more than likely to lead to more advanced forms of artificial intelligence. The evidence for this is in the sheer number of breakthroughs that had occurred since the beginning of this decade. There is a new found optimism in the air and we are now again in a new AI spring. Unfortunately, the current state of deep learning appears too many ways to be akin to alchemy. Everybody seems to have their own black-magic methods of designing architectures. The field thus needs to move forward and strive towards chemistry, or perhaps even a periodic table for deep learning. Although deep learning is still in its early infancy of development, this book strives towards some kind of unification of the ideas in deep learning. It leverages a method of description called pattern languages. Pattern Languages are languages derived from entities called patterns that when combined form solutions to complex problems. Each pattern describes a problem and offers alternative solutions. Pattern languages are a way of expressing complex solutions that were derived from experience. The benefit of an improved language of expression is that other practitioners are able to gain a much better understanding of the complex subject as well as a better way of expressing a solution to problems.


The Bayesian Approach to Sample Size Calculations

During a clinical trial, we want to make inferences about the value of some endpoint of interest which in this article we will call ? ? . In order for these inferences to be meaningful, we need to make sure that the we study enough subjects so that the estimate of the effect size is sufficiently precise. On the other hand, we do not want too many subjects because it would be unethical to expose subjects to the possibly harmful effects of the treatment or for them to be exposed to the risk of not receiving the standard of care.


Is Learning Rate Useful in Artificial Neural Networks?

This article will help you understand why we need the learning rate and whether it is useful or not for training an artificial neural network. Using a very simple Python code for a single layer perceptron, the learning rate value will get changed to catch its idea. An obstacle for newbies in artificial neural networks is the learning rate. I was asked many times about the effect of the learning rate in the training of the artificial neural networks (ANNs). Why we use learning rate? What is the best value for the learning rate? In this article, I will try to make things simpler by providing an example that shows how learning rate is useful in order to train an ANN. I will start by explaining our example with Python code before working with the learning rate.


Generalized additive models with principal component analysis: an application to time series of respiratory disease and air pollution data

Environmental epidemiological studies of the health effects of air pollution frequently utilize the generalized additive model (GAM) as the standard statistical methodology, considering the ambient air pollutants as explanatory covariates. Although exposure to air pollutants is multi-dimensional, the majority of these studies consider only a single pollutant as a covariate in the GAM model. This model restriction may be because the pollutant variables do not only have serial dependence but also interdependence between themselves. In an attempt to convey a more realistic model, we propose here the hybrid generalized additive model-principal component analysis-vector auto-regressive (GAM-PCA-VAR) model, which is a combination of PCA and GAMs along with a VAR process. The PCA is used to eliminate the multicollinearity between the pollutants whereas the VAR model is used to handle the serial correlation of the data to produce white noise processes as covariates in the GAM. Some theoretical and simulation results of the methodology proposed are discussed, with special attention to the effect of time correlation of the covariates on the PCA and, consequently, on the estimates of the parameters in the GAM and on the relative risk, which is a commonly used statistical quantity to measure the effect of the covariates, especially the pollutants, on population health. As a main motivation to the methodology, a real data set is analysed with the aim of quantifying the association between respiratory disease and air pollution concentrations, especially particulate matter PM10, sulphur dioxide, nitrogen dioxide, carbon monoxide and ozone. The empirical results show that the GAM-PCA-VAR model can remove the auto-correlations from the principal components. In addition, this method produces estimates of the relative risk, for each pollutant, which are not affected by the serial correlation in the data. This, in general, leads to more pronounced values of the estimated risk compared with the standard GAM model, indicating, for this study, an increase of almost 5.4% in the risk of PM10, which is one of the most important pollutants which is usually associated with adverse effects on human health.


Multiclass vector auto-regressive models for multistore sales data

Retailers use the vector auto-regressive (VAR) model as a standard tool to estimate the effects of prices, promotions and sales in one product category on the sales of another product category. Besides, these price, promotion and sales data are available not just for one store, but for a whole chain of stores. We propose to study cross-category effects by using a multiclass VAR model: we jointly estimate cross-category effects for several distinct but related VAR models, one for each store. Our methodology encourages effects to be similar across stores, while still allowing for small differences between stores to account for store heterogeneity. Moreover, our estimator is sparse: unimportant effects are estimated as exactly 0, which facilitates the interpretation of the results. A simulation study shows that the multiclass estimator proposed improves estimation accuracy by borrowing strength across classes. Finally, we provide three visual tools showing clustering of stores with similar cross-category effects, networks of product categories and similarity matrices of shared cross-category effects across stores.


P-values from random effects linear regression models

lme4::lmer is a useful frequentist approach to hierarchical/multilevel linear regression modelling. For good reason, the model output only includes t-values and doesn’t include p-values (partly due to the difficulty in estimating the degrees of freedom, as discussed here). Yes, p-values are evil and we should continue to try and expunge them from our analyses. But I keep getting asked about this. So here is a simple bootstrap method to generate two-sided parametric p-values on the fixed effects coefficients. Interpret with caution.


Setting up RStudio Server quickly on Amazon EC2

I have recently been working on projects using Amazon EC2 (elastic compute cloud), and RStudio Server. I thought I would share some of my working notes. Amazon EC2 supplies near instant access to on-demand disposable computing in a variety of sizes (billed in hours). RStudio Server supplies an interactive user interface to your remote R environment that is nearly indistinguishable from a local RStudio console. The idea is: for a few dollars you can work interactively on R tasks requiring hundreds of GB of memory and tens of CPUs and GPUs. If you are already an Amazon EC2 user with some Unix experience it is very easy to quickly stand up a powerful R environment, which is what I will demonstrate in this note.


Fitting a TensorFlow Linear Classifier with tfestimators

In a recent post, I mentioned three avenues for working with TensorFlow from R:
• The keras package, which uses the Keras API for building scaleable, deep learning models
• The tfestimators package, which wraps Google’s Estimators API for fitting models with pre-built estimators
• The tensorflow package, which provides an interface to Google’s low-level TensorFlow API
In this post, Edgar and I use the linear_classifier() function, one of six pre-built models currently in the tfestimators package, to train a linear classifier using data from the titanic package.
Advertisements