If you did not already know

Residual Gated Graph ConvNet google
Graph-structured data such as functional brain networks, social networks, gene regulatory networks, communications networks have brought the interest in generalizing neural networks to graph domains. In this paper, we are interested to de- sign efficient neural network architectures for graphs with variable length. Several existing works such as Scarselli et al. (2009); Li et al. (2016) have focused on recurrent neural networks (RNNs) to solve this task. A recent different approach was proposed in Sukhbaatar et al. (2016), where a vanilla graph convolutional neural network (ConvNets) was introduced. We believe the latter approach to be a better paradigm to solve graph learning problems because ConvNets are more pruned to deep networks than RNNs. For this reason, we propose the most generic class of residual multi-layer graph ConvNets that make use of an edge gating mechanism, as proposed in Marcheggiani & Titov (2017). Gated edges appear to be a natural property in the context of graph learning tasks, as the system has the ability to learn which edges are important or not for the task to solve. We apply several graph neural models to two basic network science tasks; subgraph matching and semi-supervised clustering for graphs with variable length. Numerical results show the performances of the new model. …

Gated Linear Network google
This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss. Rather than relying on non-linear transfer functions, our method gains representational power by the use of data conditioning. We state under general conditions a learnable capacity theorem that shows this approach can in principle learn any bounded Borel-measurable function on a compact subset of euclidean space; the result is stronger than many universality results for connectionist architectures because we provide both the model and the learning procedure for which convergence is guaranteed. …

Spatially Compact Semantic Scan (SCSS) google
Many methods have been proposed for detecting emerging events in text streams using topic modeling. However, these methods have shortcomings that make them unsuitable for rapid detection of locally emerging events on massive text streams. We describe Spatially Compact Semantic Scan (SCSS) that has been developed specifically to overcome the shortcomings of current methods in detecting new spatially compact events in text streams. SCSS employs alternating optimization between using semantic scan to estimate contrastive foreground topics in documents, and discovering spatial neighborhoods with high occurrence of documents containing the foreground topics. We evaluate our method on Emergency Department chief complaints dataset (ED dataset) to verify the effectiveness of our method in detecting real-world disease outbreaks from free-text ED chief complaint data. …


R Packages worth a look

Pena-Yohai Initial Estimator for Robust S-Regression (pyinit)
Deterministic Pena-Yohai initial estimator for robust S estimators of regression. The procedure is described in detail in Pena, D., & Yohai, V. (1999) <doi:10.2307/2670164>.

Measuring Disparity (dispRity)
A modular package for measuring disparity from multidimensional matrices. Disparity can be calculated from any matrix defining a multidimensional space. The package provides a set of implemented metrics to measure properties of the space and allows users to provide and test their own metrics. The package also provides functions for looking at disparity in a serial way (e.g. disparity through time) or per groups as well as visualising the results. Finally, this package provides several basic statistical tests for disparity analysis.

Functional Concurrent Regression for Sparse Data (fcr)
Dynamic prediction in functional concurrent regression with an application to child growth. Extends the pffr() function from the ‘refund’ package to handle the scenario where the functional response and concurrently measured functional predictor are irregularly measured. Leroux et al. (2017), Statistics in Medicine, <doi:10.1002/sim.7582>.

Age-Structured Population Dynamics Model (albopictus)
Implements discrete time deterministic and stochastic age-structured population dynamics models described in Erguler and others (2016) <doi:10.1371/journal.pone.0149282> and Erguler and others (2017) <doi:10.1371/journal.pone.0174293>.

Compare Big Datasets to the Uniform Distribution (ggQQunif)
A quantile-quantile plot can be used to compare a sample of p-values to the uniform distribution. But when the dataset is big (i.e. > 1e4 p-values), plotting the quantile-quantile plot can be slow. geom_QQ uses all the data to calculate the quantiles, but thins it out in a way that focuses on points near zero before plotting to speed up plotting and decrease file size, when vector graphics are stored.

If you did not already know

Riemann-Theta Boltzmann Machine google
A general Boltzmann machine with continuous visible and discrete integer valued hidden states is introduced. Under mild assumptions about the connection matrices, the probability density function of the visible units can be solved for analytically, yielding a novel parametric density function involving a ratio of Riemann-Theta functions. The conditional expectation of a hidden state for given visible states can also be calculated analytically, yielding a derivative of the logarithmic Riemann-Theta function. The conditional expectation can be used as activation function in a feedforward neural network, thereby increasing the modelling capacity of the network. Both the Boltzmann machine and the derived feedforward neural network can be successfully trained via standard gradient- and non-gradient-based optimization techniques. …

Hierarchical Compositional Network (HCN) google
We introduce the hierarchical compositional network (HCN), a directed generative model able to discover and disentangle, without supervision, the building blocks of a set of binary images. The building blocks are binary features defined hierarchically as a composition of some of the features in the layer immediately below, arranged in a particular manner. At a high level, HCN is similar to a sigmoid belief network with pooling. Inference and learning in HCN are very challenging and existing variational approximations do not work satisfactorily. A main contribution of this work is to show that both can be addressed using max-product message passing (MPMP) with a particular schedule (no EM required). Also, using MPMP as an inference engine for HCN makes new tasks simple: adding supervision information, classifying images, or performing inpainting all correspond to clamping some variables of the model to their known values and running MPMP on the rest. When used for classification, fast inference with HCN has exactly the same functional form as a convolutional neural network (CNN) with linear activations and binary weights. However, HCN’s features are qualitatively very different. …

Covariance Matrix Adaptation Evolution Strategy (CMA-ES) google
CMA-ES stands for Covariance Matrix Adaptation Evolution Strategy. Evolution strategies (ES) are stochastic, derivative-free methods for numerical optimization of non-linear or non-convex continuous optimization problems. They belong to the class of evolutionary algorithms and evolutionary computation. An evolutionary algorithm is broadly based on the principle of biological evolution, namely the repeated interplay of variation (via recombination and mutation) and selection: in each generation (iteration) new individuals (candidate solutions, denoted as x) are generated by variation, usually in a stochastic way, of the current parental individuals. Then, some individuals are selected to become the parents in the next generation based on their fitness or objective function value f(x). Like this, over the generation sequence, individuals with better and better f-values are generated. In an evolution strategy, new candidate solutions are sampled according to a multivariate normal distribution in the R^n. Recombination amounts to selecting a new mean value for the distribution. Mutation amounts to adding a random vector, a perturbation with zero mean. Pairwise dependencies between the variables in the distribution are represented by a covariance matrix. The covariance matrix adaptation (CMA) is a method to update the covariance matrix of this distribution. This is particularly useful, if the function f is ill-conditioned. Adaptation of the covariance matrix amounts to learning a second order model of the underlying objective function similar to the approximation of the inverse Hessian matrix in the Quasi-Newton method in classical optimization. In contrast to most classical methods, fewer assumptions on the nature of the underlying objective function are made. Only the ranking between candidate solutions is exploited for learning the sample distribution and neither derivatives nor even the function values themselves are required by the method. …

Whats new on arXiv

Learning non-Gaussian Time Series using the Box-Cox Gaussian Process

Gaussian processes (GPs) are Bayesian nonparametric generative models that provide interpretability of hyperparameters, admit closed-form expressions for training and inference, and are able to accurately represent uncertainty. To model general non-Gaussian data with complex correlation structure, GPs can be paired with an expressive covariance kernel and then fed into a nonlinear transformation (or warping). However, overparametrising the kernel and the warping is known to, respectively, hinder gradient-based training and make the predictions computationally expensive. We remedy this issue by (i) training the model using derivative-free global-optimisation techniques so as to find meaningful maxima of the model likelihood, and (ii) proposing a warping function based on the celebrated Box-Cox transformation that requires minimal numerical approximations—unlike existing warped GP models. We validate the proposed approach by first showing that predictions can be computed analytically, and then on a learning, reconstruction and forecasting experiment using real-world datasets.

The Three Pillars of Machine-Based Programming

In this position paper, we describe our vision of the future of machine-based programming through a categorical examination of three pillars of research. Those pillars are: (i) intention, (ii) invention, and(iii) adaptation. Intention emphasizes advancements in the human-to-computer and computer-to-machine-learning interfaces. Invention emphasizes the creation or refinement of algorithms or core hardware and software building blocks through machine learning (ML). Adaptation emphasizes advances in the use of ML-based constructs to autonomously evolve software.

Enslaving the Algorithm: From a ‘Right to an Explanation’ to a ‘Right to Better Decisions’?

As concerns about unfairness and discrimination in ‘black box’ machine learning systems rise, a legal ‘right to an explanation’ has emerged as a compellingly attractive approach for challenge and redress. We outline recent debates on the limited provisions in European data protection law, and introduce and analyze newer explanation rights in French administrative law and the draft modernized Council of Europe Convention 108. While individual rights can be useful, in privacy law they have historically unreasonably burdened the average data subject. ‘Meaningful information’ about algorithmic logics is more technically possible than commonly thought, but this exacerbates a new ‘transparency fallacy’—an illusion of remedy rather than anything substantively helpful. While rights-based approaches deserve a firm place in the toolbox, other forms of governance, such as impact assessments, ‘soft law,’ judicial review, and model repositories deserve more attention, alongside catalyzing agencies acting for users to control algorithmic system design.

Local Binary Pattern Networks

Memory and computation efficient deep learning architec- tures are crucial to continued proliferation of machine learning capabili- ties to new platforms and systems. Binarization of operations in convo- lutional neural networks has shown promising results in reducing model size and computing efficiency. In this paper, we tackle the problem us- ing a strategy different from the existing literature by proposing local binary pattern networks or LBPNet, that is able to learn and perform binary operations in an end-to-end fashion. LBPNet1 uses local binary comparisons and random projection in place of conventional convolu- tion (or approximation of convolution) operations. These operations can be implemented efficiently on different platforms including direct hard- ware implementation. We applied LBPNet and its variants on standard benchmarks. The results are promising across benchmarks while provid- ing an important means to improve memory and speed efficiency that is particularly suited for small footprint devices and hardware accelerators.

DYAN: A Dynamical Atoms Network for Video Prediction

The ability to anticipate the future is essential when making real time critical decisions, provides valuable information to understand dynamic natural scenes, and can help unsupervised video representation learning. State-of-art video prediction is based on LSTM recursive networks and/or generative adversarial network learning. These are complex architectures that need to learn large numbers of parameters, are potentially hard to train, slow to run, and may produce blurry predictions. In this paper, we introduce DYAN, a novel network with very few parameters and easy to train, which produces accurate, high quality frame predictions, significantly faster than previous approaches. DYAN owes its good qualities to its encoder and decoder, which are designed following concepts from systems identification theory and exploit the dynamics-based invariants of the data. Extensive experiments using several standard video datasets show that DYAN is superior generating frames and that it generalizes well across domains.

Closing the AI Knowledge Gap

AI researchers employ not only the scientific method, but also methodology from mathematics and engineering. However, the use of the scientific method – specifically hypothesis testing – in AI is typically conducted in service of engineering objectives. Growing interest in topics such as fairness and algorithmic bias show that engineering-focused questions only comprise a subset of the important questions about AI systems. This results in the AI Knowledge Gap: the number of unique AI systems grows faster than the number of studies that characterize these systems’ behavior. To close this gap, we argue that the study of AI could benefit from the greater inclusion of researchers who are well positioned to formulate and test hypotheses about the behavior of AI systems. We examine the barriers preventing social and behavioral scientists from conducting such studies. Our diagnosis suggests that accelerating the scientific study of AI systems requires new incentives for academia and industry, mediated by new tools and institutions. To address these needs, we propose a two-sided marketplace called TuringBox. On one side, AI contributors upload existing and novel algorithms to be studied scientifically by others. On the other side, AI examiners develop and post machine intelligence tasks designed to evaluate and characterize algorithmic behavior. We discuss this market’s potential to democratize the scientific study of AI behavior, and thus narrow the AI Knowledge Gap.

GaAN: Gated Attention Networks for Learning on Large and Spatiotemporal Graphs

We propose a new network architecture, Gated Attention Networks (GaAN), for learning on graphs. Unlike the traditional multi-head attention mechanism, which equally consumes all attention heads, GaAN uses a convolutional sub-network to control each attention head’s importance. We demonstrate the effectiveness of GaAN on the inductive node classification problem. Moreover, with GaAN as a building block, we construct the Graph Gated Recurrent Unit (GGRU) to address the traffic speed forecasting problem. Extensive experiments on three real-world datasets show that our GaAN framework achieves state-of-the-art results on both tasks.

Natural Gradient Deep Q-learning

This paper presents findings for training a Q-learning reinforcement learning agent using natural gradient techniques. We compare the original deep Q-network (DQN) algorithm to its natural gradient counterpart (NGDQN), measuring NGDQN and DQN performance on classic controls environments without target networks. We find that NGDQN performs favorably relative to DQN, converging to significantly better policies faster and more frequently. These results indicate that natural gradient could be used for value function optimization in reinforcement learning to accelerate and stabilize training.

Data Distillery: Effective Dimension Estimation via Penalized Probabilistic PCA

The paper tackles the unsupervised estimation of the effective dimension of a sample of dependent random vectors. The proposed method uses the principal components (PC) decomposition of sample covariance to establish a low-rank approximation that helps uncover the hidden structure. The number of PCs to be included in the decomposition is determined via a Probabilistic Principal Components Analysis (PPCA) embedded in a penalized profile likelihood criterion. The choice of penalty parameter is guided by a data-driven procedure that is justified via analytical derivations and extensive finite sample simulations. Application of the proposed penalized PPCA is illustrated with three gene expression datasets in which the number of cancer subtypes is estimated from all expression measurements. The analyses point towards hidden structures in the data, e.g. additional subgroups, that could be of scientific interest.

Meta Reinforcement Learning with Latent Variable Gaussian Processes

Data efficiency, i.e., learning from small data sets, is critical in many practical applications where data collection is time consuming or expensive, e.g., robotics, animal experiments or drug design. Meta learning is one way to increase the data efficiency of learning algorithms by generalizing learned concepts from a set of training tasks to unseen, but related, tasks. Often, this relationship between tasks is hard coded or relies in some other way on human expertise. In this paper, we propose to automatically learn the relationship between tasks using a latent variable model. Our approach finds a variational posterior over tasks and averages over all plausible (according to this posterior) tasks when making predictions. We apply this framework within a model-based reinforcement learning setting for learning dynamics models and controllers of many related tasks. We apply our framework in a model-based reinforcement learning setting, and show that our model effectively generalizes to novel tasks, and that it reduces the average interaction time needed to solve tasks by up to 60% compared to strong baselines.

The Leave-one-out Approach for Matrix Completion: Primal and Dual Analysis

In this paper, we introduce a powerful technique, Leave-One-Out, to the analysis of low-rank matrix completion problems. Using this technique, we develop a general approach for obtaining fine-grained, entry-wise bounds on iterative stochastic procedures. We demonstrate the power of this approach in analyzing two of the most important algorithms for matrix completion: the non-convex approach based on Singular Value Projection (SVP), and the convex relaxation approach based on nuclear norm minimization (NNM). In particular, we prove for the first time that the original form of SVP, without re-sampling or sample splitting, converges linearly in the infinity norm. We further apply our leave-one-out approach to an iterative procedure that arises in the analysis of the dual solutions of NNM. Our results show that NNM recovers the true d -by-d rank-r matrix with \mathcal{O}(\mu^2 r^3d \log d ) observed entries, which has optimal dependence on the dimension and is independent of the condition number of the matrix. To the best of our knowledge, this is the first sample complexity result for a tractable matrix completion algorithm that satisfies these two properties simultaneously.

$\tilde{O}(n^{1/3})$-Space Algorithm for the Grid Graph Reachability Problem
VGAN-Based Image Representation Learning for Privacy-Preserving Facial Expression Recognition
Divisors on matroids and their volumes
Computational performance of a projection and rescaling algorithm
Zero-Shot Detection
Slipknotting in Random Diagrams
Continuous Time Multi-stage Stochastic Reserve and Unit Commitment
Learning to Generate Wikipedia Summaries for Underserved Languages from Wikidata
Fundamentals of Wireless Information and Power Transfer: From RF Energy Harvester Models to Signal and System Designs
Impulsive Control for G-AIMD Dynamics with Relaxed and Hard Constraints
Automated Curriculum Learning by Rewarding Temporally Rare Events
Dynamic Natural Language Processing with Recurrence Quantification Analysis
English-Catalan Neural Machine Translation in the Biomedical Domain through the cascade approach
Visual Psychophysics for Making Face Recognition Algorithms More Explainable
Communication reduction in distributed optimization via estimation of the proximal operator
Supercongruences for polynomial analogs of the Apéry numbers
Exploring the predictability of range-based volatility estimators using RNNs
Lines in metric spaces: universal lines counted with multiplicity
Adversarial Generalized Method of Moments
Blaming humans in autonomous vehicle accidents: Shared responsibility across levels of automation
Beyond Homophily: Incorporating Actor Variables in Actor-oriented Network Models
Solving Quadratic Programs to High Precision using Scaled Iterative Refinement
Attention-based Temporal Weighted Convolutional Neural Network for Action Recognition
Probabilistic Occupancy Function and Sets Using Forward Stochastic Reachability for Rigid-Body Dynamic Obstacles
Partially ordering the class of invertible trees
Adaptive Smoothing V-Spline for Trajectory Reconstruction
Unveiling the invisible – mathematical methods for restoring and interpreting illuminated manuscripts
A Minimalist Approach to Type-Agnostic Detection of Quadrics in Point Clouds
Diagnostic Classification Of Lung Nodules Using 3D Neural Networks
Adaptive Polar Active Contour for Segmentation and Tracking in Ultrasound Videos
Eleven Simple Algorithms to Compute Fibonacci Numbers
Training Recurrent Neural Networks as a Constraint Satisfaction Problem
Why not be Versatile? Applications of the SGNMT Decoder for Machine Translation
Real-time Burst Photo Selection Using a Light-Head Adversarial Network
A Temporally-Aware Interpolation Network for Video Frame Inpainting
Monte Carlo Information Geometry: The dually flat case
Learning the Hierarchical Parts of Objects by Deep Non-Smooth Nonnegative Matrix Factorization
Hierarchical Metric Learning and Matching for 2D and 3D Geometric Correspondences
SlideNet: Fast and Accurate Slide Quality Assessment Based on Deep Neural Networks
Energy-Efficient Joint Offloading and Wireless Resource Allocation Strategy in Multi-MEC Server Systems
Variance Reduction for Policy Gradient with Action-Dependent Factorized Baselines
Sparse Reduced Rank Regression With Nonconvex Regularization
Split graphs: combinatorial species and asymptotics
3D Point Cloud Denoising using Graph Laplacian Regularization of a Low Dimensional Manifold Model
Transferring Rich Deep Features for Facial Beauty Prediction
Learning Dynamic Memory Networks for Object Tracking
Optimal Control and Stabilization Problem for Discrete-time Markov Jump Systems with Indefinite Weight Costs
eSCAPE: a Large-scale Synthetic Corpus for Automatic Post-Editing
Fair Deep Learning Prediction for Healthcare Applications with Confounder Filtering
Text Detection and Recognition in images: A survey
Offset Hypersurfaces and Persistent Homology of Algebraic Varieties
Face Recognition Techniques: A Survey
Flex-Convolution (Deep Learning Beyond Grid-Worlds)
A New State-Space Representation of Lyapunov Stability for Coupled PDEs and Scalable Stability Analysis in the SOS Framework
Unsupervised Cross-dataset Person Re-identification by Transfer Learning of Spatial-Temporal Patterns
Expressivity in TTS from Semantics and Pragmatics
Polarization and Index Modulations: a Theoretical and Practical Perspective
Risk and parameter convergence of logistic regression
Segmentation of histological images and fibrosis identification with a convolutional neural network
Cluster-based Wireless Energy Transfer for Low Complex Energy Receivers
Capacity Analysis of Index Modulations over Spatial, Polarization and Frequency Dimensions
Information content of coevolutionary game landscapes
The CTTC 5G end-to-end experimental platform: Integrating heterogeneous wireless/optical networks, distributed cloud, and IoT devices
Dual Polarized Modulation and Reception for Next Generation Mobile Satellite Communications
Rapid Prototyping of Standard-Compliant Visible Light Communications System
Link Adaptation Algorithms for Dual Polarization Mobile Satellite Systems
Advanced Signal Processing Techniques for Fixed and Mobile Satellite Communications
NOMA Assisted Joint Broadcast and Multicast Transmission in 5G Networks
Pushing for higher rates and efficiency in Satcom: the different perspectives within SatNExIV
End-to-end 5G services via an SDN/NFV-based multi-tenant network and cloud testbed
Zero-sum stochastic differential games of generalized McKean-Vlasov type *
Dual Polarized Modulation and Receivers for Mobile Communications in Urban Areas
Statistical evaluation of the azimuth and elevation angles seen at the output of the receiving antenna
Forward Link Interference Mitigation in Mobile Interactive Satellite Systems
An SDR Implementation of a Visible Light Communication System Based on the IEEE 802.15.7 Standard
Prototyping with SDR: a quick way to play with next-gen communications systems
Efficient Robust Model Predictive Control using Chordality
Optimizing Sponsored Search Ranking Strategy by Deep Reinforcement Learning
Frank-Wolfe with Subsampling Oracle
Progressive Structure from Motion
Discrete Potts Model for Generating Superpixels on Noisy Images
Self-Controlled Jamming Resilient Design Using Physical Layer Secret Keys
Adaptive Co-weighting Deep Convolutional Features For Object Retrieval
Optimal Symbolic Controllers Determinization for BDD storage
Fastest Rates for Stochastic Mirror Descent Methods
Sub-exponential Upper Bound for #XSAT of some CNF Classes
Effective filtering analysis for non-Gaussian dynamic systems
On Low-Resolution ADCs in Practical 5G Millimeter-Wave Massive MIMO Systems
Are you eligible? Predicting adulthood from face images via class specific mean autoencoder
Residual Codean Autoencoder for Facial Attribute Analysis
Max-Min Fairness User Scheduling and Power Allocation in Full-Duplex OFDMA Systems
Asynchronous opinion dynamics on the $k$-nearest-neighbors graph
Decomposability of graphs into subgraphs fulfilling the 1-2-3 Conjecture
Fractal analysis of the large-scale stellar mass distribution in the Sloan Digital Sky Survey
Patch-Based Image Inpainting with Generative Adversarial Networks
A Distance Oriented Kalman Filter Particle Swarm Optimizer Applied to Multi-Modality Image Registration
Ocean Eddy Identification and Tracking using Neural Networks
Ontology-Based Reasoning about the Trustworthiness of Cyber-Physical Systems
Reflected Advanced Backward Stochastic Differential Equations with Default
MLtuner: System Support for Automatic Machine Learning Tuning
Total Equitable List Coloring
Semi-Blind Spatially-Variant Deconvolution in Optical Microscopy with Local Point Spread Function Estimation By Use Of Convolutional Neural Networks
On the Alon-Tarsi Number and Chromatic-choosability of Cartesian Products of Graphs
Divisibility problems for function fields
Speech-Driven Facial Reenactment Using Conditional Generative Adversarial Networks
VQA-E: Explaining, Elaborating, and Enhancing Your Answers for Visual Questions
Equiangular tight frames from group divisible designs
MAGSAC: marginalizing sample consensus
An Improved Evaluation Framework for Generative Adversarial Networks
AC/DC: In-Database Learning Thunderstruck
Collective Schedules: Scheduling Meets Computational Social Choice
Actor and Action Video Segmentation from a Sentence
FastDeRain: A Novel Video Rain Streak Removal Method Using Directional Gradient Priors
Linearizing Visual Processes with Convolutional Variational Autoencoders
Mobile Social Services with Network Externality: From Separate Pricing to Bundled Pricing
Discrete Cubical and Path Homologies of Graphs
On a problem of Bermond and Bollobás
Non-Asymptotic Classical Data Compression with Quantum Side Information
Fusion of stereo and still monocular depth estimates in a self-supervised learning context
The Crossing Number of Seq-Shellable Drawings of Complete Graphs
Explanation Methods in Deep Learning: Users, Values, Concerns and Challenges
DeepGauge: Comprehensive and Multi-Granularity Testing Criteria for Gauging the Robustness of Deep Learning Systems
Studies on Generalized Yule Models
Broadcasting on Bounded Degree DAGs
Stacked Neural Networks for end-to-end ciliary motion analysis
An interaction index for multichoice games
C3PO: Database and Benchmark for Early-stage Malicious Activity Detection in 3D Printing
Learning Category-Specific Mesh Reconstruction from Image Collections

Book Memo: “Introduction to HPC with MPI for Data Science”

This gentle introduction to High Performance Computing (HPC) for Data Science using the Message Passing Interface (MPI) standard has been designed as a first course for undergraduates on parallel programming on distributed memory models, and requires only basic programming notions. Divided into two parts the first part covers high performance computing using C++ with the Message Passing Interface (MPI) standard followed by a second part providing high-performance data analytics on computer clusters. In the first part, the fundamental notions of blocking versus non-blocking point-to-point communications, global communications (like broadcast or scatter) and collaborative computations (reduce), with Amdalh and Gustafson speed-up laws are described before addressing parallel sorting and parallel linear algebra on computer clusters. The common ring, torus and hypercube topologies of clusters are then explained and global communication procedures on these topologies are studied. This first part closes with the MapReduce (MR) model of computation well-suited to processing big data using the MPI framework. In the second part, the book focuses on high-performance data analytics. Flat and hierarchical clustering algorithms are introduced for data exploration along with how to program these algorithms on computer clusters, followed by machine learning classification, and an introduction to graph analytics. This part closes with a concise introduction to data core-sets that let big data problems be amenable to tiny data problems.

Book Memo: “Probability and Statistics for Computer Science”

This textbook is aimed at computer science undergraduates late in sophomore or early in junior year, supplying a comprehensive background in qualitative and quantitative data analysis, probability, random variables, and statistical methods, including machine learning.
With careful treatment of topics that fill the curricular needs for the course, Probability and Statistics for Computer Science features:
• A treatment of random variables and expectations dealing primarily with the discrete case.
• A practical treatment of simulation, showing how many interesting probabilities and expectations can be extracted, with particular emphasis on Markov chains.
• A clear but crisp account of simple point inference strategies (maximum likelihood; Bayesian inference) in simple contexts. This is extended to cover some confidence intervals, samples and populations for random sampling with replacement, and the simplest hypothesis testing.
• A chapter dealing with classification, explaining why it’s useful; how to train SVM classifiers with stochastic gradient descent; and how to use implementations of more advanced methods such as random forests and nearest neighbors.
• A chapter dealing with regression, explaining how to set up, use and understand linear regression and nearest neighbors regression in practical problems.
• A chapter dealing with principal components analysis, developing intuition carefully, and including numerous practical examples. There is a brief description of multivariate scaling via principal coordinate analysis.
• A chapter dealing with clustering via agglomerative methods and k-means, showing how to build vector quantized features for complex signals.
Illustrated throughout, each main chapter includes many worked examples and other pedagogical elements such as
boxed Procedures, Definitions, Useful Facts, and Remember This (short tips). Problems and Programming Exercises are at the end of each chapter, with a summary of what the reader should know.
Instructor resources include a full set of model solutions for all problems, and an Instructor’s Manual with accompanying presentation slides.

Distilled News

The Machine Learning Reproducibility Crisis

I was recently chatting to a friend whose startup’s machine learning models were so disorganized it was causing serious problems as his team tried to build on each other’s work and share it with clients. Even the original author sometimes couldn’t train the same model and get similar results! He was hoping that I had a solution I could recommend, but I had to admit that I struggle with the same problems in my own work. It’s hard to explain to people who haven’t worked with machine learning, but we’re still back in the dark ages when it comes to tracking changes and rebuilding models from scratch. It’s so bad it sometimes feels like stepping back in time to when we coded without source control.

Getting Value from Machine Learning Isn’t About Fancier Algorithms – It’s About Making It Easier to Use

Machine learning can drive tangible business value for a wide range of industries — but only if it is actually put to use. Despite the many machine learning discoveries being made by academics, new research papers showing what is possible, and an increasing amount of data available, companies are struggling to deploy machine learning to solve real business problems. In short, the gap for most companies isn’t that machine learning doesn’t work, but that they struggle to actually use it. How can companies close this execution gap? In a recent project we illustrated the principles of how to do it. We used machine learning to augment the power of seasoned professionals — in this case, project managers — by allowing them to make data-driven business decisions well in advance. And in doing so, we demonstrated that getting value from machine learning is less about cutting-edge models, and more about making deployment easier.

Introduction to k-Nearest-Neighbors

The k-Nearest-Neighbors (kNN) method of classification is one of the simplest methods in machine learning, and is a great way to introduce yourself to machine learning and classification in general. At its most basic level, it is essentially classification by finding the most similar data points in the training data, and making an educated guess based on their classifications. Although very simple to understand and implement, this method has seen wide application in many domains, such as in recommendation systems, semantic searching, and anomaly detection.

Default Priors for the Intercept Parameter in Logistic Regressions

In logistic regression, separation refers to the situation in which a linear combination of predictors perfectly discriminates the binary outcome. Because finite-valued maximum likelihood parameter estimates do not exist under separation, Bayesian regressions with informative shrinkage of the regression coefficients offer a suitable alternative. Little focus has been given on whether and how to shrink the intercept parameter. Based upon classical studies of separation, we argue that efficiency in estimating regression coefficients may vary with the intercept prior. We adapt alternative prior distributions for the intercept that downweight implausibly extreme regions of the parameter space rendering less sensitivity to separation. Through simulation and the analysis of exemplar datasets, we quantify differences across priors stratified by established statistics measuring the degree of separation. Relative to diffuse priors, our recommendations generally result in more efficient estimation of the regression coefficients themselves when the data are nearly separated. They are equally efficient in non-separated datasets, making them suitable for default use. Modest differences were observed with respect to out-of-sample discrimination. Our work also highlights the interplay between priors for the intercept and the regression coefficients: numerical results are more sensitive to the choice of intercept prior when using a weakly informative prior on the regression coefficients than an informative shrinkage prior.

R and Docker

If you regularly have to deal with specific versions of R, or different package combinations, or getting R set up to work with other databases or applications then, well, it can be a pain. You could dedicate a special machine for each configuration you need, I guess, but that’s expensive and impractical. You could set up virtual machines in the cloud which works well for one-off situations, but gets tedious having to re-configure a new VM each time. Or, you could use Docker containers, which were expressly designed to make it quick easy to configure and launch an independent and secure collection of software and services. If you’re new to the concept of Docker containers, here’s a docker tutorial for data scientists. But the concepts are pretty simple. At Docker hub, you can search ‘images’ – basically, bundles of software with pre-configured settings – contributed by the community and by vendors. (You’ll be referring to the images by name, for example: rocker/r-base.) You can then create a ‘container’ (a running instance of that image) on your machine with the docker application, or in the cloud using the tools offered by your provider of choice.

Regression Analysis Essentials For Machine Learning

Regression analysis consists of a set of machine learning methods that allow us to predict a continuous outcome variable (y) based on the value of one or multiple predictor variables (x). Briefly, the goal of regression model is to build a mathematical equation that defines y as a function of the x variables. Next, this equation can be used to predict the outcome (y) on the basis of new values of the predictor variables (x).

What Comes After Deep Learning

We’re stuck. Or at least we’re plateaued. Can anyone remember the last time a year went by without a major notable advance in algorithms, chips, or data handling? It was so unusual to go to the Strata San Jose conference a few weeks ago and see no new eye catching developments. As I reported earlier, it seems we’ve hit maturity and now our major efforts are aimed at either making sure all our powerful new techniques work well together (converged platforms) or making a buck from those massive VC investments in same. I’m not the only one who noticed. Several attendees and exhibitors said very similar things to me. And just the other day I had a note from a team of well-regarded researchers who had been evaluating the relative merits of different advanced analytic platforms, and concluding there weren’t any differences worth reporting.

Automated front-end development using deep learning

SketchCode: Go from idea to HTML in 5 seconds

Engineering Data Science at Automattic

Most data scientists have to write code to analyze data or build products. While coding, data scientists act as software engineers. Adopting best practices from software engineering is key to ensuring the correctness, reproducibility, and maintainability of data science projects. This post describes some of our efforts in the area.

Multi-Class Text Classification with Scikit-Learn

There are lots of applications of text classification in the commercial world. For example, news stories are typically organized by topics; content or products are often tagged by categories; users can be classified into cohorts based on how they talk about a product or brand online.

Introducing udpipe for easy Natural Language Processing in R

Natural Language Processing (NLP) has been seen as one of the blackboxes of Data Analytics. The aim of this post is to introduce this simple-to-use but effective R package udpipe for NLP and Text Analytics. UDPipe?—?R package provides language-agnostic tokenization, tagging, lemmatization and dependency parsing of raw text, which is an essential part in natural language processing.

Learning Distributed Word Representations with Neural Network: an implementation from scratch in Octave

In this article, the problem of learning word representations with neural network from scratch is going to be described. This problem appeared as an assignment in the Coursera course Neural Networks for Machine Learning, taught by Prof. Geoffrey Hinton from the University of Toronto in 2012.

Blockchain Potential to Transform Artificial Intelligence

The research on improving Artificial Intelligence (A.I.) has been ongoing for decades. However, it wasn’t until recently that developers were finally able to create smart systems that closely resemble the A.I. capabilities of humans. The main reason for this breakthrough in technology is advancements in Big Data. Recent developments in Big Data have allowed us the capability to organize a very large amount of information into structured components that can be very quickly processed by computers. Another technology that has the potential for rapidly advancing and transforming Artificial Intelligence is the Blockchain. While some of the applications that have been developed on Blockchain are nothing more than ledger records of transactions, others are so incredibly smart that they almost appear like AI. Here, we will look more closely at the opportunities for A.I. advancement through the Blockchain protocol.

Document worth reading: “An introduction to Graph Data Management”

A graph database is a database where the data structures for the schema and/or instances are modeled as a (labeled)(directed) graph or generalizations of it, and where querying is expressed by graph-oriented operations and type constructors. In this article we present the basic notions of graph databases, give an historical overview of its main development, and study the main current systems that implement them. An introduction to Graph Data Management