If you did not already know

Fuzzy Cognitive Map google
A Fuzzy cognitive map is a cognitive map within which the relations between the elements (e.g. concepts, events, project resources) of a “mental landscape” can be used to compute the “strength of impact” of these elements. The theory behind that computation is fuzzy logic. …
Discrete Dantzig Selector google
We propose a new high-dimensional linear regression estimator: the Discrete Dantzig Selector, which minimizes the number of nonzero regression coefficients, subject to a budget on the maximal absolute correlation between the features and the residuals. We show that the estimator can be expressed as a solution to a Mixed Integer Linear Optimization (MILO) problem—a computationally tractable framework that enables the computation of provably optimal global solutions. Our approach has the appealing characteristic that even if we terminate the optimization problem at an early stage, it exits with a certificate of sub-optimality on the quality of the solution. We develop new discrete first order methods, motivated by recent algorithmic developments in first order continuous convex optimization, to obtain high quality feasible solutions for the Discrete Dantzig Selector problem. Our proposal leads to advantages over the off-the-shelf state-of-the-art integer programming algorithms, which include superior upper bounds obtained for a given computational budget. When a solution obtained from the discrete first order methods is passed as a warm-start to a MILO solver, the performance of the latter improves significantly. Exploiting problem specific information, we propose enhanced MILO formulations that further improve the algorithmic performance of the MILO solvers. We demonstrate, both theoretically and empirically, that, in a wide range of regimes, the statistical properties of the Discrete Dantzig Selector are superior to those of popular $\ell_{1}$-based approaches. For problem instances with $p \approx 2500$ features and $n \approx 900$ observations, our computational framework delivers optimal solutions in a few minutes and certifies optimality within an hour. …
Robust Principal Component Analysis (ROBPCA) google
We introduce a new method for robust principal component analysis (PCA). Classical PCA is based on the empirical covariance matrix of the data and hence is highly sensitive to outlying observations. Two robust approaches have been developed to date. The first approach is based on the eigenvectors of a robust scatter matrix such as the minimum covariance determinant or an S-estimator and is limited to relatively low-dimensional data. The second approach is based on projection pursuit and can handle highdimensional data. Here we propose the ROBPCA approach, which combines projection pursuit ideas with robust scatter matrix estimation. ROBPCA yields more accurate estimates at noncontaminated datasets and more robust estimates at contaminated data. ROBPCA can be computed rapidly, and is able to detect exact-fit situations. As a by-product, ROBPCA produces a diagnostic plot that displays and classifies the outliers. We apply the algorithm to several datasets from chemometrics and engineering. …


Whats new on arXiv

Espresso: Efficient Forward Propagation for BCNNs

There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is a compact, yet powerful library written in C/CUDA that features all the functionalities required for the forward propagation of CNNs, in a binary file less than 400KB, without any external dependencies. Although it is mainly designed to take advantage of massive GPU parallelism, Espresso also provides an equivalent CPU implementation for CNNs. Espresso provides special convolutional and dense layers for BCNNs, leveraging bit-packing and bit-wise computations for efficient execution. These techniques provide a speed-up of matrix-multiplication routines, and at the same time, reduce memory usage when storing parameters and activations. We experimentally show that Espresso is significantly faster than existing implementations of optimized binary neural networks (\approx 2 orders of magnitude). Espresso is released under the Apache 2.0 license and is available at http://…/espresso.

Accelerated Distributed Nesterov Gradient Descent

This paper considers the distributed optimization problem over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. We develop an Accelerated Distributed Nesterov Gradient Descent (Acc-DNGD) method. When the objective function is convex and L-smooth, we show that it achieves a O(\frac{1}{t^{1.4-\epsilon}}) convergence rate for all \epsilon\in(0,1.4). We also show the convergence rate can be improved to O(\frac{1}{t^2}) if the objective function is a composition of a linear map and a strongly-convex and smooth function. When the objective function is \mu-strongly convex and L-smooth, we show that it achieves a linear convergence rate of O([ 1 - O( (\frac{\mu}{L})^{5/7} )]^t), where \frac{L}{\mu} is the condition number of the objective.

Accelerated Inference for Latent Variable Models

Inference of latent feature models in the Bayesian nonparametric setting is generally difficult, especially in high dimensional settings, because it usually requires proposing features from some prior distribution. In special cases, where the integration is tractable, we could sample feature assignments according to a predictive likelihood. However, this still may not be efficient in high dimensions. We present a novel method to accelerate the mixing of latent variable model inference by proposing feature locations from the data, as opposed to the prior. This sampling method is efficient for proper mixing of the Markov chain Monte Carlo sampler, computationally attractive because this method can be performed in parallel, and is theoretically guaranteed to converge to the posterior distribution as its limiting distribution.

AIDE: An algorithm for measuring the accuracy of probabilistic inference algorithms

Approximate probabilistic inference algorithms are central to many fields. Examples include sequential Monte Carlo inference in robotics, variational inference in machine learning, and Markov chain Monte Carlo inference in statistics. A key problem faced by practitioners is measuring the accuracy of an approximate inference algorithm on a specific dataset. This paper introduces the auxiliary inference divergence estimator (AIDE), an algorithm for measuring the accuracy of approximate inference algorithms. AIDE is based on the observation that inference algorithms can be treated as probabilistic models and the random variables used within the inference algorithm can be viewed as auxiliary variables. This view leads to a new estimator for the symmetric KL divergence between the output distributions of two inference algorithms. The paper illustrates application of AIDE to algorithms for inference in regression, hidden Markov, and Dirichlet process mixture models. The experiments show that AIDE captures the qualitative behavior of a broad class of inference algorithms and can detect failure modes of inference algorithms that are missed by standard heuristics.

RankPL: A Qualitative Probabilistic Programming Language

In this paper we introduce RankPL, a modeling language that can be thought of as a qualitative variant of a probabilistic programming language with a semantics based on Spohn’s ranking theory. Broadly speaking, RankPL can be used to represent and reason about processes that exhibit uncertainty expressible by distinguishing ‘normal’ from’ surprising’ events. RankPL allows (iterated) revision of rankings over alternative program states and supports various types of reasoning, including abduction and causal inference. We present the language, its denotational semantics, and a number of practical examples. We also discuss an implementation of RankPL that is available for download.

Accelerated Hierarchical Density Clustering

We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter. This makes accelerated HDBSCAN* the default choice for density based clustering. Library available at: https://…/hdbscan

Fast Change Point Detection on Dynamic Social Networks

A number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model – where a network is defined at logical time-stamps. An important problem under this model is change point detection. In this work we devise an effective and efficient three-step-approach for detecting change points in dynamic networks under the snapshot model. Our algorithm achieves up to 9X speedup over the state-of-the-art while improving quality on both synthetic and real world networks.

Forward Thinking: Building Deep Random Forests

The success of deep neural networks has inspired many to wonder whether other learners could benefit from deep, layered architectures. We present a general framework called forward thinking for deep learning that generalizes the architectural flexibility and sophistication of deep neural networks while also allowing for (i) different types of learning functions in the network, other than neurons, and (ii) the ability to adaptively deepen the network as needed to improve results. This is done by training one layer at a time, and once a layer is trained, the input data are mapped forward through the layer to create a new learning problem. The process is then repeated, transforming the data through multiple layers, one at a time, rendering a new dataset, which is expected to be better behaved, and on which a final output layer can achieve good performance. In the case where the neurons of deep neural nets are replaced with decision trees, we call the result a Forward Thinking Deep Random Forest (FTDRF). We demonstrate a proof of concept by applying FTDRF on the MNIST dataset. We also provide a general mathematical formulation that allows for other types of deep learning problems to be considered.

Recurrent Additive Networks

We introduce recurrent additive networks (RANs), a new gated RNN which is distinguished by the use of purely additive latent state updates. At every time step, the new state is computed as a gated component-wise sum of the input and the previous state, without any of the non-linearities commonly used in RNN transition dynamics. We formally show that RAN states are weighted sums of the input vectors, and that the gates only contribute to computing the weights of these sums. Despite this relatively simple functional form, experiments demonstrate that RANs outperform both LSTMs and GRUs on benchmark language modeling problems. This result shows that many of the non-linear computations in LSTMs and related networks are not essential, at least for the problems we consider, and suggests that the gates are doing more of the computational work than previously understood.

MITHRIL: Mining Sporadic Associations for Cache Prefetching

The growing pressure on cloud application scalability has accentuated storage performance as a critical bottle- neck. Although cache replacement algorithms have been extensively studied, cache prefetching – reducing latency by retrieving items before they are actually requested remains an underexplored area. Existing approaches to history-based prefetching, in particular, provide too few benefits for real systems for the resources they cost. We propose MITHRIL, a prefetching layer that efficiently exploits historical patterns in cache request associations. MITHRIL is inspired by sporadic association rule mining and only relies on the timestamps of requests. Through evaluation of 135 block-storage traces, we show that MITHRIL is effective, giving an average of a 55% hit ratio increase over LRU and PROBABILITY GRAPH, a 36% hit ratio gain over AMP at reasonable cost. We further show that MITHRIL can supplement any cache replacement algorithm and be readily integrated into existing systems. Furthermore, we demonstrate the improvement comes from MITHRIL being able to capture mid-frequency blocks.

CrossNets : A New Approach to Complex Learning

We propose a novel neural network structure called CrossNets, which considers architectures on directed acyclic graphs. This structure builds on previous generalizations of feed forward models, such as ResNets, by allowing for all forward cross connections between layers (both adjacent and non-adjacent). The addition of cross connections among the network increases information flow across the whole network, leading to better training and testing performances. The superior performance of the network is tested against four benchmark datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN. We conclude with a proof of convergence for Crossnets to a local minimum for error, where weights for connections are chosen through backpropagation with momentum.

Shallow Updates for Deep Reinforcement Learning

Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the other hand, are more stable and require less hyper parameter tuning. Yet, substantial feature engineering is necessary to achieve good results. In this work we propose a hybrid approach — the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method. We do this by periodically re-training the last hidden layer of a DRL network with a batch least squares update. Key to our approach is a Bayesian regularization term for the least squares update, which prevents over-fitting to the more recent data. We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method. Interestingly, we found that the performance improvement can be attributed to the large batch size used by the LS method when optimizing the last layer.

Statistical inference using SGD

We present a novel method for frequentist statistical inference in M-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.

Shake-Shake regularization

The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://…/shake-shake.

Annealed Generative Adversarial Networks

We introduce a novel framework for adversarial training where the target distribution is annealed between the uniform distribution and the data distribution. We posited a conjecture that learning under continuous annealing in the nonparametric regime is stable irrespective of the divergence measures in the objective function and proposed an algorithm, dubbed {\ss}-GAN, in corollary. In this framework, the fact that the initial support of the generative network is the whole ambient space combined with annealing are key to balancing the minimax game. In our experiments on synthetic data, MNIST, and CelebA, {\ss}-GAN with a fixed annealing schedule was stable and did not suffer from mode collapse.

Infrastructure for Usable Machine Learning: The Stanford DAWN Project

Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations. This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end-to-end machine learning application development, from data preparation and labeling to productionization and monitoring. In this document, we outline opportunities for infrastructure supporting usable, end-to-end machine learning applications in the context of the nascent DAWN (Data Analytics for What’s Next) project at Stanford.

Learning from Complementary Labels

Collecting labeled data is costly and thus is a critical bottleneck in real-world classification tasks. To mitigate the problem, we consider a complementary label, which specifies a class that a pattern does not belong to. Collecting complementary labels would be less laborious than ordinary labels since users do not have to carefully choose the correct class from many candidate classes. However, complementary labels are less informative than ordinary labels and thus a suitable approach is needed to better learn from complementary labels. In this paper, we show that an unbiased estimator of the classification risk can be obtained only from complementary labels, if a loss function satisfies a particular symmetric condition. We theoretically prove the estimation error bounds for the proposed method, and experimentally demonstrate the usefulness of the proposed algorithms.

Improved Clustering with Augmented k-means

Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering, k-means is a very important algorithm for this. In this technical report, we develop a new k-means variant called Augmented k-means, which is a hybrid of k-means and logistic regression. During each iteration, logistic regression is used to predict the current cluster labels, and the cluster belonging probabilities are used to control the subsequent re-estimation of cluster means. Observations which can’t be firmly identified into clusters are excluded from the re-estimation step. This can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, non-sphericity, substantial overlap, and high scatter. Augmented k-means frequently outperforms k-means by more accurately classifying observations into known clusters and / or converging in fewer iterations. We demonstrate this on both simulated and real datasets. Our algorithm is implemented in Python and will be available with this report.

AIXIjs: A Software Demo for General Reinforcement Learning

Reinforcement learning is a general and powerful framework with which to study and implement artificial intelligence. Recent advances in deep learning have enabled RL algorithms to achieve impressive performance in restricted domains such as playing Atari video games (Mnih et al., 2015) and, recently, the board game Go (Silver et al., 2016). However, we are still far from constructing a generally intelligent agent. Many of the obstacles and open questions are conceptual: What does it mean to be intelligent? How does one explore and learn optimally in general, unknown environments? What, in fact, does it mean to be optimal in the general sense? The universal Bayesian agent AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a central role in the sub-field of general reinforcement learning (GRL). Recently, AIXI has been shown to be flawed in important ways; it doesn’t explore enough to be asymptotically optimal (Orseau, 2010), and it can perform poorly with certain priors (Leike and Hutter, 2015). Several variants of AIXI have been proposed to attempt to address these shortfalls: among them are entropy-seeking agents (Orseau, 2011), knowledge-seeking agents (Orseau et al., 2013), Bayes with bursts of exploration (Lattimore, 2013), MDL agents (Leike, 2016a), Thompson sampling (Leike et al., 2016), and optimism (Sunehag and Hutter, 2015). We present AIXIjs, a JavaScript implementation of these GRL agents. This implementation is accompanied by a framework for running experiments against various environments, similar to OpenAI Gym (Brockman et al., 2016), and a suite of interactive demos that explore different properties of the agents, similar to REINFORCEjs (Karpathy, 2015). We use AIXIjs to present numerous experiments illustrating fundamental properties of, and differences between, these agents.

A novel algorithmic approach to Bayesian Logic Regression

Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has remained less well known than other approaches to epistatic association mapping. Here we will adopt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects.

W2VLDA: Almost Unsupervised System for Aspect Based Sentiment Analysis

With the increase of online customer opinions in specialised websites and social networks, the necessity of automatic systems to help to organise and classify customer reviews by domain-specific aspect/categories and sentiment polarity is more important than ever. Supervised approaches to Aspect Based Sentiment Analysis obtain good results for the domain/language their are trained on, but having manually labelled data for training supervised systems for all domains and languages use to be very costly and time consuming. In this work we describe W2VLDA, an unsupervised system based on topic modelling, that combined with some other unsupervised methods and a minimal configuration, performs aspect/category classifiation, aspectterms/opinion-words separation and sentiment polarity classification for any given domain and language. We also evaluate the performance of the aspect and sentiment classification in the multilingual SemEval 2016 task 5 (ABSA) dataset. We show competitive results for several languages (English, Spanish, French and Dutch) and domains (hotels, restaurants, electronic-devices).

Minimax Statistical Learning and Domain Adaptation with Wasserstein Distances

As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data. In this work, we describe a minimax framework for statistical learning with ambiguity sets given by balls in Wasserstein space. In particular, we prove a generalization bound that involves the covering number properties of the original ERM problem. As an illustrative example, we provide generalization guarantees for domain adaptation problems where the Wasserstein distance between the source and target domain distributions can be reliably estimated from unlabeled samples.

Nonparametric Online Regression while Learning the Metric

We study algorithms for online nonparametric regression that learn the directions along which the regression function is smoother. Our algorithm learns the Mahalanobis metric based on the gradient outer product matrix \boldsymbol{G} of the regression function (automatically adapting to the effective rank of this matrix), while simultaneously bounding the regret —on the same data sequence— in terms of the spectrum of \boldsymbol{G}. As a preliminary step in our analysis, we generalize a nonparametric online learning algorithm by Hazan and Megiddo by enabling it to compete against functions whose Lipschitzness is measured with respect to an arbitrary Mahalanobis metric.

Streaming Sparse Gaussian Process Approximations

Softmax Q-Distribution Estimation for Structured Prediction: A Theoretical Interpretation for RAML

Deep De-Aliasing for Fast Compressive Sensing MRI

Simultaneous Multiple Surface Segmentation Using Deep Learning

A New 3D Segmentation Methodology for Lumbar Vertebral Bodies for the Measurement of BMD and Geometry

Sparse Coding on Stereo Video for Object Detection

A New 3D Method to Segment the Lumbar Vertebral Bodies and to Determine Bone Mineral Density and Geometry

Local Information with Feedback Perturbation Suffices for Dictionary Learning in Neural Circuits

Data-driven Optimal Transport Cost Selection for Distributionally Robust Optimizatio

Clustering under Local Stability: Bridging the Gap between Worst-Case and Beyond Worst-Case Analysis

Improved Very-short-term Wind Forecasting using Atmospheric Classification

A Lightweight Approach for On-the-Fly Reflectance Estimation

Relaxed Wasserstein with Applications to GANs

Competing Bose-Glass physics with disorder-induced Bose-Einstein condensation in the doped $S=1$ antiferromagnet Ni(Cl$_{1-x}$Br$_x$)$_2$-4SC(NH$_2$)$_2$ at high magnetic fields

Doubly Robust Data-Driven Distributionally Robust Optimization

Hall-Littlewood RSK field

Nestrov’s Acceleration For Second Order Method

Model-Based Planning in Discrete Action Spaces

Large System Analysis of Power Normalization Techniques in Massive MIMO

On the coupling time of the heat-bath process for the Fortuin-Kasteleyn random-cluster model

Proximal Methods for Sparse Optimal Scoring and Discriminant Analysis

Hypothesis Testing via Euclidean Separation

The High-Dimensional Geometry of Binary Neural Networks

Smoothed and Average-case Approximation Ratios of Mechanisms: Beyond the Worst-case Analysis

Multi-Stage Variational Auto-Encoders for Coarse-to-Fine Image Generation

Ensemble Adversarial Training: Attacks and Defenses

Machine learning modeling for time series problem: Predicting flight ticket prices

Towards Real World Human Parsing: Multiple-Human Parsing in the Wild

PixColor: Pixel Recursive Colorization

Two-temperature logistic regression based on the Tsallis divergence

Quantum versus classical simultaneity in communication complexity

Space Complexity of Fault Tolerant Register Emulations

Securing Deep Neural Nets against Adversarial Attacks with Moving Target Defense

How to Train Your DRAGAN

Families of vectors without antipodal pairs

GAR: An efficient and scalable Graph-based Activity Regularization for semi-supervised learning

Active Sampling for Graph-Cognizant Classification via Expected Model Change

Quadruplet Network with One-Shot Learning for Visual Tracking

Iteration-complexity of a Jacobi-type non-Euclidean ADMM for multi-block linearly constrained nonconvex programs

Uncertainty in Economic Growth and Inequality

Coexistence of RF-powered IoT and a Primary Wireless Network with Secrecy Guard Zones

Recurrent Scene Parsing with Perspective Understanding in the Loop

Diffusion-based neuromodulation can eliminate catastrophic forgetting in simple neural networks

End-to-end Planning of Fixed Millimeter-Wave Networks

Speedup from a different parametrization within the Neural Net algorithm

Modeling spatial social complex networks for dynamical processes

SVM via Saddle Point Optimization: New Bounds and Distributed Algorithms

BRPL: Backpressure RPL for High-throughput and Mobile IoTs

Learning Feature Nonlinearities with Non-Convex Regularized Binned Regression

PrivMin: Differentially Private MinHash for Jaccard Similarity Computation

Oracle Complexity of Second-Order Methods for Smooth Convex Optimization

Stochastic Recursive Gradient Algorithm for Nonconvex Optimization

Batch Reinforcement Learning on the Industrial Benchmark: First Experiences

Adversarial Examples Are Not Easily Detected: Bypassing Ten Detection Methods

High-Dimensional Bayesian Geostatistics

Search Engine Guided Non-Parametric Neural Machine Translation

Learning to Factor Policies and Action-Value Functions: Factored Action Space Representations for Deep Reinforcement learning

Conflict-free vertex-connections of graphs

Non-Linear Phase-Shifting of Haar Wavelets for Run-Time All-Frequency Lighting

Polar Coding for Parallel Gaussian Channel

Bayesian Belief Updating of Spatiotemporal Seizure Dynamics

Fast and simple algorithms for computing both $LCS_{k}$ and $LCS_{k+}$

Structured Bayesian Pruning via Log-Normal Multiplicative Noise

Gaze Distribution Analysis and Saliency Prediction Across Age Groups

Optimality of orders one to three and beyond: characterization and evaluation complexity in constrained nonconvex optimization

Performance Evaluation of Optimal Radio Access Technology Selection Algorithms for LTE-WiFi Network

Deep Sparse Coding Using Optimized Linear Expansion of Thresholds

Contract Design for Energy Demand Response

Toric weak Fano varieties associated to building sets

Event-Triggered Algorithms for Leader-Follower Consensus of Networked Euler-Lagrange Agents

Personalized Ranking for Context-Aware Venue Suggestion

Lower Bound On the Computational Complexity of Discounted Markov Decision Problems

Non-existence of antipodal cages of even girth

Formalized Lambek Calculus in Higher Order Logic (HOL4)

Applications of multiplicative number theory to uniform distribution and ergodic Ramsey theory

Dynamic Analysis of the Arrow Distributed Directory Protocol in General Networks

Forecasting Hand and Object Locations in Future Frames

Critical Contours: An Invariant Linking Image Flow with Salient Surface Organization

Full-Duplex Bidirectional Secure Communications under Perfect and Distributionally Ambiguous Eavesdropper’s CSI

Combining tabu search and graph reduction to solve the maximum balanced biclique problem

Phase-Shifting Separable Haar Wavelets and Applications

Why You Should Charge Your Friends for Borrowing Your Stuff

Ensemble Sampling

Calibrating Black Box Classification Models through the Thresholding Method

Stability of cross-validation and minmax-optimal number of folds

A Dynkin game on assets with incomplete information on the return

On Capacity of Noncoherent MIMO with Asymmetric Link Strengths

Structural Compression of Convolutional Neural Networks Based on Greedy Filter Pruning

Honey Bee Dance Modeling in Real-time using Machine Learning

Stabilizing Adversarial Nets With Prediction Methods

Mixed Membership Word Embeddings for Computational Social Science

Broadcasting in Noisy Radio Networks

Spelling Correction as a Foreign Language

Instrument-Armed Bandits

Generalizing the Role of Determinization in Probabilistic Planning

Gradient Flows: Applications to Classification, Image Denoising, and Riemannian MCMC

Incorporating Depth into both CNN and CRF for Indoor Semantic Segmentation

Balanced Policy Evaluation and Learning

DeepMasterPrint: Generating Fingerprints for Presentation Attacks

Ergodicity of stochastic differential equations with jumps and singular coefficients

On asymptotically minimax nonparametric detection of signal in Gaussian white noise

Equating $k$ Maximum Degrees in Graphs without Short Cycles

Algebraic Aspects of Conditional Independence and Graphical Models

Answers to Holm’s questions for high oreder free arrangements

More results on the distance (signless) Laplacian eigenvalues of graphs

Structured Image Classification from Conditional Random Field with Deep Class Embedding

Generative Partition Networks for Multi-Person Pose Estimation

Learning Semantic Relatedness From Human Feedback Using Metric Learning

The Do’s and Don’ts for CNN-based Face Verification

Direct Search Methods on Reductive Homogeneous Spaces

Sketched Answer Set Programming

On uniquely k-list colorable planar graphs, graphs on surfaces, and regular graphs

Powerful sets: a generalisation of binary matroids

Exponential Capacity in an Autoencoder Neural Network with a Hidden Layer

Parallel Streaming Wasserstein Barycenters

Additive Combinatorics: A Menu of Research Problems

Learning to Mix n-Step Returns: Generalizing lambda-Returns for Deep Reinforcement Learning

Griffiths Singularities in the Random Quantum Ising Antiferromagnet: A Tree Tensor Network Renormalization Group Study

A spatial epidemic model with site contamination

Image Segmentation by Iterative Inference from Conditional Score Estimation

Monotonicity of average return probabilities for random walks in random environments

Analytical Methods and Field Theory for Disordered Systems

Experience enrichment based task independent reward model

Spatially Controlled Relay Beamforming: $2$-Stage Optimal Policies

Improved Algorithms for Matrix Recovery from Rank-One Projections

Nonautonomous Young differential equations revisited

Nice latent variable models have log-rank

On Stackelberg Mixed Strategies

Report of the HPC Correctness Summit, Jan 25–26, 2017, Washington, DC

Parallel and in-process compilation of individuals for genetic programming on GPU

On some polynomials and series of Bloch-Polya Type

A truncation model for estimating Species Richness

Sandpile Groups of Random Bipartite Graphs

Classification and Retrieval of Digital Pathology Scans: A New Dataset

Testing Degree Corrections in Stochastic Block Models

Testing hypotheses on a tree: new error rates and controlling strategies

Classification of toric manifolds over an $n$-cube with one vertex cut

Corrupted Sensing with Sub-Gaussian Measurements

A Note on the Information-Theoretic-(in)Security of Fading Generated Secret Keys

Random walks among time increasing conductances: heat kernel estimates

Integrable structure of products of finite complex Ginibre random matrices

On the Phase Transition of Corrupted Sensing

An Overview of Massive MIMO Research at the University of Bristol

Categorical relations between Langlands dual quantum affine algebras: Doubly laced types

Building Emotional Machines: Recognizing Image Emotions through Deep Neural Networks

A note on the number of edges in a Hamiltonian graph with no repeated cycle length

Parameterized Complexity of the List Coloring Reconfiguration Problem with Graph Parameters

Subgradients of Minimal Time Functions without Calmness

Boosting the accuracy of multi-spectral image pan-sharpening by learning a deep residual network

Note on Evolution and Forecasting of Requirements: Communications Example

Detection Estimation and Grid matching of Multiple Targets with Single Snapshot Measurements

Batch Size Matters: A Diffusion Approximation Framework on Nonconvex Stochastic Gradient Descent

Learning to Rank Using Localized Geometric Mean Metrics

Learning to Prune Deep Neural Networks via Layer-wise Optimal Brain Surgeon

Hypergroups derived from random walks on some infinite graphs

Many-Body-Localization : Strong Disorder perturbative approach for the Local Integrals of Motion

Self-Fulfilling Signal of an Endogenous State in Network Congestion Games

Global Guarantees for Enforcing Deep Generative Priors by Empirical Risk

Semiparametric Efficient Empirical Higher Order Influence Function Estimators

Union of Intersections (UoI) for Interpretable Data Driven Discovery and Prediction

Learning Robust Object Recognition Using Composed Scenes from Generative Models

Rao-Blackwellized Particle Smoothing as Message Passing

Classification Using Proximity Catch Digraphs (Technical Report)

Multi-output Polynomial Networks and Factorization Machines

External powers of tensor products as representations of general linear groups

A copula approach for dependence modeling in multivariate nonparametric time series

Deep Reinforcement Learning with Relative Entropy Stochastic Search

View-Invariant Recognition of Action Style Self-Dissimilarity

Nonconvex homogenization for one-dimensional controlled random walks in random potential

Construction of strongly regular Cayley graphs based on three-valued Gauss periods

Computer vision-based food calorie estimation: dataset, method, and experiment

Resonant Near-Field Effects in Photonic Glasses

On the Efficient Simulation of the Left-Tail of the Sum of Correlated Log-normal Variates

Dynamics Based 3D Skeletal Hand Tracking

Speed and fluctuations for some driven dimer models

From optimal transport to generative modeling: the VEGAN cookbook

Near-feasible stable matchings with budget constraints

An approximate empirical Bayesian method for large-scale linear-Gaussian inverse problems

The Widom-Rowlinson Model on the Delaunay Graph

ReFACTor: Practical Low-Rank Matrix Estimation Under Column-Sparsity

Streaming Binary Sketching based on Subspace Tracking and Diagonal Uniformization

LOGAN: Evaluating Privacy Leakage of Generative Models Using Generative Adversarial Networks

CayleyNets: Graph Convolutional Neural Networks with Complex Rational Spectral Filters

An affine scaling method using a class of differential barrier functions

A decoding algorithm for Twisted Gabidulin codes

Quantitative stochastic homogenization and regularity theory of parabolic equations

A Linear-Time Kernel Goodness-of-Fit Test

Individualized Risk Prognosis for Critical Care Patients: A Multi-task Gaussian Process Model

Clique-Width for Graph Classes Closed under Complementation

Controllability of evolution equations with memory

An Inexact Newton-like conditional gradient method for constrained nonlinear systems

Semantic Softmax Loss for Zero-Shot Learning

Compressed Sensing with Prior Information via Maximizing Correlation

A Regularized Framework for Sparse and Structured Neural Attention

An Out-of-the-box Full-network Embedding for Convolutional Neural Networks

On the Convergence of the Accelerated Riccati Iteration Method

Quo Vadis, Action Recognition? A New Model and the Kinetics Dataset

An Asynchronous Distributed Framework for Large-scale Learning Based on Parameter Exchanges

VEEGAN: Reducing Mode Collapse in GANs using Implicit Variational Learning

Exact Recovery with Symmetries for the Doubly-Stochastic Relaxation

Learning to Associate Words and Images Using a Large-scale Graph

Convolutional Networks with MuxOut Layers as Multi-rate Systems for Image Upscaling

Follow the Signs for Robust Stochastic Optimization

Robust Localized Multi-view Subspace Clustering

An improvement of the asymptotic Elias bound for non-binary codes

Backprop without Learning Rates Through Coin Betting

A unified view of entropy-regularized Markov decision processes

Right-sided multifractal spectra indicate small-worldness in networks

Use Privacy in Data-Driven Systems: Theory and Experiments with Machine Learnt Programs

Information-theoretic analysis of generalization capability of learning algorithms

Sparse hierarchical interaction learning with epigraphical projection

TricorNet: A Hybrid Temporal Convolutional and Recurrent Network for Video Action Segmentation

Regularizing deep networks using efficient layerwise adversarial training

On deep holes of generalized projective Reed-Solomon codes

Comparing the Finite-Time Performance of Simulation-Optimization Algorithms

Ask the Right Questions: Active Question Reformulation with Reinforcement Learning

Stabilizing GAN Training with Multiple Random Projections

Concrete Dropout

Counting De Bruijn sequences as perturbations of linear recursions

Size Matters: Cardinality-Constrained Clustering and Outlier Detection via Conic Optimization

DepthCut: Improved Depth Edge Estimation Using Multiple Unreliable Channels

Machine-learning-assisted correction of correlated qubit errors in a topological code

Real Time Image Saliency for Black Box Classifiers

On-the-fly Operation Batching in Dynamic Computation Graphs

Symmetry Breaking in the Congest Model: Time- and Message-Efficient Algorithms for Ruling Sets

Finite Blocklength Rates over a Fading Channel with CSIT and CSIR

SmartPaste: Learning to Adapt Source Code

Facial Expression Recognition Using Enhanced Deep 3D Convolutional Neural Networks

A Framework for Sharing Confidential Research Data, Applied to Investigating Differential Pay by Race in the U. S. Government

A unified approach to interpreting model predictions

The effects of noise and time delay on the synchronization of the Kuramoto model in small-world networks

Newton polytopes and symmetric Grothendieck polynomials

Block building programming for symbolic regression

TernGrad: Ternary Gradients to Reduce Communication in Distributed Deep Learning

Reducing Reparameterization Gradient Variance

Dynamic Partition of Complex Networks

Facial Affect Estimation in the Wild Using Deep Residual and Convolutional Networks

If you did not already know

Exponential Moving Average
An exponential moving average (EMA), also known as an exponentially weighted moving average (EWMA), is a type of infinite impulse response filter that applies weighting factors which decrease exponentially. The weighting for each older datum decreases exponentially, never reaching zero. The graph at right shows an example of the weight decrease. …
Exponential Moving Average (EMA) google
Pruned Exact Linear Time
This approach is based on the algorithm of Jackson et al. (2005 (‘An algorithm for optimal partitioning of data on an interval’)) , but involves a pruning step within the dynamic program. This pruning reduces the computational cost of the method, but does not affect the exactness of the resulting segmentation. It can be applied to find changepoints under a range of statistical criteria such as penalised likelihood, quasi-likelihood (Braun et al., 2000 (‘Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation’)) and cumulative sum of squares (Inclan and Tiao, 1994 (‘Use of cumulative sums of squares for retrospective detection of changes of variance.’); Picard et al., 2011 (‘Joint segmentation, calling and normalization of multiple cgh profiles’)). In simulations we compare PELT with both Binary Segmentation and Optimal Partitioning. We show that PELT can be calculated orders of magnitude faster than Optimal Partitioning, particularly for long data sets. Whilst asymptotically PELT can be quicker, we find that in practice Binary Segmentation is quicker on the examples we consider, and we believe this would be the case in almost all applications. However, we show that PELT leads to a substantially more accurate segmentation than Binary Segmentation. …
Pruned Exact Linear Time (PELT) google
Negative Binomial Regression
Negative binomial regression is for modeling count variables, usually for over-dispersed count outcome variables. …
Negative Binomial Regression (NBR) google

Document worth reading: “Unsupervised learning of phase transitions: from principle component analysis to variational autoencoders”

We employ unsupervised machine learning techniques to learn latent parameters which best describe states of the two-dimensional Ising model and the three-dimensional XY model. These methods range from principle component analysis to artificial neural network based variational autoencoders. The states are sampled using a Monte-Carlo simulation above and below the critical temperature. We find that the predicted latent parameters correspond to the known order parameters. The latent representation of the states of the models in question are clustered, which makes it possible to identify phases without prior knowledge of their existence or the underlying Hamiltonian. Furthermore, we find that the reconstruction loss function can be used as a universal identifier for phase transitions. Unsupervised learning of phase transitions: from principle component analysis to variational autoencoders

Distilled News

3 Ways to Move Your Data Science Into Production

In this live webinar, on May 24th at 11AM Central, learn how Anaconda empowers data scientists to encapsulate and deploy their data science projects as live applications with a single click.

Must-Know: Key issues and problems with A/B testing

A look at 2 topics in A/B testing: Ensuring that bucket assignment is truly random, and conducting an A/B test on an opt-in feature. KNIME Analytics Platform solves your complex data puzzles KNIME Analytics Platform

The Marcos Lopez de Prado Hierarchical Risk Parity Algorithm

This post will be about replicating the Marcos Lopez de Prado algorithm from his paper building diversified portfolios that outperform out of sample. This algorithm is one that attempts to make a tradeoff between the classic mean-variance optimization algorithm that takes into account a covariance structure, but is unstable, and an inverse volatility algorithm that ignores covariance, but is more stable. This is a paper that I struggled with until I ran the code in Python (I have anaconda installed but have trouble installing some packages such as keras because I’m on windows…would love to have someone walk me through setting up a Linux dual-boot), as I assumed that the clustering algorithm actually was able to concretely group every asset into a particular cluster (I.E. ETF 1 would be in cluster 1, ETF 2 in cluster 3, etc.). Turns out, that isn’t at all the case. Here’s how the algorithm actually works.

Instrumental Variables in R exercises (Part-2)

This is the second part of the series on Instrumental Variables. For other parts of the series follow the tag instrumental variables. In this exercise set we will build on the example from part-1. We will now consider an over-identified case i.e. we have multiple IVs for an endogenous variable. We will also look at tests for endogeneity and over-identifying restrictions.

How to analyze max-diff data in R

This post discusses a number of options that are available in R for analyzing data from max-diff experiments, using the package flipMaxDiff. For a more detailed explanation of how to analyze max-diff, and what the outputs mean, you should read the post How max-diff analysis works. The post will cover the processes of installing packages, importing your data and experimental design, before discussing counting analysis and the more powerful, and valid, latent class analysis.

Principal Components Analysis

Principal components analysis (PCA) is a statistical technique that allows to identify underlying linear patterns in a data set so it can be expressed in terms of other data set of significatively lower dimension without much loss of information. The final data set should be able to explain most of the variance of the original data set by making a variable reduction. The final variables will be named as principal components.

Book Memo: “Encyclopedia of Machine Learning and Data Mining”

This authoritative, expanded and updated second edition of Encyclopedia of Machine Learning and Data Mining provides easy access to core information for those seeking entry into any aspect within the broad field of Machine Learning and Data Mining. A paramount work, its 800 entries – about 150 of them newly updated or added – are filled with valuable literature references, providing the reader with a portal to more detailed information on any given topic.Topics for the Encyclopedia of Machine Learning and Data Mining include Learning and Logic, Data Mining, Applications, Text Mining, Statistical Learning, Reinforcement Learning, Pattern Mining, Graph Mining, Relational Mining, Evolutionary Computation, Information Theory, Behavior Cloning, and many others. Topics were selected by a distinguished international advisory board. Each peer-reviewed, highly-structured entry includes a definition, key words, an illustration, applications, a bibliography, and links to related literature.The entries are expository and tutorial, making this reference a practical resource for students, academics, or professionals who employ machine learning and data mining methods in their projects. Machine learning and data mining techniques have countless applications, including data science applications, and this reference is essential for anyone seeking quick access to vital information on the topic.

R Packages worth a look

Stubbing and Setting Expectations on ‘HTTP’ Requests (webmockr)
Stubbing and setting expectations on ‘HTTP’ requests. Includes tools for stubbing ‘HTTP’ requests, including expected request conditions and response conditions. Match on ‘HTTP’ method, query parameters, request body, headers and more.

alabama’ Plugin for the ‘R’ Optimization Infrastructure (ROI.plugin.alabama)
Enhances the R Optimization Infrastructure (‘ROI’) package with the ‘alabama’ solver for solving nonlinear optimization problems.

Automated Linear Regression Diagnostic (lindia)
Provides a set of streamlined functions that allow easy generation of linear regression diagnostic plots necessarily for checking linear model assumptions. This package is meant for easy scheming of linear regression diagnostics, while preserving merits of ‘The Grammar of Graphics’ as implemented in ‘ggplot2’. See the ‘ggplot2’ website for more information regarding the specific capability of graphics.

R Wrapper to the spaCy NLP Library (spacyr)
An R wrapper to the ‘Python’ ‘spaCy’ ‘NLP’ library, from <>.

Power Analysis Tool for Joint Testing Hazards with Competing Risks Data (powerCompRisk)
A power analysis tool for jointly testing the cause-1 cause-specific hazard and the any-cause hazard with competing risks data.

If you did not already know

Kernel Fisher Discriminant Analysis
In statistics, kernel Fisher discriminant analysis (KFD), also known as generalized discriminant analysis and kernel discriminant analysis, is a kernelized version of linear discriminant analysis. It is named after Ronald Fisher. Using the kernel trick, LDA is implicitly performed in a new feature space, which allows non-linear mappings to be learned.
“Linear Discriminant Analysis”
Kernel Fisher Discriminant Analysis (KFD,KFDA) google
Multi-Advisor Reinforcement Learning
This article deals with a novel branch of Separation of Concerns, called Multi-Advisor Reinforcement Learning (MAd-RL), where a single-agent RL problem is distributed to $n$ learners, called advisors. Each advisor tries to solve the problem with a different focus. Their advice is then communicated to an aggregator, which is in control of the system. For the local training, three off-policy bootstrapping methods are proposed and analysed: local-max bootstraps with the local greedy action, rand-policy bootstraps with respect to the random policy, and agg-policy bootstraps with respect to the aggregator’s greedy policy. MAd-RL is positioned as a generalisation of Reinforcement Learning with Ensemble methods. An experiment is held on a simplified version of the Ms. Pac-Man Atari game. The results confirm the theoretical relative strengths and weaknesses of each method. … Multi-Advisor Reinforcement Learning google
Neural Networks / Artificial Neural Networks
In computer science and related fields, artificial neural networks (ANNs) are computational models inspired by an animal’s central nervous systems (in particular the brain) which is capable of machine learning as well as pattern recognition. Artificial neural networks are generally presented as systems of interconnected “neurons” which can compute values from inputs. … Neural Networks / Artificial Neural Networks (ANN) google

If you did not already know

Partitional Clustering
Partitional clustering decomposes a data set into a set of disjoint clusters. Given a data set of N points, a partitioning method constructs K (N ≥ K) partitions of the data, with each partition representing a cluster. That is, it classifies the data into K groups by satisfying the following requirements:
(1) each group contains at least one point, and
(2) each point belongs to exactly one group. Notice that for fuzzy partitioning, a point can belong to more than one group.
Many partitional clustering algorithms try to minimize an objective function. …
Partitional Clustering google
Recurrent Collective Classification
We propose a new method for training iterative collective classifiers for labeling nodes in network data. The iterative classification algorithm (ICA) is a canonical method for incorporating relational information into classification. Yet, existing methods for training ICA models rely on the assumption that relational features reflect the true labels of the nodes. This unrealistic assumption introduces a bias that is inconsistent with the actual prediction algorithm. In this paper, we introduce recurrent collective classification (RCC), a variant of ICA analogous to recurrent neural network prediction. RCC accommodates any differentiable local classifier and relational feature functions. We provide gradient-based strategies for optimizing over model parameters to more directly minimize the loss function. In our experiments, this direct loss minimization translates to improved accuracy and robustness on real network data. We demonstrate the robustness of RCC in settings where local classification is very noisy, settings that are particularly challenging for ICA. … Recurrent Collective Classification (RCC) google
Data Acceleration
Data technologies are evolving rapidly, but organizations have adopted most of these in piecemeal fashion. As a result, enterprise data – whether related to customer interactions, business performance, computer notifications, or external events in the business environment – is vastly underutilized. Moreover, companies’ data ecosystems have become complex and littered with data silos. This makes the data more difficult to access, which in turn limits the value that organizations can get out of it. Indeed, according to a recent Gartner, Inc. report, 85 percent of Fortune 500 organizations will be unable to exploit Big Data for competitive advantage through 2015. Furthermore, a recent Accenture study found that half of all companies have concerns about the accuracy of their data, and the majority of executives are unclear about the business outcomes they are getting from their data analytics programs. To unlock the value hidden in their data, companies must start treating data as a supply chain, enabling it to flow easily and usefully through the entire organization – and eventually throughout each company’s ecosystem of partners, including suppliers and customers. The time is right for this approach. For one thing, new external data sources are becoming available, providing fresh opportunities for data insights. In addition, the tools and technology required to build a better data platform are available and in use. These provide a foundation on which companies can construct an integrated, end-to-end data supply chain. … Data Acceleration google