Whats new on arXiv

Deep Learning for Physical Processes: Incorporating Prior Scientific Knowledge

We consider the use of Deep Learning methods for modeling complex phenomena like those occurring in natural physical processes. With the large amount of data gathered on these phenomena the data intensive paradigm could begin to challenge more traditional approaches elaborated over the years in fields like maths or physics. However, despite considerable successes in a variety of application domains, the machine learning field is not yet ready to handle the level of complexity required by such problems. Using an example application, namely Sea Surface Temperature Prediction, we show how general background knowledge gained from physics could be used as a guideline for designing efficient Deep Learning models. In order to motivate the approach and to assess its generality we demonstrate a formal link between the solution of a class of differential equations underlying a large family of physical phenomena and the proposed model. Experiments and comparison with series of baselines including a state of the art numerical approach is then provided.

Self-Similarity Based Time Warping

In this work, we explore the problem of aligning two time-ordered point clouds which are spatially transformed and re-parameterized versions of each other. This has a diverse array of applications such as cross modal time series synchronization (e.g. MOCAP to video) and alignment of discretized curves in images. Most other works that address this problem attempt to jointly uncover a spatial alignment and correspondences between the two point clouds, or to derive local invariants to spatial transformations such as curvature before computing correspondences. By contrast, we sidestep spatial alignment completely by using self-similarity matrices (SSMs) as a proxy to the time-ordered point clouds, since self-similarity matrices are blind to isometries and respect global geometry. Our algorithm, dubbed ‘Isometry Blind Dynamic Time Warping’ (IBDTW), is simple and general, and we show that its associated dissimilarity measure lower bounds the L1 Gromov-Hausdorff distance between the two point sets when restricted to warping paths. We also present a local, partial alignment extension of IBDTW based on the Smith Waterman algorithm. This eliminates the need for tedious manual cropping of time series, which is ordinarily necessary for global alignment algorithms to function properly.

Residual Gated Graph ConvNets

Graph-structured data such as functional brain networks, social networks, gene regulatory networks, communications networks have brought the interest in generalizing neural networks to graph domains. In this paper, we are interested to de- sign efficient neural network architectures for graphs with variable length. Several existing works such as Scarselli et al. (2009); Li et al. (2016) have focused on recurrent neural networks (RNNs) to solve this task. A recent different approach was proposed in Sukhbaatar et al. (2016), where a vanilla graph convolutional neural network (ConvNets) was introduced. We believe the latter approach to be a better paradigm to solve graph learning problems because ConvNets are more pruned to deep networks than RNNs. For this reason, we propose the most generic class of residual multi-layer graph ConvNets that make use of an edge gating mechanism, as proposed in Marcheggiani & Titov (2017). Gated edges appear to be a natural property in the context of graph learning tasks, as the system has the ability to learn which edges are important or not for the task to solve. We apply several graph neural models to two basic network science tasks; subgraph matching and semi-supervised clustering for graphs with variable length. Numerical results show the performances of the new model.

Spec-QP: Speculative Query Planning for Joins over Knowledge Graphs

Organisations store huge amounts of data from multiple heterogeneous sources in the form of Knowledge Graphs (KGs). One of the ways to query these KGs is to use SPARQL queries over a database engine. Since SPARQL follows exact match semantics, the queries may return too few or no results. Recent works have proposed query relaxation where the query engine judiciously replaces a query predicate with similar predicates using weighted relaxation rules mined from the KG. The space of possible relaxations is potentially too large to fully explore and users are typically interested in only top-k results, so such query engines use top-k algorithms for query processing. However, they may still process all the relaxations, many of whose answers do not contribute towards top-k answers. This leads to computation overheads and delayed response times. We propose Spec-QP, a query planning framework that speculatively determines which relaxations will have their results in the top-k answers. Only these relaxations are processed using the top-k operators. We, therefore, reduce the computation overheads and achieve faster response times without adversely affecting the quality of results. We tested Spec-QP over two datasets – XKG and Twitter, to demonstrate the efficiency of our planning framework at reducing runtimes with reasonable accuracy for query engines supporting relaxations.

Sparse-Input Neural Networks for High-dimensional Nonparametric Regression and Classification

Neural networks are usually not the tool of choice for nonparametric high-dimensional problems where the number of input features is much larger than the number of observations. Though neural networks can approximate complex multivariate functions, they generally require a large number of training observations to obtain reasonable fits, unless one can learn the appropriate network structure. In this manuscript, we show that neural networks can be applied successfully to high-dimensional settings if the true function falls in a low dimensional subspace, and proper regularization is used. We propose fitting a neural network with a sparse group lasso penalty on the first-layer input weights, which results in a neural net that only uses a small subset of the original features. In addition, we characterize the statistical convergence of the penalized empirical risk minimizer to the optimal neural network: we show that the excess risk of this penalized estimator only grows with the logarithm of the number of input features; and we show that the weights of irrelevant features converge to zero. Via simulation studies and data analyses, we show that these sparse-input neural networks outperform existing nonparametric high-dimensional estimation methods when the data has complex higher-order interactions.

Event Representations with Tensor-based Compositions

Robust and flexible event representations are important to many core areas in language understanding. Scripts were proposed early on as a way of representing sequences of events for such understanding, and has recently attracted renewed attention. However, obtaining effective representations for modeling script-like event sequences is challenging. It requires representations that can capture event-level and scenario-level semantics. We propose a new tensor-based composition method for creating event representations. The method captures more subtle semantic interactions between an event and its entities and yields representations that are effective at multiple event-related tasks. With the continuous representations, we also devise a simple schema generation method which produces better schemas compared to a prior discrete representation based method. Our analysis shows that the tensors capture distinct usages of a predicate even when there are only subtle differences in their surface realizations.

Proximal Alternating Direction Network: A Globally Converged Deep Unrolling Framework

Deep learning models have gained great success in many real-world applications. However, most existing networks are typically designed in heuristic manners, thus lack of rigorous mathematical principles and derivations. Several recent studies build deep structures by unrolling a particular optimization model that involves task information. Unfortunately, due to the dynamic nature of network parameters, their resultant deep propagation networks do \emph{not} possess the nice convergence property as the original optimization scheme does. This paper provides a novel proximal unrolling framework to establish deep models by integrating experimentally verified network architectures and rich cues of the tasks. More importantly, we \emph{prove in theory} that 1) the propagation generated by our unrolled deep model globally converges to a critical-point of a given variational energy, and 2) the proposed framework is still able to learn priors from training data to generate a convergent propagation even when task information is only partially available. Indeed, these theoretical results are the best we can ask for, unless stronger assumptions are enforced. Extensive experiments on various real-world applications verify the theoretical convergence and demonstrate the effectiveness of designed deep models.

Genetic Algorithms for Evolving Deep Neural Networks

In recent years, deep learning methods applying unsupervised learning to train deep layers of neural networks have achieved remarkable results in numerous fields. In the past, many genetic algorithms based methods have been successfully applied to training neural networks. In this paper, we extend previous work and propose a GA-assisted method for deep learning. Our experimental results indicate that this GA-assisted approach improves the performance of a deep autoencoder, producing a sparser neural network.

Repulsion Loss: Detecting Pedestrians in a Crowd

Detecting individual pedestrians in a crowd remains a challenging problem since the pedestrians often gather together and occlude each other in real-world scenarios. In this paper, we first explore how a state-of-the-art pedestrian detector is harmed by crowd occlusion via experimentation, providing insights into the crowd occlusion problem. Then, we propose a novel bounding box regression loss specifically designed for crowd scenes, termed repulsion loss. This loss is driven by two motivations: the attraction by target, and the repulsion by other surrounding objects. The repulsion term prevents the proposal from shifting to surrounding objects thus leading to more crowd-robust localization. Our detector trained by repulsion loss outperforms all the state-of-the-art methods with a significant improvement in occlusion cases.

Understanding Deep Learning Generalization by Maximum Entropy

Deep learning achieves remarkable generalization capability with overwhelming number of model parameters. Theoretical understanding of deep learning generalization receives recent attention yet remains not fully explored. This paper attempts to provide an alternative understanding from the perspective of maximum entropy. We first derive two feature conditions that softmax regression strictly apply maximum entropy principle. DNN is then regarded as approximating the feature conditions with multilayer feature learning, and proved to be a recursive solution towards maximum entropy principle. The connection between DNN and maximum entropy well explains why typical designs such as shortcut and regularization improves model generalization, and provides instructions for future model development.

Detecting independence of random vectors II. Distance multivariance and Gaussian multivariance

We introduce two new measures for the dependence of n \ge 2 random variables: `distance multivariance’ and `total distance multivariance’. Both measures are based on the weighted L^2-distance of quantities related to the characteristic functions of the underlying random variables. They extend distance covariance (introduced by Szekely, Rizzo and Bakirov) and generalized distance covariance (introduced in part I) from pairs of random variables to n-tuplets of random variables. We show that total distance multivariance can be used to detect the independence of n random variables and has a simple finite-sample representation in terms of distance matrices of the sample points, where distance is measured by a continuous negative definite function. Based on our theoretical results, we present a test for independence of multiple random vectors which is consistent against all alternatives.

Detecting independence of random vectors I. Generalized distance covariance and Gaussian covariance

Distance covariance is a quantity to measure the dependence of two random vectors. We show that the original concept introduced and developed by Sz\’ekely, Rizzo and Bakirov can be embedded into a more general framework based on symmetric L\’evy measures and the corresponding real-valued continuous negative definite functions. The L\’evy measures replace the weight functions used in the original definition of distance covariance. All essential properties of distance covariance are preserved in this new framework and some proofs are streamlined. Form a practical point of view this allows less restrictive moment conditions on the underlying random variables and one can use other distance functions than the Euclidean distance, e.g. the Minkowski distance. Most importantly, it serves as the basic building block for distance multivariance, a quantity to measure and estimate dependence of multiple random vectors, which is introduced in the companion paper [Detecting independence of random vectors II: Distance multivariance and Gaussian multivariance] to the present article.

Hidden Tree Markov Networks: Deep and Wide Learning for Structured Data

The paper introduces the Hidden Tree Markov Network (HTN), a neuro-probabilistic hybrid fusing the representation power of generative models for trees with the incremental and discriminative learning capabilities of neural networks. We put forward a modular architecture in which multiple generative models of limited complexity are trained to learn structural feature detectors whose outputs are then combined and integrated by neural layers at a later stage. In this respect, the model is both deep, thanks to the unfolding of the generative models on the input structures, as well as wide, given the potentially large number of generative modules that can be trained in parallel. Experimental results show that the proposed approach can outperform state-of-the-art syntactic kernels as well as generative kernels built on the same probabilistic model as the HTN.

Visual and Textual Sentiment Analysis Using Deep Fusion Convolutional Neural Networks

Sentiment analysis is attracting more and more attentions and has become a very hot research topic due to its potential applications in personalized recommendation, opinion mining, etc. Most of the existing methods are based on either textual or visual data and can not achieve satisfactory results, as it is very hard to extract sufficient information from only one single modality data. Inspired by the observation that there exists strong semantic correlation between visual and textual data in social medias, we propose an end-to-end deep fusion convolutional neural network to jointly learn textual and visual sentiment representations from training examples. The two modality information are fused together in a pooling layer and fed into fully-connected layers to predict the sentiment polarity. We evaluate the proposed approach on two widely used data sets. Results show that our method achieves promising result compared with the state-of-the-art methods which clearly demonstrate its competency.

Universal Denoising Networks : A Novel CNN-based Network Architecture for Image Denoising

We design a novel network architecture for learning discriminative image models that are employed to efficiently tackle the problem of grayscale and color image denoising. Based on the proposed architecture, we introduce two different variants. The first network involves convolutional layers as a core component, while the second one relies instead on non-local filtering layers and thus it is able to exploit the inherent non-local self-similarity property of natural images. As opposed to most of the existing neural networks, which require the training of a specific model for each considered noise level, the proposed networks are able to handle a wide range of different noise levels, while they are very robust when the noise degrading the latent image does not match the statistics of the one used during training. The latter argument is supported by results that we report on publicly available images corrupted by unknown noise and which we compare against solutions obtained by alternative state-of-the-art methods. At the same time the introduced networks achieve excellent results under additive white Gaussian noise (AWGN), which are comparable to the current state-of-the-art network, while they depend on a more shallow architecture with the number of trained parameters being one order of magnitude smaller. These properties make the proposed networks ideal candidates to serve as sub-solvers on restoration methods that deal with general inverse imaging problems such as deblurring, demosaicking, superresolution, etc.

Adversarial Network Embedding

Learning low-dimensional representations of networks has proved effective in a variety of tasks such as node classification, link prediction and network visualization. Existing methods can effectively encode different structural properties into the representations, such as neighborhood connectivity patterns, global structural role similarities and other high-order proximities. However, except for objectives to capture network structural properties, most of them suffer from lack of additional constraints for enhancing the robustness of representations. In this paper, we aim to exploit the strengths of generative adversarial networks in capturing latent features, and investigate its contribution in learning stable and robust graph representations. Specifically, we propose an Adversarial Network Embedding (ANE) framework, which leverages the adversarial learning principle to regularize the representation learning. It consists of two components, i.e., a structure preserving component and an adversarial learning component. The former component aims to capture network structural properties, while the latter contributes to learning robust representations by matching the posterior distribution of the latent representations to given priors. As shown by the empirical results, our method is competitive with or superior to state-of-the-art approaches on benchmark network embedding tasks.

Functional Map of the World

We present a new dataset, Functional Map of the World (fMoW), which aims to inspire the development of machine learning models capable of predicting the functional purpose of buildings and land use from temporal sequences of satellite images and a rich set of metadata features. The metadata provided with each image enables reasoning about location, time, sun angles, physical sizes, and other features when making predictions about objects in the image. Our dataset consists of over 1 million images from over 200 countries. For each image, we provide at least one bounding box annotation containing one of 63 categories, including a ‘false detection’ category. We present an analysis of the dataset along with baseline approaches that reason about metadata and temporal views. Our data, code, and pretrained models have been made publicly available.

Autoencoder Node Saliency: Selecting Relevant Latent Representations

The autoencoder is an artificial neural network model that learns hidden representations of unlabeled data. With a linear transfer function it is similar to the principal component analysis (PCA). While both methods use weight vectors for linear transformations, the autoencoder does not come with any indication similar to the eigenvalues in PCA that are paired with the eigenvectors. We propose a novel supervised node saliency (SNS) method that ranks the hidden nodes by comparing class distributions of latent representations against a fixed reference distribution. The latent representations of a hidden node can be described using a one-dimensional histogram. We apply normalized entropy difference (NED) to measure the ‘interestingness’ of the histograms, and conclude a property for NED values to identify a good classifying node. By applying our methods to real data sets, we demonstrate the ability of SNS to explain what the trained autoencoders have learned.

Domain Generalization by Marginal Transfer Learning

Domain generalization is the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distribution-free, kernel-based approach that predicts a classifier from the marginal distribution of features, by leveraging the trends present in related classification tasks. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on synthetic data and three real data applications demonstrate the superiority of the method with respect to a pooling strategy.

Non-local Neural Networks

Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our non-local models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available.

The Combinatorics of Higher Derivatives of Implicit Functions
Robust Environmental Mapping by Mobile Sensor Networks
Optimistic Robust Optimization With Applications To Machine Learning
Non-Gaussian Autoregressive Processes with Tukey g-and-h Transformations
Treatment Effect Quantification for Time-to-event Endpoints — Estimands, Analysis Strategies, and beyond
Dropping Activation Outputs with Localized First-layer Deep Network for Enhancing User Privacy and Data Security
On k-neighbor separated permutations
Subgroup Identification and Interpretation with Bayesian Nonparametric Models in Health Care Claims Data
Path properties of the solution to the stochastic heat equation with Lévy noise
Transition density estimates for diagonal systems of SDEs driven by cylindrical $α$-stable processes
Convergence of Finite Element Methods for Singular Stochastic Control
On estimating the alphabet size of a discrete random source
Collective behavior of oscillating electric dipoles
Review on Parameter Estimation in HMRF
Unbiased Simulation for Optimizing Stochastic Function Compositions
Neural 3D Mesh Renderer
Edge Estimation with Independent Set Oracles
On Nearest Neighbors in Non Local Means Denoising
Finding Differentially Covarying Needles in a Temporally Evolving Haystack: A Scan Statistics Perspective
Chaos expansion of 2D parabolic Anderson model
CVXR: An R Package for Disciplined Convex Optimization
Pure state `really’ informationally complete with rank-1 POVM
Domination structure for number three
An Enhanced Middleware for Collaborative Privacy in IPTV Recommender Services
On the Distortion of Voting with Multiple Representative Candidates
Pixie: A System for Recommending 3+ Billion Items to 200+ Million Users in Real-Time
Non-spanning lattice 3-polytopes
Knowledge Concentration: Learning 100K Object Classifiers in a Single CNN
Are You Talking to Me? Reasoned Visual Dialog Generation through Adversarial Learning
Asking the Difficult Questions: Goal-Oriented Visual Question Generation via Intermediate Rewards
Dynamic Distributed Storage for Scaling Blockchains
$S^4$Net: Single Stage Salient-Instance Segmentation
Groupwise Maximin Fair Allocation of Indivisible Goods
A deep learning-based method for relative location prediction in CT scan images
On statistical approaches to generate Level 3 products from remote sensing retrievals
Bounds on Fractional Repetition Codes using Hypergraphs
Generating Thematic Chinese Poetry with Conditional Variational Autoencoder
High-Dimensional Multivariate Posterior Consistency Under Global-Local Shrinkage Priors
Towards a More Reliable Privacy-preserving Recommender System
HybridTune: Spatio-temporal Data and Model Driven Performance Diagnosis for Big Data Systems
Multi-Image Semantic Matching by Mining Consistent Features
Evaluating Machine Translation Performance on Chinese Idioms with a Blacklist Method
Cross Temporal Recurrent Networks for Ranking Question Answer Pairs
Fullie and Wiselie: A Dual-Stream Recurrent Convolutional Attention Model for Activity Recognition
Asymptotic independence of regenerative processes with dependent cycles
The Pontryagin Maximum Principle in the Wasserstein Space
Massive MIMO for Drone Communications: Applications, Case Studies and Future Directions
Mondrian Processes for Flow Cytometry Analysis
Transferring Agent Behaviors from Videos via Motion GANs
Controllability under positivity constraints of semilinear heat equations
Induced subgraphs of graphs with large chromatic number. XI. Orientations
Asymptotically Minimax Robust Hypothesis Testing
JamBot: Music Theory Aware Chord Based Generation of Polyphonic Music with LSTMs
A two-dimensional decomposition approach for matrix completion through gossip
Regret Analysis for Continuous Dueling Bandit
Fully Convolutional Neural Networks for Page Segmentation of Historical Document Images
Construction of asymptotically good locally repairable codes via automorphism groups of function fields
Multiplicity One property of The Length Spectra of Simple Regular Periodic Graphs
Wick order, spreadability and exchangeability for monotone commutation relations
Approximating Geometric Knapsack via L-packings
Invariant measures of discrete interacting particle systems: Algebraic aspects
Residual Parameter Transfer for Deep Domain Adaptation
Partially Observed Functional Data: The Case of Systematically Missing Parts
Total Variation-Based Dense Depth from Multi-Camera Array
The Application of Preconditioned Alternating Direction Method of Multipliers in Depth from Focal Stack
On the Turán number of ordered forests
Using stochastic computation graphs formalism for optimization of sequence-to-sequence model
A New Approach for Solving the Market Clearing Problem With Uniform Purchase Price and Curtailable Block Orders
Variational Probability Flow for Biologically Plausible Training of Deep Neural Networks
Non-uniform Replication
The Turan number of 2P_7
On $P_5$-free Chordal bipartite graphs
The Hidden Binary Search Tree:A Balanced Rotation-Free Search Tree in the AVL RAM Model
Model-based Clustering with Sparse Covariance Matrices
Continuity equation in LlogL for the 2D Euler equations under the enstrophy measure
Beyond Accuracy Optimization: On the Value of Item Embeddings for Student Job Recommendations
Data Assimilation for a Geological Process Model Using the Ensemble Kalman Filter
Receptive Field Block Net for Accurate and Fast Object Detection
Localisation in a growth model with interaction
Ultra-Reliable Low Latency Communication (URLLC) using Interface Diversity
A Geometric Approach to Spectral Analysis
Energy-Efficient Transmission Strategies for CoMP Downlink – Overview, Extension, and Numerical Comparison
Presentations of the saturated cluster modular groups of finite mutation type $X_6$ and $X_7$
Hierarchical internal representation of spectral features in deep convolutional networks trained for EEG decoding
Efficient Multi-Person Pose Estimation with Provable Guarantees
Uniqueness of Dirichlet forms related to infinite systems of interacting Brownian motions
Why ‘Redefining Statistical Significance’ Will Not Improve Reproducibility and Could Make the Replication Crisis Worse
Approaching Miscorrection-free Performance of Product and Generalized Product Codes
Jaccard analysis and LASSO-based feature selection for location fingerprinting with limited computational complexity
On the EM-Tau algorithm: a new EM-style algorithm with partial E-steps
Evaluation of bioinspired algorithms for the solution of the job scheduling problem
Efficient Implementation of a Recognition System Using the Cortex Ventral Stream Model
A smartphone application to measure the quality of pest control spraying machines via image analysis
Discussion among Different Methods of Updating Model Filter in Object Tracking
Robust Object Tracking Based on Self-adaptive Search Area
UnFlow: Unsupervised Learning of Optical Flow with a Bidirectional Census Loss
Application of generative autoencoder in de novo molecular design
Approximation Algorithms for Rectangle Packing Problems (PhD Thesis)
Accurate Semidefinite Programming Models for Optimal Power Flow in Distribution Systems
Mobility and Popularity-Aware Coded Small-Cell Caching
Integrable Combinatorics
Dynamic topologies of activity-driven temporal networks with memory
Universal minimal flows of generalized Ważewski dendrites
Revisiting Connected Vertex Cover: FPT Algorithms and Lossy Kernels
Constructive Preference Elicitation over Hybrid Combinatorial Spaces
Training large margin host-pathogen protein-protein interaction predictors
SilNet : Single- and Multi-View Reconstruction by Learning from Silhouettes
A New Algorithm to Fit Exponential Decays
Effective Strategies in Zero-Shot Neural Machine Translation
Quantifying Performance of Bipedal Standing with Multi-channel EMG
Two-Archive Evolutionary Algorithm for Constrained Multi-Objective Optimization
Effective Use of Bidirectional Language Modeling for Medical Named Entity Recognition
Optimal Sleeping Mechanism for Multiple Servers with MMPP-Based Bursty Traffic Arrival
Asymptotic Close To Optimal Joint Resource Allocation and Power Control in the Uplink of Two-cell Networks
10Sent: A Stable Sentiment Analysis Method Based on the Combination of Off-The-Shelf Approaches
Dimension Drop for Transient Random Walks on Galton-Watson Trees in Random Environments
Kullback-Leibler Principal Component for Tensors is not NP-hard
Aperture Supervision for Monocular Depth Estimation
Randomization Bias in Field Trials to Evaluate Targeting Methods
Mastering the Dungeon: Grounded Language Learning by Mechanical Turker Descent
Cellular Automata Simulation on FPGA for Training Neural Networks with Virtual World Imagery
Time-Limited Toeplitz Operators on Abelian Groups: Applications in Information Theory and Subspace Approximation
Fine-Grained I/O Complexity via Reductions: New lower bounds, faster algorithms, and a time hierarchy
Local cohomology and the multi-graded regularity of FI$^m$-modules
Deep Learning for Real-time Gravitational Wave Detection and Parameter Estimation with Advanced LIGO Data
A Compositional Treatment of Iterated Open Games
WAYLA – Generating Images from Eye Movements
Low-Complexity Integer-Forcing Methods for Block Fading MIMO Multiple-Access Channels


Distilled News

Using an R ‘template’ package to enhance reproducible research or the ‘R package syndrome’

Have you ever had the feeling that the creation of your data analysis report(s) resulted in looking up, copy-pasting and reuse of code from previous analyses? This approach is time consuming and prone to errors. If you frequently analyze similar data(-types), e.g. from a standardized analysis workflow or different experiments on the same platform, the automation of your report creation via an R ‘template’ package might be a very useful and time-saving step. It also allows to focus on the important part of the analysis (i.e. the experiment or analysis-specific part). Imagine that you need to analyze tens or hundreds of runs of data in the same format, making use of an R ‘template’ package can save you hours, days or even weeks. On the go, reports can be adjusted, errors corrected and extensions added without much effort.

CNN for Short-Term Stocks Prediction using Tensorflow

In machine learning, a convolutional neural network (CNN, or ConvNet) is a class of neural networks that has successfully been applied to image recognition and analysis. In this project I’ve approached this class of models trying to apply it to stock market prediction, combining stock prices with sentiment analysis. The implementation of the network has been made using TensorFlow, starting from the online tutorial. In this article, I will describe the following steps: dataset creation, CNN training and evaluation of the model.

Using Data Analytics to Prevent, Not Just Report

I recently had another client conversation about optimizing their data warehouse and Business Intelligence (BI) environment. The client had lots of pride in their existing data warehouse and business intelligence accomplishments, and rightfully so. The heart of the conversation was about taking costs out of their reporting environments by consolidating runaway data marts and “spreadmarts,” and improving business analyst BI self-sufficiency. These types of conversations are good – saving money and improving effectiveness is always a good thing – but organizations need to be careful that they are not just “paving the cow path.” That is, are they just optimizing existing (old school) processes when new methodologies exist that can possibly eliminate those processes? Or as I challenged the customer: “Do you want to report, or do you want to prevent?”

State-of-the-art result for all Machine Learning Problems

This repository provides state-of-the-art (SoTA) results for all machine learning problems.

Estimating an Optimal Learning Rate For a Deep Neural Network

The learning rate is one of the most important hyper-parameters to tune for training deep neural networks. In this post, I’m describing a simple and powerful way to find a reasonable learning rate that I learned from Deep Learning course. I’m taking the new version of the course in person at University of San Francisco. It’s not available to the general public yet, but will be at the end of the year at (which currently has the last year’s version).

Timing in R

As time goes on, your R scripts are probably getting longer and more complicated, right? Timing parts of your script could save you precious time when re-running code over and over again. Today I’m going to go through the 4 main functions for doing so.

Store Data About Rows

Introduction to keyholder package. Tools for keeping track of information about rows.
• It might be a good idea to extract some package functionality into separate package, as this can lead to one more useful tool.
• Package keyholder offers functionality for keeping track of arbitrary data about rows after application of some user defined function. This is done by creating special attribute “keys” which is updated after every change in rows (subsetting, ordering, etc.).

Book Memo: “The Algorithm Design Manual”

Most professional programmers that I’ve encountered are not well prepared to tackle algorithm design problems. This is a pity, because the techniques of algorithm design form one of the core practical technologies of computer science. Designing correct, efficient, and implementable algorithms for real-world problems requires access to two distinct bodies of knowledge: • Techniques – Good algorithm designers understand several fundamental algorithm design techniques, including data structures, dynamic programming, depth first search, backtracking, and heuristics. Perhaps the single most important design technique is modeling, the art of abstracting a messy real-world application into a clean problem suitable for algorithmic attack. • Resources – Good algorithm designers stand on the shoulders of giants. Rather than laboring from scratch to produce a new algorithm for every task, they can figure out what is known about a particular problem. Rather than re-implementing popular algorithms from scratch, they seek existing implementations to serve as a starting point. They are familiar with many classic algorithmic problems, which provide sufficient source material to model most any application. This book is intended as a manual on algorithm design, providing access to combinatorial algorithm technology for both students and computer professionals.

R Packages worth a look

Clustering with Overlaps (COveR)
Provide functions for overlaps clustering, fuzzy clustering and interval-valued data manipulation. The package implement the following algorithms: OKM (Overlapping Kmeans) from Cleuziou, G. (2007) <doi:10.1109/icpr.2008.4761079> ; NEOKM (Non-exhaustive overlapping Kmeans) from Whang, J. J., Dhillon, I. S., and Gleich, D. F. (2015) <doi:10.1137/1.9781611974010.105> ; Fuzzy Cmeans from Bezdek, J. C. (1981) <doi:10.1007/978-1-4757-0450-1> ; Fuzzy I-Cmeans from de A.T. De Carvalho, F. (2005) <doi:10.1016/j.patrec.2006.08.014>.

Multiphase Optimization Strategy (MOST)
Provides functions similar to the ‘SAS’ macros previously provided to accompany Collins, Dziak, and Li (2009) <DOI:10.1037/a0015826> and Dziak, Nahum-Shani, and Collins (2012) <DOI:10.1037/a0026972>, papers which outline practical benefits and challenges of factorial and fractional factorial experiments for scientists interested in developing biological and/or behavioral interventions, especially in the context of the multiphase optimization strategy (see Collins, Kugler & Gwadz 2016) <DOI:10.1007/s10461-015-1145-4>. The package currently contains three functions. First, RelativeCosts1() draws a graph of the relative cost of complete and reduced factorial designs versus other alternatives. Second, RandomAssignmentGenerator() returns a dataframe which contains a list of random numbers that can be used to conveniently assign participants to conditions in an experiment with many conditions. Third, FactorialPowerPlan() estimates the power, detectable effect size, or required sample size of a factorial or fractional factorial experiment, for main effects or interactions, given several possible choices of effect size metric, and allowing pretests and clustering.

Geometrically Designed Spline Regression (GeDS)
Geometrically Designed Spline (‘GeDS’) Regression is a non-parametric geometrically motivated method for fitting variable knots spline predictor models in one or two independent variables, in the context of generalized (non-)linear models. ‘GeDS’ estimates the number and position of the knots and the order of the spline, assuming the response variable has a distribution from the exponential family. A description of the method can be found in Kaishev et al. (2016) <doi:10.1007/s00180-015-0621-7> and Dimitrova et al. (2017) <https://…/18460>.

Interface to ‘gretlcli’ (Rgretl)
An interface to ‘GNU gretl’: running ‘gretl’ scripts from, estimating econometric models with backward passing of model results, opening ‘gretl’ data files (.gdt). ‘gretl’ can be downloaded from <>. This package could make life on introductory/intermediate econometrics courses much easier: full battery of the required regression diagnostics, including White’s heteroskedasticity test, restricted ols estimation, advanced weak instrument test after iv estimation, very convenient dealing with lagged variables in models, standard case treatment in unit root tests, vector auto- regressions, and vector error correction models. Datasets for 8 popular econometrics textbooks can be installed into ‘gretl’ from its server. All datasets can be easily imported using this package.

Combined Graphs for Logistic Regression (logihist)
Provides histograms, boxplots and dotplots as alternatives to scatterplots of data when plotting fitted logistic regressions.

Ordinal Data Clustering, Co-Clustering and Classification (ordinalClust)
Ordinal data classification, clustering and co-clustering using model-based approach with the Bos distribution for ordinal data (Christophe Biernacki and Julien Jacques (2016) <doi:10.1007/s11222-015-9585-2>).

If you did not already know

Random Variable google
In probability and statistics, a random variable, aleatory variable or stochastic variable is a variable whose value is subject to variations due to chance (i.e. randomness, in a mathematical sense). A random variable can take on a set of possible different values (similarly to other mathematical variables), each with an associated probability, in contrast to other mathematical variables. A random variable’s possible values might represent the possible outcomes of a yet-to-be-performed experiment, or the possible outcomes of a past experiment whose already-existing value is uncertain (for example, due to imprecise measurements or quantum uncertainty). They may also conceptually represent either the results of an ‘objectively’ random process (such as rolling a die) or the ‘subjective’ randomness that results from incomplete knowledge of a quantity. The meaning of the probabilities assigned to the potential values of a random variable is not part of probability theory itself but is instead related to philosophical arguments over the interpretation of probability. The mathematics works the same regardless of the particular interpretation in use. The mathematical function describing the possible values of a random variable and their associated probabilities is known as a probability distribution. Random variables can be discrete, that is, taking any of a specified finite or countable list of values, endowed with a probability mass function, characteristic of a probability distribution; or continuous, taking any numerical value in an interval or collection of intervals, via a probability density function that is characteristic of a probability distribution; or a mixture of both types. The realizations of a random variable, that is, the results of randomly choosing values according to the variable’s probability distribution function, are called random variates. The formal mathematical treatment of random variables is a topic in probability theory. In that context, a random variable is understood as a function defined on a sample space whose outputs are numerical values. …

Cognitive Analytics google
Cognitive Analytics: A hybrid of several disparate disciplines, methods, and practical technologies. …

Backwards Analysis google
The idea of backwards analysis (or backward analysis) is a technique to analyze randomized algorithms by imagining as if it was running backwards in time, from output to input. Most of the more interesting applications of backward analysis are in Computational Geometry, but nevertheless, there are some other applications that are interesting and we survey some of them here. …

Document worth reading: “Advances in Variational Inference”

Many modern unsupervised or semi-supervised machine learning algorithms rely on Bayesian probabilistic models. These models are usually intractable and thus require approximate inference. Variational inference (VI) lets us approximate a high-dimensional Bayesian posterior with a simpler variational distribution by solving an optimization problem. This approach has been successfully used in various models and large-scale applications. In this review, we give an overview of recent trends in variational inference. We first introduce standard mean field variational inference, then review recent advances focusing on the following aspects: (a) scalable VI, which includes stochastic approximations, (b) generic VI, which extends the applicability of VI to a large class of otherwise intractable models, such as non-conjugate models, (c) accurate VI, which includes variational models beyond the mean field approximation or with atypical divergences, and (d) amortized VI, which implements the inference over local latent variables with inference networks. Finally, we provide a summary of promising future research directions. Advances in Variational Inference

Whats new on arXiv

Deep learning for inferring cause of data anomalies

Daily operation of a large-scale experiment is a resource consuming task, particularly from perspectives of routine data quality monitoring. Typically, data comes from different sub-detectors and the global quality of data depends on the combinatorial performance of each of them. In this paper, the problem of identifying channels in which anomalies occurred is considered. We introduce a generic deep learning model and prove that, under reasonable assumptions, the model learns to identify ‘channels’ which are affected by an anomaly. Such model could be used for data quality manager cross-check and assistance and identifying good channels in anomalous data samples. The main novelty of the method is that the model does not require ground truth labels for each channel, only global flag is used. This effectively distinguishes the model from classical classification methods. Being applied to CMS data collected in the year 2010, this approach proves its ability to decompose anomaly by separate channels.

Deep Reinforcement Learning for Multi-Resource Multi-Machine Job Scheduling

Minimizing job scheduling time is a fundamental issue in data center networks that has been extensively studied in recent years. The incoming jobs require different CPU and memory units, and span different number of time slots. The traditional solution is to design efficient heuristic algorithms with performance guarantee under certain assumptions. In this paper, we improve a recently proposed job scheduling algorithm using deep reinforcement learning and extend it to multiple server clusters. Our study reveals that deep reinforcement learning method has the potential to outperform traditional resource allocation algorithms in a variety of complicated environments.

Adversarial Attacks Beyond the Image Space

Generating adversarial examples is an intriguing problem and an important way of understanding the working mechanism of deep neural networks. Recently, it has attracted a lot of attention in the computer vision community. Most existing approaches generated perturbations in image space, i.e., each pixel can be modified independently. However, it remains unclear whether these adversarial examples are authentic, in the sense that they correspond to actual changes in physical properties. This paper aims at exploring this topic in the contexts of object classification and visual question answering. The baselines are set to be several state-of-the-art deep neural networks which receive 2D input images. We augment these networks with a differentiable 3D rendering layer in front, so that a 3D scene (in physical space) is rendered into a 2D image (in image space), and then mapped to a prediction (in output space). There are two (direct or indirect) ways of attacking the physical parameters. The former back-propagates the gradients of error signals from output space to physical space directly, while the latter first constructs an adversary in image space, and then attempts to find the best solution in physical space that is rendered into this image. An important finding is that attacking physical space is much more difficult, as the direct method, compared with that used in image space, produces a much lower success rate and requires heavier perturbations to be added. On the other hand, the indirect method does not work out, suggesting that adversaries generated in image space are inauthentic. By interpreting them in physical space, most of these adversaries can be filtered out, showing promise in defending adversaries.

Verifying Neural Networks with Mixed Integer Programming

Neural networks have demonstrated considerable success in a wide variety of real-world problems. However, the presence of adversarial examples – slightly perturbed inputs that are misclassified with high confidence – limits our ability to guarantee performance for these networks in safety-critical applications. We demonstrate that, for networks that are piecewise affine (for example, deep networks with ReLU and maxpool units), proving no adversarial example exists – or finding the closest example if one does exist – can be naturally formulated as solving a mixed integer program. Solves for a fully-connected MNIST classifier with three hidden layers can be completed an order of magnitude faster than those of the best existing approach. To address the concern that adversarial examples are irrelevant because pixel-wise attacks are unlikely to happen in natural images, we search for adversaries over a natural class of perturbations written as convolutions with an adversarial blurring kernel. When searching over blurred images, we find that as opposed to pixelwise attacks, some misclassifications are impossible. Even more interestingly, a small fraction of input images are provably robust to blurs: every blurred version of the input is classified with the same, correct label.

The Promise and Peril of Human Evaluation for Model Interpretability

Transparency, user trust, and human comprehension are popular ethical motivations for interpretable machine learning. In support of these goals, researchers evaluate model explanation performance using humans and real world applications. This alone presents a challenge in many areas of artificial intelligence. In this position paper, we propose a distinction between descriptive and persuasive explanations. We discuss reasoning suggesting that functional interpretability may be correlated with cognitive function and user preferences. If this is indeed the case, evaluation and optimization using functional metrics could perpetuate implicit cognitive bias in explanations that threaten transparency. Finally, we propose two potential research directions to disambiguate cognitive function and explanation models, retaining control over the tradeoff between accuracy and interpretability.

Variable selection with genetic algorithms using repeated cross-validation of PLS regression models as fitness measure

Genetic algorithms are a widely used method in chemometrics for extracting variable subsets with high prediction power. Most fitness measures used by these genetic algorithms are based on the ordinary least-squares fit of the resulting model to the entire data or a subset thereof. Due to multicollinearity, partial least squares regression is often more appropriate, but rarely considered in genetic algorithms due to the additional cost for estimating the optimal number of components. We introduce two novel fitness measures for genetic algorithms, explicitly designed to estimate the internal prediction performance of partial least squares regression models built from the variable subsets. Both measures estimate the optimal number of components using cross-validation and subsequently estimate the prediction performance by predicting the response of observations not included in model-fitting. This is repeated multiple times to estimate the measures’ variations due to different random splits. Moreover, one measure was optimized for speed and more accurate estimation of the prediction performance for observations not included during variable selection. This leads to variable subsets with high internal and external prediction power. Results on high-dimensional chemical-analytical data show that the variable subsets acquired by this approach have competitive internal prediction power and superior external prediction power compared to variable subsets extracted with other fitness measures.

Learning to Organize Knowledge with N-Gram Machines

Deep neural networks (DNNs) had great success on NLP tasks such as language modeling, machine translation and certain question answering (QA) tasks. However, the success is limited at more knowledge intensive tasks such as QA from a big corpus. Existing end-to-end deep QA models (Miller et al., 2016; Weston et al., 2014) need to read the entire text after observing the question, and therefore their complexity in responding a question is linear in the text size. This is prohibitive for practical tasks such as QA from Wikipedia, a novel, or the Web. We propose to solve this scalability issue by using symbolic meaning representations, which can be indexed and retrieved efficiently with complexity that is independent of the text size. More specifically, we use sequence-to-sequence models to encode knowledge symbolically and generate programs to answer questions from the encoded knowledge. We apply our approach, called the N-Gram Machine (NGM), to the bAbI tasks (Weston et al., 2015) and a special version of them (‘life-long bAbI’) which has stories of up to 10 million sentences. Our experiments show that NGM can successfully solve both of these tasks accurately and efficiently. Unlike fully differentiable memory models, NGM’s time complexity and answering quality are not affected by the story length. The whole system of NGM is trained end-to-end with REINFORCE (Williams, 1992). To avoid high variance in gradient estimation, which is typical in discrete latent variable models, we use beam search instead of sampling. To tackle the exponentially large search space, we use a stabilized auto-encoding objective and a structure tweak procedure to iteratively reduce and refine the search space.

Leave no Trace: Learning to Reset for Safe and Autonomous Reinforcement Learning

Deep reinforcement learning algorithms can learn complex behavioral skills, but real-world application of these methods requires a large amount of experience to be collected by the agent. In practical settings, such as robotics, this involves repeatedly attempting a task, resetting the environment between each attempt. However, not all tasks are easily or automatically reversible. In practice, this learning process requires extensive human intervention. In this work, we propose an autonomous method for safe and efficient reinforcement learning that simultaneously learns a forward and reset policy, with the reset policy resetting the environment for a subsequent attempt. By learning a value function for the reset policy, we can automatically determine when the forward policy is about to enter a non-reversible state, providing for uncertainty-aware safety aborts. Our experiments illustrate that proper use of the reset policy can greatly reduce the number of manual resets required to learn a task, can reduce the number of unsafe actions that lead to non-reversible states, and can automatically induce a curriculum.

Tree-Structured Boosting: Connections Between Gradient Boosted Stumps and Full Decision Trees

Additive models, such as produced by gradient boosting, and full interaction models, such as classification and regression trees (CART), are widely used algorithms that have been investigated largely in isolation. We show that these models exist along a spectrum, revealing never-before-known connections between these two approaches. This paper introduces a novel technique called tree-structured boosting for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although tree-structured boosting is designed primarily to provide both the model interpretability and predictive performance needed for high-stake applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.

FluidNets: Fast & Simple Resource-Constrained Structure Learning of Deep Networks

We present FluidNets, an approach to automate the design of neural network structures. FluidNets iteratively shrinks and expands a network, shrinking via a resource-weighted sparsifying regularizer on activations and expanding via a uniform multiplicative factor on all layers. In contrast to previous approaches, our method is scalable to large networks, adaptable to specific resource constraints (e.g. the number of floating-point operations per inference), and capable of increasing the network’s performance. When applied to standard network architectures on a wide variety of datasets, our approach discovers novel structures in each domain, obtaining higher performance while respecting the resource constraint.

Deep Gaussian Mixture Models

Deep learning is a hierarchical inference method formed by subsequent multiple layers of learning able to more efficiently describe complex relationships. In this work, Deep Gaussian Mixture Models are introduced and discussed. A Deep Gaussian Mixture model (DGMM) is a network of multiple layers of latent variables, where, at each layer, the variables follow a mixture of Gaussian distributions. Thus, the deep mixture model consists of a set of nested mixtures of linear models, which globally provide a nonlinear model able to describe the data in a very flexible way. In order to avoid overparameterized solutions, dimension reduction by factor models can be applied at each layer of the architecture thus resulting in deep mixtures of factor analysers.

Interleaver Design for Deep Neural Networks

We propose a class of interleavers for a novel deep neural network (DNN) architecture that uses algorithmically pre-determined, structured sparsity to significantly lower memory and computational requirements, and speed up training. The interleavers guarantee clash-free memory accesses to eliminate idle operational cycles, optimize spread and dispersion to improve network performance, and are designed to ease the complexity of memory address computations in hardware. We present a design algorithm with mathematical proofs for these properties. We also explore interleaver variations and analyze the behavior of neural networks as a function of interleaver metrics.

Decentralized High-Dimensional Bayesian Optimization with Factor Graphs

This paper presents a novel decentralized high-dimensional Bayesian optimization (DEC-HBO) algorithm that, in contrast to existing HBO algorithms, can exploit the interdependent effects of various input components on the output of the unknown objective function f for boosting the BO performance and still preserve scalability in the number of input dimensions without requiring prior knowledge or the existence of a low (effective) dimension of the input space. To realize this, we propose a sparse yet rich factor graph representation of f to be exploited for designing an acquisition function that can be similarly represented by a sparse factor graph and hence be efficiently optimized in a decentralized manner using distributed message passing. Despite richly characterizing the interdependent effects of the input components on the output of f with a factor graph, DEC-HBO can still guarantee no-regret performance asymptotically. Empirical evaluation on synthetic and real-world experiments (e.g., sparse Gaussian process model with 1811 hyperparameters) shows that DEC-HBO outperforms the state-of-the-art HBO algorithms.

Prior-aware Dual Decomposition: Document-specific Topic Inference for Spectral Topic Models

Spectral topic modeling algorithms operate on matrices/tensors of word co-occurrence statistics to learn topic-specific word distributions. This approach removes the dependence on the original documents and produces substantial gains in efficiency and provable topic inference, but at a cost: the model can no longer provide information about the topic composition of individual documents. Recently Thresholded Linear Inverse (TLI) is proposed to map the observed words of each document back to its topic composition. However, its linear characteristics limit the inference quality without considering the important prior information over topics. In this paper, we evaluate Simple Probabilistic Inverse (SPI) method and novel Prior-aware Dual Decomposition (PADD) that is capable of learning document-specific topic compositions in parallel. Experiments show that PADD successfully leverages topic correlations as a prior, notably outperforming TLI and learning quality topic compositions comparable to Gibbs sampling on various data.

Structured Stein Variational Inference for Continuous Graphical Models

We propose a novel distributed inference algorithm for continuous graphical models by extending Stein variational gradient descent (SVGD) to leverage the Markov dependency structure of the distribution of interest. The idea is to use a set of local kernel functions over the Markov blanket of each node, which alleviates the problem of the curse of high dimensionality and simultaneously yields a distributed algorithm for decentralized inference tasks. We justify our method with theoretical analysis and show that the use of local kernels can be viewed as a new type of localized approximation that matches the target distribution on the conditional distributions of each node over its Markov blanket. Our empirical results demonstrate that our method outperforms a variety of baselines including standard MCMC and particle message passing methods.

Classification with Costly Features using Deep Reinforcement Learning

We study a classification problem where each feature can be acquired for a cost and the goal is to optimize the trade-off between classification precision and the total feature cost. We frame the problem as a sequential decision-making problem, where we classify one sample in each episode. At each step, an agent can use values of acquired features to decide whether to purchase another one or whether to classify the sample. We use vanilla Double Deep Q-learning, a standard reinforcement learning technique, to find a classification policy. We show that this generic approach outperforms Adapt-Gbrt, currently the best-performing algorithm developed specifically for classification with costly features.

Deep Approximately Orthogonal Nonnegative Matrix Factorization for Clustering

Nonnegative Matrix Factorization (NMF) is a widely used technique for data representation. Inspired by the expressive power of deep learning, several NMF variants equipped with deep architectures have been proposed. However, these methods mostly use the only nonnegativity while ignoring task-specific features of data. In this paper, we propose a novel deep approximately orthogonal nonnegative matrix factorization method where both nonnegativity and orthogonality are imposed with the aim to perform a hierarchical clustering by using different level of abstractions of data. Experiment on two face image datasets showed that the proposed method achieved better clustering performance than other deep matrix factorization methods and state-of-the-art single layer NMF variants.

Bidirectional Conditional Generative Adversarial Networks

Conditional variants of Generative Adversarial Networks (GANs), known as cGANs, are generative models that can produce data samples (x) conditioned on both latent variables (z) and known auxiliary information (c). Another GAN variant, Bidirectional GAN (BiGAN) is a recently developed framework for learning the inverse mapping from x to z through an encoder trained simultaneously with the generator and the discriminator of an unconditional GAN. We propose the Bidirectional Conditional GAN (BCGAN), which combines cGANs and BiGANs into a single framework with an encoder that learns inverse mappings from x to both z and c, trained simultaneously with the conditional generator and discriminator in an end-to-end setting. We present crucial techniques for training BCGANs, which incorporate an extrinsic factor loss along with an associated dynamically-tuned importance weight. As compared to other encoder-based GANs, BCGANs not only encode c more accurately but also utilize z and c more effectively and in a more disentangled way to generate data samples.

Better Agnostic Clustering Via Relaxed Tensor Norms

We develop a new family of convex relaxations for k-means clustering based on sum-of-squares norms, a relaxation of the injective tensor norm that is efficiently computable using the Sum-of-Squares algorithm. We give an algorithm based on this relaxation that recovers a faithful approximation to the true means in the given data whenever the low-degree moments of the points in each cluster have bounded sum-of-squares norms. We then prove a sharp upper bound on the sum-of-squares norms for moment tensors of any distribution that satisfies the \emph{Poincare inequality}. The Poincare inequality is a central inequality in probability theory, and a large class of distributions satisfy it including Gaussians, product distributions, strongly log-concave distributions, and any sum or uniformly continuous transformation of such distributions. As an immediate corollary, for any \gamma > 0, we obtain an efficient algorithm for learning the means of a mixture of k arbitrary \Poincare distributions in \mathbb{R}^d in time d^{O(1/\gamma)} so long as the means have separation \Omega(k^{\gamma}). This in particular yields an algorithm for learning Gaussian mixtures with separation \Omega(k^{\gamma}), thus partially resolving an open problem of Regev and Vijayaraghavan \citet{regev2017learning}. Our algorithm works even in the outlier-robust setting where an \epsilon fraction of arbitrary outliers are added to the data, as long as the fraction of outliers is smaller than the smallest cluster. We, therefore, obtain results in the strong agnostic setting where, in addition to not knowing the distribution family, the data itself may be arbitrarily corrupted.

Recovering Lexicographic Triangulations
Fusing Bird View LIDAR Point Cloud and Front View Camera Image for Deep Object Detection
Learning Discriminative Affine Regions via Discriminability
Maximum-norm a posteriori error estimates for an optimal control problem
Manifold learning with bi-stochastic kernels
Integrating Disparate Sources of Experts for Robust Image Denoising
Techniques for proving Asynchronous Convergence results for Markov Chain Monte Carlo methods
Quarnet inference rules for level-1 networks
3D object classification and retrieval with Spherical CNNs
Phonological (un)certainty weights lexical activation
Information Gathering with Peers: Submodular Optimization with Peer-Prediction Constraints
Principal Manifolds of Middles: A Framework and Estimation Procedure Using Mixture Densities
Wing Loss for Robust Facial Landmark Localisation with Convolutional Neural Networks
Deep supervised learning using local errors
Improving particle filter performance with a generalized random field model of observation errors
Backward induction in presence of cycles
Generation and Consolidation of Recollections for Efficient Deep Lifelong Learning
Addressing Expensive Multi-objective Games with Postponed Preference Articulation via Memetic Co-evolution
Image Registration of Very Large Images via Genetic Programming
A Two-Phase Genetic Algorithm for Image Registration
Genetic Algorithm-Based Solver for Very Large Multiple Jigsaw Puzzles of Unknown Dimensions and Piece Orientation
An Automatic Solver for Very Large Jigsaw Puzzles Using Genetic Algorithms
A Generalized Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles of Complex Types
A Genetic Algorithm-Based Solver for Very Large Jigsaw Puzzles
Approximate Gradient Coding via Sparse Random Graphs
Separable discrete functions: recognition and sufficient conditions
Game Theoretic Analysis of Auction Mechanisms Modeled by Constrained Optimization Problems
Excitation Backprop for RNNs
Machine Learning Approaches for Traffic Volume Forecasting: A Case Study of the Moroccan Highway Network
Exact alignment recovery for correlated Erdos Renyi graphs
A primal-dual algorithm with optimal stepsizes and its application in decentralized consensus optimization
Measuring Territorial Control in Civil Wars Using Hidden Markov Models: A Data Informatics-Based Approach
Learning Aggregated Transmission Propagation Networks for Haze Removal and Beyond
MinimalRNN: Toward More Interpretable and Trainable Recurrent Neural Networks
Enumeration of Some Closed Knight Paths
Co-attending Free-form Regions and Detections with Multi-modal Multiplicative Feature Embedding for Visual Question Answering
Prediction Scores as a Window into Classifier Behavior
Short proofs for generalizations of the Lovász Local Lemma: Shearer’s condition and cluster expansion
Scalable Relaxations of Sparse Packing Constraints: Optimal Biocontrol in Predator-Prey Network
Reduction of total-cost and average-cost MDPs with weakly continuous transition probabilities to discounted MDPs
Fast Monte Carlo Markov chains for Bayesian shrinkage models with random effects
A Color Quantization Optimization Approach for Image Representation Learning
Household poverty classification in data-scarce environments: a machine learning approach
Convex Set of Doubly Substochastic Matrices
Acquiring Common Sense Spatial Knowledge through Implicit Spatial Templates
A novel Topological Model for Nonlinear Analysis and Prediction for Observations with Recurring Patterns
Low-dimensional Embeddings for Interpretable Anchor-based Topic Inference
Continuous-state branching processes with competition: Duality and Reflection at Infinity
Transferable Semi-supervised Semantic Segmentation
Random Access in Massive MIMO by Exploiting Timing Offsets and Excess Antennas
Proximal Gradient Method with Extrapolation and Line Search for a Class of Nonconvex and Nonsmooth Problems
Neural Network Reinforcement Learning for Audio-Visual Gaze Control in Human-Robot Interaction
Genetic Algorithms for Mentor-Assisted Evaluation Function Optimization
Simulating Human Grandmasters: Evolution and Coevolution of Evaluation Functions
Expert-Driven Genetic Algorithms for Simulating Evaluation Functions
Evaluating Roles of Central Users in Online Communication Networks: A Case Study of #PanamaLeaks
Local Clustering Coefficient of Spatial Preferential Attachment Model
DLTK: State of the Art Reference Implementations for Deep Learning on Medical Images
Style Transfer in Text: Exploration and Evaluation
From Common to Special: When Multi-Attribute Learning Meets Personalized Opinions
Bio-Inspired Local Information-Based Control for Probabilistic Swarm Distribution Guidance
Anonymous Hedonic Game for Task Allocation in a Large-Scale Multiple Agent System
Automatically Extracting Action Graphs from Materials Science Synthesis Procedures
Learning Dynamics and the Co-Evolution of Competing Sexual Species
Fission-fusion dynamics and group-size dependent composition in heterogeneous populations
Fully Dynamic Almost-Maximal Matching: Breaking the Polynomial Barrier for Worst-Case Time Bounds
Learning to select computations
Is China Entering WTO or shijie maoyi zuzhi–a Corpus Study of English Acronyms in Chinese Newspapers
Inversion of Tchebychev-Tchernov inequality
Single-Shot Refinement Neural Network for Object Detection
The Cultural Evolution of National Constitutions
On the second largest Laplacian eigenvalue of graph
Collective gradient sensing in fish schools
Optimal Stopping for Interval Estimation in Bernoulli Trials
Joint User Scheduling and Beam Selection Optimization for Beam-Based Massive MIMO Downlinks
Gazing into the Abyss: Real-time Gaze Estimation
Shifted tableaux crystals
Superlinear Lower Bounds for Distributed Subgraph Detection
Run, skeleton, run: skeletal model in a physics-based simulation
The Bayes Lepski’s Method and Credible Bands through Volume of Tubular Neighborhoods
Computational Results for Extensive-Form Adversarial Team Games
Average-case Approximation Ratio of Scheduling without Payments
Macdonald-positive specializations of the algebra of symmetric functions: Proof of the Kerov conjecture
Robust Synthetic Control
Node Profiles of Symmetric Digital Search Trees
An extension to the theory of controlled Lagrangians using the Helmholtz conditions
A novel total variation model based on kernel functions and its application
Approximating geodesics via random points
A systematic framework to discover pattern for web spam classification
BPGrad: Towards Global Optimality in Deep Learning via Branch and Pruning
The Strength of Multi-row Aggregation Cuts for Sign-pattern Integer Programs
Cyclone: High Availability for Persistent Key Value Stores
Intelligent Word Embeddings of Free-Text Radiology Reports
Unsupervised Domain Adaptation for Semantic Segmentation with GANs
How much is my car worth? A methodology for predicting used cars prices using Random Forest
MIT Autonomous Vehicle Technology Study: Large-Scale Deep Learning Based Analysis of Driver Behavior and Interaction with Automation
Enhanced Group Sparse Beamforming for Green Cloud-RAN: A Random Matrix Approach
Sequential Randomized Matrix Factorization for Gaussian Processes: Efficient Predictions and Hyper-parameter Optimization
Kill Two Birds with One Stone: Weakly-Supervised Neural Network for Image Annotation and Tag Refinement
A note on quadratic approximations of logistic log-likelihoods
Convergence Analysis of the Dynamics of a Special Kind of Two-Layered Neural Networks with $\ell_1$ and $\ell_2$ Regularization
Probabilistic approach to quantum separation effect for Feynman-Kac semigroup
Coherence-based Time Series Clustering for Brain Connectivity Visualization
A Discourse-Level Named Entity Recognition and Relation Extraction Dataset for Chinese Literature Text
MicroExpNet: An Extremely Small and Fast Model For Expression Recognition From Frontal Face Images
A note on Hadamard fractional differential equations with varying coefficients and their applications in probability
Incorporating Syntactic Uncertainty in Neural Machine Translation using a Forest-to-Seuqence Model
Zero Dynamics for Port-Hamiltonian Systems
Extremal graphs with respect to the total-eccentricity index
Image-Image Domain Adaptation with Preserved Self-Similarity and Domain-Dissimilarity for Person Re-identification
Mixed-integer linear representability, disjunctions, and Chvatal functions — modeling implications
Universal Cycles of Restricted Words
Normal Representations of Hyperplane Arrangements Over a Field with $1-ad$ Structure and Convex Positive Bijections
Two-level schemes for the advection equation
A Coordinate-wise Optimization Algorithm for Sparse Inverse Covariance Selection
An Improved Oscillating-Error Classifier with Branching
A Classifying Variational Autoencoder with Application to Polyphonic Music Generation
An Approximating Control Design for Optimal Mixing by Stokes Flows
A New Form of Williamson’s Product Theorem
Morphisms of open games
DeblurGAN: Blind Motion Deblurring Using Conditional Adversarial Networks
Diverse and Accurate Image Description Using a Variational Auto-Encoder with an Additive Gaussian Encoding Space
The destiny of constant structure discrete time closed semantic systems
Node Balanced Steady States: Unifying and Generalizing Complex and Detailed Balanced Steady States
On convergence rate for an infinite-channel queuing system with Poisson input flow
Does mitigating ML’s disparate impact require disparate treatment?
Estimation Considerations in Contextual Bandits
Equiangular tight frames that contain regular simplices
Second-Order Variational Analysis of Parametric Constraint and Variational Systems
Superexponential estimates and weighted lower bounds for the square function
Compression-Based Regularization with an Application to Multi-Task Learning
Probabilistic and Combinatorial Interpretations of the Bernoulli Symbol
Eigenvectors distribution and quantum unique ergodicity for deformed Wigner matrices
A Double Parametric Bootstrap Test for Topic Models
A note on quasi-convex functions
The invariant measure and the flow associated to the $Φ^4_3$-quantum field model
Modeling Epistemological Principles for Bias Mitigation in AI Systems: An Illustration in Hiring Decisions
Deletion-Robust Submodular Maximization at Scale
On the Stability of a N-class Aloha Network
Hello Edge: Keyword Spotting on Microcontrollers
CleanNet: Transfer Learning for Scalable Image Classifier Training with Label Noise
Critique of Barbosa’s ‘P != NP Proof’
Robust Non-line-of-sight Imaging with Single Photon Detectors
Schlegel Diagram and Optimizable Immediate Snapshot Protocol
Nonparametric Double Robustness
Optimal binary linear locally repairable codes with disjoint repair groups
On the Global Fluctuations of Block Gaussian Matrices
Spectral-Spatial Feature Extraction and Classification by ANN Supervised with Center Loss in Hyperspectral Imagery
On $e$-positivity and $e$-unimodality of chromatic quasisymmetric functions
Interactive, Intelligent Tutoring for Auxiliary Constructions in Geometry Proofs
Let Features Decide for Themselves: Feature Mask Network for Person Re-identification
Dynamic Neural Program Embedding for Program Repair
Parameter Reference Loss for Unsupervised Domain Adaptation
On the Feasibility of Interference Alignment in Compounded MIMO Broadcast Channels with Antenna Correlation and Mixed User Classes
Polyhedral parametrizations of canonical bases & cluster duality
Non-reversible, tuning- and rejection-free Markov chain Monte Carlo via iterated random functions
Is prioritized sweeping the better episodic control?
On a stochastic Hardy-Littlewood-Sobolev inequality with application to Strichartz estimates for the white noise dispersion
Block-Cyclic Stochastic Coordinate Descent for Deep Neural Networks
Softening and Yielding of Soft Glassy Materials
Method to Design UF-OFDM Filter and its Analysis
A new class of tests for multinormality with i.i.d. and Garch data based on the empirical moment generating function
End-to-end Trained CNN Encode-Decoder Networks for Image Steganography
List-Decodable Robust Mean Estimation and Learning Mixtures of Spherical Gaussians
Maximizing Non-monotone/Non-submodular Functions by Multi-objective Evolutionary Algorithms
Lefschetz and Lower Bound theorems for Minkowski sums
Model Extraction Warning in MLaaS Paradigm
Generalized Dual Dynamic Programming for Infinite Horizon Problems in Continuous State and Action Spaces
Linear-Complexity Relaxed Word Mover’s Distance with GPU Acceleration
Finite Time Analysis of Optimal Adaptive Policies for Linear-Quadratic Systems
Stochastic metamorphosis with template uncertainties
Statistics of the Voronoi cell perimeter in large bi-pointed maps
Tracking in Aerial Hyperspectral Videos using Deep Kernelized Correlation Filters
MegDet: A Large Mini-Batch Object Detector
Optical Character Recognition (OCR) for Telugu: Database, Algorithm and Application
Face Attention Network: An effective Face Detector for the Occluded Faces
Finite Horizon Robustness Analysis of LTV Systems Using Integral Quadratic Constraints
On the optimality of the uniform random strategy
Light-Head R-CNN: In Defense of Two-Stage Object Detector
Fast BTG-Forest-Based Hierarchical Sub-sentential Alignment
Evaluating the Performance of eMTC and NB-IoT for Smart City Applications
A Separation Between Run-Length SLPs and LZ77
Positive semi-definite embedding for dimensionality reduction and out-of-sample extensions
Facets, Tiers and Gems: Ontology Patterns for Hypernormalisation
Speech recognition for medical conversations
Backscatter Communications for the Internet of Things: A Stochastic Geometry Approach
Vision-and-Language Navigation: Interpreting visually-grounded navigation instructions in real environments
Quantum Query Algorithms are Completely Bounded Forms
Non-exchangeable random partition models for microclustering
When Fourth Moments Are Enough
Learning Steerable Filters for Rotation Equivariant CNNs
Bitmap Filter: Speeding up Exact Set Similarity Joins with Bitwise Operations
Optimization-Based Autonomous Racing of 1:43 Scale RC Cars
Zero-shot Learning via Shared-Reconstruction-Graph Pursuit
Solution of network localization problem with noisy distances and its convergence
Performance of In-band Transmission of System Information in Massive MIMO Systems
Cooperative Games With Bounded Dependency Degree
Detection of Tooth caries in Bitewing Radiographs using Deep Learning
A Note on Helffer-Sjöstrand Representation for A Ginzburg-Landau Process
Cascaded Pyramid Network for Multi-Person Pose Estimation
Proof Complexity Meets Algebra
On DNA Codes using the Ring Z4 + wZ4
Bayesian Active Edge Evaluation on Expensive Graphs
Robust Decentralized Secondary Frequency Control in Power Systems: Merits and Trade-Offs
Convergent Block Coordinate Descent for Training Tikhonov Regularized Deep Neural Networks
Community detection with spiking neural networks for neuromorphic hardware
Pixel-wise object tracking
Wasserstein and Kolmogorov error bounds for variance-gamma approximation via Stein’s method I
Spectral distribution of the free Jacobi process, revisited
Adaptive M-QAM for Indoor Wireless Environments : Rate & Power Adaptation
How morphological development can guide evolution
V2V-PoseNet: Voxel-to-Voxel Prediction Network for Accurate 3D Hand and Human Pose Estimation from a Single Depth Map
Non-Contextual Modeling of Sarcasm using a Neural Network Benchmark
Disentangling Factors of Variation by Mixing Them
Robust Seed Mask Generation for Interactive Image Segmentation
Outliers in the spectrum for products of independent random matrices
Informed proposals for local MCMC in discrete spaces
Modular Continual Learning in a Unified Visual Environment
Joint Object Category and 3D Pose Estimation from 2D Images
Action Recognition with Coarse-to-Fine Deep Feature Integration and Asynchronous Fusion
A local graph rewiring algorithm for sampling spanning trees
Relaxed Oracles for Semi-Supervised Clustering
On Convergence of Epanechnikov Mean Shift
On tight cycles in hypergraphs
A generalised framework for detailed classification of swimming paths inside the Morris Water Maze
Subcritical multitype branching process in random environment
Mixture Models, Robustness, and Sum of Squares Proofs
Families of nested graphs with compatible symmetric-group actions
Matrix Factorization for Nonparametric Multi-source Localization Exploiting Unimodal Properties
SquishedNets: Squishing SqueezeNet further for edge device scenarios via deep evolutionary synthesis
Glitch Classification and Clustering for LIGO with Deep Transfer Learning

Book Memo: “R Graphs Cookbook”

This hands-on guide cuts short the preamble and gets straight to the point – actually creating graphs, instead of just theoretical learning. Each recipe is specifically tailored to fulfill your appetite for visually representing you data in the best way possible. This book is for readers already familiar with the basics of R who want to learn the best techniques and code to create graphics in R in the best way possible. It will also serve as an invaluable reference book for expert R users.