How intelligent are convolutional neural networks?

Motivated by the Gestalt pattern theory, and the Winograd Challenge for language understanding, we design synthetic experiments to investigate a deep learning algorithm’s ability to infer simple (at least for human) visual concepts, such as symmetry, from examples. A visual concept is represented by randomly generated, positive as well as negative, example images. We then test the ability and speed of algorithms (and humans) to learn the concept from these images. The training and testing are performed progressively in multiple rounds, with each subsequent round deliberately designed to be more complex and confusing than the previous round(s), especially if the concept was not grasped by the learner. However, if the concept was understood, all the deliberate tests would become trivially easy. Our experiments show that humans can often infer a semantic concept quickly after looking at only a very small number of examples (this is often referred to as an ‘aha moment’: a moment of sudden realization), and performs perfectly during all testing rounds (except for careless mistakes). On the contrary, deep convolutional neural networks (DCNN) could approximate some concepts statistically, but only after seeing many (x10^4) more examples. And it will still make obvious mistakes, especially during deliberate testing rounds or on samples outside the training distributions. This signals a lack of true ‘understanding’, or a failure to reach the right ‘formula’ for the semantics. We did find that some concepts are easier for DCNN than others. For example, simple ‘counting’ is more learnable than ‘symmetry’, while ‘uniformity’ or ‘conformance’ are much more difficult for DCNN to learn. To conclude, we propose an ‘Aha Challenge’ for visual perception, calling for focused and quantitative research on Gestalt-style machine intelligence using limited training examples.

PrivyNet: A Flexible Framework for Privacy-Preserving Deep Neural Network Training with A Fine-Grained Privacy Control

Massive data exist among user local platforms that usually cannot support deep neural network (DNN) training due to computation and storage resource constraints. Cloud-based training schemes can provide beneficial services, but rely on excessive user data collection, which can lead to potential privacy risks and violations. In this paper, we propose PrivyNet, a flexible framework to enable DNN training on the cloud while protecting the data privacy simultaneously. We propose to split the DNNs into two parts and deploy them separately onto the local platforms and the cloud. The local neural network (NN) is used for feature extraction. To avoid local training, we rely on the idea of transfer learning and derive the local NNs by extracting the initial layers from pre-trained NNs. We identify and compare three factors that determine the topology of the local NN, including the number of layers, the depth of output channels, and the subset of selected channels. We also propose a hierarchical strategy to determine the local NN topology, which is flexible to optimize the accuracy of the target learning task under the constraints on privacy loss, local computation, and storage. To validate PrivyNet, we use the convolutional NN (CNN) based image classification task as an example and characterize the dependency of privacy loss and accuracy on the local NN topology in detail. We also demonstrate that PrivyNet is efficient and can help explore and optimize the trade-off between privacy loss and accuracy.

DropoutDAgger: A Bayesian Approach to Safe Imitation Learning

While imitation learning is becoming common practice in robotics, this approach often suffers from data mismatch and compounding errors. DAgger is an iterative algorithm that addresses these issues by continually aggregating training data from both the expert and novice policies, but does not consider the impact of safety. We present a probabilistic extension to DAgger, which uses the distribution over actions provided by the novice policy, for a given observation. Our method, which we call DropoutDAgger, uses dropout to train the novice as a Bayesian neural network that provides insight to its confidence. Using the distribution over the novice’s actions, we estimate a probabilistic measure of safety with respect to the expert action, tuned to balance exploration and exploitation. The utility of this approach is evaluated on the MuJoCo HalfCheetah and in a simple driving experiment, demonstrating improved performance and safety compared to other DAgger variants and classic imitation learning.

Learning Low-Dimensional Metrics

This paper investigates the theoretical foundations of metric learning, focused on three key questions that are not fully addressed in prior work: 1) we consider learning general low-dimensional (low-rank) metrics as well as sparse metrics; 2) we develop upper and lower (minimax)bounds on the generalization error; 3) we quantify the sample complexity of metric learning in terms of the dimension of the feature space and the dimension/rank of the underlying metric;4) we also bound the accuracy of the learned metric relative to the underlying true generative metric. All the results involve novel mathematical approaches to the metric learning problem, and lso shed new light on the special case of ordinal embedding (aka non-metric multidimensional scaling).

Robustness of Neural Networks against Storage Media Errors

We study the trade-offs between storage/bandwidth and prediction accuracy of neural networks that are stored in noisy media. Conventionally, it is assumed that all parameters (e.g., weight and biases) of a trained neural network are stored as binary arrays and are error-free. This assumption is based upon the implementation of error correction codes (ECCs) that correct potential bit flips in storage media. However, ECCs add storage overhead and cause bandwidth reduction when loading the trained parameters during the inference. We study the robustness of deep neural networks when bit errors exist but ECCs are turned off for different neural network models and datasets. It is observed that more sophisticated models and datasets are more vulnerable to errors in their trained parameters. We propose a simple detection approach that can universally improve the robustness, which in some cases can be improved by orders of magnitude. We also propose an alternative binary representation of the parameters such that the distortion brought by bit flips is reduced and even theoretically vanishing when the number of bits to represent a parameter increases.

A Survey of Machine Learning for Big Code and Naturalness

Research at the intersection of machine learning, programming languages, and software engineering has recently taken important steps in proposing learnable probabilistic models of source code that exploit code’s abundance of patterns. In this article, we survey this work. We contrast programming languages against natural languages and discuss how these similarities and differences drive the design of probabilistic models. We present a taxonomy based on the underlying design principles of each model and use it to navigate the literature. Then, we review how researchers have adapted these models to application areas and discuss cross-cutting and application-specific challenges and opportunities.

Aspect-Based Relational Sentiment Analysis Using a Stacked Neural Network Architecture

Sentiment analysis can be regarded as a relation extraction problem in which the sentiment of some opinion holder towards a certain aspect of a product, theme or event needs to be extracted. We present a novel neural architecture for sentiment analysis as a relation extraction problem that addresses this problem by dividing it into three subtasks: i) identification of aspect and opinion terms, ii) labeling of opinion terms with a sentiment, and iii) extraction of relations between opinion terms and aspect terms. For each subtask, we propose a neural network based component and combine all of them into a complete system for relational sentiment analysis. The component for aspect and opinion term extraction is a hybrid architecture consisting of a recurrent neural network stacked on top of a convolutional neural network. This approach outperforms a standard convolutional deep neural architecture as well as a recurrent network architecture and performs competitively compared to other methods on two datasets of annotated customer reviews. To extract sentiments for individual opinion terms, we propose a recurrent architecture in combination with word distance features and achieve promising results, outperforming a majority baseline by 18% accuracy and providing the first results for the USAGE dataset. Our relation extraction component outperforms the current state-of-the-art in aspect-opinion relation extraction by 15% F-Measure.

Aspect-Based Sentiment Analysis Using a Two-Step Neural Network Architecture

The World Wide Web holds a wealth of information in the form of unstructured texts such as customer reviews for products, events and more. By extracting and analyzing the expressed opinions in customer reviews in a fine-grained way, valuable opportunities and insights for customers and businesses can be gained. We propose a neural network based system to address the task of Aspect-Based Sentiment Analysis to compete in Task 2 of the ESWC-2016 Challenge on Semantic Sentiment Analysis. Our proposed architecture divides the task in two subtasks: aspect term extraction and aspect-specific sentiment extraction. This approach is flexible in that it allows to address each subtask independently. As a first step, a recurrent neural network is used to extract aspects from a text by framing the problem as a sequence labeling task. In a second step, a recurrent network processes each extracted aspect with respect to its context and predicts a sentiment label. The system uses pretrained semantic word embedding features which we experimentally enhance with semantic knowledge extracted from WordNet. Further features extracted from SenticNet prove to be beneficial for the extraction of sentiment labels. As the best performing system in its category, our proposed system proves to be an effective approach for the Aspect-Based Sentiment Analysis.

Nonnegative matrix factorization with side information for time series recovery and prediction

Motivated by the reconstruction and the prediction of electricity consumption, we extend Nonnegative Matrix Factorization~(NMF) to take into account side information (column or row features). We consider general linear measurement settings, and propose a framework which models non-linear relationships between features and the response variables. We extend previous theoretical results to obtain a sufficient condition on the identifiability of the NMF in this setting. Based the classical Hierarchical Alternating Least Squares~(HALS) algorithm, we propose a new algorithm (HALSX, or Hierarchical Alternating Least Squares with eXogeneous variables) which estimates the factorization model. The algorithm is validated on both simulated and real electricity consumption datasets as well as a recommendation dataset, to show its performance in matrix recovery and prediction for new rows and columns.

MetaLDA: a Topic Model that Efficiently Incorporates Meta information

Besides the text content, documents and their associated words usually come with rich sets of meta informa- tion, such as categories of documents and semantic/syntactic features of words, like those encoded in word embeddings. Incorporating such meta information directly into the generative process of topic models can improve modelling accuracy and topic quality, especially in the case where the word-occurrence information in the training data is insufficient. In this paper, we present a topic model, called MetaLDA, which is able to leverage either document or word meta information, or both of them jointly. With two data argumentation techniques, we can derive an efficient Gibbs sampling algorithm, which benefits from the fully local conjugacy of the model. Moreover, the algorithm is favoured by the sparsity of the meta information. Extensive experiments on several real world datasets demonstrate that our model achieves comparable or improved performance in terms of both perplexity and topic quality, particularly in handling sparse texts. In addition, compared with other models using meta information, our model runs significantly faster.

Inference in Graphical Models via Semidefinite Programming Hierarchies

Maximum A posteriori Probability (MAP) inference in graphical models amounts to solving a graph-structured combinatorial optimization problem. Popular inference algorithms such as belief propagation (BP) and generalized belief propagation (GBP) are intimately related to linear programming (LP) relaxation within the Sherali-Adams hierarchy. Despite the popularity of these algorithms, it is well understood that the Sum-of-Squares (SOS) hierarchy based on semidefinite programming (SDP) can provide superior guarantees. Unfortunately, SOS relaxations for a graph with n vertices require solving an SDP with n^{\Theta(d)} variables where d is the degree in the hierarchy. In practice, for d\ge 4, this approach does not scale beyond a few tens of variables. In this paper, we propose binary SDP relaxations for MAP inference using the SOS hierarchy with two innovations focused on computational efficiency. Firstly, in analogy to BP and its variants, we only introduce decision variables corresponding to contiguous regions in the graphical model. Secondly, we solve the resulting SDP using a non-convex Burer-Monteiro style method, and develop a sequential rounding procedure. We demonstrate that the resulting algorithm can solve problems with tens of thousands of variables within minutes, and outperforms BP and GBP on practical problems such as image denoising and Ising spin glasses. Finally, for specific graph types, we establish a sufficient condition for the tightness of the proposed partial SOS relaxation.

BIOS ORAM: Improved Privacy-Preserving Data Access for Parameterized Outsourced Storage

Algorithms for oblivious random access machine (ORAM) simulation allow a client, Alice, to obfuscate a pattern of data accesses with a server, Bob, who is maintaining Alice’s outsourced data while trying to learn information about her data. We present a novel ORAM scheme that improves the asymptotic I/O overhead of previous schemes for a wide range of size parameters for client-side private memory and message blocks, from logarithmic to polynomial. Our method achieves statistical security for hiding Alice’s access pattern and, with high probability, achieves an I/O overhead that ranges from O(1) to O(\log^2 n/(\log\log n)^2), depending on these size parameters, where n is the size of Alice’s outsourced memory. Our scheme, which we call BIOS ORAM, combines multiple uses of B-trees with a reduction of ORAM simulation to isogrammic access sequences.

Triangle Generative Adversarial Networks

A Triangle Generative Adversarial Network (\Delta-GAN) is developed for semi-supervised cross-domain joint distribution matching, where the training data consists of samples from each domain, and supervision of domain correspondence is provided by only a few paired samples. \Delta-GAN consists of four neural networks, two generators and two discriminators. The generators are designed to learn the two-way conditional distributions between the two domains, while the discriminators implicitly define a ternary discriminative function, which is trained to distinguish real data pairs and two kinds of fake data pairs. The generators and discriminators are trained together using adversarial learning. Under mild assumptions, in theory the joint distributions characterized by the two generators concentrate to the data distribution. In experiments, three different kinds of domain pairs are considered, image-label, image-image and image-attribute pairs. Experiments on semi-supervised image classification, image-to-image translation and attribute-based image generation demonstrate the superiority of the proposed approach.

A Note on Tight Lower Bound for MNL-Bandit Assortment Selection Models
Bayesian detection of piecewise linear trends in replicated time-series with application to growth data modelling
Crossing Patterns in Nonplanar Road Networks
Geometric Semantic Genetic Programming Algorithm and Slump Prediction
An approximate fractional Gaussian noise model with ${\mathcal O}(n)$ computational cost
Fiber-Flux Diffusion Density for White Matter Tracts Analysis: Application to Mild Anomalies Localization in Contact Sports Players
A Probabilistic Framework for Nonlinearities in Stochastic Neural Networks
Diluting the Scalability Boundaries: Exploring the Use of Disaggregated Architectures for High-Level Network Data Analysis
When is a Convolutional Filter Easy To Learn?
Gallai-Ramsey numbers of $C_9$ with multiple colors
Reserve Requirements in Ancillary Markets Using Consensus-Based Cooperative Model Considering Renewable Resources
Discrete Dynamic Causal Modeling and Its Relationship with Directed Information
Iterative Policy Learning in End-to-End Trainable Task-Oriented Neural Dialog Models
Geographically Coordinated Frequency Control technical report
Model-Powered Conditional Independence Test
Viscosity Solutions of Stochastic Hamilton-Jacobi-Bellman Equations
White Matter Fiber Segmentation Using Functional Varifolds
Character tables and the problem of existence of finite projective planes
Multi-modal analysis of genetically-related subjects using SIFT descriptors in brain MRI
The bimodal Ising spin glass in dimension two : the anomalous dimension $η$
A dissipativity theorem for p-dominant systems
Distributed Estimation Under Sensor Attacks
Matterport3D: Learning from RGB-D Data in Indoor Environments
On Dynamic Precision Scaling
Paraphrasing verbal metonymy through computational methods
Many Triangles with Few Edges
Improving spliced alignment for identification of ortholog groups and multiple CDS alignment
On the Complexity of Robust Stable Marriage
Blocking Versus Non-Blocking Halo Exchange
Zooming in on NYC taxi data with Portal
A Fast Algorithm Based on a Sylvester-like Equation for LS Regression with GMRF Prior
On the Opportunities and Pitfalls of Nesting Monte Carlo Estimators
Bias Correction with Jackknife, Bootstrap, and Taylor Series
Enumeration on Trees under Relabelings
Compressed Representations of Conjunctive Query Results
On Bergeron’s positivity problem for $q$-binomial coefficients
Connecting Width and Structure in Knowledge Compilation
Synchronization Patterns in Networks of Kuramoto Oscillators: A Geometric Approach for Analysis and Control
POMCPOW: An online algorithm for POMDPs with continuous state, action, and observation spaces
Human Understandable Explanation Extraction for Black-box Classification Models Based on Matrix Factorization
A Comparative Quantitative Analysis of Contemporary Big Data Clustering Algorithms for Market Segmentation in Hospitality Industry
Protest Activity Detection and Perceived Violence Estimation from Social Media Images
Algorithm and Hardware Design of Discrete-Time Spiking Neural Networks Based on Back Propagation with Binary Activations
Chromatic polynomials of random graphs
Bridging observational studies and randomized experiments by embedding the former in the latter
Estimating Mutual Information for Discrete-Continuous Mixtures
Deterministic rendezvous with detection using beeps
A note on quasi-equilibrium problems
Time-Dependent Generalized Nash Equilibrium Problem
Deterministic meeting of sniffing agents in the plane
Fast Discrete Linear Canonical Transform Based on CM-CC-CM Decomposition and FFT
Integrable stochastic dualities and the deformed Knizhnik-Zamolodchikov equation
CISRDCNN: Super-resolution of compressed images using deep convolutional neural networks
Truncated Cramér-von Mises test of normality
Discretized conformal prediction for efficient distribution-free inference
Inter-Operator Base Station Coordination in Spectrum-Shared Millimeter Wave Cellular Networks
Training Better CNNs Requires to Rethink ReLU
Look Wider to Match Image Patches with Convolutional Neural Networks
BeSS: An R Package for Best Subset Selection in Linear, Logistic and CoxPH Models
Finite Sample Guarantees for PCA in Non-Isotropic and Data-Dependent Noise
Uniform Consistency of the Highly Adaptive Lasso Estimator of Infinite Dimensional Parameters
Deep-Learnt Classification of Light Curves
Compressing Low Precision Deep Neural Networks Using Sparsity-Induced Regularization in Ternary Networks
Maximum of an Airy process plus Brownian motion and memory in KPZ growth
Dynamic Oracle for Neural Machine Translation in Decoding Phase
A Novel Quasigroup Substitution Scheme for Chaos Based Image Encryption
Incorrigibility in the CIRL Framework
Random Caching in Backhaul-Limited Multi-Antenna Networks: Analysis and Area Spectrum Efficiency Optimization
A Proof Technique for Skewness of Graphs
Coded Caching in Partially Cooperative D2D Communication Networks
Conjugate generalized linear mixed models for clustered data
The Critical Radius in Sampling-Based Motion Planning
Multifractal characteristics of external anal sphincter based on sEMG signals
Sparse Markov Decision Processes with Causal Sparse Tsallis Entropy Regularization for Reinforcement Learning
Physical Layer Security in Heterogeneous Networks with Jammer Selection and Full-Duplex Users
Steepest descent algorithm on orthogonal Stiefel manifolds
MuseGAN: Symbolic-domain Music Generation and Accompaniment with Multi-track Sequential Generative Adversarial Networks
Tilt Assembly: Algorithms for Micro-Factories That Build Objects with Uniform External Forces
Colour Terms: a Categorisation Model Inspired by Visual Cortex Neurons
F-index of graphs based on new operations related to the join of graphs
Scalable Estimation of Dirichlet Process Mixture Models on Distributed Data
A Fast and Accurate Vietnamese Word Segmenter
Exploring Human-like Attention Supervision in Visual Question Answering
Hybrid, Frame and Event based Visual Inertial Odometry for Robust, Autonomous Navigation of Quadrotors
Asymptotics for relative frequency when population is driven by arbitrary evolution
Analytic model of thermalization: Quantum emulation of classical cellular automata
Predicting Video Saliency with Object-to-Motion CNN and Two-layer Convolutional LSTM
Improving Opinion-Target Extraction with Character-Level Word Embeddings
Fitting Generalized Essential Matrices from Generic 6×6 Matrices and its Applications
An Algebra Associated with a Flag in a Subspace Lattice over a Finite Field and the Quantum Affine Algebra $U_q(\widehat{\mathfrak{sl}}_2)$
On the exit time from open sets of some semi-Markov processes
3D Reconstruction in Canonical Co-ordinate Space from Arbitrarily Oriented 2D Images
The bail-out optimal dividend problem under the absolutely continuous condition
Double-distance frameworks and mixed sparsity graphs
Minimax lower bounds for function estimation on graphs
An Improved Primal-Dual Interior Point Solver for Model Predictive Control
An Adaptive Algorithm for Precise Pupil Boundary Detection using Entropy of Contour Gradients
Selfish Jobs with Favorite Machines: Price of Anarchy vs Strong Price of Anarchy
Central limit theorem associated to Gaussian operators of type B
A General Framework for the Recognition of Online Handwritten Graphics
Analogical-based Bayesian Optimization
Human Action Forecasting by Learning Task Grammars
Equilibration in the Nosé-Hoover isokinetic ensemble: Effect of inter-particle interactions
Interactive Music Generation with Positional Constraints using Anticipation-RNNs
A propos de l’algèbre de Hopf des mots tassés WMat
The Brownian Motion on Aff(R) and Quasi-Local Theorems
Rate of convergence to equilibrium for discrete-time stochastic dynamics with memory
Neural Networks for Text Correction and Completion in Keyboard Decoding
Language Modeling with Highway LSTM
Automatic Leaf Extraction from Outdoor Images
A Recorded Debating Dataset
Scalable Support Vector Clustering Using Budget
Human Activity Recognition Using Robust Adaptive Privileged Probabilistic Learning
3D Reconstruction with Low Resolution, Small Baseline and High Radial Distortion Stereo Images
On Björner and Lovász’s conjecture
Uncoded Placement Optimization for Coded Delivery
Comparison of the Kim-Milman and Brenier maps
Evaluation of the Rate of Convergence in the PIA
On dihedral flows in embedded graphs
Restricted-Boltzmann-Machine Learning for Solving Strongly Correlated Quantum Systems
Image operator learning coupled with CNN classification and its application to staff line removal
A Simple and Efficient Algorithm for Nonlinear Model Predictive Control
Accurate Genomic Prediction Of Human Height
Learning to update Auto-associative Memory in Recurrent Neural Networks for Improving Sequence Memorization
Convolutional Long Short-Term Memory Networks for Recognizing First Person Interactions
SalNet360: Saliency Maps for omni-directional images with CNN
On the monotone and primal-dual active set schemes for $\ell^p$-type problems, $p \in (0,1]$
Geometric inequalities, stability results and Kendall’s problem in spherical space
Photoacoustic Imaging using Combination of Eigenspace-Based Minimum Variance and Delay-Multiply-and-Sum Beamformers: Simulation Study
Learning to Detect Violent Videos using Convolutional Long Short-Term Memory
When 3D-Aided 2D Face Recognition Meets Deep Learning: An extended UR2D for Pose-Invariant Face Recognition
Summable Reparameterizations of Wasserstein Critics in the One-Dimensional Setting
Unimodal Category and the Monotonicity Conjecture