Geometry of Optimization and Implicit Regularization in Deep Learning

We argue that the optimization plays a crucial role in generalization of deep learning models through implicit regularization. We do this by demonstrating that generalization ability is not controlled by network size but rather by some other implicit control. We then demonstrate how changing the empirical optimization procedure can improve generalization, even if actual optimization quality is not affected. We do so by studying the geometry of the parameter space of deep networks, and devising an optimization algorithm attuned to this geometry.

Convolutional Sequence to Sequence Learning

The prevalent approach to sequence to sequence learning maps an input sequence to a variable length output sequence via recurrent neural networks. We introduce an architecture based entirely on convolutional neural networks. Compared to recurrent models, computations over all elements can be fully parallelized during training and optimization is easier since the number of non-linearities is fixed and independent of the input length. Our use of gated linear units eases gradient propagation and we equip each decoder layer with a separate attention module. We outperform the accuracy of the deep LSTM setup of Wu et al. (2016) on both WMT’14 English-German and WMT’14 English-French translation at an order of magnitude faster speed, both on GPU and CPU.

Using word2vec for Bilateral Translation

Word and phrase tables are key inputs to machine translations, but costly to produce. New unsupervised learning methods represent words and phrases in a high-dimensional vector space, and these monolingual embeddings have been shown to encode syntactic and semantic relationships between language elements. The information captured by these embeddings can be exploited for bilingual translation by learning a transformation matrix that allows to match relative positions across two monolingual vector spaces. This method aims to identify high-quality candidates for word and phrase translation more cost-effectively from unlabeled data. This paper expands the scope of previous attempts of bilingual translation to four languages (English, German, Spanish, and French). It shows how to process the source data, train a neural network to learn the high-dimensional embeddings for individual languages and expands the framework for testing their quality beyond the English language.. Furthermore, it shows how to learn bilingual transformation matrices and obtain candidates for word and phrase translation, and assess their quality.

A Survey of Location Prediction on Twitter

Locations, e.g., countries, states, cities, and point-of-interests, are central to news, emergency events, and people’s daily lives. Automatic identification of locations associated with or mentioned in documents has been explored for decades. As one of the most popular online social network platforms, Twitter has attracted a large number of users who send millions of tweets on daily basis. Due to the world-wide coverage of its users and real-time freshness of tweets, location prediction on Twitter has gained significant attention in recent years. Research efforts are spent on dealing with new challenges and opportunities brought by the noisy, short, and context-rich nature of tweets. In this survey, we aim at offering an overall picture of location prediction on Twitter. Specifically, we concentrate on the prediction of user home locations, tweet locations, and mentioned locations. We first define the three tasks and review the evaluation metrics. By summarizing Twitter network, tweet content, and tweet context as potential inputs, we then structurally highlight how the problems depend on these inputs. Each dependency is illustrated by a comprehensive review of the corresponding strategies adopted in state-of-the-art approaches. In addition, we also briefly review two related problems, i.e., semantic location prediction and point-of-interest recommendation. Finally, we list future research directions.

Model Complexity-Accuracy Trade-off for a Convolutional Neural Network

Convolutional Neural Networks(CNN) has had a great success in the recent past, because of the advent of faster GPUs and memory access. CNNs are really powerful as they learn the features from data in layers such that they exhibit the structure of the V-1 features of the human brain. A huge bottleneck, in this case, is that CNNs are very large and have a very high memory footprint, and hence they cannot be employed on devices with limited storage such as mobile phone, IoT etc. In this work, we study the model complexity versus accuracy trade-off on MNSIT dataset, and give a concrete framework for handling such a problem, given the worst case accuracy that a system can tolerate. In our work, we reduce the model complexity by 236 times, and memory footprint by 19.5 times compared to the base model while achieving worst case accuracy threshold.

Stable Architectures for Deep Neural Networks

Deep neural networks have become invaluable tools for supervised machine learning, e.g., in classification of text or images. While offering superior flexibility to find and express complicated patterns in data, deep architectures are known to be challenging to design and train so that they generalize well to new data. An important issue are numerical instabilities in derivative-based learning algorithms commonly called exploding or vanishing gradients. In this paper we propose new forward propagation techniques inspired by systems of Ordinary Differential Equations (ODE) that overcome this challenge and lead to well-posed learning problems for arbitrarily deep networks. The backbone of our approach is interpreting deep learning as a parameter estimation problem of a nonlinear dynamical system. Given this formulation we analyze stability and well-posedness of deep learning and motivated by our findings develop new architectures. We relate the exploding and vanishing gradient phenomenon to the stability of the discrete ODE and present several strategies for stabilizing deep learning for very deep networks. While our new architectures restrict the solution space, several numerical experiments show their competitiveness to state-of-the-art networks.

Artificial Noise Injection for Securing Single-Antenna Systems

Polar codes for secret sharing

On Algebraic condition for null controllability of some coupled degenerate systems

Development of an Accelerated Test Methodology to the Predict Service Life of Polymeric Materials Subject to Outdoor Weathering

A Model for Information Networks: Efficiency, Stability and Dynamics

Convergence rates of the empirical spectral measure of unitary Brownian motion

Playability and arbitrarily large rat games

Optimal User Scheduling and Power Allocation for Millimeter Wave NOMA Systems

OncoScore: an R package to measure the oncogenic potential of genes

A Multi-Class Dispatching and Charging Scheme for Autonomous Electric Mobility On-Demand

An Anthropic Argument against the Future Existence of Superintelligent Artificial Intelligence

Trimming the Hill estimator: robustness, optimality and adaptivity

Agatha: disentangling periodic signals from correlated noise in a periodogram framework

Partial Domination in Graphs

On the pointwise iteration-complexity of a dynamic regularized ADMM with over-relaxation stepsize

A simple yet effective baseline for 3d human pose estimation

Lower Bound on the Localization Error in Infinite Networks with Random Sensor Locations

Scalable System Scheduling for HPC and Big Data

Sharp phase transition for the random-cluster and Potts models via decision trees

Distributed Control for Spatial Self-Organization of Multi-Agent Swarms

CAD Priors for Accurate and Flexible Instance Reconstruction

Synergistic parallel multi-objective integer programming

A 3D Ginibre point field

Affinity Scheduling and the Applications on Data Center Scheduling with Data Locality

Analysis of Approximate Message Passing with a Class of Non-Separable Denoisers

Asymmetric Clustering for High-Dimensional Data via Mixtures of Joint Generalized Hyperbolic Models

Penalized Mixture of Latent Trait Models for Variable Selection in Clustered Binary Data

CHAM: action recognition using convolutional hierarchical attention model

Deep Spatio-temporal Manifold Network for Action Recognition

Large Order Binary de Bruijn Sequences via Zech’s Logarithms

Phonetic Temporal Neural Model for Language Identification

Phone-aware Neural Language Identification

Contour Detection from Deep Patch-level Boundary Prediction

Accelerating solutions of PDEs with GPU-based swept time-space decomposition

Emotional Metaheuristics For in-situ Foraging Using Sensor Constrained Robot Swarms

Solving a Path Planning Problem in a Partially Known Environment using a Swarm Algorithm

C-homotopy classes of maps and covers

Asymptotic Normality of Extensible Grid Sampling

Multi-file Private Information Retrieval from MDS Coded Databases with Colluding Servers

Linear Projections of the Vandermonde Polynomial

On the switching behavior of sparse optimal controls for the one-dimensional heat equation

Low-Complexity Decoding for Symmetric, Neighboring and Consecutive Side-information Index Coding Problems

Validity of Borodin & Kostochka Conjecture for a Class of Graphs

Fast and Accurate Computation of the Distribution of Sums of Dependent Log-Normals

Does William Shakespeare REALLY Write Hamlet? Knowledge Representation Learning with Confidence

On the diameter of an ideal

Efficient Structure from Motion for Oblique UAV Images Based on Maximal Spanning Tree Expansions

Predicting Rising Follower Counts on Twitter Using Profile Information

An efficient model-free setting for longitudinal and lateral vehicle control. Validation through the interconnected pro-SiVIC/RTMaps prototyping platform

Receive Spatial Modulation for Massive MIMO Systems

Boundary behaviour of RW’s on planar graphs and convergence of LERW to chordal SLE$_2$

Convolutional Dictionary Learning via Local Processing

Low Complexity Two-Stage Soft/Hard Decoders

On Placement of Synthetic Inertia with Explicit Time-Domain Constraints

Semi-Federated Scheduling of Parallel Real-Time Tasks on Multiprocessors

A Systematic Review of Hindi Prosody

Variational Analysis for the Bilateral Minimal Time Function

Criteria for Solar Car Optimized Route Estimation

Diving Performance Assessment by means of Video Processing

Optimality conditions and local regularity of the value function for the optimal exit time problem

Evidence for the size principle in semantic and perceptual domains

Drug-drug Interaction Extraction via Recurrent Neural Network with Multiple Attention Layers

A Note on the Power of Non-Deterministic Circuits with Gate Restrictions

WikiM: Metapaths based Wikification of Scientific Abstracts

Computing the Lambert W function in arbitrary-precision complex interval arithmetic

Finite Convergence Analysis and Weak Sharp Solutions for Variational Inequalities

Evolving phylogenies of trait-dependent branching with mutation and competition. Part I: Existence

Fast Approximate Construction of Best Complex Antipodal Spherical Codes

Large-scale, Fast and Accurate Shot Boundary Detection through Spatio-temporal Convolutional Neural Networks

An exponential lower bound for Individualization-Refinement algorithms for Graph Isomorphism

Towards a complexity theory for the congested clique

Sparsity-promoting and edge-preserving maximum a posteriori estimators in non-parametric Bayesian inverse problems

Improving drug sensitivity predictions in precision medicine through active expert knowledge elicitation

Universality and Fourth Moment Theorem for homogeneous sums. Orthogonal polynomials and apolarity

Banach space actions and $L^2$-spectral gap

Semiparametric spectral modeling of the Drosophila connectome

Making up for the deficit in a marathon run

READ-BAD: A New Dataset and Evaluation Scheme for Baseline Detection in Archival Documents

Non-orthogonal Multiple Access in Large-Scale Heterogeneous Networks

Low-Density Code-Domain NOMA: Better Be Regular

Deep Person Re-Identification with Improved Embedding

Dimensions of sets which uniformly avoid arithmetic progressions

Gilbert’s disc model with geostatistical marking

Adaptive Regularization of Some Inverse Problems in Image Analysis

Skin lesion detection based on an ensemble of deep convolutional neural network

Bayesian Joint Topic Modelling for Weakly Supervised Object Localisation

A note on the uniqueness of models in social abstract argumentation

Optimal Computation of Overabundant Words

Cell Tracking via Proposal Generation and Selection

Generative Adversarial Trainer: Defense to Adversarial Perturbations with GAN

Logical Parsing from Natural Language Based on a Neural Translation Model

Nonconvex generalizations of ADMM for nonlinear equality constrained problems

3D Placement of an Unmanned Aerial Vehicle Base Station (UAV-BS) for Energy-Efficient Maximal Coverage

A notion of minor-based matroid connectivity

Learning Deep Networks from Noisy Labels with Dropout Regularization

Compressive Estimation of a Stochastic Process with Unknown Autocorrelation Function

Adjustments to Computer Models via Projected Kernel Calibration

Rapid Mixing of Local Graph Dynamics

Deep Projective 3D Semantic Segmentation

Incentive Mechanism Design for Cache-Assisted D2D Communications: A Mobility-Aware Approach

Analysis of Channel-Based User Authentication by Key-Less and Key-Based Approaches

Two-component domain decomposition scheme with overlapping subdomains for parabolic equations

A conjectural Peterson isomorphism in K-theory

Optimal properties of the canonical tight probabilistic frame

Frequentist Consistency of Variational Bayes

The Interactive Sum Choice Number of Trees

Local asymptotic equivalence of pure quantum states ensembles and quantum Gaussian white noise

Proceedings of the Workshop on Data Mining for Oil and Gas