Does Neural Machine Translation Benefit from Larger Context?

We propose a neural machine translation architecture that models the surrounding text in addition to the source sentence. These models lead to better performance, both in terms of general translation quality and pronoun prediction, when trained on small corpora, although this improvement largely disappears when trained with a larger corpus. We also discover that attention-based neural machine translation is well suited for pronoun prediction and compares favorably with other approaches that were specifically designed for this task.

Stein Variational Autoencoder

A new method for learning variational autoencoders is developed, based on an application of Stein’s operator. The framework represents the encoder as a deep nonlinear function through which samples from a simple distribution are fed. One need not make parametric assumptions about the form of the encoder distribution, and performance is further enhanced by integrating the proposed encoder with importance sampling. Example results are demonstrated across multiple unsupervised and semi-supervised problems, including semi-supervised analysis of the ImageNet data, demonstrating the scalability of the model to large datasets.

Stein Variational Adaptive Importance Sampling

We propose a novel adaptive importance sampling algorithm which incorporates Stein variational gradient decent algorithm (SVGD) with importance sampling (IS). Our algorithm leverages the nonparametric transforms in SVGD to iteratively decrease the KL divergence between our importance proposal and the target distribution. The advantages of this algorithm are twofold: first, our algorithm turns SVGD into a standard IS algorithm, allowing us to use standard diagnostic and analytic tools of IS to evaluate and interpret the results; second, we do not restrict the choice of our importance proposal to predefined distribution families like traditional (adaptive) IS methods. Empirical experiments demonstrate that our algorithm performs well on evaluating partition functions of restricted Boltzmann machines and testing likelihood of variational auto-encoders.

On the k-Means/Median Cost Function

In this work, we study the k-means cost function. The (Euclidean) k-means problem can be described as follows: given a dataset X \subseteq \mathbb{R}^d and a positive integer k, find a set of k centers C \subseteq \mathbb{R}^d such that \Phi(C, X) \stackrel{def}{=} \sum_{x \in X} \min_{c \in C} ||x - c||^2 is minimized. Let \Delta_k(X) \stackrel{def}{=} \min_{C \subseteq \mathbb{R}^d} \Phi(C, X) denote the cost of the optimal k-means solution. It is simple to observe that for any dataset X, \Delta_k(X) decreases as k increases. We try to understand this behaviour more precisely. For any dataset X \subseteq \mathbb{R}^d, integer k \geq 1, and a small precision parameter \varepsilon > 0, let \mathcal{L}_{X}^{k, \varepsilon} denote the smallest integer such that \Delta_{\mathcal{L}_{X}^{k, \varepsilon}}(X) \leq \varepsilon \cdot \Delta_{k}(X). We show upper and lower bounds on this quantity. Our techniques generalize for the metric k-median problem in arbitrary metrics and we give bounds in terms of the doubling dimension of the metric. Finally, we observe that for any dataset X, we can compute a set S of size O \left(\mathcal{L}_{X}^{k, \frac{\varepsilon}{c}} \right) such that \Delta_{S}(X) \leq \varepsilon \cdot \Delta_k(X) using the D^2-sampling algorithm which is also known as the k-means++ seeding procedure. In the previous statement, c is some fixed constant. We also discuss some applications of our bounds.

Grammar-Based Graph Compression

We present a new graph compressor that works by recursively detecting repeated substructures and representing them through grammar rules. We show that for a large number of graphs the compressor obtains smaller representations than other approaches. Specific queries such as reachability between two nodes or regular path queries can be evaluated in linear time (or quadratic times, respectively), over the grammar, thus allowing speed-ups proportional to the compression ratio.

Semantic Similarity from Natural Language and Ontology Analysis

Artificial Intelligence federates numerous scientific fields in the aim of developing machines able to assist human operators performing complex treatments — most of which demand high cognitive skills (e.g. learning or decision processes). Central to this quest is to give machines the ability to estimate the likeness or similarity between things in the way human beings estimate the similarity between stimuli. In this context, this book focuses on semantic measures: approaches designed for comparing semantic entities such as units of language, e.g. words, sentences, or concepts and instances defined into knowledge bases. The aim of these measures is to assess the similarity or relatedness of such semantic entities by taking into account their semantics, i.e. their meaning — intuitively, the words tea and coffee, which both refer to stimulating beverage, will be estimated to be more semantically similar than the words toffee (confection) and coffee, despite that the last pair has a higher syntactic similarity. The two state-of-the-art approaches for estimating and quantifying semantic similarities/relatedness of semantic entities are presented in detail: the first one relies on corpora analysis and is based on Natural Language Processing techniques and semantic models while the second is based on more or less formal, computer-readable and workable forms of knowledge such as semantic networks, thesaurus or ontologies. (…) Beyond a simple inventory and categorization of existing measures, the aim of this monograph is to convey novices as well as researchers of these domains towards a better understanding of semantic similarity estimation and more generally semantic measures.

Unwinding the Amplituhedron in Binary

Big Data Analysis Using Shrinkage Strategies

Conceptual Frameworks for Building Online Citizen Science Projects

On the number of points in general position in the plane

FEUP at SemEval-2017 Task 5: Predicting Sentiment Polarity and Intensity with Financial Word Embeddings

Statistical inference for high dimensional regression via Constrained Lasso

Making data center computations fast, but not so furious

Perfect Elimination Orderings for Symmetric Matrices

Super- and Anti-Principal Modes in Multi-Mode Waveguides

Exploring Sparsity in Recurrent Neural Networks

Does robustness imply tractability? A lower bound for planted clique in the semi-random model

A Gabor Filter Texture Analysis Approach for Histopathological Brain Tumor Subtype Discrimination

Performance Impact of Base Station Antenna Heights in Dense Cellular Networks

Comment on ‘Many-body localization in Ising models with random long-range interactions’

A hybrid CPU-GPU parallelization scheme of variable neighborhood search for inventory optimization problems

Learning Affine Feature Space Transformations in Symbolic Regression

The Causality/Repair Connection in Databases: Causality-Programs

The Emergence of Canalization and Evolvability in an Open-Ended, Interactive Evolutionary System

O$^2$TD: (Near)-Optimal Off-Policy TD Learning

Performance Analysis of Slotted Secondary Transmission with Adaptive Modulation under Interweave Cognitive Radio Implementation

Linear recurrences for cylindrical networks

Automatic Disambiguation of French Discourse Connectives

Video Object Segmentation using Supervoxel-Based Gerrymandering

Proceedings 8th Workshop on Developments in Implicit Computational Complexity and 5th Workshop on Foundational and Practical Aspects of Resource Analysis

LibOPT: An Open-Source Platform for Fast Prototyping Soft Optimization Techniques

Quiver Hall-Littlewood functions and Kostka-Shoji polynomials

SearchQA: A New Q&A Dataset Augmented with Context from a Search Engine

‘Short-Dot’: Computing Large Linear Transforms Distributedly Using Coded Short Dot Products

Satellite Based Positioning Signal Acquisition at Higher Order Cycle Frequency

Capacity of Cellular Wireless Network

Deep Self-Taught Learning for Weakly Supervised Object Localization

Competitive Resource Allocation in HetNets: the Impact of Small-cell Spectrum Constraints and Investment Costs

Overpartitions with bounded part differences

Accelerated Distributed Dual Averaging over Evolving Networks of Growing Connectivity

Learning Piece-wise Linear Models from Large Scale Data for Ad Click Prediction

Mutual Information, Relative Entropy and Estimation Error in Semi-martingale Channels

ECG Signal Compression and Optimization in Remote Monitoring Networks

HPSLPred: An Ensemble Multi-label Classifier for Human Protein Subcellular Location Prediction with Imbalanced Source

Distances between Random Orthogonal Matrices and Independent Normals

Algorithms for Pattern Containment in 0-1 Matrices

Results on Pattern Avoidance Games

Existence of solution to scalar BSDEs with weakly $L^{1+}$-integrable terminal values

Coalescence of Geodesics in Exactly Solvable Models of Last Passage Percolation

Secret Key Generation from Correlated Sources and Secure Link

Know Your Master: Driver Profiling-based Anti-theft Method

Sentiment analysis based on rhetorical structure theory: Learning deep neural networks from discourse trees

Fast 2-D Complex Gabor Filter with Kernel Decomposition

A Faster Implementation of Online Run-Length Burrows-Wheeler Transform

Robust Optical Flow Estimation in Rainy Scenes

Image Fusion With Cosparse Analysis Operator

When Anderson localization makes quantum particles move backward

Hot or not? Forecasting cellular network hot spots using sector performance indicators

Criticality as It Could Be: organizational invariance as self-organized criticality in embodied agents

Bounds on some monotonic topological indices of bipartite graphs with a given number of cut edges

On PGZ decoding of alternant codes

A Comment on ‘Analysis of Video Image Sequences Using Point and Line Correspondences’

Convergence of extreme value statistics in a two-layer quasi-geostrophic atmospheric model

Peer Truth Serum: Incentives for Crowdsourcing Measurements and Opinions

Large-Scale Online Semantic Indexing of Biomedical Articles via an Ensemble of Multi-Label Classification Models

Scalable Global Grid catalogue for LHC Run3 and beyond

Best reply structure and equilibrium convergence in generic games

Mechanical Failure in Amorphous Solids: Scale Free Spinodal Criticality

Positive-instance driven dynamic programming for treewidth

Coverage and Rate of Downlink Sequence Transmissions with Reliability Guarantees

Infinite random planar maps related to Cauchy processes

The Robot Routing Problem for Collecting Aggregate Stochastic Rewards

Unsupervised Learning by Predicting Noise

Benchmarking OpenCL, OpenACC, OpenMP, and CUDA: programming productivity, performance, and energy consumption

On the choice of the low-dimensional domain for global optimization via random embeddings

A characterization of the Logarithmic Least Squares Method

Anomaly detection and motif discovery in symbolic representations of time series

On Low Complexity Detection for QAM Isomorphic Constellations

MuLoG, or How to apply Gaussian denoisers to multi-channel SAR speckle reduction?

Baselines and test data for cross-lingual inference

Joint Domain Based Massive Access for Small Packets Traffic of Uplink Wireless Channel

Improving the Performance of OTDOA based Positioning in NB-IoT System

Understanding Negations in Information Processing: Learning from Replicating Human Behavior

Representing Sentences as Low-Rank Subspaces

An Adaptive Observer Design for Takagi-Sugeno type Nonlinear System

Interactive Outlining of Pancreatic Cancer Liver Metastases in Ultrasound Images

Online Weighted Matching: Beating the $\frac{1}{2}$ Barrier

An adaptive observer design approach for discrete-time nonlinear systems

Mining Worse and Better Opinions. Unsupervised and Agnostic Aggregation of Online Reviews

Hitting times of interacting drifted Brownian motions and the vertex reinforced jump process

A Study of Deep Learning Robustness Against Computation Failures

How to exploit prior information in low-complexity models

Waveform Design for Wireless Power Transfer with Limited Feedback

The phase diagram of the complex branching Brownian motion energy model

Power Efficient Hybrid Beamforming for Massive MIMO Public Channel

Wave-like Decoding of Tail-biting Spatially Coupled LDPC Codes Through Iterative Demapping

Ranking to Learn: Feature Ranking and Selection via Eigenvector Centrality

An Empirical Analysis of NMT-Derived Interlingual Embeddings and their Use in Parallel Sentence Identification

Light Field Blind Motion Deblurring

Diagonal RNNs in Symbolic Music Modeling

A Broad-Coverage Challenge Corpus for Sentence Understanding through Inference

Distributed Event-Triggered Control for Global Consensus of Multi-Agent Systems with Input Saturation

Online Degree-Bounded Steiner Network Design

Reverse Engineering of Communications Networks: Evolution and Challenges

Distributed Dynamic Event-Triggered Control for Multi-Agent Systems

Approximations from Anywhere and General Rough Sets