Neural System Combination for Machine Translation

Neural machine translation (NMT) becomes a new approach to machine translation and generates much more fluent results compared to statistical machine translation (SMT). However, SMT is usually better than NMT in translation adequacy. It is therefore a promising direction to combine the advantages of both NMT and SMT. In this paper, we propose a neural system combination framework leveraging multi-source NMT, which takes as input the outputs of NMT and SMT systems and produces the final translation. Extensive experiments on the Chinese-to-English translation task show that our model archives significant improvement by 5.3 BLEU points over the best single system output and 3.4 BLEU points over the state-of-the-art traditional system combination methods.

A Novel Distance Matric: Generalized Relative Entropy

Information entropy and its extension, which are important generalization of entropy, have been applied in many research domains today. In this paper, a novel generalized relative entropy is constructed to avoid some defects of traditional relative entropy. We presented the structure of generalized relative entropy after the discussion of defects in relative entropy. Moreover, some properties of the provided generalized relative entropy is presented and proved. The provided generalized relative entropy is proved to have a finite range and is a finite distance metric.

Equivalence Between Policy Gradients and Soft Q-Learning

Two of the leading approaches for model-free reinforcement learning are policy gradient methods and Q-learning methods. Q-learning methods can be effective and sample-efficient when they work, however, it is not well-understood why they work, since empirically, the Q-values they estimate are very inaccurate. A partial explanation may be that Q-learning methods are secretly implementing policy gradient updates: we show that there is a precise equivalence between Q-learning and policy gradient methods in the setting of entropy-regularized reinforcement learning, that ‘soft’ (entropy-regularized) Q-learning is exactly equivalent to a policy gradient method. We also point out a connection between Q-learning methods and natural policy gradient methods. Experimentally, we explore the entropy-regularized versions of Q-learning and policy gradients, and we find them to perform as well as (or slightly better than) the standard variants on the Atari benchmark. We also show that the equivalence holds in practical settings by constructing a Q-learning method that closely matches the learning dynamics of A3C without using a target network or \epsilon-greedy exploration schedule.

Bandit Structured Prediction for Neural Sequence-to-Sequence Learning

Bandit structured prediction describes a stochastic optimization framework where learning is performed from partial feedback. This feedback is received in the form of a task loss evaluation to a predicted output structure, without having access to gold standard structures. We advance this framework by lifting linear bandit learning to neural sequence-to-sequence learning problems using attention-based recurrent neural networks. Furthermore, we show how to incorporate control variates into our learning algorithms for variance reduction and improved generalization. We present an evaluation on a neural machine translation task that shows improvements of up to 5.89 BLEU points for domain adaptation from simulated bandit feedback.

Firing Cell: An Artificial Neuron with a Simulation of Long-Term-Potentiation-Related Memory

We propose a computational model of neuron, called firing cell (FC), properties of which cover such phenomena as attenuation of receptors for external stimuli, delay and decay of postsynaptic potentials, modification of internal weights due to propagation of postsynaptic potentials through the dendrite, modification of properties of the analog memory for each input due to a pattern of short-time synaptic potentiation or long-time synaptic potentiation (LTP), output-spike generation when the sum of all inputs exceeds a threshold, and refraction. The cell may take one of the three forms: excitatory, inhibitory, and receptory. The computer simulations showed that, depending on the phase of input signals, the artificial neuron’s output frequency may demonstrate various chaotic behaviors.

Mutual Information, Neural Networks and the Renormalization Group

A Time Hierarchy Theorem for the LOCAL Model

Statistical Analysis of Time-Variant Channels in Diffusive Mobile Molecular Communications

Complexity of the Fourier transform on the Johnson graph

A Reinforcement Learning Approach to Weaning of Mechanical Ventilation in Intensive Care Units

Quality of Service of an Asynchronous Crash-Recovery Leader Election Algorithm

k-Majority Digraphs and the Hardness of Voting with a Constant Number of Voters

Efficient Gender Classification Using a Deep LDA-Pruned Net

Finding Euclidean Distance to a Convex Cone Generated by a Large Number of Discrete Points

Settling the query complexity of non-adaptive junta testing

Good Features to Correlate for Visual Tracking

Deep Clustering via Joint Convolutional Autoencoder Embedding and Relative Entropy Minimization

Stochastic comparisons of series and parallel systems with heterogeneous components

Rate-Splitting to Mitigate Residual Transceiver Hardware Impairments in Massive MIMO Systems

Identifying First-person Camera Wearers in Third-person Videos

Graph Invariants with Connections to the Feynman Period in $φ^4$ Theory

Model order reduction for stochastic dynamical systems with continuous symmetries

Stability and Fluctuations in a Simple Model of Phonetic Category Change

SwellShark: A Generative Model for Biomedical Named Entity Recognition without Labeled Data

Shared processor scheduling

Hard Mixtures of Experts for Large Scale Weakly Supervised Vision

Facially Dual Complete (Nice) cones and lexicographic tangents

NormFace: $L_2$ Hypersphere Embedding for Face Verification

Robust Multi-view Pedestrian Tracking Using Neural Networks

Recalibration: A post-processing method for approximate Bayesian computation

Quantum Codes from Linear Codes over Finite Chain Rings

A data set for evaluating the performance of multi-class multi-object video tracking

Improving Context Aware Language Models

Hierarchical 3D fully convolutional networks for multi-organ segmentation

Multiple Reflection Symmetry Detection via Linear-Directional Kernel Density Estimation

Tail sums of Wishart and GUE eigenvalues beyond the bulk edge

Distribution of k-Hop Paths in the Random Connection Model

Subject-Specific Abnormal Region Detection in Traumatic Brain Injury Using Sparse Model Selection on High Dimensional Diffusion Data

Solar Power Plant Detection on Multi-Spectral Satellite Imagery using Convolutional Neural Networks with Feedback Model and m-PCNN Fusion

Track Everything: Limiting Prior Knowledge in Online Multi-Object Recognition

Short-Packet Communications in Non-Orthogonal Multiple Access Systems

Massive MIMO Downlink 1-Bit Precoding with Linear Programming for PSK Signaling

Asymptotic theory of multiple-set linear canonical analysis

Faster Rates for Policy Learning

Stationary analysis of the shortest queue problem

Robust and Fast Decoding of High-Capacity Color QR Codes for Mobile Applications

A numerical study of heat source reconstruction for the advection-diffusion operator: A conjugate gradient method stabilized with SVD

A Domain Based Approach to Social Relation Recognition

A Theory of Nonlinear Signal-Noise Interactions in Wavelength Division Multiplexed Coherent Systems

Gap structure of 1D cut and project Hamiltonians

Energy of commuting graph of finite groups whose centralizers are Abelian

Visibility graphs and symbolic dynamics

The technosphere in Earth system analysis: a coevolutionary perspective

Attend to You: Personalized Image Captioning with Context Sequence Memory Networks

Combined Fractional Variational Problems of Variable Order and Some Computational Aspects

Reported design characteristics influence heterogeneity among randomized trials

A spatio-temporal process-convolution model for quantifying health inequalities in respiratory prescription rates in Scotland

The Ising Partition Function: Zeros and Deterministic Approximation

Time Series Prediction for Graphs in Kernel and Dissimilarity Spaces

Remote Channel Inference for Beamforming in Ultra-Dense Hyper-Cellular Network

Exploring the bounds on the positive semidefinite rank

Functional Erdös-Renyi laws for Levy processes

Nonlinear Precoders for Massive MIMO Systems with General Constraints

Fairness in Resource Allocation and Slowed-down Dependent Rounding

Asymptotic Performance Analysis of Spatially Reconfigurable Antenna Arrays

Incorporation of geometallurgical modelling into long-term production planning

Improper Colourings inspired by Hadwiger’s Conjecture

Estimation of the discontinuous leverage effect: Evidence from the NASDAQ order book

A 3D fully convolutional neural network and a random walker to segment the esophagus in CT

On mean-variance hedging under partial observations and terminal wealth constraints

PQTable: Non-exhaustive Fast Search for Product-quantized Codes using Hash Tables

Attention Strategies for Multi-Source Sequence-to-Sequence Learning

A level-1 Limit Order book with time dependent arrival rates

Existence of solutions to a general geometric elliptic variational problem

Infinite end-devouring sets of rays with prescribed start vertices

Hydrodynamic limit and viscosity solutions for a 2D growth process in the anisotropic KPZ class

On the 1-dimensional complex Ornstein-Uhlenbeck operator

Panorama to panorama matching for location recognition

A Semantic QA-Based Approach for Text Summarization Evaluation

Context-based Object Viewpoint Estimation: A 2D Relational Approach

Making Neural Programming Architectures Generalize via Recursion

Composite Quasi-Maximum Likelihood Estimation of Dynamic Panels with Group-Specific Heterogeneity and Spatially Dependent Errors

Accurately and Efficiently Interpreting Human-Robot Instructions of Varying Granularities

Scientific Article Summarization Using Citation-Context and Article’s Discourse Structure

A hybrid spatial data mining approach based on fuzzy topological relations and MOSES evolutionary algorithm

Path-contractions, edge deletions and connectivity preservation

Symmetry in Software Synthesis

Learned D-AMP: A Principled CNN-based Compressive Image Recovery Algorithm

Consistency and Asymptotic Normality of Latent Blocks Model Estimators

Partition-Theoretic Formulas for Arithmetic Densities

Full-Duplex Relaying with Improper Gaussian Signaling over Nakagami-m Fading Channels

Total Variation Approximation of Random Orthogonal Matrices by Gaussian Matrices

Feed-forward approximations to dynamic recurrent network architectures