Learning to Reason: End-to-End Module Networks for Visual Question Answering

Natural language questions are inherently compositional, and many are most easily answered by reasoning about their decomposition into modular sub-problems. For example, to answer ‘is there an equal number of balls and boxes?’ we can look for balls, look for boxes, count them, and compare the results. The recently proposed Neural Module Network (NMN) architecture implements this approach to question answering by parsing questions into linguistic substructures and assembling question-specific deep networks from smaller modules that each solve one subtask. However, existing NMN implementations rely on brittle off-the-shelf parsers, and are restricted to the module configurations proposed by these parsers rather than learning them from data. In this paper, we propose End-to-End Module Networks (N2NMNs), which learn to reason by directly predicting instance-specific network layouts without the aid of a parser. Our model learns to generate network structures (by imitating expert demonstrations) while simultaneously learning network parameters (using the downstream task loss). Experimental results on the new CLEVR dataset targeted at compositional question answering show that N2NMNs achieve an error reduction of nearly 50% relative to state-of-the-art attentional approaches, while discovering interpretable network architectures specialized for each question.


Stochastic Gradient Twin Support Vector Machine for Large Scale Problems

For classification problems, twin support vector machine (TSVM) with nonparallel hyperplanes has been shown to be more powerful than support vector machine (SVM). However, it is time consuming and insufficient memory to deal with large scale problems due to calculating the inverse of matrices. In this paper, we propose an efficient stochastic gradient twin support vector machine (SGTSVM) based on stochastic gradient descent algorithm (SGD). As far as now, it is the first time that SGD is applied to TSVM though there have been some variants where SGD was applied to SVM (SGSVM). Compared with SGSVM, our SGTSVM is more stable, and its convergence is also proved. In addition, its simple nonlinear version is also presented. Experimental results on several benchmark and large scale datasets have shown that the performance of our SGTSVM is comparable to the current classifiers with a very fast learning speed.


Average nearest neighbor degrees in scale-free networks

The average nearest neighbor degree (ANND) of a node of degree k, as a function of k, is often used to characterize dependencies between degrees of a node and its neighbors in a network. We study the limiting behavior of the ANND in undirected random graphs with general i.i.d. degree sequences and arbitrary joint degree distribution of neighbor nodes, when the graph size tends to infinity. When the degree distribution has finite variance, the ANND converges to a deterministic function and we prove that for the configuration model, where nodes are connected at random, this, naturally, is a constant. For degree distributions with infinite variance, the ANND in the configuration model scales with the size of the graph and we prove a central limit theorem that characterizes this behavior. As a result, the ANND is uninformative for graphs with infinite variance degree distributions. We propose an alternative measure, the average nearest neighbor rank (ANNR) and prove its convergence to a deterministic function whenever the degree distribution has finite mean. In addition to our theoretical results we provide numerical experiments to show the convergence of both functions in the configuration model and the erased configuration model, where self-loops and multiple edges are removed. These experiments also shed new light on the well-known `structural negative correlations’, or `finite-size effects’, that arise in simple graphs, because large nodes can only have a limited number of large neighbors. In particular we show that the majority of such effects for regularly varying distributions are due to a sampling bias.


Adversarial Multi-task Learning for Text Classification

Neural network models have shown their promising opportunities for multi-task learning, which focus on learning the shared layers to extract the common and task-invariant features. However, in most existing approaches, the extracted shared features are prone to be contaminated by task-specific features or the noise brought by other tasks. In this paper, we propose an adversarial multi-task learning framework, alleviating the shared and private latent feature spaces from interfering with each other. We conduct extensive experiments on 16 different text classification tasks, which demonstrates the benefits of our approach. Besides, we show that the shared knowledge learned by our proposed model can be regarded as off-the-shelf knowledge and easily transferred to new tasks. The datasets of all 16 tasks are publicly available at \url{http://…/}


Redefining Context Windows for Word Embedding Models: An Experimental Study

Distributional semantic models learn vector representations of words through the contexts they occur in. Although the choice of context (which often takes the form of a sliding window) has a direct influence on the resulting embeddings, the exact role of this model component is still not fully understood. This paper presents a systematic analysis of context windows based on a set of four distinct hyper-parameters. We train continuous Skip-Gram models on two English-language corpora for various combinations of these hyper-parameters, and evaluate them on both lexical similarity and analogy tasks. Notable experimental results are the positive impact of cross-sentential contexts and the surprisingly good performance of right-context windows.


A Fresh Approach to Forecasting in Astroparticle Physics and Dark Matter Searches

A geometric approach to non-linear correlations with intrinsic scatter

Generalized Ideals and Co-Granular Rough Sets

Feedback-Capacity of Degraded Gaussian Vector BC using Directed Information and Concave Envelopes

Strongly Polynomial 2-Approximations of Discrete Wasserstein Barycenters

The pinnacle set of a permutation

Investigating Recurrence and Eligibility Traces in Deep Q-Networks

Quantifying instabilities in Financial Markets

Full statistical mode reconstruction of a light field via a photon-number resolved measurement

Optimal Posted Prices for Online Cloud Resource Allocation

25 Tweets to Know You: A New Model to Predict Personality with Social Media

Computer Vision for Autonomous Vehicles: Problems, Datasets and State-of-the-Art

Building Regular Registers with Rational Malicious Servers and Anonymous Clients — Extended Version

Performance of Optimal Data Shaping Codes

Smooth polytopes with negative Ehrhart coefficients

Optimal Jittered Sampling for two Points in the Unit Square

Signaling on the Continuous Spectrum of Nonlinear Optical fiber

Beating Atari with Natural Language Guided Reinforcement Learning

Combining parameter values or $p$-values

Lower bound on the 2-adic complexity of Ding-Helleseth generalized cyclotomic sequences of period $p^n$

Annotating Object Instances with a Polygon-RNN

Extractive Summarization: Limits, Compression, Generalized Model and Heuristics

Discovering Evolutionary Stepping Stones through Behavior Domination

A version of the random directed forest and its convergence to the Brownian web

The Combinatorics of Directed Planar Trees

The Impact of Antenna Height Difference on the Performance of Downlink Cellular Networks

Post-Capture Lighting Manipulation using Flash Photography

Introduction to Ultra Reliable and Low Latency Communications in 5G

Simultaneous Policy Learning and Latent State Inference for Imitating Driver Behavior

On the growth of a superlinear preferential attachment scheme

Using Contexts and Constraints for Improved Geotagging of Human Trafficking Webpages

Periodicity and integrability for the cube recurrence

Predicting Role Relevance with Minimal Domain Expertise in a Financial Domain

Answering Complex Questions Using Open Information Extraction

Proposal of Vital Data Analysis Platform using Wearable Sensor

1D Modeling of Sensor Selection Problem for Weak Barrier Coverage and Gap Mending in Wireless Sensor Networks

BMO estimates for stochastic singular integral operators and its application to PDEs with Lévy noise

A Large Self-Annotated Corpus for Sarcasm

Morrey-Campanato estimates for the moments of stochastic integral operators and its application to SPDEs

Schauder estimates for stochastic transport-diffusion equations with Lévy processes

Computability in the Lattice of Equivalence Relations

Learning to Fly by Crashing

Proactive Eavesdropping in Relaying Systems

OCRAPOSE II: An OCR-based indoor positioning system using mobile phone images

Testing Docker Performance for HPC Applications

Dependency resolution and semantic mining using Tree Adjoining Grammars for Tamil Language

A Novel Receiver Design with Joint Coherent and Non-Coherent Processing

Deduplication in a massive clinical note dataset

Proof of Chapoton’s conjecture on Newton polytopes of $q$-Ehrhart polynomials

Maximum Likelihood Detection for Collaborative Molecular Communication

FSITM: A Feature Similarity Index For Tone-Mapped Images

Perfect Half Space Games

Continuous Inference for Aggregated Point Process Data

Reduction for stochastic biochemical reaction networks with multiscale conservations

ConvNet-Based Localization of Anatomical Structures in 3D Medical Images

Classical and bayesian componentwise predictors for non-compact correlated ARH(1) processes

A Technical Report on PLS-Completeness of Single-Swap for Unweighted Metric Facility Location and $K$-Means

Skeleton Boxes: Solving skeleton based action detection with a single deep convolutional neural network

Stability of Piecewise Deterministic Markovian Load Processes on Networks

Skeleton based action recognition using translation-scale invariant image mapping and multi-scale deep cnn

Effects of the optimisation of the margin distribution on generalisation in deep architectures

Nash Equilibrium Approximation under Communication and Computation Constraints in Large-Scale Non-cooperative Games

Characterizations and algorithms for generalized Cops and Robbers games

Generalised least squares estimation of regularly varying space-time processes based on flexible observation schemes

Common adversaries form alliances: modelling complex networks via anti-transitivity

Alphabet-dependent Parallel Algorithm for Suffix Tree Construction for Pattern Searching

CNN based music emotion classification

Automorphism group of the subspace inclusion graph of a vector space

Unsupervised object segmentation in video by efficient selection of highly probable positive features

Design of low-cost, compact and weather-proof whole sky imagers for high-dynamic-range captures

m-Bonsai: a Practical Compact Dynamic Trie

A Fractional Gauss-Jacobi quadrature rule for approximating fractional integrals and derivatives

A multi-method simulation of a high-frequency bus line using AnyLogic

Unsupervised Creation of Parameterized Avatars

NEURAL: quantitative features for newborn EEG using Matlab

Fractional Herglotz Variational Principles with Generalized Caputo Derivatives

Automatic Segmentation of the Left Ventricle in Cardiac CT Angiography Using Convolutional Neural Network

Quantum Sphere-Packing Bounds with Polynomial Prefactors

Existence and approximation of fixed points of vicinal mappings in geodesic spaces

Survivor-complier effects in the presence of selection on treatment, with application to a study of prompt ICU admission

A Deep Learning Framework using Passive WiFi Sensing for Respiration Monitoring

$β$-expansion: A Theoretical Framework for Fast and Recursive Construction of Polar Codes

C-RAN with Hybrid RF/FSO Fronthaul Links: Joint Optimization of RF Time Allocation and Fronthaul Compression

Universal Adversarial Perturbations Against Semantic Image Segmentation

D-optimal designs for complex Ornstein-Uhlenbeck processes

The proximal point algorithm in geodesic spaces with curvature bounded above

The True Destination of EGO is Multi-local Optimization

Evolution of high-order connected components in random hypergraphs

Learning Video Object Segmentation with Visual Memory

How Long It Takes for an Ordinary Node with an Ordinary ID to Output?

Study of Anomaly Detection Based on Randomized Subspace Methods in IP Networks

Conditional measure on random sets such as Brownian path and convergence of random measures

Vehicular Communications: A Physical Layer Perspective

Understanding Task Design Trade-offs in Crowdsourced Paraphrase Collection

A location-aware embedding technique for accurate landmark recognition

Rate-Distortion Theory of Finite Point Processes

Spatio-temporal analysis of regional unemployment rates: A comparison of model based approaches

Polar factorization of conformal and projective maps of the sphere in the sense of optimal mass transport

Derivation of the Asymptotic Eigenvalue Distribution for Causal 2D-AR Models under Upscaling

Deep Occlusion Reasoning for Multi-Camera Multi-Target Detection

Accurate Single Stage Detector Using Recurrent Rolling Convolution

A Catalan Subset of Descending Plane Partitions

Positive Semidefiniteness and Positive Definiteness of a Linear Parametric Interval Matrix

Importance Sampled Stochastic Optimization for Variational Inference

d-Complete posets: local structural axioms, properties, and equivalent definitions

Sorting sums of binary decision summands

Network Dissection: Quantifying Interpretability of Deep Visual Representations

Tikhonov regularization of control-constrained optimal control problems

A complete dichotomy for complex-valued Holant^c

The free boundary Schur process and applications

Online Weighted Degree-Bounded Steiner Networks via Novel Online Mixed Packing/Covering

Analytical study of the ‘master-worker’ framework scalability on multiprocessors with distributed memory

Learn to Model Motion from Blurry Footages

Noise-Tolerant Interactive Learning from Pairwise Comparisons with Near-Minimal Label Complexity

Deterministic Quantum Annealing Expectation-Maximization Algorithm

Interlacement of double curves of immersed spheres

Kesten’s bound for sub-exponential densities on the real line and its multi-dimensional analogues

Learning to Generate Long-term Future via Hierarchical Prediction

SkiMap: An Efficient Mapping Framework for Robot Navigation

Beating $1-\frac{1}{e}$ for Ordered Prophets

Generative Face Completion

Advertisements