Pruning variable selection ensembles

In the context of variable selection, ensemble learning has gained increasing interest due to its great potential to improve selection accuracy and to reduce false discovery rate. A novel ordering-based selective ensemble learning strategy is designed in this paper to obtain smaller but more accurate ensembles. In particular, a greedy sorting strategy is proposed to rearrange the order by which the members are included into the integration process. Through stopping the fusion process early, a smaller subensemble with higher selection accuracy can be obtained. More importantly, the sequential inclusion criterion reveals the fundamental strength-diversity trade-off among ensemble members. By taking stability selection (abbreviated as StabSel) as an example, some experiments are conducted with both simulated and real-world data to examine the performance of the novel algorithm. Experimental results demonstrate that pruned StabSel generally achieves higher selection accuracy and lower false discovery rates than StabSel and several other benchmark methods.

Limits of End-to-End Learning

End-to-end learning refers to training a possibly complex learning system by applying gradient-based learning to the system as a whole. End-to-end learning system is specifically designed so that all modules are differentiable. In effect, not only a central learning machine, but also all ‘peripheral’ modules like representation learning and memory formation are covered by a holistic learning process. The power of end-to-end learning has been demonstrated on many tasks, like playing a whole array of Atari video games with a single architecture. While pushing for solutions to more challenging tasks, network architectures keep growing more and more complex. In this paper we ask the question whether and to what extent end-to-end learning is a future-proof technique in the sense of scaling to complex and diverse data processing architectures. We point out potential inefficiencies, and we argue in particular that end-to-end learning does not make optimal use of the modular design of present neural networks. Our surprisingly simple experiments demonstrate these inefficiencies, up to the complete breakdown of learning.

Semantic Autoencoder for Zero-Shot Learning

Existing zero-shot learning (ZSL) models typically learn a projection function from a feature space to a semantic embedding space (e.g.~attribute space). However, such a projection function is only concerned with predicting the training seen class semantic representation (e.g.~attribute prediction) or classification. When applied to test data, which in the context of ZSL contains different (unseen) classes without training data, a ZSL model typically suffers from the project domain shift problem. In this work, we present a novel solution to ZSL based on learning a Semantic AutoEncoder (SAE). Taking the encoder-decoder paradigm, an encoder aims to project a visual feature vector into the semantic space as in the existing ZSL models. However, the decoder exerts an additional constraint, that is, the projection/code must be able to reconstruct the original visual feature. We show that with this additional reconstruction constraint, the learned projection function from the seen classes is able to generalise better to the new unseen classes. Importantly, the encoder and decoder are linear and symmetric which enable us to develop an extremely efficient learning algorithm. Extensive experiments on six benchmark datasets demonstrate that the proposed SAE outperforms significantly the existing ZSL models with the additional benefit of lower computational cost. Furthermore, when the SAE is applied to supervised clustering problem, it also beats the state-of-the-art.

A New Type of Neurons for Machine Learning

In machine learning, the use of an artificial neural network is the mainstream approach. Such a network consists of layers of neurons. These neurons are of the same type characterized by the two features: (1) an inner product of an input vector and a matching weighting vector of trainable parameters and (2) a nonlinear excitation function. Here we investigate the possibility of replacing the inner product with a quadratic function of the input vector, thereby upgrading the 1st order neuron to the 2nd order neuron, empowering individual neurons, and facilitating the optimization of neural networks. Also, numerical examples are provided to illustrate the feasibility and merits of the 2nd order neurons. Finally, further topics are discussed.

Question Answering on Knowledge Bases and Text using Universal Schema and Memory Networks

Existing question answering methods infer answers either from a knowledge base or from raw text. While knowledge base (KB) methods are good at answering compositional questions, their performance is often affected by the incompleteness of the KB. Au contraire, web text contains millions of facts that are absent in the KB, however in an unstructured form. {\it Universal schema} can support reasoning on the union of both structured KBs and unstructured text by aligning them in a common embedded space. In this paper we extend universal schema to natural language question answering, employing \emph{memory networks} to attend to the large body of facts in the combination of text and KB. Our models can be trained in an end-to-end fashion on question-answer pairs. Evaluation results on \spades fill-in-the-blank question answering dataset show that exploiting universal schema for question answering is better than using either a KB or text alone. This model also outperforms the current state-of-the-art by 8.5 F_1 points.\footnote{Code and data available in \url{https://…/TextKBQA}}

Multimodal Word Distributions

Word embeddings provide point representations of words containing useful semantic information. We introduce multimodal word distributions formed from Gaussian mixtures, for multiple word meanings, entailment, and rich uncertainty information. To learn these distributions, we propose an energy-based max-margin objective. We show that the resulting approach captures uniquely expressive semantic information, and outperforms alternatives, such as word2vec skip-grams, and Gaussian embeddings, on benchmark datasets such as word similarity and entailment.

A Survey of Neural Network Techniques for Feature Extraction from Text

This paper aims to catalyze the discussions about text feature extraction techniques using neural network architectures. The research questions discussed in the paper focus on the state-of-the-art neural network techniques that have proven to be useful tools for language processing, language generation, text classification and other computational linguistics tasks.

Multiscale Analysis for Higher-order Tensors

The widespread use of multisensor technology and the emergence of big data sets have created the necessity to develop more versatile tools to represent large and multimodal data such as higher-order tensors. Tensor decomposition based methods have been shown to be flexible in the choice of the constraints and to extract more general latent components in such data compared to matrix-based methods. For these reasons, tensor decompositions have found applications in many different signal processing problems including dimensionality reduction, signal separation, linear regression, feature extraction, and classification. However, most of the existing tensor decomposition methods are founded on the principle of finding a low-rank approximation in a linear subspace structure, where the definition of the rank may change depending on the particular decomposition. Since most data are not necessarily low-rank in a linear subspace, this often results in high approximation errors or low compression rates. In this paper, we introduce a new adaptive, multi-scale tensor decomposition method for higher order data inspired by hybrid linear modeling and subspace clustering techniques. In particular, we develop a multi-scale higher-order singular value decomposition (MS-HoSVD) approach where a given tensor is first permuted and then partitioned into several sub-tensors each of which can be represented as a low-rank tensor increasing the efficiency of the representation. The proposed approach is evaluated for two different signal processing applications: dimensionality reduction and classification.

Analytic Approach to Activity-dependent Adaptive Boolean Networks

We propose new activity-dependent adaptive Boolean networks inspired by the cis-regulatory mechanism in gene regulatory networks. We analytically show that our model can be solved for stationary in-degree distribution for a wide class of update rules by employing the annealed approximation of Boolean network dynamics and that evolved Boolean networks have a preassigned average sensitivity that can be set independently of update rules. In particular, when it is set to 1, our theory predicts that the proposed network rewiring algorithm drives Boolean networks towards criticality. We verify that these analytic results agree well with numerical simulations for four representative update rules. We also discuss the relationship between sensitivity of update rules and stationary in-degree distributions and compare it with that in real-world gene regulatory networks.

Exploring the Performance Benefit of Hybrid Memory System on HPC Environments

Likelihood Ratio as Weight of Forensic Evidence: A Closer Look

Preferential Attachment Random Graphs with Edge-Step Functions

Deep Cross-Modal Audio-Visual Generation

Distributed Finite Time Termination of Ratio Consensus for Averaging in the presence of Delays

Diversity driven Attention Model for Query-based Abstractive Summarization

Multidimensional Rational Covariance Extension with Approximate Covariance Matching

Face Identification and Clustering

A word property for twisted involutions in Coxeter groups

Point-shifts of Point Processes on Topological Groups

On the multifractal local behavior of parabolic stochastic PDEs

A Distributed Shared Memory Model and C++ Templated Meta-Programming Interface for the Epiphany RISC Array Processor

Bloch oscillations in two-dimensional crystals: Inverse problem

Hypothesis Testing under Mutual Information Privacy Constraints in the High Privacy Regime

SOFAR: large-scale association network learning

The MacGyver Test – A Framework for Evaluating Machine Resourcefulness and Creative Problem Solving

From Characters to Words to in Between: Do We Capture Morphology?

Learning Exact Topology of a Loopy Power Grid from Ambient Dynamics

An Improved Bound for Minimizing the Total Weighted Completion Time of Coflows in Datacenters

Propagating elastic vibrations dominate thermal conduction in amorphous silicon

Identifying Similarities in Epileptic Patients for Drug Resistance Prediction

Low-complexity Distributed Tomographic Backprojection for large datasets

Neural AMR: Sequence-to-Sequence Models for Parsing and Generation

(Quasi)Periodicity Quantification in Video Data, Using Topology

Large-scale Feature Selection of Risk Genetic Factors for Alzheimer’s Disease via Distributed Group Lasso Regression

Learning Structured Natural Language Representations for Semantic Parsing

Puns upon a midnight dreary, Lexical Semantics for the weak and weary

Tweeting AI: Perceptions of AI-Tweeters (AIT) vs Expert AI-Tweeters (EAIT)

Duluth at SemEval-2017 Task 6: Language Models in Humor Detection

A strong ergodic theorem for extreme and intermediate order statistics

A wearable general-purpose solution for Human-Swarm Interaction

Stein’s method for steady-state diffusion approximations

Elliptic hypergeometric functions associated with root systems

Efficient Projection Partitioning for parallel multi-objective integer optimisation

Basic Properties of Singular Fractional Order System with order (1,2)

Fractional Generalized KYP Lemma for Fractional Order System within Finite Frequency Range

Fractional Multidimensional System

A GRU-Gated Attention Model for Neural Machine Translation

DeepCCI: End-to-end Deep Learning for Chemical-Chemical Interaction Prediction

On Bootstrap Averaging Empirical Bayes Estimators

DNA Steganalysis Using Deep Recurrent Neural Networks

A note on Quantile curves based bivariate reliability concepts

Improved Oracles for Time-Dependent Road Networks

Regression Type Models for Extremal Dependence

Kernels by properly colored paths in arc-colored digraphs

Quantitative analytical theory for disordered nodal points

Locality Preserving Projections for Grassmann manifold

Communication complexity of approximate maximum matching in the message-passing model

Consensus of rankings

Linear-Size Hopsets with Small Hopbound, and Distributed Routing with Low Memory

A New Class of Nonlinear Precoders for Hardware Efficient Massive MIMO Systems

An Experimental Comparison of Uncertainty Sets for Robust Shortest Path Problems

Equating two maximum degrees

Asymptotics of Transmit Antenna Selection: Impact of Multiple Receive Antennas

The Graovac-Pisanski Index of Armchair Nanotubes

Sticky matroids and Kantor’s Conjecture

The utility of a Bayesian analysis of complex models and the study of archeological ${}^{14}$C data

Combinatorial Cost Sharing

Pseudo Unique Sink Orientations

On the implausibility of classical client blind quantum computing

No, This is not a Circle

Convex and isometric domination of (weak) dominating pair graphs

Optimal client recommendation for market makers in illiquid financial products

Extending Message Passing Interface Windows to Storage

The Abelian distribution

Multi-Metrics Learning for Speech Enhancement

No More Discrimination: Cross City Adaptation of Road Scene Segmenters

On Schur multiple zeta functions: A combinatoric generalization of multiple zeta functions

A polynomial-time randomized reduction from tournament isomorphism to tournament asymmetry

EEG-Based User Reaction Time Estimation Using Riemannian Geometry Features

The Parameterized Complexity of Positional Games

ICNet for Real-Time Semantic Segmentation on High-Resolution Images

Equivalent martingale measures for Lévy-driven moving averages and related processes

Practical and Effective Re-Pair Compression

Estimating thresholding levels for random fields via Euler characteristics

Frequency-domain Compressive Channel Estimation for Frequency-Selective Hybrid mmWave MIMO Systems

A continuous-time framework for ARMA processes

Faster Betweenness Centrality Updates in Evolving Networks

Scale-free behavior of networks with the copresence of preferential and uniform attachment rules

Multifractal Analysis of Pulsar Timing Residuals: Assessment of Gravitational Waves Detection

Representations of weakly multiplicative arithmetic matroids are unique

Limit theorems for multidimensional long-range dependent processes

BAM! The Behance Artistic Media Dataset for Recognition Beyond Photography

Saliency Benchmarking: Separating Models, Maps and Metrics

End-to-End Multimodal Emotion Recognition using Deep Neural Networks

Improved approximation algorithm for the Dense-3-Subhypergraph Problem

Asymptotic control theory for a closed string

Epidemic Extinction Paths in Complex Networks

Locality via Partially Lifted Codes

Full-Page Text Recognition: Learning Where to Start and When to Stop

Sparse Hierachical Extrapolated Parametric Methods for Cortical Data Analysis

Minimizers of Gerstewitz functionals

Pattern Avoidance in Double Lists

Factorization formulas of $K$-$k$-Schur functions I

Paracontrolled distributions on Bravais lattices and weak universality of the 2d parabolic Anderson model

Factorization formulas of $K$-$k$-Schur functions II

Expected Number of Distinct Subsequences in Randomly Generated Binary Strings

Evolution of moments and correlations in non-renewal escape-time processes

Construction of the Lindström valuation of an algebraic extension

Local Marchenko-Pastur Law for Random Bipartite Graphs

A quantitative assessment of the effect of different algorithmic schemes to the task of learning the structure of Bayesian Networks

Non-Uniform Attacks Against Pseudoentropy

Age-Minimal Transmission in Energy Harvesting Two-hop Networks

Combinatorial 6/5-Approximation of Steiner Tree

Optimal Sample Complexity for Matrix Completion and Related Problems via $\ell_2$-Regularization

Deep Functional Maps: Structured Prediction for Dense Shape Correspondence

SIT: A Lightweight Encryption Algorithm for Secure Internet of Things