SILVar: Single Index Latent Variable Models

A semi-parametric, non-linear regression model in the presence of latent variables is introduced. These latent variables can correspond to unmodeled phenomena or unmeasured agents in a complex networked system. This new formulation allows joint estimation of certain non-linearities in the system, the direct interactions between measured variables, and the effects of unmodeled elements on the observed system. The particular form of the model is justified, and learning is posed as a regularized maximum likelihood estimation. This leads to classes of structured convex optimization problems with a ‘sparse plus low-rank’ flavor. Relations between the proposed model and several common model paradigms, such as those of Robust Principal Component Analysis (PCA) and Vector Autoregression (VAR), are established. Particularly in the VAR setting, the low-rank contributions can come from broad trends exhibited in the time series. Details of the algorithm for learning the model are presented. Experiments demonstrate the performance of the model and the estimation algorithm on simulated and real data.

Relevance-based Word Embedding

Learning a high-dimensional dense representation for vocabulary terms, also known as a word embedding, has recently attracted much attention in natural language processing and information retrieval tasks. The embedding vectors are typically learned based on term proximity in a large corpus. This means that the objective in well-known word embedding algorithms, e.g., word2vec, is to accurately predict adjacent word(s) for a given word or context. However, this objective is not necessarily equivalent to the goal of many information retrieval (IR) tasks. The primary objective in various IR tasks is to capture relevance instead of term proximity, syntactic, or even semantic similarity. This is the motivation for developing unsupervised relevance-based word embedding models that learn word representations based on query-document relevance information. In this paper, we propose two learning models with different objective functions; one learns a relevance distribution over the vocabulary set for each query, and the other classifies each term as belonging to the relevant or non-relevant class for each query. To train our models, we used over six million unique queries and the top ranked documents retrieved in response to each query, which are assumed to be relevant to the query. We extrinsically evaluate our learned word representation models using two IR tasks: query expansion and query classification. Both query expansion experiments on four TREC collections and query classification experiments on the KDD Cup 2005 dataset suggest that the relevance-based word embedding models significantly outperform state-of-the-art proximity-based embedding models, such as word2vec and GloVe.

Spatial Random Sampling: A Structure-Preserving Data Sketching Tool

Random column sampling is not guaranteed to yield data sketches that preserve the underlying structures of the data and may not sample sufficiently from less-populated data clusters. Also, adaptive sampling can often provide accurate low rank approximations, yet may fall short of producing descriptive data sketches, especially when the cluster centers are linearly dependent. Motivated by that, this paper introduces a novel randomized column sampling tool dubbed Spatial Random Sampling (SRS), in which data points are sampled based on their proximity to randomly sampled points on the unit sphere. The most compelling feature of SRS is that the corresponding probability of sampling from a given data cluster is proportional to the surface area the cluster occupies on the unit sphere, independently from the size of the cluster population. Although it is fully randomized, SRS is shown to provide descriptive and balanced data representations. The proposed idea addresses a pressing need in data science and holds potential to inspire many novel approaches for analysis of big data.

Lectures on the mean values of functionals — An elementary introduction to infinite-dimensional probability

This is an elementary introduction to infinite-dimensional probability. In the lectures, we compute the exact mean values of some functionals on C[0,1] and L[0,1] by considering these functionals as infinite-dimensional random variables. The results show that there exist the complete concentration of measure phenomenon for these mean values since the variances are all zeroes.

An initialization method for the k-means using the concept of useful nearest centers

The aim of the k-means is to minimize squared sum of Euclidean distance from the mean (SSEDM) of each cluster. The k-means can effectively optimize this function, but it is too sensitive for initial centers (seeds). This paper proposed a method for initialization of the k-means using the concept of useful nearest center for each data point.

A Survey of Deep Learning Methods for Relation Extraction

Relation Extraction is an important sub-task of Information Extraction which has the potential of employing deep learning (DL) models with the creation of large datasets using distant supervision. In this review, we compare the contributions and pitfalls of the various DL models that have been used for the task, to help guide the path ahead.

Deep Speaker Feature Learning for Text-independent Speaker Verification

Recently deep neural networks (DNNs) have been used to learn speaker features. However, the quality of the learned features is not sufficiently good, so a complex back-end model, either neural or probabilistic, has to be used to address the residual uncertainty when applied to speaker verification, just as with raw features. This paper presents a convolutional time-delay deep neural network structure (CT-DNN) for speaker feature learning. Our experimental results on the Fisher database demonstrated that this CT-DNN can produce high-quality speaker features: even with a single feature (0.3 seconds including the context), the EER can be as low as 7.68%. This effectively confirmed that the speaker trait is largely a deterministic short-time property rather than a long-time distributional pattern, and therefore can be extracted from just dozens of frames.

Towards a Calculus for Wireless Networks

This paper presents a set of new results directly exploring the special characteristics of the wireless channel capacity process. An appealing finding is that, for typical fading channels, their instantaneous capacity and cumulative capacity are both light-tailed. A direct implication of this finding is that the cumulative capacity and subsequently the delay and backlog performance can be upper-bounded by some exponential distributions, which is often assumed but not justified in the wireless network performance analysis literature. In addition, various bounds are derived for distributions of the cumulative capacity and the delay-constrained capacity, considering three representative dependence structures in the capacity process, namely comonotonicity, independence, and Markovian. To help gain insights in the performance of a wireless channel whose capacity process may be too complex or detailed information is lacking, stochastic orders are introduced to the capacity process, based on which, results to compare the delay and delay-constrained capacity performance are obtained. Moreover, the impact of self-interference in communication, which is an open problem in stochastic network calculus (SNC), is investigated and original results are derived. The obtained results in this paper complement the SNC literature, easing its application to wireless networks and its extension towards a calculus for wireless networks.

A Survey of Distant Supervision Methods using PGMs

Relation Extraction refers to the task of populating a database with tuples of the form r(e_1, e_2), where r is a relation and e_1, e_2 are entities. Distant supervision is one such technique which tries to automatically generate training examples based on an existing KB such as Freebase. This paper is a survey of some of the techniques in distant supervision which primarily rely on Probabilistic Graphical Models (PGMs).

Analysing Data-To-Text Generation Benchmarks

Recently, several data-sets associating data to text have been created to train data-to-text surface realisers. It is unclear however to what extent the surface realisation task exercised by these data-sets is linguistically challenging. Do these data-sets provide enough variety to encourage the development of generic, high-quality data-to-text surface realisers ? In this paper, we argue that these data-sets have important drawbacks. We back up our claim using statistics, metrics and manual evaluation. We conclude by eliciting a set of criteria for the creation of a data-to-text benchmark which could help better support the development, evaluation and comparison of linguistically sophisticated data-to-text surface realisers.

A nested expectation-maximization algorithm for latent class regression models

Latent class regression models characterize the joint distribution of a multivariate categorical random variable under an assumption of conditional independence given a predictor-dependent latent class variable. Although these models are popular choices in several fields, current computational procedures based on the expectation-maximization (EM) algorithm require gradient methods to facilitate the derivations for the maximization step. However, these procedures do not provide monotone loglikelihood sequences, thereby leading to algorithms which may not guarantee reliable convergence. To address this issue, we propose a nested EM algorithm, which relies on a sequence of conditional expectation-maximizations for the regression coefficients associated with each predictor-dependent latent class. Leveraging the recent P\’olya-gamma data augmentation for logistic regression, the conditional expectation-maximizations reduce to a set of generalized least squares minimization problems. This method is a direct consequence of an exact EM algorithm which we develop for the special case of two latent classes. We show that the proposed computational methods provide a monotone loglikelihood sequence, and discuss the improved performance in two real data applications.

Survey of Visual Question Answering: Datasets and Techniques

Visual question answering (or VQA) is a new and exciting problem that combines natural language processing and computer vision techniques. We present a survey of the various datasets and models that have been used to tackle this task. The first part of the survey details the various datasets for VQA and compares them along some common factors. The second part of this survey details the different approaches for VQA, classified into four types: non-deep learning models, deep learning models without attention, deep learning models with attention, and other models which do not fit into the first three. Finally, we compare the performances of these approaches and provide some directions for future work.

Signal reconstruction via operator guiding

The Sequential Normal Scores Transformation

Socially Trusted Collaborative Edge Computing in Ultra Dense Networks

The middle-scale asymptotics of Wishart matrices

Reach of Repulsion for Determinantal Point Processes in High Dimensions

Integral Policy Iterations for Reinforcement Learning Problems in Continuous Time and Space

Superadditivity of quantum relative entropy for general states

Multi-Scale Spatially Weighted Local Histograms in O(1)

Low noise sensitivity analysis of Lq-minimization in oversampled systems

Geostatistical estimation of forest biomass in interior Alaska combining Landsat-derived tree cover, sampled airborne lidar and field observations

Shape Formation by Programmable Particles

A Probabilistic Framework for Quantifying Biological Complexity

Ergodicity on Sublinear Expectation Spaces

CORe50: a New Dataset and Benchmark for Continuous Object Recognition

TriviaQA: A Large Scale Distantly Supervised Challenge Dataset for Reading Comprehension


Asymptotics for Turán numbers of cycles in 3-uniform linear hypergraphs

Deep Episodic Value Iteration for Model-based Meta-Reinforcement Learning

Data Unfolding with Wiener-SVD Method

A Comment on ‘A New Degree of Freedom For Energy Efficiency of Digital Communication Systems’

Discovery Radiomics via Evolutionary Deep Radiomic Sequencer Discovery for Pathologically-Proven Lung Cancer Detection

Schnyder woods, SLE(16), and Liouville quantum gravity

Occupation measure of random walks and wired spanning forests in balls of Cayley graphs

On the Ergodic Rate Lower Bounds with Applications to Massive MIMO

A bump hunter’s guide to model uncertainty

Inapproximability of Maximum Biclique Problems, Minimum $k$-Cut and Densest At-Least-$k$-Subgraph from the Small Set Expansion Hypothesis

A Duality Between Depth-Three Formulas and Approximation by Depth-Two

Sofic and percolative entropy over infinite regular trees

Automatic Response Category Combination in Multinomial Logistic Regression

Collaborative Descriptors: Convolutional Maps for Preprocessing

Solving Multi-Objective MDP with Lexicographic Preference: An application to stochastic planning with multiple quantile objective

Performance Evaluation and Modeling of HPC I/O on Non-Volatile Memory

A system of nonlinear equations with application to large deviations for Markov chains with finite lifetime

Computation of K-Core Decomposition on Giraph

Nonuniformity of P-values Can Occur Early in Diverging Dimensions

Learning RGB-D Salient Object Detection using background enclosure, depth contrast, and top-down features

Analysis of Optimization Algorithms via Integral Quadratic Constraints: Nonstrongly Convex Problems

Tverberg-type theorems for matroids: A counterexample and a proof

A performance spectrum for parallel computational frameworks that solve PDEs

Reaction-Diffusion models: From Particle Systems to SDE’s

Inferring and Executing Programs for Visual Reasoning

4d isip: 4d implicit surface interest point detection

Near Optimal Parallel Algorithms for Dynamic DFS in Undirected Graphs

Shuffles of trees

Time-delayed feedback control of coherence resonance chimeras

Discussion on ‘Sparse graphs using exchangeable random measures’ by F. Caron and E. B. Fox

Linear Quadratic Optimal Control Problems with Fixed Terminal States and Integral Quadratic Constraints

Discussion on ‘Random-projection ensemble classification’ by T. Cannings and R. Samworth

Hybrid PDE solver for data-driven problems and modern branching

Mind the Gap: A Well Log Data Analysis

The Complexity of Routing with Few Collisions

On sufficient conditions for rainbow cycles in edge-colored graphs

Context-aware stacked convolutional neural networks for classification of breast carcinomas in whole-slide histopathology images

Fourier-Correlation Imaging

Benchmark Graphs for Practical Graph Isomorphism

The Perimeter of Proper Polycubes

Log-Lindley generated family of distributions

Global-Local View: Scalable Consistency for Concurrent Data Types

Loose Hamiltonian cycles forced by large $(k-2)$-degree – sharp version

Irreducibility of Random Polynomials

Modelling and Traffic Signal Control of Heterogeneous Traffic Systems

Energy-Efficient Joint Unicast and Multicast Beamforming with Multi-Antenna User Terminals

Optimal stopping and a non-zero-sum Dynkin game in discrete time with risk measures induced by BSDEs

Some discussions on the Read Paper ‘Beyond subjective and objective in statistics’ by A. Gelman and C. Hennig

Optimal Residential Demand Response Considering the Operational Constraints of Unbalanced Distribution Networks

Efficient and Scalable View Generation from a Single Image using Fully Convolutional Networks

Exactly Solvable Random Graph Ensemble with Extensively Many Short Cycles

Uplink Analysis of Large MU-MIMO Systems With Space-Constrained Arrays in Ricean Fading

Comments on the proof of adaptive submodular function minimization

Flexible and Creative Chinese Poetry Generation Using Neural Memory

Introductory Lectures on Stochastic Population Systems

Phase diagram of a generalized off-diagonal Aubry-André model with p-wave pairing

The free energy in the Derrida–Retaux hierarchical renormalisation model

A note on panchromatic colorings

Model-based Estimation of Computed Tomography Images

Hybrid Isolation Forest – Application to Intrusion Detection

Conditionally Poissonian random digraphs

Convergence of inertial dynamics and proximal algorithms governed by maximal monotone operators

Enumerating the symplectic Dellac configurations

Smart Routing of Electric Vehicles for Load Balancing in Smart Grids

Experimental Analysis of a Novel Stratified Sampling Algorithm for Hypercubes

Stable and robust $\ell_p$-constrained compressive sensing recovery via robust width property

A Local Prime Factor Decomposition Algorithm for Strong Product Graphs

Automatic Brain Tumor Detection and Segmentation Using U-Net Based Fully Convolutional Networks

Context Attentive Bandits: Contextual Bandit with Restricted Context

Context-Aware Hierarchical Online Learning for Performance Maximization in Mobile Crowdsourcing

Local algorithms for the prime factorization of strong product graphs

Recovering sampling distributions od statistics of finite populations via resampling: a predictive approach

Nonparametric inference for continuous-time event counting and link-based dynamic network models

Quasi-Reliable Estimates of Effective Sample Size

Tight Bounds for Asynchronous Collaborative Grid Exploration

Asymptotic bounds for the sizes of constant dimension codes and an improved lower bound

Lindeberg’s method for moderate deviations and random summation

On the linear independence of shifted powers

Coded Caching with Partial Adaptive Matching

An Exponential Family for Bayesian Process Tomography

Predicting the Driver’s Focus of Attention: the DR(eye)VE Project

Explicit polynomial sequences with maximal spaces of partial derivatives and a question of K. Mulmuley

Coded convolution for parallel and distributed computing within a deadline

Constant Space and Non-Constant Time in Distributed Computing