A note on rank constrained solutions to linear matrix equations

This preliminary note presents a heuristic for determining rank constrained solutions to linear matrix equations (LME). The method proposed here is based on minimizing a non-convex quadratic functional, which will hence-forth be termed as the \textit{Low-Rank-Functional} (LRF). Although this method lacks a formal proof/comprehensive analysis, for example in terms of a probabilistic guarantee for converging to a solution, the proposed idea is intuitive and has been seen to perform well in simulations. To that end, many numerical examples are provided to corroborate the idea.

Deep neural network marketplace recommenders in online experiments

Recommendations are broadly used in marketplaces to match users with items relevant to their interests and needs. To understand user intent and tailor recommendations to their needs, we use deep learning to explore various heterogeneous data available in marketplaces. This paper focuses on the challenge of measuring recommender performance and summarizes the online experiment results with several promising types of deep neural network recommenders – hybrid item representation models combining features from user engagement and content, sequence-based models, and multi-armed bandit models that optimize user engagement by re-ranking proposals from multiple submodels. The recommenders are currently running in production at the leading Norwegian marketplace FINN.no and serves over one million visitors everyday.

Five lessons from building a deep neural network recommender

Recommendation algorithms are widely adopted in marketplaces to help users find the items they are looking for. The sparsity of the items by user matrix and the cold-start issue in marketplaces pose challenges for the off-the-shelf matrix factorization based recommender systems. To understand user intent and tailor recommendations to their needs, we use deep learning to explore various heterogeneous data available in marketplaces. This paper summarizes five lessons we learned from experimenting with state-of-the-art deep learning recommenders at the leading Norwegian marketplace \textit{FINN.no}. We design a hybrid recommender system that takes the user-generated contents of a marketplace (including text, images and meta attributes) and combines them with user behavior data such as page views and messages to provide recommendations for marketplace items. Among various tactics we experimented with, the following five show the best impact: staged training instead of end-to-end training, leveraging rich user behaviors beyond page views, using user behaviors as noisy labels to train embeddings, using transfer learning to solve the unbalanced data problem, and using attention mechanisms in the hybrid model. This system is currently running with around 20\% click-through-rate in production at \textit{FINN.no} and serves over one million visitors everyday.

Variational Bayesian Inference for Robust Streaming Tensor Factorization and Completion

Streaming tensor factorization is a powerful tool for processing high-volume and multi-way temporal data in Internet networks, recommender systems and image/video data analysis. Existing streaming tensor factorization algorithms rely on least-squares data fitting and they do not possess a mechanism for tensor rank determination. This leaves them susceptible to outliers and vulnerable to over-fitting. This paper presents a Bayesian robust streaming tensor factorization model to identify sparse outliers, automatically determine the underlying tensor rank and accurately fit low-rank structure. We implement our model in Matlab and compare it with existing algorithms on tensor datasets generated from dynamic MRI and Internet traffic.

Large Scale Learning with Krein Kernels

We extend the Nystr\’om method for low-rank approximation of positive definite Mercer kernels to approximation of indefinite kernel matrices. Our result is the first derivation of the approach that does not require the positive definiteness of the kernel function. Building on this result, we then devise highly scalable methods for learning in reproducing kernel Kre\u{\i}n spaces. The main motivation for our work comes from problems with structured representations (e.g., graphs, strings, time-series), where it is relatively easy to devise a pairwise (dis)similarity function based on intuition/knowledge of a domain expert. Such pairwise functions are typically not positive definite and it is often well beyond the expertise of practitioners to verify this condition. The proposed large scale approaches for learning in reproducing kernel Kre\u{\i}n spaces provide principled and theoretically well-founded means to tackle this class of problems. The effectiveness of the approaches is evaluated empirically using kernels defined on structured and vectorial data representations.

Differentially Private Bayesian Inference for Exponential Families

The study of private inference has been sparked by growing concern regarding the analysis of data when it stems from sensitive sources. We present the first method for private Bayesian inference in exponential families that properly accounts for noise introduced by the privacy mechanism. It is efficient because it works only with sufficient statistics and not individual data. Unlike other methods, it gives properly calibrated posterior beliefs in the non-asymptotic data regime.

Challenges of Context and Time in Reinforcement Learning: Introducing Space Fortress as a Benchmark

Research in deep reinforcement learning (RL) has coalesced around improving performance on benchmarks like the Arcade Learning Environment. However, these benchmarks conspicuously miss important characteristics like abrupt context-dependent shifts in strategy and temporal sensitivity that are often present in real-world domains. As a result, RL research has not focused on these challenges, resulting in algorithms which do not understand critical changes in context, and have little notion of real world time. To tackle this issue, this paper introduces the game of Space Fortress as a RL benchmark which incorporates these characteristics. We show that existing state-of-the-art RL algorithms are unable to learn to play the Space Fortress game. We then confirm that this poor performance is due to the RL algorithms’ context insensitivity and reward sparsity. We also identify independent axes along which to vary context and temporal sensitivity, allowing Space Fortress to be used as a testbed for understanding both characteristics in combination and also in isolation. We release Space Fortress as an open-source Gym environment.

ProdSumNet: reducing model parameters in deep neural networks via product-of-sums matrix decompositions

We consider a general framework for reducing the number of trainable model parameters in deep learning networks by decomposing linear operators as a product of sums of simpler linear operators. Recently proposed deep learning architectures such as CNN, KFC, Dilated CNN, etc. are all subsumed in this framework and we illustrate other types of neural network architectures within this framework. We show that good accuracy on MNIST and Fashion MNIST can be obtained using a relatively small number of trainable parameters. In addition, since implementation of the convolutional layer is resource-heavy, we consider an approach in the transform domain that obviates the need for convolutional layers. One of the advantages of this general framework over prior approaches is that the number of trainable parameters is not fixed and can be varied arbitrarily. In particular, we illustrate the tradeoff of varying the number of trainable variables and the corresponding error rate. As an example, by using this decomposition on a reference CNN architecture for MNIST with over 3×10^6 trainable parameters, we are able to obtain an accuracy of 98.44% using only 3554 trainable parameters.

Dynamic Hierarchical Empirical Bayes: A Predictive Model Applied to Online Advertising

Predicting keywords performance, such as number of impressions, click-through rate (CTR), conversion rate (CVR), revenue per click (RPC), and cost per click (CPC), is critical for sponsored search in the online advertising industry. An interesting phenomenon is that, despite the size of the overall data, the data are very sparse at the individual unit level. To overcome the sparsity and leverage hierarchical information across the data structure, we propose a Dynamic Hierarchical Empirical Bayesian (DHEB) model that dynamically determines the hierarchy through a data-driven process and provides shrinkage-based estimations. Our method is also equipped with an efficient empirical approach to derive inferences through the hierarchy. We evaluate the proposed method in both simulated and real-world datasets and compare to several competitive models. The results favor the proposed method among all comparisons in terms of both accuracy and efficiency. In the end, we design a two-phase system to serve prediction in real time.

2PFPCE: Two-Phase Filter Pruning Based on Conditional Entropy

Deep Convolutional Neural Networks~(CNNs) offer remarkable performance of classifications and regressions in many high-dimensional problems and have been widely utilized in real-word cognitive applications. However, high computational cost of CNNs greatly hinder their deployment in resource-constrained applications, real-time systems and edge computing platforms. To overcome this challenge, we propose a novel filter-pruning framework, two-phase filter pruning based on conditional entropy, namely \textit{2PFPCE}, to compress the CNN models and reduce the inference time with marginal performance degradation. In our proposed method, we formulate filter pruning process as an optimization problem and propose a novel filter selection criteria measured by conditional entropy. Based on the assumption that the representation of neurons shall be evenly distributed, we also develop a maximum-entropy filter freeze technique that can reduce over fitting. Two filter pruning strategies — global and layer-wise strategies, are compared. Our experiment result shows that combining these two strategies can achieve a higher neural network compression ratio than applying only one of them under the same accuracy drop threshold. Two-phase pruning, that is, combining both global and layer-wise strategies, achieves 10 X FLOPs reduction and 46% inference time reduction on VGG-16, with 2% accuracy drop.

Character-Aware Decoder for Neural Machine Translation

Standard neural machine translation (NMT) systems operate primarily on words, ignoring lower-level patterns of morphology. We present a character-aware decoder for NMT that can simultaneously work with both word-level and subword-level sequences which is designed to capture such patterns. We achieve character-awareness by augmenting both the softmax and embedding layers of an attention-based encoder-decoder network with convolutional neural networks that operate on spelling of a word (or subword). While character-aware embeddings have been successfully used in the source-side, we find that mixing character-aware embeddings with standard embeddings is crucial in the target-side. Furthermore, we show that a simple approximate softmax layer can be used for large target-side vocabularies which would otherwise require prohibitively large memory. We experiment on the TED multi-target dataset, translating English into 14 typologically diverse languages. We find that in this low-resource setting, the character-aware decoder provides consistent improvements over word-level and subword-level counterparts with BLEU score gains of up to +3.37.

Sequential Model Selection Method for Nonparametric Autoregression

In this paper for the first time the nonparametric autoregression estimation problem for the quadratic risks is considered. To this end we develop a new adaptive sequential model selection method based on the efficient sequential kernel estimators proposed by Arkoun and Pergamenshchikov (2016). Moreover, we develop a new analytical tool for general regression models to obtain the non asymptotic sharp oracle inequalities for both usual quadratic and robust quadratic risks. Then, we show that the constructed sequential model selection procedure is optimal in the sense of oracle inequalities.

Learning Optimal Fair Policies

We consider the problem of learning optimal policies from observational data in a way that satisfies certain fairness criteria. The issue of fairness arises where some covariates used in decision making are sensitive features, or are correlated with sensitive features. (Nabi and Shpitser 2018) formalized fairness in the context of regression problems as constraining the causal effects of sensitive features along certain disallowed causal pathways. The existence of these causal effects may be called retrospective unfairness in the sense of already being present in the data before analysis begins, and may be due to discriminatory practices or the biased way in which variables are defined or recorded. In the context of learning policies, what we call prospective bias, i.e., the inappropriate dependence of learned policies on sensitive features, is also possible. In this paper, we use methods from causal and semiparametric inference to learn optimal policies in a way that addresses both retrospective bias in the data, and prospective bias due to the policy. In addition, our methods appropriately address statistical bias due to model misspecification and confounding bias, which are important in the estimation of path-specific causal effects from observational data. We apply our methods to both synthetic data and real criminal justice data.

Multi-Source Domain Adaptation with Mixture of Experts

We propose a mixture-of-experts approach for unsupervised domain adaptation from multiple sources. The key idea is to explicitly capture the relationship between a target example and different source domains. This relationship, expressed by a point-to-set metric, determines how to combine predictors trained on various domains. The metric is learned in an unsupervised fashion using meta-training. Experimental results on sentiment analysis and part-of-speech tagging demonstrate that our approach consistently outperforms multiple baselines and can robustly handle negative transfer.

BubGAN: Bubble Generative Adversarial Networks for Synthesizing Realistic Bubbly Flow Images

Bubble segmentation and size detection algorithms have been developed in recent years for their high efficiency and accuracy in measuring bubbly two-phase flows. In this work, we proposed an architecture called bubble generative adversarial networks (BubGAN) for the generation of realistic synthetic images which could be further used as training or benchmarking data for the development of advanced image processing algorithms. The BubGAN is trained initially on a labeled bubble dataset consisting of ten thousand images. By learning the distribution of these bubbles, the BubGAN can generate more realistic bubbles compared to the conventional models used in the literature. The trained BubGAN is conditioned on bubble feature parameters and has full control of bubble properties in terms of aspect ratio, rotation angle, circularity and edge ratio. A million bubble dataset is pre-generated using the trained BubGAN. One can then assemble realistic bubbly flow images using this dataset and associated image processing tool. These images contain detailed bubble information, therefore do not require additional manual labeling. This is more useful compared with the conventional GAN which generates images without labeling information. The tool could be used to provide benchmarking and training data for existing image processing algorithms and to guide the future development of bubble detecting algorithms.

Approximation algorithms for stochastic clustering

We consider stochastic settings for clustering, and develop provably-good (approximation) algorithms for a number of these notions. These algorithms allow one to obtain better approximation ratios compared to the usual deterministic clustering setting. Additionally, they offer a number of advantages including providing fairer clustering and clustering which has better long-term behavior for each user. In particular, they ensure that *every user* is guaranteed to get good service (on average). We also complement some of these with impossibility results.

Change-Point Testing and Estimation for Risk Measures in Time Series

We investigate methods of change-point testing and confidence interval construction for nonparametric estimators of expected shortfall and related risk measures in weakly dependent time series. A key aspect of our work is the ability to detect general multiple structural changes in the tails of time series marginal distributions. Unlike extant approaches for detecting tail structural changes using quantities such as tail index, our approach does not require parametric modeling of the tail and detects more general changes in the tail. Additionally, our methods are based on the recently introduced self-normalization technique for time series, allowing for statistical analysis without the issues of consistent standard error estimation. The theoretical foundation for our methods are functional central limit theorems, which we develop under weak assumptions. An empirical study of S&P 500 returns and US 30-Year Treasury bonds illustrates the practical use of our methods in detecting and quantifying market instability via the tails of financial time series during times of financial crisis.

Information-Theoretic Active Learning for Content-Based Image Retrieval

We propose Information-Theoretic Active Learning (ITAL), a novel batch-mode active learning method for binary classification, and apply it for acquiring meaningful user feedback in the context of content-based image retrieval. Instead of combining different heuristics such as uncertainty, diversity, or density, our method is based on maximizing the mutual information between the predicted relevance of the images and the expected user feedback regarding the selected batch. We propose suitable approximations to this computationally demanding problem and also integrate an explicit model of user behavior that accounts for possible incorrect labels and unnameable instances. Furthermore, our approach does not only take the structure of the data but also the expected model output change caused by the user feedback into account. In contrast to other methods, ITAL turns out to be highly flexible and provides state-of-the-art performance across various datasets, such as MIRFLICKR and ImageNet.

Hierarchical Characteristic Set Merging for Optimizing SPARQL Queries in Heterogeneous RDF

Characteristic sets (CS) organize RDF triples based on the set of properties characterizing their subject nodes. This concept is recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most CS-based approaches yield significant improvements in space and query performance, they fail to perform well in the presence of schema heterogeneity, i.e., when the number of CSs becomes very large, resulting in a highly partitioned data organization. In this paper, we address this problem by introducing a novel technique, for merging CSs based on their hierarchical structure. Our technique employs a lattice to capture the hierarchical relationships between CSs, identifies dense CSs and merges dense CSs with their ancestors, thus reducing the size of the CSs as well as the links between them. We implemented our algorithm on top of a relational backbone, where each merged CS is stored in a relational table, and we performed an extensive experimental study to evaluate the performance and impact of merging to the storage and querying of RDF datasets, indicating significant improvements.

FRESH: Fréchet Similarity with Hashing

Massive datasets of curves, such as time series and trajectories, are continuously generated by mobile and sensing devices. A relevant operation on curves is similarity search: given a dataset S of curves, construct a data structure that, for any query curve q, finds the curves in S similar to q. Similarity search is a computational demanding task, in particular when a robust distance function is used, such as the continuous Fr\’echet distance. In this paper, we propose FRESH, a novel approximate solution to find similar curves under the continuous Fr\’echet distance. FRESH leverages on a locality sensitive hashing scheme for detecting candidate near neighbors of the query curve, and a subsequent pruning step based on a pipeline of curve simplifications. By relaxing the requirement of exact and deterministic solutions, FRESH reaches high performance and outperforms the state-of-the-art approaches. The experiments indeed show that, with a recall larger than 80% and precision 100%, we have at least a factor 10 improvement in performance over a baseline given by the best solutions developed for the ACM SIGSPATIAL 2017 challenge on the Fr\’echet distance. Furthermore, the improvement peaks up to two orders of magnitude, and even more, by relaxing the precision.

Multi-Target Prediction: A Unifying View on Problems and Methods

Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research.

A proof that artificial neural networks overcome the curse of dimensionality in the numerical approximation of Black-Scholes partial differential equations

Artificial neural networks (ANNs) have very successfully been used in numerical simulations for a series of computational problems ranging from image classification/image recognition, speech recognition, time series analysis, game intelligence, and computational advertising to numerical approximations of partial differential equations (PDEs). Such numerical simulations suggest that ANNs have the capacity to very efficiently approximate high-dimensional functions and, especially, such numerical simulations indicate that ANNs seem to admit the fundamental power to overcome the curse of dimensionality when approximating the high-dimensional functions appearing in the above named computational problems. There are also a series of rigorous mathematical approximation results for ANNs in the scientific literature. Some of these mathematical results prove convergence without convergence rates and some of these mathematical results even rigorously establish convergence rates but there are only a few special cases where mathematical results can rigorously explain the empirical success of ANNs when approximating high-dimensional functions. The key contribution of this article is to disclose that ANNs can efficiently approximate high-dimensional functions in the case of numerical approximations of Black-Scholes PDEs. More precisely, this work reveals that the number of required parameters of an ANN to approximate the solution of the Black-Scholes PDE grows at most polynomially in both the reciprocal of the prescribed approximation accuracy \varepsilon > 0 and the PDE dimension d \in \mathbb{N} and we thereby prove, for the first time, that ANNs do indeed overcome the curse of dimensionality in the numerical approximation of Black-Scholes PDEs.

On2Vec: Embedding-based Relation Prediction for Ontology Population

Populating ontology graphs represents a long-standing problem for the Semantic Web community. Recent advances in translation-based graph embedding methods for populating instance-level knowledge graphs lead to promising new approaching for the ontology population problem. However, unlike instance-level graphs, the majority of relation facts in ontology graphs come with comprehensive semantic relations, which often include the properties of transitivity and symmetry, as well as hierarchical relations. These comprehensive relations are often too complex for existing graph embedding methods, and direct application of such methods is not feasible. Hence, we propose On2Vec, a novel translation-based graph embedding method for ontology population. On2Vec integrates two model components that effectively characterize comprehensive relation facts in ontology graphs. The first is the Component-specific Model that encodes concepts and relations into low-dimensional embedding spaces without a loss of relational properties; the second is the Hierarchy Model that performs focused learning of hierarchical relation facts. Experiments on several well-known ontology graphs demonstrate the promising capabilities of On2Vec in predicting and verifying new relation facts. These promising results also make possible significant improvements in related methods.

Deep Feature Learning of Multi-Network Topology for Node Classification

Networks are ubiquitous structure that describes complex relationships between different entities in the real world. As a critical component of prediction task over nodes in networks, learning the feature representation of nodes has become one of the most active areas recently. Network Embedding, aiming to learn non-linear and low-dimensional feature representation based on network topology, has been proved to be helpful on tasks of network analysis, especially node classification. For many real-world systems, multiple types of relations are naturally represented by multiple networks. However, existing network embedding methods mainly focus on single network embedding and neglect the information shared among different networks. In this paper, we propose a novel multiple network embedding method based on semisupervised autoencoder, named DeepMNE, which captures complex topological structures of multi-networks and takes the correlation among multi-networks into account. We evaluate DeepMNE on the task of node classification with two real-world datasets. The experimental results demonstrate the superior performance of our method over four state-of-the-art algorithms.

Deep Recurrent Survival Analysis

Survival analysis is a hotspot in statistical research for modeling time-to-event information with data censorship handling, which has been widely used in many applications such as clinical research, information system and other fields with survivorship bias. Many works have been proposed for survival analysis ranging from traditional statistic methods to machine learning models. However, the existing methodologies either utilize counting-based statistics on the segmented data, or have a pre-assumption on the event probability distribution w.r.t. time. Moreover, few works consider sequential patterns within the feature space. In this paper, we propose a Deep Recurrent Survival Analysis model which combines deep learning for conditional probability prediction at fine-grained level of the data, and survival analysis for tackling the censorship. By capturing the time dependency through modeling the conditional probability of the event for each sample, our method predicts the likelihood of the true event occurrence and estimates the survival rate over time, i.e., the probability of the non-occurrence of the event, for the censored data. Meanwhile, without assuming any specific form of the event probability distribution, our model shows great advantages over the previous works on fitting various sophisticated data distributions. In the experiments on the three real-world tasks from different fields, our model significantly outperforms the state-of-the-art solutions under various metrics.

A Primer on Causality in Data Science

Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure (randomized or not) on an outcome interest. Even studies that are seemingly non-causal (e.g. prediction or prevalence estimation) have causal elements, such as differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in the data. In this work, we review the ‘Causal Roadmap’, a formal framework to augment our traditional statistical analyses in an effort to answer the causal questions driving our research. Specific steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to translate the causal parameter into a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. Throughout we focus on the effect of an exposure occurring at a single time point and provide extensions to more advanced settings.

Term-Mouse-Fixations as an Additional Indicator for Topical User Interests in Domain-Specific Search

Models in Interactive Information Retrieval (IIR) are grounded very much on the user’s task in order to give system support based on different task types and topics. However, the automatic recognition of user interests from log data in search systems is not trivial. Search queries entered by users a surely one such source. However, queries may be short, or users are only browsing. In this paper, we propose a method of term-mouse-fixations which takes the fixations on terms users are hovering over with the mouse into consideration to estimate topical user interests. We analyzed 22,259 search sessions of a domain-specific digital library over a period of about four months. We compared these mouse fixations to user-entered search terms and to titles and keywords from documents the user showed an interest in. These terms were found in 87.12% of all analyzed sessions; in this subset of sessions, per session on average 11.46 term-mouse-fixations from queries and viewed documents were found. These terms were fixated significantly longer with about 7 seconds than other terms with about 4.4 seconds. This means, term-mouse-fixations provide indicators for topical user interests and it is possible to extract them based on fixation time.

Multitask and Multilingual Modelling for Lexical Analysis

In Natural Language Processing (NLP), one traditionally considers a single task (e.g. part-of-speech tagging) for a single language (e.g. English) at a time. However, recent work has shown that it can be beneficial to take advantage of relatedness between tasks, as well as between languages. In this work I examine the concept of relatedness and explore how it can be utilised to build NLP models that require less manually annotated data. A large selection of NLP tasks is investigated for a substantial language sample comprising 60 languages. The results show potential for joint multitask and multilingual modelling, and hints at linguistic insights which can be gained from such models.

Convolutional Neural Network: Text Classification Model for Open Domain Question Answering System

Recently machine learning is being applied to almost every data domain one of which is Question Answering Systems (QAS). A typical Question Answering System is fairly an information retrieval system, which matches documents or text and retrieve the most accurate one. The idea of open domain question answering system put forth, involves convolutional neural network text classifiers. The Classification model presented in this paper is multi-class text classifier. The neural network classifier can be trained on large dataset. We report series of experiments conducted on Convolution Neural Network (CNN) by training it on two different datasets. Neural network model is trained on top of word embedding. Softmax layer is applied to calculate loss and mapping of semantically related words. Gathered results can help justify the fact that proposed hypothetical QAS is feasible. We further propose a method to integrate Convolutional Neural Network Classifier to an open domain question answering system. The idea of Open domain will be further explained, but the generality of it indicates to the system of domain specific trainable models, thus making it an open domain.

BiasedWalk: Biased Sampling for Representation Learning on Graphs

Network embedding algorithms are able to learn latent feature representations of nodes, transforming networks into lower dimensional vector representations. Typical key applications, which have effectively been addressed using network embeddings, include link prediction, multilabel classification and community detection. In this paper, we propose BiasedWalk, a scalable, unsupervised feature learning algorithm that is based on biased random walks to sample context information about each node in the network. Our random-walk based sampling can behave as Breath-First-Search (BFS) and Depth-First-Search (DFS) samplings with the goal to capture homophily and role equivalence between the nodes in the network. We have performed a detailed experimental evaluation comparing the performance of the proposed algorithm against various baseline methods, on several datasets and learning tasks. The experiment results show that the proposed method outperforms the baseline ones in most of the tasks and datasets.

Sparse Kernel PCA for Outlier Detection

In this paper, we propose a new method to perform Sparse Kernel Principal Component Analysis (SKPCA) and also mathematically analyze the validity of SKPCA. We formulate SKPCA as a constrained optimization problem with elastic net regularization (Hastie et al.) in kernel feature space and solve it. We consider outlier detection (where KPCA is employed) as an application for SKPCA, using the RBF kernel. We test it on 5 real-world datasets and show that by using just 4% (or even less) of the principal components (PCs), where each PC has on average less than 12% non-zero elements in the worst case among all 5 datasets, we are able to nearly match and in 3 datasets even outperform KPCA. We also compare the performance of our method with a recently proposed method for SKPCA by Wang et al. and show that our method performs better in terms of both accuracy and sparsity. We also provide a novel probabilistic proof to justify the existence of sparse solutions for KPCA using the RBF kernel. To the best of our knowledge, this is the first attempt at theoretically analyzing the validity of SKPCA.

Fairness Through Causal Awareness: Learning Latent-Variable Models for Biased Data

How do we learn from biased data? Historical datasets often reflect historical prejudices; sensitive or protected attributes may affect the observed treatments and outcomes. Classification algorithms tasked with predicting outcomes accurately from these datasets tend to replicate these biases. We advocate a causal modeling approach to learning from biased data and reframe fair classification as an intervention problem. We propose a causal model in which the sensitive attribute confounds both the treatment and the outcome. Building on prior work in deep learning and generative modeling, we describe how to learn the parameters of this causal model from observational data alone, even in the presence of unobserved confounders. We show experimentally that fairness-aware causal modeling provides better estimates of the causal effects between the sensitive attribute, the treatment, and the outcome. We further present evidence that estimating these causal effects can help us to learn policies which are both more accurate and fair, when presented with a historically biased dataset.

Differentially Private Continual Release of Graph Statistics

Motivated by understanding the dynamics of sensitive social networks over time, we consider the problem of continual release of statistics in a network that arrives online, while preserving privacy of its participants. For our privacy notion, we use differential privacy — the gold standard in privacy for statistical data analysis. The main challenge in this problem is maintaining a good privacy-utility tradeoff; naive solutions that compose across time, as well as solutions suited to tabular data either lead to poor utility or do not directly apply. In this work, we show that if there is a publicly known upper bound on the maximum degree of any node in the entire network sequence, then we can release many common graph statistics such as degree distributions and subgraph counts continually with a better privacy-accuracy tradeoff. Code available at https://…/graphprivacycode

HyperGCN: Hypergraph Convolutional Networks for Semi-Supervised Classification

Graph-based semi-supervised learning (SSL) is an important learning problem where the goal is to assign labels to initially unlabeled nodes in a graph. Graph Convolutional Networks (GCNs) have recently been shown to be effective for graph-based SSL problems. GCNs inherently assume existence of pairwise relationships in the graph-structured data. However, in many real-world problems, relationships go beyond pairwise connections and hence are more complex. Hypergraphs provide a natural modeling tool to capture such complex relationships. In this work, we explore the use of GCNs for hypergraph-based SSL. In particular, we propose HyperGCN, an SSL method which uses a layer-wise propagation rule for convolutional neural networks operating directly on hypergraphs. To the best of our knowledge, this is the first principled adaptation of GCNs to hypergraphs. HyperGCN is able to encode both the hypergraph structure and hypernode features in an effective manner. Through detailed experimentation, we demonstrate HyperGCN’s effectiveness at hypergraph-based SSL.

Accelerating Deep Neural Networks with Spatial Bottleneck Modules

This paper presents an efficient module named spatial bottleneck for accelerating the convolutional layers in deep neural networks. The core idea is to decompose convolution into two stages, which first reduce the spatial resolution of the feature map, and then restore it to the desired size. This operation decreases the sampling density in the spatial domain, which is independent yet complementary to network acceleration approaches in the channel domain. Using different sampling rates, we can tradeoff between recognition accuracy and model complexity. As a basic building block, spatial bottleneck can be used to replace any single convolutional layer, or the combination of two convolutional layers. We empirically verify the effectiveness of spatial bottleneck by applying it to the deep residual networks. Spatial bottleneck achieves 2x and 1.4x speedup on the regular and channel-bottlenecked residual blocks, respectively, with the accuracies retained in recognizing low-resolution images, and even improved in recognizing high-resolution images.

A Practical Design Optimization Method for Multicopter Propulsion Systems
Multi-Adversarial Domain Adaptation
Hierarchical Selective Recruitment in Linear-Threshold Brain Networks – Part II: Inter-Layer Dynamics and Top-Down Recruitment
Langevin equations in the small-mass limit: Higher order approximations
A note on concentration inequality for vector-valued martingales with weak exponential-type tails
On the Importance of Visual Context for Data Augmentation in Scene Understanding
CoverBLIP: scalable iterative matched filtering for MR Fingerprint recovery
Efficient Loop Detection in Forwarding Networks and Representing Atoms in a Field of Sets
Stochastically Controlled Stochastic Gradient for the Convex and Non-convex Composition problem
Reflected backward stochastic differentialequation with jumps and viscosity solution of second order integro-differential equation without monotonicity condition: case with the measure of Levy infinite
Balanced multi-shot EPI for accelerated Cartesian MRF: An alternative to spiral MRF
Suboptimal Control of Dividends under Exponential Utility
A self-consistent Hartree-Fock approach to Many-Body Localization
GANs beyond divergence minimization
Upcycle Your OCR: Reusing OCRs for Post-OCR Text Correction in Romanised Sanskrit
From FATS to feets: Further improvements to an astronomical feature extraction tool based on machine learning
Object Hallucination in Image Captioning
Gaussian statistics as an emergent symmetry far from equilibrium
Escaping Saddle Points in Constrained Optimization
Deep Learning for Generic Object Detection: A Survey
Turning a Blind Eye: Explicit Removal of Biases and Variation from Deep Neural Network Embeddings
Logical Rule Induction and Theory Learning Using Neural Theorem Proving
Bayesian Nonparametric Spectral Estimation
A Queueing Model for Sleep as a Vacation
Two short pieces around the Wigner problem
Assessing Gender Bias in Machine Translation — A Case Study with Google Translate
True Gaussian shaping for high count rate measurements of pulse amplitudes
An improved sum-product bound for quaternions
Well-posedness of distribution dependent SDEs with singular drifts
Time-dependent balls and bins model with positive feedback
Content-based Propagation of User Markings for Interactive Segmentation of Patterned Images
Adaptive Strategic Cyber Defense for Advanced Persistent Threats in Critical Infrastructure Networks
Obstacle Detection Quality as a Problem-Oriented Approach to Stereo Vision Algorithms Estimation in Road Situation Analysis
Deep Neural Net with Attention for Multi-channel Multi-touch Attribution
Distributed and Optimal Resilient Planning of Large-Scale Interdependent Critical Infrastructures
Automated Game Design via Conceptual Expansion
Applying Deep Learning to Derivatives Valuation
A Bandit Approach to Multiple Testing with False Discovery Control
82 Treebanks, 34 Models: Universal Dependency Parsing with Multi-Treebank Models
Hypergames and Cyber-Physical Security for Control Systems
Fractional Calculus of Variations: a novel way to look at it
Adversarial Feature-Mapping for Speech Enhancement
Cycle-Consistent Speech Enhancement
Quantum algorithms and approximating polynomials for composed functions with shared inputs
Adversarial Domain Adaptation for Duplicate Question Detection
Representing Images in 200 Bytes: Compression via Triangulation
The Force of Proof by Which Any Argument Prevails
Logistic Regression Augmented Community Detection for Network Data with Application in Identifying Autism-Related Gene Pathways
Expanding tidy data principles to facilitate missing data exploration, visualization and assessment of imputations
Cloud-based Quadratic Optimization with Partially Homomorphic Encryption
Computation of Total Kidney Volume from CT images in Autosomal Dominant Polycystic Kidney Disease using Multi-Task 3D Convolutional Neural Networks
edge2vec: Learning Node Representation Using Edge Semantics
Learning Embeddings of Directed Networks with Text-Associated Nodes—with Applications in Software Package Dependency Networks
Optimal Relaying Beamforming in Multiple Access Broadcast Channel (MABC) Bidirectional Cognitive Radio Networks in Presence of Interferers
Local Music Event Recommendation with Long Tail Artists
Cell-aware Stacked LSTMs for Modeling Sentences
Nash Equilibrium in Smoothed Polynomial Time for Network Coordination Games
Evolutionary Centrality and Maximal Cliques in Mobile Social Networks
Dynamic Compositionality in Recursive Neural Networks with Structure-aware Tag Representations
Tensor Ring Decomposition with Rank Minimization on Latent Space: An Efficient Approach for Tensor Completion
The entropy function of an invariant measure
A Block Coordinate Ascent Algorithm for Mean-Variance Optimization
Lyapunov exponent and variance in the CLT for products of random matrices related to random Fibonacci sequences
Neurons Merging Layer: Towards Progressive Redundancy Reduction for Deep Supervised Hashing
Data Augmentation for Spoken Language Understanding via Joint Variational Generation
Unsupervised Cross-lingual Word Embedding by Multilingual Neural Language Models
Splittings and symbolic powers of square-free monomial Ideals
A weakly convergent fully inexact Douglas-Rachford method with relative error tolerance
Fast greedy algorithms for dictionary selection with generalized sparsity constraints
QoS aware Automatic Web Service Composition with Multiple objectives
Scaling Video Analytics Systems to Large Camera Deployments
Collision Free Navigation of a Multi-Robot Team for Intruder Interception
ADM for grid CRF loss in CNN segmentation
Non-parametric uncertainties in the dark matter velocity distribution
Glassy properties of Anderson localization: pinning, avalanches and chaos
Dynamical coupling between Ising and FK percolation
Predicting Lung Nodule Malignancies by Combining Deep Convolutional Neural Network and Handcrafted Features
An Anderson-Chebyshev Mixing Method for Nonlinear Optimization
Performance Degradation Assessment for Electrical Machines Based on SOM and Hybrid DHMM
Exploiting local and global performance of candidate systems for aggregation of summarization techniques
Degradation Detection Method for Railway Point Machines
New methods for calculating the degree distance and the Gutman index
Asymptotic efficiency for covariance estimation under noise and asynchronicity
Self-optimal Piecewise Linearization Based Network Power Flow Constraints in Electrical Distribution System Optimization
Infinite Curriculum Learning for Efficiently Detecting Gastric Ulcers in WCE Images
Metric dimension reduction: A snapshot of the Ribe program
Monte Carlo Tree Search with Scalable Simulation Periods for Continuously Running Tasks
A simple probabilistic deep generative model for learning generalizable disentangled representations from grouped data
Mixtures of Skewed Matrix Variate Bilinear Factor Analyzers
Improving On-policy Learning with Statistical Reward Accumulation
Relaxation schemes for mathematical programs with switching constraints
Hook, line and sinker: a bijective proof of the skew shifted hook-length formula
Latin Cubes with Forbidden Entries
Improving Neural Question Generation using Answer Separation
Detecting Potential Local Adversarial Examples for Human-Interpretable Defense
The joint spectrum
Performance Analysis of MRC under Spatially Correlated Interference using Mixture-based Method
Where Do All These Search Terms Come From? – Two Experiments in Domain-Specific Search
Data Requirements for Evaluation of Personalization of Information Retrieval – A Position Paper
Challenges for Measuring Usefulness of Interactive IR Systems with Log-based Approaches
Proceedings Ninth International Symposium on Games, Automata, Logics, and Formal Verification
Transient Chaos Generates Small Chimeras
Design Automation for Adiabatic Circuits
Studying the inertias of LCM matrices and revisiting the Bourque-Ligh conjecture
Heat kernel estimates of fractional Schrödinger operators with negative hardy potential
Arithmetic Progressions with Restricted Digits
Joint species distribution modeling with additive multivariate Gaussian process priors and heteregenous data
Dealing with the Dimensionality Curse in Dynamic Pricing Competition: Using Frequent Repricing to Compensate Imperfect Market Anticipations
Top-k Overlapping Densest Subgraphs: Approximation and Complexity
On Underlay-Aware Self-Stabilizing Overlay Networks
A Largest Empty Hypersphere Metaheuristic for Robust Optimisation with Implementation Uncertainty
Online optimization in dynamic environments: a regret analysis for sparse problems
Optimizing deep video representation to match brain activity
HC-Net: Memory-based Incremental Dual-Network System for Continual learning
Alternating chimeras in networks of ephaptically coupled bursting neurons
Posterior consistency for $n$ in the binomial $(n,p)$ problem with both parameters unknown – with applications to quantitative nanoscopy
Metamorphic Relation Based Adversarial Attacks on Differentiable Neural Computer
Posterior Consistency in the Binomial $(n,p)$ Model with Unknown $n$ and $p$: A Numerical Study
Dirichlet process mixtures under affine transformations of the data
Euclidean matchings and minimality of hyperplane arrangements
Almost sure convergence on chaoses
Meteorologists and Students: A resource for language grounding of geographical descriptors
MixUp as Locally Linear Out-Of-Manifold Regularization
Multi-level hypothesis testing for populations of heterogeneous networks
Local Coloring and its Complexity
Random intersection graphs with communities
Streaming dictionary matching with mismatches
Scalable Monte Carlo inference for state-space models
Using Sparse Semantic Embeddings Learned from Multimodal Text and Image Data to Model Human Conceptual Knowledge
Growth on Two Limiting Essential Resources in a Self-Cycling Fermentor
A Deeper Look at 3D Shape Classifiers
Generalized weak rigidity: Theory, and local and global convergence of formations
Skin lesion classification with ensemble of squeeze-and-excitation networks and semi-supervised learning
The largest cognitive systems will be optoelectronic
Tackling the Qubit Mapping Problem for NISQ-Era Quantum Devices
The edge-statistics conjecture for $\ell\ll k^{6/5}$
Pebbling on Directed Graphs with Fixed Diameter
Self-Supervised Generation of Spatial Audio for 360 Video
Learning Invariances for Policy Generalization
Logographic Subword Model for Neural Machine Translation
VOS: a Method for Variational Oversampling of Imbalanced Data
Fast, High-Fidelity, Quantum Non-demolition Readout of a Superconducting Qubit Using a Transverse Coupling
Mobility-Aware Resource Allocation in VLC Networks Using T-Step Look-Ahead Policy
Simple coarse graining and sampling strategies for image recognition
Many body localization mediated by the presence of a central qudit
Liouville metric of star-scale invariant fields: tails and Weyl scaling