This preliminary note presents a heuristic for determining rank constrained solutions to linear matrix equations (LME). The method proposed here is based on minimizing a non-convex quadratic functional, which will hence-forth be termed as the \textit{Low-Rank-Functional} (LRF). Although this method lacks a formal proof/comprehensive analysis, for example in terms of a probabilistic guarantee for converging to a solution, the proposed idea is intuitive and has been seen to perform well in simulations. To that end, many numerical examples are provided to corroborate the idea.
Recommendations are broadly used in marketplaces to match users with items relevant to their interests and needs. To understand user intent and tailor recommendations to their needs, we use deep learning to explore various heterogeneous data available in marketplaces. This paper focuses on the challenge of measuring recommender performance and summarizes the online experiment results with several promising types of deep neural network recommenders – hybrid item representation models combining features from user engagement and content, sequence-based models, and multi-armed bandit models that optimize user engagement by re-ranking proposals from multiple submodels. The recommenders are currently running in production at the leading Norwegian marketplace FINN.no and serves over one million visitors everyday.
Recommendation algorithms are widely adopted in marketplaces to help users find the items they are looking for. The sparsity of the items by user matrix and the cold-start issue in marketplaces pose challenges for the off-the-shelf matrix factorization based recommender systems. To understand user intent and tailor recommendations to their needs, we use deep learning to explore various heterogeneous data available in marketplaces. This paper summarizes five lessons we learned from experimenting with state-of-the-art deep learning recommenders at the leading Norwegian marketplace \textit{FINN.no}. We design a hybrid recommender system that takes the user-generated contents of a marketplace (including text, images and meta attributes) and combines them with user behavior data such as page views and messages to provide recommendations for marketplace items. Among various tactics we experimented with, the following five show the best impact: staged training instead of end-to-end training, leveraging rich user behaviors beyond page views, using user behaviors as noisy labels to train embeddings, using transfer learning to solve the unbalanced data problem, and using attention mechanisms in the hybrid model. This system is currently running with around 20\% click-through-rate in production at \textit{FINN.no} and serves over one million visitors everyday.
Streaming tensor factorization is a powerful tool for processing high-volume and multi-way temporal data in Internet networks, recommender systems and image/video data analysis. Existing streaming tensor factorization algorithms rely on least-squares data fitting and they do not possess a mechanism for tensor rank determination. This leaves them susceptible to outliers and vulnerable to over-fitting. This paper presents a Bayesian robust streaming tensor factorization model to identify sparse outliers, automatically determine the underlying tensor rank and accurately fit low-rank structure. We implement our model in Matlab and compare it with existing algorithms on tensor datasets generated from dynamic MRI and Internet traffic.
We extend the Nystr\’om method for low-rank approximation of positive definite Mercer kernels to approximation of indefinite kernel matrices. Our result is the first derivation of the approach that does not require the positive definiteness of the kernel function. Building on this result, we then devise highly scalable methods for learning in reproducing kernel Kre\u{\i}n spaces. The main motivation for our work comes from problems with structured representations (e.g., graphs, strings, time-series), where it is relatively easy to devise a pairwise (dis)similarity function based on intuition/knowledge of a domain expert. Such pairwise functions are typically not positive definite and it is often well beyond the expertise of practitioners to verify this condition. The proposed large scale approaches for learning in reproducing kernel Kre\u{\i}n spaces provide principled and theoretically well-founded means to tackle this class of problems. The effectiveness of the approaches is evaluated empirically using kernels defined on structured and vectorial data representations.
The study of private inference has been sparked by growing concern regarding the analysis of data when it stems from sensitive sources. We present the first method for private Bayesian inference in exponential families that properly accounts for noise introduced by the privacy mechanism. It is efficient because it works only with sufficient statistics and not individual data. Unlike other methods, it gives properly calibrated posterior beliefs in the non-asymptotic data regime.
Research in deep reinforcement learning (RL) has coalesced around improving performance on benchmarks like the Arcade Learning Environment. However, these benchmarks conspicuously miss important characteristics like abrupt context-dependent shifts in strategy and temporal sensitivity that are often present in real-world domains. As a result, RL research has not focused on these challenges, resulting in algorithms which do not understand critical changes in context, and have little notion of real world time. To tackle this issue, this paper introduces the game of Space Fortress as a RL benchmark which incorporates these characteristics. We show that existing state-of-the-art RL algorithms are unable to learn to play the Space Fortress game. We then confirm that this poor performance is due to the RL algorithms’ context insensitivity and reward sparsity. We also identify independent axes along which to vary context and temporal sensitivity, allowing Space Fortress to be used as a testbed for understanding both characteristics in combination and also in isolation. We release Space Fortress as an open-source Gym environment.
We consider a general framework for reducing the number of trainable model parameters in deep learning networks by decomposing linear operators as a product of sums of simpler linear operators. Recently proposed deep learning architectures such as CNN, KFC, Dilated CNN, etc. are all subsumed in this framework and we illustrate other types of neural network architectures within this framework. We show that good accuracy on MNIST and Fashion MNIST can be obtained using a relatively small number of trainable parameters. In addition, since implementation of the convolutional layer is resource-heavy, we consider an approach in the transform domain that obviates the need for convolutional layers. One of the advantages of this general framework over prior approaches is that the number of trainable parameters is not fixed and can be varied arbitrarily. In particular, we illustrate the tradeoff of varying the number of trainable variables and the corresponding error rate. As an example, by using this decomposition on a reference CNN architecture for MNIST with over 3×10^6 trainable parameters, we are able to obtain an accuracy of 98.44% using only 3554 trainable parameters.
Predicting keywords performance, such as number of impressions, click-through rate (CTR), conversion rate (CVR), revenue per click (RPC), and cost per click (CPC), is critical for sponsored search in the online advertising industry. An interesting phenomenon is that, despite the size of the overall data, the data are very sparse at the individual unit level. To overcome the sparsity and leverage hierarchical information across the data structure, we propose a Dynamic Hierarchical Empirical Bayesian (DHEB) model that dynamically determines the hierarchy through a data-driven process and provides shrinkage-based estimations. Our method is also equipped with an efficient empirical approach to derive inferences through the hierarchy. We evaluate the proposed method in both simulated and real-world datasets and compare to several competitive models. The results favor the proposed method among all comparisons in terms of both accuracy and efficiency. In the end, we design a two-phase system to serve prediction in real time.
Deep Convolutional Neural Networks~(CNNs) offer remarkable performance of classifications and regressions in many high-dimensional problems and have been widely utilized in real-word cognitive applications. However, high computational cost of CNNs greatly hinder their deployment in resource-constrained applications, real-time systems and edge computing platforms. To overcome this challenge, we propose a novel filter-pruning framework, two-phase filter pruning based on conditional entropy, namely \textit{2PFPCE}, to compress the CNN models and reduce the inference time with marginal performance degradation. In our proposed method, we formulate filter pruning process as an optimization problem and propose a novel filter selection criteria measured by conditional entropy. Based on the assumption that the representation of neurons shall be evenly distributed, we also develop a maximum-entropy filter freeze technique that can reduce over fitting. Two filter pruning strategies — global and layer-wise strategies, are compared. Our experiment result shows that combining these two strategies can achieve a higher neural network compression ratio than applying only one of them under the same accuracy drop threshold. Two-phase pruning, that is, combining both global and layer-wise strategies, achieves 10 X FLOPs reduction and 46% inference time reduction on VGG-16, with 2% accuracy drop.
Standard neural machine translation (NMT) systems operate primarily on words, ignoring lower-level patterns of morphology. We present a character-aware decoder for NMT that can simultaneously work with both word-level and subword-level sequences which is designed to capture such patterns. We achieve character-awareness by augmenting both the softmax and embedding layers of an attention-based encoder-decoder network with convolutional neural networks that operate on spelling of a word (or subword). While character-aware embeddings have been successfully used in the source-side, we find that mixing character-aware embeddings with standard embeddings is crucial in the target-side. Furthermore, we show that a simple approximate softmax layer can be used for large target-side vocabularies which would otherwise require prohibitively large memory. We experiment on the TED multi-target dataset, translating English into 14 typologically diverse languages. We find that in this low-resource setting, the character-aware decoder provides consistent improvements over word-level and subword-level counterparts with BLEU score gains of up to +3.37.
In this paper for the first time the nonparametric autoregression estimation problem for the quadratic risks is considered. To this end we develop a new adaptive sequential model selection method based on the efficient sequential kernel estimators proposed by Arkoun and Pergamenshchikov (2016). Moreover, we develop a new analytical tool for general regression models to obtain the non asymptotic sharp oracle inequalities for both usual quadratic and robust quadratic risks. Then, we show that the constructed sequential model selection procedure is optimal in the sense of oracle inequalities.
We consider the problem of learning optimal policies from observational data in a way that satisfies certain fairness criteria. The issue of fairness arises where some covariates used in decision making are sensitive features, or are correlated with sensitive features. (Nabi and Shpitser 2018) formalized fairness in the context of regression problems as constraining the causal effects of sensitive features along certain disallowed causal pathways. The existence of these causal effects may be called retrospective unfairness in the sense of already being present in the data before analysis begins, and may be due to discriminatory practices or the biased way in which variables are defined or recorded. In the context of learning policies, what we call prospective bias, i.e., the inappropriate dependence of learned policies on sensitive features, is also possible. In this paper, we use methods from causal and semiparametric inference to learn optimal policies in a way that addresses both retrospective bias in the data, and prospective bias due to the policy. In addition, our methods appropriately address statistical bias due to model misspecification and confounding bias, which are important in the estimation of path-specific causal effects from observational data. We apply our methods to both synthetic data and real criminal justice data.
We propose a mixture-of-experts approach for unsupervised domain adaptation from multiple sources. The key idea is to explicitly capture the relationship between a target example and different source domains. This relationship, expressed by a point-to-set metric, determines how to combine predictors trained on various domains. The metric is learned in an unsupervised fashion using meta-training. Experimental results on sentiment analysis and part-of-speech tagging demonstrate that our approach consistently outperforms multiple baselines and can robustly handle negative transfer.
Bubble segmentation and size detection algorithms have been developed in recent years for their high efficiency and accuracy in measuring bubbly two-phase flows. In this work, we proposed an architecture called bubble generative adversarial networks (BubGAN) for the generation of realistic synthetic images which could be further used as training or benchmarking data for the development of advanced image processing algorithms. The BubGAN is trained initially on a labeled bubble dataset consisting of ten thousand images. By learning the distribution of these bubbles, the BubGAN can generate more realistic bubbles compared to the conventional models used in the literature. The trained BubGAN is conditioned on bubble feature parameters and has full control of bubble properties in terms of aspect ratio, rotation angle, circularity and edge ratio. A million bubble dataset is pre-generated using the trained BubGAN. One can then assemble realistic bubbly flow images using this dataset and associated image processing tool. These images contain detailed bubble information, therefore do not require additional manual labeling. This is more useful compared with the conventional GAN which generates images without labeling information. The tool could be used to provide benchmarking and training data for existing image processing algorithms and to guide the future development of bubble detecting algorithms.
We consider stochastic settings for clustering, and develop provably-good (approximation) algorithms for a number of these notions. These algorithms allow one to obtain better approximation ratios compared to the usual deterministic clustering setting. Additionally, they offer a number of advantages including providing fairer clustering and clustering which has better long-term behavior for each user. In particular, they ensure that *every user* is guaranteed to get good service (on average). We also complement some of these with impossibility results.
We investigate methods of change-point testing and confidence interval construction for nonparametric estimators of expected shortfall and related risk measures in weakly dependent time series. A key aspect of our work is the ability to detect general multiple structural changes in the tails of time series marginal distributions. Unlike extant approaches for detecting tail structural changes using quantities such as tail index, our approach does not require parametric modeling of the tail and detects more general changes in the tail. Additionally, our methods are based on the recently introduced self-normalization technique for time series, allowing for statistical analysis without the issues of consistent standard error estimation. The theoretical foundation for our methods are functional central limit theorems, which we develop under weak assumptions. An empirical study of S&P 500 returns and US 30-Year Treasury bonds illustrates the practical use of our methods in detecting and quantifying market instability via the tails of financial time series during times of financial crisis.
We propose Information-Theoretic Active Learning (ITAL), a novel batch-mode active learning method for binary classification, and apply it for acquiring meaningful user feedback in the context of content-based image retrieval. Instead of combining different heuristics such as uncertainty, diversity, or density, our method is based on maximizing the mutual information between the predicted relevance of the images and the expected user feedback regarding the selected batch. We propose suitable approximations to this computationally demanding problem and also integrate an explicit model of user behavior that accounts for possible incorrect labels and unnameable instances. Furthermore, our approach does not only take the structure of the data but also the expected model output change caused by the user feedback into account. In contrast to other methods, ITAL turns out to be highly flexible and provides state-of-the-art performance across various datasets, such as MIRFLICKR and ImageNet.
Characteristic sets (CS) organize RDF triples based on the set of properties characterizing their subject nodes. This concept is recently used in indexing techniques, as it can capture the implicit schema of RDF data. While most CS-based approaches yield significant improvements in space and query performance, they fail to perform well in the presence of schema heterogeneity, i.e., when the number of CSs becomes very large, resulting in a highly partitioned data organization. In this paper, we address this problem by introducing a novel technique, for merging CSs based on their hierarchical structure. Our technique employs a lattice to capture the hierarchical relationships between CSs, identifies dense CSs and merges dense CSs with their ancestors, thus reducing the size of the CSs as well as the links between them. We implemented our algorithm on top of a relational backbone, where each merged CS is stored in a relational table, and we performed an extensive experimental study to evaluate the performance and impact of merging to the storage and querying of RDF datasets, indicating significant improvements.
Massive datasets of curves, such as time series and trajectories, are continuously generated by mobile and sensing devices. A relevant operation on curves is similarity search: given a dataset $S$ of curves, construct a data structure that, for any query curve $q$, finds the curves in $S$ similar to $q$. Similarity search is a computational demanding task, in particular when a robust distance function is used, such as the continuous Fr\’echet distance. In this paper, we propose FRESH, a novel approximate solution to find similar curves under the continuous Fr\’echet distance. FRESH leverages on a locality sensitive hashing scheme for detecting candidate near neighbors of the query curve, and a subsequent pruning step based on a pipeline of curve simplifications. By relaxing the requirement of exact and deterministic solutions, FRESH reaches high performance and outperforms the state-of-the-art approaches. The experiments indeed show that, with a recall larger than 80% and precision 100%, we have at least a factor 10 improvement in performance over a baseline given by the best solutions developed for the ACM SIGSPATIAL 2017 challenge on the Fr\’echet distance. Furthermore, the improvement peaks up to two orders of magnitude, and even more, by relaxing the precision.
Multi-target prediction (MTP) is concerned with the simultaneous prediction of multiple target variables of diverse type. Due to its enormous application potential, it has developed into an active and rapidly expanding research field that combines several subfields of machine learning, including multivariate regression, multi-label classification, multi-task learning, dyadic prediction, zero-shot learning, network inference, and matrix completion. In this paper, we present a unifying view on MTP problems and methods. First, we formally discuss commonalities and differences between existing MTP problems. To this end, we introduce a general framework that covers the above subfields as special cases. As a second contribution, we provide a structured overview of MTP methods. This is accomplished by identifying a number of key properties, which distinguish such methods and determine their suitability for different types of problems. Finally, we also discuss a few challenges for future research.
Artificial neural networks (ANNs) have very successfully been used in numerical simulations for a series of computational problems ranging from image classification/image recognition, speech recognition, time series analysis, game intelligence, and computational advertising to numerical approximations of partial differential equations (PDEs). Such numerical simulations suggest that ANNs have the capacity to very efficiently approximate high-dimensional functions and, especially, such numerical simulations indicate that ANNs seem to admit the fundamental power to overcome the curse of dimensionality when approximating the high-dimensional functions appearing in the above named computational problems. There are also a series of rigorous mathematical approximation results for ANNs in the scientific literature. Some of these mathematical results prove convergence without convergence rates and some of these mathematical results even rigorously establish convergence rates but there are only a few special cases where mathematical results can rigorously explain the empirical success of ANNs when approximating high-dimensional functions. The key contribution of this article is to disclose that ANNs can efficiently approximate high-dimensional functions in the case of numerical approximations of Black-Scholes PDEs. More precisely, this work reveals that the number of required parameters of an ANN to approximate the solution of the Black-Scholes PDE grows at most polynomially in both the reciprocal of the prescribed approximation accuracy $\varepsilon > 0$ and the PDE dimension $d \in \mathbb{N}$ and we thereby prove, for the first time, that ANNs do indeed overcome the curse of dimensionality in the numerical approximation of Black-Scholes PDEs.
Populating ontology graphs represents a long-standing problem for the Semantic Web community. Recent advances in translation-based graph embedding methods for populating instance-level knowledge graphs lead to promising new approaching for the ontology population problem. However, unlike instance-level graphs, the majority of relation facts in ontology graphs come with comprehensive semantic relations, which often include the properties of transitivity and symmetry, as well as hierarchical relations. These comprehensive relations are often too complex for existing graph embedding methods, and direct application of such methods is not feasible. Hence, we propose On2Vec, a novel translation-based graph embedding method for ontology population. On2Vec integrates two model components that effectively characterize comprehensive relation facts in ontology graphs. The first is the Component-specific Model that encodes concepts and relations into low-dimensional embedding spaces without a loss of relational properties; the second is the Hierarchy Model that performs focused learning of hierarchical relation facts. Experiments on several well-known ontology graphs demonstrate the promising capabilities of On2Vec in predicting and verifying new relation facts. These promising results also make possible significant improvements in related methods.
Networks are ubiquitous structure that describes complex relationships between different entities in the real world. As a critical component of prediction task over nodes in networks, learning the feature representation of nodes has become one of the most active areas recently. Network Embedding, aiming to learn non-linear and low-dimensional feature representation based on network topology, has been proved to be helpful on tasks of network analysis, especially node classification. For many real-world systems, multiple types of relations are naturally represented by multiple networks. However, existing network embedding methods mainly focus on single network embedding and neglect the information shared among different networks. In this paper, we propose a novel multiple network embedding method based on semisupervised autoencoder, named DeepMNE, which captures complex topological structures of multi-networks and takes the correlation among multi-networks into account. We evaluate DeepMNE on the task of node classification with two real-world datasets. The experimental results demonstrate the superior performance of our method over four state-of-the-art algorithms.
Survival analysis is a hotspot in statistical research for modeling time-to-event information with data censorship handling, which has been widely used in many applications such as clinical research, information system and other fields with survivorship bias. Many works have been proposed for survival analysis ranging from traditional statistic methods to machine learning models. However, the existing methodologies either utilize counting-based statistics on the segmented data, or have a pre-assumption on the event probability distribution w.r.t. time. Moreover, few works consider sequential patterns within the feature space. In this paper, we propose a Deep Recurrent Survival Analysis model which combines deep learning for conditional probability prediction at fine-grained level of the data, and survival analysis for tackling the censorship. By capturing the time dependency through modeling the conditional probability of the event for each sample, our method predicts the likelihood of the true event occurrence and estimates the survival rate over time, i.e., the probability of the non-occurrence of the event, for the censored data. Meanwhile, without assuming any specific form of the event probability distribution, our model shows great advantages over the previous works on fitting various sophisticated data distributions. In the experiments on the three real-world tasks from different fields, our model significantly outperforms the state-of-the-art solutions under various metrics.
Many questions in Data Science are fundamentally causal in that our objective is to learn the effect of some exposure (randomized or not) on an outcome interest. Even studies that are seemingly non-causal (e.g. prediction or prevalence estimation) have causal elements, such as differential censoring or measurement. As a result, we, as Data Scientists, need to consider the underlying causal mechanisms that gave rise to the data, rather than simply the pattern or association observed in the data. In this work, we review the ‘Causal Roadmap’, a formal framework to augment our traditional statistical analyses in an effort to answer the causal questions driving our research. Specific steps of the Roadmap include clearly stating the scientific question, defining of the causal model, translating the scientific question into a causal parameter, assessing the assumptions needed to translate the causal parameter into a statistical estimand, implementation of statistical estimators including parametric and semi-parametric methods, and interpretation of our findings. Throughout we focus on the effect of an exposure occurring at a single time point and provide extensions to more advanced settings.
Models in Interactive Information Retrieval (IIR) are grounded very much on the user’s task in order to give system support based on different task types and topics. However, the automatic recognition of user interests from log data in search systems is not trivial. Search queries entered by users a surely one such source. However, queries may be short, or users are only browsing. In this paper, we propose a method of term-mouse-fixations which takes the fixations on terms users are hovering over with the mouse into consideration to estimate topical user interests. We analyzed 22,259 search sessions of a domain-specific digital library over a period of about four months. We compared these mouse fixations to user-entered search terms and to titles and keywords from documents the user showed an interest in. These terms were found in 87.12% of all analyzed sessions; in this subset of sessions, per session on average 11.46 term-mouse-fixations from queries and viewed documents were found. These terms were fixated significantly longer with about 7 seconds than other terms with about 4.4 seconds. This means, term-mouse-fixations provide indicators for topical user interests and it is possible to extract them based on fixation time.
In Natural Language Processing (NLP), one traditionally considers a single task (e.g. part-of-speech tagging) for a single language (e.g. English) at a time. However, recent work has shown that it can be beneficial to take advantage of relatedness between tasks, as well as between languages. In this work I examine the concept of relatedness and explore how it can be utilised to build NLP models that require less manually annotated data. A large selection of NLP tasks is investigated for a substantial language sample comprising 60 languages. The results show potential for joint multitask and multilingual modelling, and hints at linguistic insights which can be gained from such models.
Recently machine learning is being applied to almost every data domain one of which is Question Answering Systems (QAS). A typical Question Answering System is fairly an information retrieval system, which matches documents or text and retrieve the most accurate one. The idea of open domain question answering system put forth, involves convolutional neural network text classifiers. The Classification model presented in this paper is multi-class text classifier. The neural network classifier can be trained on large dataset. We report series of experiments conducted on Convolution Neural Network (CNN) by training it on two different datasets. Neural network model is trained on top of word embedding. Softmax layer is applied to calculate loss and mapping of semantically related words. Gathered results can help justify the fact that proposed hypothetical QAS is feasible. We further propose a method to integrate Convolutional Neural Network Classifier to an open domain question answering system. The idea of Open domain will be further explained, but the generality of it indicates to the system of domain specific trainable models, thus making it an open domain.
Network embedding algorithms are able to learn latent feature representations of nodes, transforming networks into lower dimensional vector representations. Typical key applications, which have effectively been addressed using network embeddings, include link prediction, multilabel classification and community detection. In this paper, we propose BiasedWalk, a scalable, unsupervised feature learning algorithm that is based on biased random walks to sample context information about each node in the network. Our random-walk based sampling can behave as Breath-First-Search (BFS) and Depth-First-Search (DFS) samplings with the goal to capture homophily and role equivalence between the nodes in the network. We have performed a detailed experimental evaluation comparing the performance of the proposed algorithm against various baseline methods, on several datasets and learning tasks. The experiment results show that the proposed method outperforms the baseline ones in most of the tasks and datasets.
In this paper, we propose a new method to perform Sparse Kernel Principal Component Analysis (SKPCA) and also mathematically analyze the validity of SKPCA. We formulate SKPCA as a constrained optimization problem with elastic net regularization (Hastie et al.) in kernel feature space and solve it. We consider outlier detection (where KPCA is employed) as an application for SKPCA, using the RBF kernel. We test it on 5 real-world datasets and show that by using just 4% (or even less) of the principal components (PCs), where each PC has on average less than 12% non-zero elements in the worst case among all 5 datasets, we are able to nearly match and in 3 datasets even outperform KPCA. We also compare the performance of our method with a recently proposed method for SKPCA by Wang et al. and show that our method performs better in terms of both accuracy and sparsity. We also provide a novel probabilistic proof to justify the existence of sparse solutions for KPCA using the RBF kernel. To the best of our knowledge, this is the first attempt at theoretically analyzing the validity of SKPCA.
How do we learn from biased data? Historical datasets often reflect historical prejudices; sensitive or protected attributes may affect the observed treatments and outcomes. Classification algorithms tasked with predicting outcomes accurately from these datasets tend to replicate these biases. We advocate a causal modeling approach to learning from biased data and reframe fair classification as an intervention problem. We propose a causal model in which the sensitive attribute confounds both the treatment and the outcome. Building on prior work in deep learning and generative modeling, we describe how to learn the parameters of this causal model from observational data alone, even in the presence of unobserved confounders. We show experimentally that fairness-aware causal modeling provides better estimates of the causal effects between the sensitive attribute, the treatment, and the outcome. We further present evidence that estimating these causal effects can help us to learn policies which are both more accurate and fair, when presented with a historically biased dataset.
Motivated by understanding the dynamics of sensitive social networks over time, we consider the problem of continual release of statistics in a network that arrives online, while preserving privacy of its participants. For our privacy notion, we use differential privacy — the gold standard in privacy for statistical data analysis. The main challenge in this problem is maintaining a good privacy-utility tradeoff; naive solutions that compose across time, as well as solutions suited to tabular data either lead to poor utility or do not directly apply. In this work, we show that if there is a publicly known upper bound on the maximum degree of any node in the entire network sequence, then we can release many common graph statistics such as degree distributions and subgraph counts continually with a better privacy-accuracy tradeoff. Code available at https://…/graphprivacycode
Graph-based semi-supervised learning (SSL) is an important learning problem where the goal is to assign labels to initially unlabeled nodes in a graph. Graph Convolutional Networks (GCNs) have recently been shown to be effective for graph-based SSL problems. GCNs inherently assume existence of pairwise relationships in the graph-structured data. However, in many real-world problems, relationships go beyond pairwise connections and hence are more complex. Hypergraphs provide a natural modeling tool to capture such complex relationships. In this work, we explore the use of GCNs for hypergraph-based SSL. In particular, we propose HyperGCN, an SSL method which uses a layer-wise propagation rule for convolutional neural networks operating directly on hypergraphs. To the best of our knowledge, this is the first principled adaptation of GCNs to hypergraphs. HyperGCN is able to encode both the hypergraph structure and hypernode features in an effective manner. Through detailed experimentation, we demonstrate HyperGCN’s effectiveness at hypergraph-based SSL.
This paper presents an efficient module named spatial bottleneck for accelerating the convolutional layers in deep neural networks. The core idea is to decompose convolution into two stages, which first reduce the spatial resolution of the feature map, and then restore it to the desired size. This operation decreases the sampling density in the spatial domain, which is independent yet complementary to network acceleration approaches in the channel domain. Using different sampling rates, we can tradeoff between recognition accuracy and model complexity. As a basic building block, spatial bottleneck can be used to replace any single convolutional layer, or the combination of two convolutional layers. We empirically verify the effectiveness of spatial bottleneck by applying it to the deep residual networks. Spatial bottleneck achieves 2x and 1.4x speedup on the regular and channel-bottlenecked residual blocks, respectively, with the accuracies retained in recognizing low-resolution images, and even improved in recognizing high-resolution images.