Graph Edit Distance (GED) computation is a core operation of many widely-used graph applications, such as graph classification, graph matching, and graph similarity search. However, computing the exact GED between two graphs is NP-complete. Most current approximate algorithms are based on solving a combinatorial optimization problem, which involves complicated design and high time complexity. In this paper, we propose a novel end-to-end neural network based approach to GED approximation, aiming to alleviate the computational burden while preserving good performance. The proposed approach, named GSimCNN, turns GED computation into a learning problem. Each graph is considered as a set of nodes, represented by learnable embedding vectors. The GED computation is then considered as a two-set matching problem, where a higher matching score leads to a lower GED. A Convolutional Neural Network (CNN) based approach is proposed to tackle the set matching problem. We test our algorithm on three real graph datasets, and our model achieves significant performance enhancement against state-of-the-art approximate GED computation algorithms.
Simulation workflow is a top-level model for the design and control of simulation process. It connects multiple simulation components with time and interaction restrictions to form a complete simulation system. Before the construction and evaluation of the component models, the validation of upper-layer simulation workflow is of the most importance in a simulation system. However, the methods especially for validating simulation workflow is very limit. Many of the existing validation techniques are domain-dependent with cumbersome questionnaire design and expert scoring. Therefore, this paper present an empirical learning-based validation procedure to implement a semi-automated evaluation for simulation workflow. First, representative features of general simulation workflow and their relations with validation indices are proposed. The calculation process of workflow credibility based on Analytic Hierarchy Process (AHP) is then introduced. In order to make full use of the historical data and implement more efficient validation, four learning algorithms, including back propagation neural network (BPNN), extreme learning machine (ELM), evolving new-neuron (eNFN) and fast incremental gaussian mixture model (FIGMN), are introduced for constructing the empirical relation between the workflow credibility and its features. A case study on a landing-process simulation workflow is established to test the feasibility of the proposed procedure. The experimental results also provide some useful overview of the state-of-the-art learning algorithms on the credibility evaluation of simulation models.
Robust PCA, the problem of PCA in the presence of outliers has been extensively investigated in the last few years. Here we focus on Robust PCA in the outlier model where each column of the data matrix is either an inlier or an outlier. Most of the existing methods for this model assumes either the knowledge of the dimension of the lower dimensional subspace or the fraction of outliers in the system. However in many applications knowledge of these parameters is not available. Motivated by this we propose a parameter free outlier identification method for robust PCA which a) does not require the knowledge of outlier fraction, b) does not require the knowledge of the dimension of the underlying subspace, c) is computationally simple and fast d) can handle structured and unstructured outliers. Further, analytical guarantees are derived for outlier identification and the performance of the algorithm is compared with the existing state of the art methods in both real and synthetic data for various outlier structures.
Recommender system is an important component of many web services to help users locate items that match their interests. Several studies showed that recommender systems are vulnerable to poisoning attacks, in which an attacker injects fake data to a given system such that the system makes recommendations as the attacker desires. However, these poisoning attacks are either agnostic to recommendation algorithms or optimized to recommender systems that are not graph-based. Like association-rule-based and matrix-factorization-based recommender systems, graph-based recommender system is also deployed in practice, e.g., eBay, Huawei App Store. However, how to design optimized poisoning attacks for graph-based recommender systems is still an open problem. In this work, we perform a systematic study on poisoning attacks to graph-based recommender systems. Due to limited resources and to avoid detection, we assume the number of fake users that can be injected into the system is bounded. The key challenge is how to assign rating scores to the fake users such that the target item is recommended to as many normal users as possible. To address the challenge, we formulate the poisoning attacks as an optimization problem, solving which determines the rating scores for the fake users. We also propose techniques to solve the optimization problem. We evaluate our attacks and compare them with existing attacks under white-box (recommendation algorithm and its parameters are known), gray-box (recommendation algorithm is known but its parameters are unknown), and black-box (recommendation algorithm is unknown) settings using two real-world datasets. Our results show that our attack is effective and outperforms existing attacks for graph-based recommender systems. For instance, when 1% fake users are injected, our attack can make a target item recommended to 580 times more normal users in certain scenarios.
Time series observations are ubiquitous in astronomy, and are generated to distinguish between different types of supernovae, to detect and characterize extrasolar planets and to classify variable stars. These time series are usually modeled using a parametric and/or physical model that assumes independent and homoscedastic errors, but in many cases these assumptions are not accurate and there remains a temporal dependency structure on the errors. This can occur, for example, when the proposed model cannot explain all the variability of the data or when the parameters of the model are not properly estimated. In this work we define an autoregressive model for irregular discrete-time series, based on the discrete time representation of the continuous autoregressive model of order 1. We show that the model is ergodic and stationary. We further propose a maximum likelihood estimation procedure and assess the finite sample performance by Monte Carlo simulations. We implement the model on real and simulated data from Gaussian as well as other distributions, showing that the model can flexibly adapt to different data distributions. We apply the irregular autoregressive model to the residuals of a transit of an extrasolar planet to illustrate errors that remain with temporal structure. We also apply this model to residuals of an harmonic fit of light-curves from variable stars to illustrate how the model can be used to detect incorrect parameter estimation.
Image optimization problems encompass many applications such as spectral fusion, deblurring, deconvolution, dehazing, matting, reflection removal and image interpolation, among others. With current image sizes in the order of megabytes, it is extremely expensive to run conventional algorithms such as gradient descent, making them unfavorable especially when closed-form solutions can be derived and computed efficiently. This paper explains in detail the framework for solving convex image optimization and deconvolution in the Fourier domain. We begin by explaining the mathematical background and motivating why the presented setups can be transformed and solved very efficiently in the Fourier domain. We also show how to practically use these solutions, by providing the corresponding implementations. The explanations are aimed at a broad audience with minimal knowledge of convolution and image optimization. The eager reader can jump to Section 3 for a footprint of how to solve and implement a sample optimization function, and Section 5 for the more complex cases.
With the development of cloud computing and big data, the reliability of data storage systems becomes increasingly important. Previous researchers have shown that machine learning algorithms based on SMART attributes are effective methods to predict hard drive failures. In this paper, we use SMART attributes to predict hard drive health degrees which are helpful for taking different fault tolerant actions in advance. Given the highly imbalanced SMART datasets, it is a nontrivial work to predict the health degree precisely. The proposed model would encounter overfitting and biased fitting problems if it is trained by the traditional methods. In order to resolve this problem, we propose two strategies to better utilize imbalanced data and improve performance. Firstly, we design a layerwise perturbation-based adversarial training method which can add perturbations to any layers of a neural network to improve the generalization of the network. Secondly, we extend the training method to the semi-supervised settings. Then, it is possible to utilize unlabeled data that have a potential of failure to further improve the performance of the model. Our extensive experiments on two real-world hard drive datasets demonstrate the superiority of the proposed schemes for both supervised and semi-supervised classification. The model trained by the proposed method can correctly predict the hard drive health status 5 and 15 days in advance. Finally, we verify the generality of the proposed training method in other similar anomaly detection tasks where the dataset is imbalanced. The results argue that the proposed methods are applicable to other domains.
To realize the promise of ubiquitous embedded deep network inference, it is essential to seek limits of energy and area efficiency. To this end, low-precision networks offer tremendous promise because both energy and area scale down quadratically with the reduction in precision. Here, for the first time, we demonstrate ResNet-18, ResNet-34, ResNet-50, ResNet-152, Inception-v3, densenet-161, and VGG-16bn networks on the ImageNet classification benchmark that, at 8-bit precision exceed the accuracy of the full-precision baseline networks after one epoch of finetuning, thereby leveraging the availability of pretrained models. We also demonstrate for the first time ResNet-18, ResNet-34, and ResNet-50 4-bit models that match the accuracy of the full-precision baseline networks. Surprisingly, the weights of the low-precision networks are very close (in cosine similarity) to the weights of the corresponding baseline networks, making training from scratch unnecessary. The number of iterations required by stochastic gradient descent to achieve a given training error is related to the square of (a) the distance of the initial solution from the final plus (b) the maximum variance of the gradient estimates. By drawing inspiration from this observation, we (a) reduce solution distance by starting with pretrained fp32 precision baseline networks and fine-tuning, and (b) combat noise introduced by quantizing weights and activations during training, by using larger batches along with matched learning rate annealing. Together, these two techniques offer a promising heuristic to discover low-precision networks, if they exist, close to fp32 precision baseline networks.
This paper addresses the problem of change-point detection on sequences of high-dimensional and heterogeneous observations, which also possess a periodic temporal structure. Due to the dimensionality problem, when the time between change-points is on the order of the dimension of the model parameters, drifts in the underlying distribution can be misidentified as changes. To overcome this limitation we assume that the observations lie in a lower dimensional manifold that admits a latent variable representation. In particular, we propose a hierarchical model that is computationally feasible, widely applicable to heterogeneous data and robust to missing instances. Additionally, to deal with the observations’ periodic dependencies, we employ a circadian model where the data periodicity is captured by non-stationary covariance functions. We validate the proposed technique on synthetic examples and we demonstrate its utility in the detection of changes for human behavior characterization.
With the advancement of machine learning and deep learning, vector search becomes instrumental to many information retrieval systems, to search and find best matches to user queries based on their semantic similarities.These online services require the search architecture to be both effective with high accuracy and efficient with low latency and memory footprint, which existing work fails to offer. We develop, Zoom, a new vector search solution that collaboratively optimizes accuracy, latency and memory based on a multiview approach. (1) A ‘preview’ step generates a small set of good candidates, leveraging compressed vectors in memory for reduced footprint and fast lookup. (2) A ‘fullview’ step on SSDs reranks those candidates with their full-length vector, striking high accuracy. Our evaluation shows that, Zoom achieves an order of magnitude improvements on efficiency while attaining equal or higher accuracy, comparing with the state-of-the-art.
Forecasting multivariate time series data, such as prediction of electricity consumption, solar power production, and polyphonic piano pieces, has numerous valuable applications. However, complex and non-linear interdependencies between time steps and series complicate the task. To obtain accurate prediction, it is crucial to model long-term dependency in time series data, which can be achieved to some good extent by recurrent neural network (RNN) with attention mechanism. Typical attention mechanism reviews the information at each previous time step and selects the relevant information to help generate the outputs, but it fails to capture the temporal patterns across multiple time steps. In this paper, we propose to use a set of filters to extract time-invariant temporal patterns, which is similar to transforming time series data into its ‘frequency domain’. Then we proposed a novel attention mechanism to select relevant time series, and use its ‘frequency domain’ information for forecasting. We applied the proposed model on several real-world tasks and achieved the state-of-the-art performance in all of them with only one exception. We also show that to some degree the learned filters play the role of bases in discrete Fourier transform.
Stochastic gradient methods are the workhorse (algorithms) of large-scale optimization problems in machine learning, signal processing, and other computational sciences and engineering. This paper studies Markov chain gradient descent, a variant of stochastic gradient descent where the random samples are taken on the trajectory of a Markov chain. Existing results of this method assume convex objectives and a reversible Markov chain and thus have their limitations. We establish new non-ergodic convergence under wider step sizes, for nonconvex problems, and for non-reversible finite-state Markov chains. Nonconvexity makes our method applicable to broader problem classes. Non-reversible finite-state Markov chains, on the other hand, can mix substatially faster. To obtain these results, we introduce a new technique that varies the mixing levels of the Markov chains. The reported numerical results validate our contributions.
In graphs with rich text information, constructing expressive graph representations requires incorporating textual information with structural information. Graph embedding models are becoming more and more popular in representing graphs, yet they are faced with two issues: sampling efficiency and text utilization. Through analyzing existing models, we find their training objectives are composed of pairwise proximities, and there are large amounts of redundant node pairs in Random Walk-based methods. Besides, inferring graph structures directly from texts (also known as zero-shot scenario) is a problem that requires higher text utilization. To solve these problems, we propose a novel Text-driven Graph Embedding with Pairs Sampling (TGE-PS) framework. TGE-PS uses Pairs Sampling (PS) to generate training samples which reduces ~99% training samples and is competitive compared to Random Walk. TGE-PS uses Text-driven Graph Embedding (TGE) which adopts word- and character-level embeddings to generate node embeddings. We evaluate TGE-PS on several real-world datasets, and experimental results demonstrate that TGE-PS produces state-of-the-art results in traditional and zero-shot link prediction tasks.
In this work, an ontology-based model for AI-assisted medicine side-effect (SE) prediction is developed, where three main components, including the drug model, the treatment model, and the AI-assisted prediction model, of proposed model are presented. To validate the proposed model, an ANN structure is established and trained by two hundred and forty-two TCM prescriptions that are gathered and classified from the most famous ancient TCM book and more than one thousand SE reports, in which two ontology-based attributions, hot and cold, are simply introduced to evaluate whether the prediction will cause a SE or not. The results preliminarily reveal that it is a relationship between the ontology-based attributions and the corresponding indicator that can be learnt by AI for predicting the SE, which suggests the proposed model has a potential in AI-assisted SE prediction. However, it should be noted that, the proposed model highly depends on the sufficient clinic data, and hereby, much deeper exploration is important for enhancing the accuracy of the prediction.
Machine reading comprehension (MRC) requires reasoning about both the knowledge involved in a document and knowledge about the world. However, existing datasets are typically dominated by questions that can be well solved by context matching, which fail to test this capability. To encourage the progress on knowledge-based reasoning in MRC, we present knowledge-based MRC in this paper, and build a new dataset consisting of 40,047 question-answer pairs. The annotation of this dataset is designed so that successfully answering the questions requires understanding and the knowledge involved in a document. We implement a framework consisting of both a question answering model and a question generation model, both of which take the knowledge extracted from the document as well as relevant facts from an external knowledge base such as Freebase/ProBase/Reverb/NELL. Results show that incorporating side information from external KB improves the accuracy of the baseline question answer system. We compare it with a standard MRC model BiDAF, and also provide the difficulty of the dataset and lay out remaining challenges.
Ensembles of deep neural networks with diverse architectures significantly improve generalization accuracy. However, training such ensembles requires a large amount of computational resources and time as every network in the ensemble has to be separately trained. In practice, this restricts the number of different deep neural network architectures that can be included within an ensemble. We propose a new approach to address this problem. Our approach captures the structural similarity between members of a neural network ensemble and train it only once. Subsequently, this knowledge is transferred to all members of the ensemble using function-preserving transformations. Then, these ensemble networks converge significantly faster as compared to training from scratch. We show through experiments on CIFAR-10, CIFAR-100, and SVHN data sets that our approach can train large and diverse ensembles of deep neural networks achieving comparable accuracy to existing approaches in a fraction of their training time. In particular, our approach trains an ensemble of $100$ variants of deep neural networks with diverse architectures up to $6 \times$ faster as compared to existing approaches. This improvement in training cost grows linearly with the size of the ensemble.
Conversational semantic parsing over tables requires knowledge acquiring and reasoning abilities, which have not been well explored by current state-of-the-art approaches. Motivated by this fact, we propose a knowledge-aware semantic parser to improve parsing performance by integrating various types of knowledge. In this paper, we consider three types of knowledge, including grammar knowledge, expert knowledge, and external resource knowledge. First, grammar knowledge empowers the model to effectively replicate previously generated logical form, which effectively handles the co-reference and ellipsis phenomena in conversation Second, based on expert knowledge, we propose a decomposable model, which is more controllable compared with traditional end-to-end models that put all the burdens of learning on trial-and-error in an end-to-end way. Third, external resource knowledge, i.e., provided by a pre-trained language model or an entity typing model, is used to improve the representation of question and table for a better semantic understanding. We conduct experiments on the SequentialQA dataset. Results show that our knowledge-aware model outperforms the state-of-the-art approaches. Incremental experimental results also prove the usefulness of various knowledge. Further analysis shows that our approach has the ability to derive the meaning representation of a context-dependent utterance by leveraging previously generated outcomes.
In a linear regression model with random design, we consider a family of candidate models from which we want to select a `good’ model for prediction out-of-sample. We fit the models using block shrinkage estimators, and we focus on the challenging situation where the number of explanatory variables can be of the same order as sample size and where the number of candidate models can be much larger than sample size. We develop an estimator for the out-of-sample predictive performance, and we show that the empirically best model is asymptotically as good as the truly best model. Using the estimator corresponding to the empirically best model, we construct a prediction interval that is approximately valid and short with high probability, i.e., we show that the actual coverage probability is close to the nominal one and that the length of this prediction interval is close to the length of the shortest but infeasible prediction interval. All results hold uniformly over a large class of data-generating processes. These findings extend results of Leeb (2009), where the models are fit using least-squares estimators, and of Huber (2013), where the models are fit using shrinkage estimators without block structure.
Dialogue systems are usually built on either generation-based or retrieval-based approaches, yet they do not benefit from the advantages of different models. In this paper, we propose a Retrieval-Enhanced Adversarial Training (REAT) method for neural response generation. Distinct from existing ap- proaches, the REAT method leverages an encoder-decoder framework in terms of an adversarial training paradigm, while taking advantage of N-best response candidates from a retrieval-based system to construct the discriminator. An empirical study on a large scale public available benchmark dataset shows that the REAT method significantly outper- forms the vanilla Seq2Seq model as well as the conventional adversarial training approach.
Time Series Classification (TSC) is an important and challenging problem in data mining. With the increase of time series data availability, hundreds of TSC algorithms have been proposed. Among these methods, only a few have considered Deep Neural Networks (DNNs) to perform this task. This is surprising as deep learning has seen very successful applications in the last years. DNNs have indeed revolutionized the field of computer vision especially with the advent of novel deeper architectures such as Residual and Convolutional Neural Networks. Apart from images, sequential data such as text and audio can also be processed with DNNs to reach state of the art performance for document classification and speech recognition. In this article, we study the current state of the art performance of deep learning algorithms for TSC by presenting an empirical study of the most recent DNN architectures for TSC. We give an overview of the most successful deep learning applications in various time series domains under a unified taxonomy of DNNs for TSC. We also provide an open source deep learning framework to the TSC community where we implemented each of the compared approaches and evaluated them on a univariate TSC benchmark (the UCR archive) and 12 multivariate time series datasets. By training 8,730 deep learning models on 97 time series datasets, we propose the most exhaustive study of DNNs for TSC to date.
A novel procedure is presented in this paper, for training a deep convolutional and recurrent neural network, taking into account both the available training data set and some information extracted from similar networks trained with other relevant data sets. This information is included in an extended loss function used for the network training, so that the network can have an improved performance when applied to the other data sets, without forgetting the learned knowledge from the original data set. Facial expression and emotion recognition in-the-wild is the test bed application that is used to demonstrate the improved performance achieved using the proposed approach. In this framework, we provide an experimental study on categorical emotion recognition using datasets from a very recent related emotion recognition challenge.
Time irreversibility is a common signature of nonlinear processes, and a fundamental property of non-equilibrium systems driven by non-conservative forces. A time series is said to be reversible if its statistical properties are invariant regardless of the direction of time. Here we propose the Time Reversibility from Ordinal Patterns method (TiROP) to assess time-reversibility from an observed finite time series. TiROP captures the information of scalar observations in time forward, as well as its time-reversed counterpart by means of ordinal patterns. The method compares both underlying information contents by quantifying its (dis)-similarity via Jensen-Shannon divergence. The statistic is contrasted with a population of divergences coming from a set of surrogates to unveil the temporal nature and its involved time scales. We tested TiROP in different synthetic and real, linear and non linear time series, juxtaposed with results from the classical Ramsey’s time reversibility test. Our results depict a novel, fast-computation, and fully data-driven methodology to assess time-reversibility at different time scales with no further assumptions over data. This approach adds new insights about the current non-linear analysis techniques, and also could shed light on determining new physiological biomarkers of high reliability and computational efficiency.
We propose a data-efficient Gaussian process-based Bayesian approach to the semi-supervised learning problem on graphs. The proposed model shows extremely competitive performance when compared to the state-of-the-art graph neural networks on semi-supervised learning benchmark experiments, and outperforms the neural networks in active learning experiments where labels are scarce. Furthermore, the model does not require a validation data set for early stopping to control over-fitting. Our model can be viewed as an instance of empirical distribution regression weighted locally by network connectivity. We further motivate the intuitive construction of the model with a Bayesian linear model interpretation where the node features are filtered by an operator related to the graph Laplacian. The method can be easily implemented by adapting off-the-shelf scalable variational inference algorithms for Gaussian processes.
We define and study a general framework for approval-based budgeting methods and compare certain methods within this framework by their axiomatic and computational properties. Furthermore, we visualize their behavior on certain Euclidean distributions and analyze them experimentally.
Public sector organisations are increasingly interested in using data science and artificial intelligence capabilities to deliver policy and generate efficiencies in high uncertainty environments. The long-term success of data science and AI in the public sector relies on effectively embedding it into delivery solutions for policy implementation. However, governments cannot do this integration of AI into public service delivery on their own. The UK Government Industrial Strategy is clear that delivering on the AI grand challenge requires collaboration between universities and public and private sectors. This cross-sectoral collaborative approach is the norm in applied AI centres of excellence around the world. Despite their popularity, cross-sector collaborations entail serious management challenges that hinder their success. In this article we discuss the opportunities and challenges from AI for public sector. Finally, we propose a series of strategies to successfully manage these cross-sectoral collaborations.
Meta-analyses of clinical trials targeting rare events face particular challenges when the data lack adequate numbers of events for all treatment arms. Especially when the number of studies is low, standard meta-analysis methods can lead to serious distortions because of such data sparsity. To overcome this, we suggest the use of weakly informative priors (WIP) for the treatment effect parameter of a Bayesian meta-analysis model, which may also be seen as a form of penalization. As a data model, we use a binomial-normal hierarchical model (BNHM) which does not require continuity corrections in case of zero counts in one or both arms. We suggest a normal prior for the log odds ratio with mean 0 and standard deviation 2.82, which is motivated (1) as a symmetric prior centred around unity and constraining the odds ratio to within a range from 1/250 to 250 with 95 % probability, and (2) as consistent with empirically observed effect estimates from a set of $\mbox{$37\,773$}$ meta-analyses from the Cochrane Database of Systematic Reviews. In a simulation study with rare events and few studies, our BNHM with a WIP outperformed a Bayesian method without a WIP and a maximum likelihood estimator in terms of smaller bias and shorter interval estimates with similar coverage. Furthermore, the methods are illustrated by a systematic review in immunosuppression of rare safety events following paediatric transplantation. A publicly available $\textbf{R}$ package, $\texttt{MetaStan}$, is developed to automate the $\textbf{Stan}$ implementation of meta-analysis models using WIPs.
We prove that, under low noise assumptions, the support vector machine with $N\ll m$ random features (RFSVM) can achieve the learning rate faster than $O(1/\sqrt{m})$ on a training set with $m$ samples when an optimized feature map is used. Our work extends the previous fast rate analysis of random features method from least square loss to 0-1 loss. We also show that the reweighted feature selection method, which approximates the optimized feature map, helps improve the performance of RFSVM in experiments on a synthetic data set.
Gradient boosted decision trees (GBDTs) have seen widespread adoption in academia, industry and competitive data science due to their state-of-the-art performance in a wide variety of machine learning tasks. In this paper, we present an extensive empirical comparison of XGBoost, LightGBM and CatBoost, three popular GBDT algorithms, to aid the data science practitioner in the choice from the multitude of available implementations. Specifically, we evaluate their behavior on four large-scale datasets with varying shapes, sparsities and learning tasks, in order to evaluate the algorithms’ generalization performance, training times (on both CPU and GPU) and their sensitivity to hyper-parameter tuning. In our analysis, we first make use of a distributed grid-search to benchmark the algorithms on fixed configurations, and then employ a state-of-the-art algorithm for Bayesian hyper-parameter optimization to fine-tune the models.
Algorithmic predictions are increasingly used to aid, or in some cases supplant, human decision-making, and this development has placed new demands on the outputs of machine learning procedures. To facilitate human interaction, we desire that they output prediction functions that are in some fashion simple or interpretable. And because they influence consequential decisions, we also desire equitable prediction functions, ones whose allocations benefit (or at the least do not harm) disadvantaged groups. We develop a formal model to explore the relationship between simplicity and equity. Although the two concepts appear to be motivated by qualitatively distinct goals, our main result shows a fundamental inconsistency between them. Specifically, we formalize a general framework for producing simple prediction functions, and in this framework we show that every simple prediction function is strictly improvable: there exists a more complex prediction function that is both strictly more efficient and also strictly more equitable. Put another way, using a simple prediction function both reduces utility for disadvantaged groups and reduces overall welfare. Our result is not only about algorithms but about any process that produces simple models, and as such connects to the psychology of stereotypes and to an earlier economics literature on statistical discrimination.