Image splicing is a very common image manipulation technique that is sometimes used for malicious purposes. A splicing detection and localization algorithm usually takes an input image and produces a binary decision indicating whether the input image has been manipulated, and also a segmentation mask that corresponds to the spliced region. Most existing splicing detection and localization pipelines suffer from two main shortcomings: 1) they use handcrafted features that are not robust against subsequent processing (e.g., compression), and 2) each stage of the pipeline is usually optimized independently. In this paper we extend the formulation of the underlying splicing problem to consider two input images, a query image and a potential donor image. Here the task is to estimate the probability that the donor image has been used to splice the query image, and obtain the splicing masks for both the query and donor images. We introduce a novel deep convolutional neural network architecture, called Deep Matching and Validation Network (DMVN), which simultaneously localizes and detects image splicing. The proposed approach does not depend on handcrafted features and uses raw input images to create deep learned representations. Furthermore, the DMVN is end-to-end op- timized to produce the probability estimates and the segmentation masks. Our extensive experiments demonstrate that this approach outperforms state-of-the-art splicing detection methods by a large margin in terms of both AUC score and speed.
We present a radial basis function solver for convolutional neural networks that can be directly applied to both image classification and distance metric learning problems. Our method treats all training features from a deep neural network as radial basis function centres and computes loss by summing the influence of a feature’s nearby centres in the embedding space. Having a radial basis function centred on each training feature is made scalable by treating it as an approximate nearest neighbour search problem. End-to-end learning of the network and solver is carried out, mapping high dimensional features into clusters of the same class. This results in a well formed embedding space, where semantically related instances are likely to be located near one another, regardless of whether or not the network was trained on those classes. The same loss function is used for both the metric learning and classification problems. We show that our radial basis function solver sets state-of-the-art embedding results on the Stanford Cars196 and CUB-200-2011 datasets. Additionally, we show that when used as a classifier, our method outperforms a conventional softmax classifier on the Caltech-256 object recognition dataset and the fine-grained recognition dataset CUB-200-2011.
Graph structured data on the web is now massive as well as diverse, ranging from social networks, web graphs to knowledge-bases. Effectively querying this graph structured data is non-trivial and has led to research in a variety of directions — structured queries, keyword and natural language queries, automatic translation of these queries to structured queries, etc. We are concerned with a class of queries called relationship queries, which are usually expressed as a set of keywords (each keyword denoting a named entity). The results returned are a set of ranked trees, each of which denotes relationships among the various keywords. The result list could consist of hundreds of answers. The problem of keyword search on graphs has been explored for over a decade now, but an important aspect that is not as extensively studied is that of user experience. We propose KlusTree, which presents clustered results to the users instead of a list of all the results. In our approach, the result trees are represented using language models and are clustered using JS divergence as a distance measure. We compare KlusTree with the well-known approaches based on isomorphism and tree-edit distance based clustering. The user evaluations show that KlusTree outperforms the other two in providing better clustering, thereby enriching user experience, revealing interesting patterns and improving result interpretation by the user.
Lifelong learning is the problem of learning multiple consecutive tasks in an online manner and is essential towards the development of intelligent machines that can adapt to their surroundings. In this work we focus on learning a lifelong approach to generative modeling whereby we continuously incorporate newly observed distributions into our model representation. We utilize two models, aptly named the student and the teacher, in order to aggregate information about all past distributions without the preservation of any of the past data or previous models. The teacher is utilized as a form of compressed memory in order to allow for the student model to learn over the past as well as present data. We demonstrate why a naive approach to lifelong generative modeling fails and introduce a regularizer with which we demonstrate learning across a long range of distributions.
Often in machine learning, data are collected as a combination of multiple conditions, e.g., the voice recordings of multiple persons, each labeled with an ID. How could we build a model that captures the latent information related to these conditions and generalize to a new one with few data? We present a new model called Latent Variable Multiple Output Gaussian Processes (LVMOGP) and that allows to jointly model multiple conditions for regression and generalize to a new condition with a few data points at test time. LVMOGP infers the posteriors of Gaussian processes together with a latent space representing the information about different conditions. We derive an efficient variational inference method for LVMOGP, of which the computational complexity is as low as sparse Gaussian processes. We show that LVMOGP significantly outperforms related Gaussian process methods on various tasks with both synthetic and real data.
The novel unseen classes can be formulated as the extreme values of known classes. This inspired the recent works on open-set recognition \cite{Scheirer_2013_TPAMI,Scheirer_2014_TPAMIb,EVM}, which however can have no way of naming the novel unseen classes. To solve this problem, we propose the Extreme Value Learning (EVL) formulation to learn the mapping from visual feature to semantic space. To model the margin and coverage distributions of each class, the Vocabulary-informed Learning (ViL) is adopted by using vast open vocabulary in the semantic space. Essentially, by incorporating the EVL and ViL, we for the first time propose a novel semantic embedding paradigm — Vocabulary-informed Extreme Value Learning (ViEVL), which embeds the visual features into semantic space in a probabilistic way. The learned embedding can be directly used to solve supervised learning, zero-shot and open set recognition simultaneously. Experiments on two benchmark datasets demonstrate the effectiveness of proposed frameworks.
One of the long-term goals of artificial intelligence is to build an agent that can communicate intelligently with human in natural language. Most existing work on natural language learning relies heavily on training over a pre-collected dataset with annotated labels, leading to an agent that essentially captures the statistics of the fixed external training data. As the training data is essentially a static snapshot representation of the knowledge from the annotator, the agent trained this way is limited in adaptiveness and generalization of its behavior. Moreover, this is very different from the language learning process of humans, where language is acquired during communication by taking speaking action and learning from the consequences of speaking action in an interactive manner. This paper presents an interactive setting for grounded natural language learning, where an agent learns natural language by interacting with a teacher and learning from feedback, thus learning and improving language skills while taking part in the conversation. To achieve this goal, we propose a model which incorporates both imitation and reinforcement by leveraging jointly sentence and reward feedbacks from the teacher. Experiments are conducted to validate the effectiveness of the proposed approach.
Cities have been a thriving place for citizens over the centuries due to their complex infrastructure. The emergence of the Cyber-Physical-Social Systems (CPSS) and context-aware technologies boost a growing interest in analysing, extracting and eventually understanding city events which subsequently can be utilised to leverage the citizen observations of their cities. In this paper, we investigate the feasibility of using Twitter textual streams for extracting city events. We propose a hierarchical multi-view deep learning approach to contextualise citizen observations of various city systems and services. Our goal has been to build a flexible architecture that can learn representations useful for tasks, thus avoiding excessive task-specific feature engineering. We apply our approach on a real-world dataset consisting of event reports and tweets of over four months from San Francisco Bay Area dataset and additional datasets collected from London. The results of our evaluations show that our proposed solution outperforms the existing models and can be used for extracting city related events with an averaged accuracy of 81% over all classes. To further evaluate the impact of our Twitter event extraction model, we have used two sources of authorised reports through collecting road traffic disruptions data from Transport for London API, and parsing the Time Out London website for sociocultural events. The analysis showed that 49.5% of the Twitter traffic comments are reported approximately five hours prior to the authorities official records. Moreover, we discovered that amongst the scheduled sociocultural event topics; tweets reporting transportation, cultural and social events are 31.75% more likely to influence the distribution of the Twitter comments than sport, weather and crime topics.
Recently, two scalable adaptations of the bootstrap have been proposed: the bag of little bootstraps (BLB; Kleiner et al., 2014) and the subsampled double bootstrap (SDB; Sengupta et al., 2016). In this paper, we introduce Bayesian bootstrap analogues to the BLB and SDB that have similar theoretical and computational properties, a strategy to perform lossless inference for a class of functionals of the Bayesian bootstrap, and briefly discuss extensions for Dirichlet Processes.
The expected improvement (EI) algorithm is a popular strategy for information collection in optimization under uncertainty. The algorithm is widely known to be too greedy, but nevertheless enjoys wide use due to its simplicity and ability to handle uncertainty and noise in a coherent decision theoretic framework. To provide rigorous insight into EI, we study its properties in a simple setting of Bayesian optimization where the domain consists of a finite grid of points. This is the so-called best-arm identification problem, where the goal is to allocate measurement effort wisely to confidently identify the best arm using a small number of measurements. In this framework, one can show formally that EI is far from optimal. To overcome this shortcoming, we introduce a simple modification of the expected improvement algorithm. Surprisingly, this simple change results in an algorithm that is asymptotically optimal for Gaussian best-arm identification problems, and provably outperforms standard EI by an order of magnitude.
We propose a hybrid approach to temporal anomaly detection in user-database access data — or more generally, any kind of subject-object co-occurrence data. Our methodology allows identifying anomalies based on a single stationary model, instead of requiring a full temporal one, which would be prohibitive in our setting. We learn our low-rank stationary model from the high-dimensional training data, and then fit a regression model for predicting the expected likelihood score of normal access patterns in the future. The disparity between the predicted and the observed likelihood scores is used to assess the ‘surprise’. This approach enables calibration of the anomaly score so that time-varying normal behavior patterns are not considered anomalous. We provide a detailed description of the algorithm, including a convergence analysis, and report encouraging empirical results. One of the datasets we tested is new for the public domain. It consists of two months’ worth of database access records from a live system. This dataset will be made publicly available, and is provided in the supplementary material.
Products reviews are one of the major resources to determine the public sentiment. The existing literature on reviews sentiment analysis mainly utilizes supervised paradigm, which needs labeled data to be trained on and suffers from domain-dependency. This article addresses these issues by describes a completely automatic approach for sentiment analysis based on unsupervised ensemble learning. The method consists of two phases. The first phase is contextual analysis, which has five processes, namely (1) data preparation; (2) spelling correction; (3) intensifier handling; (4) negation handling and (5) contrast handling. The second phase comprises the unsupervised learning approach, which is an ensemble of clustering classifiers using a majority voting mechanism with different weight schemes. The base classifier of the ensemble method is a modified k-means algorithm. The base classifier is modified by extracting initial centroids from the feature set via using SentWordNet (SWN). We also introduce new sentiment analysis problems of Australian airlines and home builders which offer potential benchmark problems in the sentiment analysis field. Our experiments on datasets from different domains show that contextual analysis and the ensemble phases improve the clustering performance in term of accuracy, stability and generalization ability.
Our work addresses two important issues with recurrent neural networks: (1) they are over-parameterized, and (2) the recurrence matrix is ill-conditioned. The former increases the sample complexity of learning and the training time. The latter causes the vanishing and exploding gradient problem. We present a flexible recurrent neural network model called Kronecker Recurrent Units (KRU). KRU achieves parameter efficiency in RNNs through a Kronecker factored recurrent matrix. It overcomes the ill-conditioning of the recurrent matrix by enforcing soft unitary constraints on the factors. Thanks to the small dimensionality of the factors, maintaining these constraints is computationally efficient. Our experimental results on five standard data-sets reveal that KRU can reduce the number of parameters by three orders of magnitude in the recurrent weight matrix compared to the existing recurrent models, without trading the statistical performance. These results in particular show that while there are advantages in having a high dimensional recurrent space, the capacity of the recurrent part of the model can be dramatically reduced.
We develop a new theoretical framework to analyze the generalization error of deep learning, and derive a new fast learning rate for two representative algorithms: empirical risk minimization and Bayesian deep learning. The series of theoretical analyses of deep learning has revealed its high expressive power and universal approximation capability. Although these analyses are highly nonparametric, existing generalization error analyses have been developed mainly in a fixed dimensional parametric model. To compensate this gap, we develop an infinite dimensional model that is based on an integral form as performed in the analysis of the universal approximation capability. This allows us to define a reproducing kernel Hilbert space corresponding to each layer. Our point of view is to deal with the ordinary finite dimensional deep neural network as a finite approximation of the infinite dimensional one. The approximation error is evaluated by the degree of freedom of the reproducing kernel Hilbert space in each layer. To estimate a good finite dimensional model, we consider both of empirical risk minimization and Bayesian deep learning. We derive its generalization error bound and it is shown that there appears bias-variance trade-off in terms of the number of parameters of the finite dimensional approximation. We show that the optimal width of the internal layers can be determined through the degree of freedom and the convergence rate can be faster than $O(1/\sqrt{n})$ rate which has been shown in the existing studies.
We consider sampling and enumeration problems for Markov equivalence classes. We create and analyze a Markov chain for uniform random sampling on the DAGs inside a Markov equivalence class. Though the worst case is exponentially slow mixing, we find a condition on the Markov equivalence class for polynomial time mixing. We also investigate the ratio of Markov equivalence classes to DAGs and a Markov chain of He, Jia, and Yu for random sampling of sparse Markov equivalence classes.
The simmer package brings discrete-event simulation to R. It is designed as a generic yet powerful process-oriented framework. The architecture encloses a robust and fast simulation core written in C++ with automatic monitoring capabilities. It provides a rich and flexible R API that revolves around the concept of trajectory, a common path in the simulation model for entities of the same type.
We show that the skip-gram formulation of word2vec trained with negative sampling is equivalent to a weighted logistic PCA. This connection allows us to better understand the objective, compare it to other word embedding methods, and extend it to higher dimensional models.
At present, the vast majority of building blocks, techniques, and architectures for deep learning are based on real-valued operations and representations. However, recent work on recurrent neural networks and older fundamental theoretical analysis suggests that complex numbers could have a richer representational capacity and could also facilitate noise-robust memory retrieval mechanisms. Despite their attractive properties and potential for opening up entirely new neural architectures, complex-valued deep neural networks have been marginalized due to the absence of the building blocks required to design such models. In this work, we provide the key atomic components for complex-valued deep neural networks and apply them to convolutional feed-forward networks. More precisely, we rely on complex convolutions and present algorithms for complex batch-normalization, complex weight initialization strategies for complex-valued neural nets and we use them in experiments with end-to-end training schemes. We demonstrate that such complex-valued models are able to achieve comparable or better performance than their real-valued counterparts. We test deep complex models on several computer vision tasks and on music transcription using the MusicNet dataset where we achieve state of the art performance.
Experimenting with a new dataset of 1.6M user comments from a Greek news portal and existing datasets of English Wikipedia comments, we show that an RNN outperforms the previous state of the art in moderation. A deep, classification-specific attention mechanism improves further the overall performance of the RNN. We also compare against a CNN and a word-list baseline, considering both fully automatic and semi-automatic moderation.
Micro-blogging service Twitter is a lucrative source for data mining applications on global sentiment. But due to the omnifariousness of the subjects mentioned in each data item; it is inefficient to run a data mining algorithm on the raw data. This paper discusses an algorithm to accurately classify the entire stream in to a given number of mutually exclusive collectively exhaustive streams upon each of which the data mining algorithm can be run separately yielding more relevant results with a high efficiency.
We address one of the important problems in Big Data, namely how to combine estimators from different subsamples by robust fusion procedures, when we are unable to deal with the whole sample.
There are two common approaches for optimizing the performance of a machine: genetic algorithms and machine learning. A genetic algorithm is applied over many generations whereas machine learning works by applying feedback until the system meets a performance threshold. Though these are methods that typically operate separately, we combine evolutionary adaptation and machine learning into one approach. Our focus is on machines that can learn during their lifetime, but instead of equipping them with a machine learning algorithm we aim to let them evolve their ability to learn by themselves. We use evolvable networks of probabilistic and deterministic logic gates, known as Markov Brains, as our computational model organism. The ability of Markov Brains to learn is augmented by a novel adaptive component that can change its computational behavior based on feedback. We show that Markov Brains can indeed evolve to incorporate these feedback gates to improve their adaptability to variable environments. By combining these two methods, we now also implemented a computational model that can be used to study the evolution of learning.