Abnormal event detection is one of the important objectives in research and practical applications of video surveillance. However, there are still three challenging problems for most anomaly detection systems in practical setting: limited labeled data, ambiguous definition of ‘abnormal’ and expensive feature engineering steps. This paper introduces a unified detection framework to handle these challenges using energy-based models, which are powerful tools for unsupervised representation learning. Our proposed models are firstly trained on unlabeled raw pixels of image frames from an input video rather than hand-crafted visual features; and then identify the locations of abnormal objects based on the errors between the input video and its reconstruction produced by the models. To handle video stream, we develop an online version of our framework, wherein the model parameters are updated incrementally with the image frames arriving on the fly. Our experiments show that our detectors, using Restricted Boltzmann Machines (RBMs) and Deep Boltzmann Machines (DBMs) as core modules, achieve superior anomaly detection performance to unsupervised baselines and obtain accuracy comparable with the state-of-the-art approaches when evaluating at the pixel-level. More importantly, we discover that our system trained with DBMs is able to simultaneously perform scene clustering and scene reconstruction. This capacity not only distinguishes our method from other existing detectors but also offers a unique tool to investigate and understand how the model works.
The algorithm selection problem is to choose the most suitable algorithm for solving a given problem instance and thus, it leverages the complementarity between different approaches that is present in many areas of AI. We report on the state of the art in algorithm selection, as defined by the Algorithm Selection Competition series 2015 to 2017. The results of these competitions show how the state of the art improved over the years. Although performance in some cases is very promising, there is still room for improvement in other cases. Finally, we provide insights into why some scenarios are hard, and pose challenges to the community on how to advance the current state of the art.
Mapping complex input data into suitable lower dimensional manifolds is a common procedure in machine learning. This step is beneficial mainly for two reasons: (1) it reduces the data dimensionality and (2) it provides a new data representation possibly characterised by convenient geometric properties. Euclidean spaces are by far the most widely used embedding spaces, thanks to their well-understood structure and large availability of consolidated inference methods. However, recent research demonstrated that many types of complex data (e.g., those represented as graphs) are actually better described by non-Euclidean geometries. Here, we investigate how embedding graphs on constant-curvature manifolds (hyper-spherical and hyperbolic manifolds) impacts on the ability to detect changes in sequences of attributed graphs. The proposed methodology consists in embedding graphs into a geometric space and perform change detection there by means of conventional methods for numerical streams. The curvature of the space is a parameter that we learn to reproduce the geometry of the original application-dependent graph space. Preliminary experimental results show the potential capability of representing graphs by means of curved manifold, in particular for change and anomaly detection problems.
While working in collaborative team elsewhere sometimes the federated (huge) data are from heterogeneous cloud vendors. It is not only about the data privacy concern but also about how can those federated data can be querying from cloud directly in fast and securely way. Previous solution offered hybrid cloud between public and trusted private cloud. Another previous solution used encryption on MapReduce framework. But the challenge is we are working on heterogeneous clouds. In this paper, we present a novel technique for querying with privacy concern. Since we take execution time into account, our basic idea is to use the data mining model by partitioning the federated databases in order to reduce the search and query time. By using model of the database it means we use only the summary or the very characteristic patterns of the database. Modeling is the Preserving Privacy Stage I, since by modeling the data is being symbolized. We implement encryption on the database as preserving privacy Stage II. Our system, called ‘cSELENE’ (stands for ‘cloud SELENE’), is designed to handle federated data on heterogeneous clouds: AWS, Microsoft Azure, and Google Cloud Platform with MapReduce technique. In this paper we discuss preserving-privacy system and threat model, the format of federated data, the parallel programming (GPU programming and shared/memory systems), the parallel and secure algorithm for data mining model in distributed cloud, the cloud infrastructure/architecture, and the UIX design of the cSELENE system. Other issues such as incremental method and the secure design of cloud architecture system (Virtual Machines across platform design) are still open to discuss. Our experiments should demonstrate the validity and practicality of the proposed high performance computing scheme.
Modern applications significantly enhance user experience by adapting to each user’s individual condition and/or preferences. While this adaptation can greatly improve utility or be essential for the application to work (e.g., for ride-sharing applications), the exposure of user data to the application presents a significant privacy threat to the users, even when the traces are anonymized, since the statistical matching of an anonymized trace to prior user behavior can identify a user and their habits. Because of the current and growing algorithmic and computational capabilities of adversaries, provable privacy guarantees as a function of the degree of anonymization and obfuscation of the traces are necessary. Our previous work has established the requirements on anonymization and obfuscation in the case that data traces are independent between users. However, the data traces of different users will be dependent in many applications, and an adversary can potentially exploit such. In this paper, we consider the impact of correlation between user traces on their privacy. First, we demonstrate that the adversary can readily identify the association graph, revealing which user data traces are correlated. Next, we demonstrate that the adversary can use this association graph to break user privacy with significantly shorter traces than in the case when traces are independent between users, and that independent obfuscation of the data traces is often insufficient to remedy such. Finally, we discuss how the users can employ dependency in their obfuscation to improve their privacy.
modAL is a modular active learning framework for Python, aimed to make active learning research and practice simpler. Its distinguishing features are (i) clear and modular object oriented design (ii) full compatibility with scikit-learn models and workflows. These features make fast prototyping and easy extensibility possible, aiding the development of real-life active learning pipelines and novel algorithms as well. modAL is fully open source, hosted on GitHub at https://…/modAL. To assure code quality, extensive unit tests are provided and continuous integration is applied. In addition, a detailed documentation with several tutorials are also available for ease of use. The framework is available in PyPI and distributed under the MIT license.
The development of Artificial General Intelligence (AGI) promises to be a major event. Along with its many potential benefits, it also raises serious safety concerns (Bostrom, 2014). The intention of this paper is to provide an easily accessible and up-to-date collection of references for the emerging field of AGI safety. A significant number of safety problems for AGI have been identified. We list these, and survey recent research on solving them. We also cover works on how best to think of AGI from the limited knowledge we have today, predictions for when AGI will first be created, and what will happen after its creation. Finally, we review the current public policy on AGI.
We introduce the SaaS Algorithm for semi-supervised learning, which uses learning speed during stochastic gradient descent in a deep neural network to measure the quality of an iterative estimate of the posterior probability of unknown labels. Training speed in supervised learning correlates strongly with the percentage of correct labels, so we use it as an inference criterion for the unknown labels, without attempting to infer the model parameters at first. Despite its simplicity, SaaS achieves state-of-the-art results in semi-supervised learning benchmarks.
In recent years, many variance reduced algorithms for empirical risk minimization have been introduced. In contrast to vanilla SGD, these methods converge linearly on strong convex problems. To obtain the variance reduction, current methods either require frequent passes over the full data to recompute gradients—without making any progress during this time (like in SVRG), or they require memory of the same size as the input problem (like SAGA). In this work, we propose k-SVRG, an algorithm that interpolates between those two extremes: it makes best use of the available memory and in turn does avoid full passes over the data without making progress. We prove linear convergence of k-SVRG on strongly convex problems and convergence to stationary points on non-convex problems. Numerical experiments show the effectiveness of our method.
Machine learning has revolutionized fields such as image, text, and speech recognition. There’s also growing interest in applying machine and deep learning ideas in engineering, robotics, biotechnology, and finance. Despite their immense success in practice, there is limited mathematical understanding of neural networks. We mathematically study neural networks in the asymptotic regime of simultaneously (A) large network sizes and (B) large numbers of stochastic gradient descent training iterations. We rigorously prove that the empirical distribution of the neural network parameters converges to the solution of a nonlinear partial differential equation. This result can be considered a law of large numbers for neural networks. In addition, a consequence of our analysis is that the trained parameters of the neural network asymptotically become independent, a property which is commonly called ‘propagation of chaos’.
Reduced numerical precision is a common technique to reduce computational cost in many Deep Neural Networks (DNNs). While it has been observed that DNNs are resilient to small errors and noise, no general result exists that is capable of predicting a given DNN system architecture’s sensitivity to reduced precision. In this project, we emulate arbitrary bit-width using a specified floating-point representation with a truncation method, which is applied to the neural network after each batch. We explore the impact of several model parameters on the network’s training accuracy and show results on the MNIST dataset. We then present a preliminary theoretical investigation of the error scaling in both forward and backward propagations. We end with a discussion of the implications of these results as well as the potential for generalization to other network architectures.
Deep Factor Alpha provides a framework for extracting nonlinear factors information to explain the time-series cross-section properties of asset returns. Sorting securities based on firm characteristics is viewed as a nonlinear activation function which can be implemented within a deep learning architecture. Multi-layer deep learners are constructed to augment traditional long-short factor models. Searching firm characteristic space over deep architectures of nonlinear transformations is compatible with the economic goal of eliminating mispricing Alphas. Joint estimation of factors and betas is achieved with stochastic gradient descent. To illustrate our methodology, we design long-short latent factors in a train-validation-testing framework of US stock market asset returns from 1975 to 2017. We perform an out-of-sample study to analyze Fama-French factors, in both the cross-section and time-series, versus their deep learning counterparts. Finally, we conclude with directions for future research.
Network structure optimization is a fundamental task in complex network analysis. However, almost all the research on Bayesian optimization is aimed at optimizing the objective functions with vectorial inputs. In this work, we first present a flexible framework, denoted graph Bayesian optimization, to handle arbitrary graphs in the Bayesian optimization community. By combining the proposed framework with graph kernels, it can take full advantage of implicit graph structural features to supplement explicit features guessed according to the experience, such as tags of nodes and any attributes of graphs. The proposed framework can identify which features are more important during the optimization process. We apply the framework to solve four problems including two evaluations and two applications to demonstrate its efficacy and potential applications.
Label embedding plays an important role in zero-shot learning. Side information such as attributes, semantic text representations, and label hierarchy are commonly used as the label embedding in zero-shot classification tasks. However, the label embedding used in former works considers either only one single context of the label, or multiple contexts without dependency. Therefore, different contexts of the label may not be well aligned in the embedding space to preserve the relatedness between labels, which will result in poor interpretability of the label embedding. In this paper, we propose a Multi-Context Label Embedding (MCLE) approach to incorporate multiple label contexts, e.g., label hierarchy and attributes, within a unified matrix factorization framework. To be specific, we model each single context by a matrix factorization formula and introduce a shared variable to capture the dependency among different contexts. Furthermore, we enforce sparsity constraint on our multi-context framework to strengthen the interpretability of the learned label embedding. Extensive experiments on two real-world datasets demonstrate the superiority of our MCLE in label description and zero-shot image classification.
Mesh partitioning is an indispensable tool for efficient parallel numerical simulations. Its goal is to minimize communication between the processes of a simulation while achieving load balance. Established graph-based partitioning tools yield a high solution quality; however, their scalability is limited. Geometric approaches usually scale better, but their solution quality may be unsatisfactory for `non-trivial’ mesh topologies. In this paper, we present a scalable version of $k$-means that is adapted to yield balanced clusters. Balanced $k$-means constitutes the core of our new partitioning algorithm Geographer. Bootstrapping of initial centers is performed with space-filling curves, leading to fast convergence of the subsequent balanced k-means algorithm. Our experiments with up to 16384 MPI processes on numerous benchmark meshes show the following: (i) Geographer produces partitions with a lower communication volume than state-of-the-art geometric partitioners from the Zoltan package; (ii) Geographer scales well on large inputs; (iii) a Delaunay mesh with a few billion vertices and edges can be partitioned in a few seconds.
This paper addresses the problem of learning the concept of ‘propagation’ in the pretopology theoretical formalism. Our proposal is first to define the pseudo-closure operator (modeling the propagation concept) as a logical combination of neighborhoods. We show that learning such an operator lapses into the Multiple Instance (MI) framework, where the learning process is performed on bags of instances instead of individual instances. Though this framework is well suited for this task, its use for learning a pretopological space leads to a set of bags exponential in size. To overcome this issue we thus propose a learning method based on a low estimation of the bags covered by a concept under construction. As an experiment, percolation processes (forest fires typically) are simulated and the corresponding propagation models are learned based on a subset of observations. It reveals that the proposed MI approach is significantly more efficient on the task of propagation model recognition than existing methods.
The provision of multilingual event-centric temporal knowledge graphs such as EventKG enables structured access to representations of a large number of historical and contemporary events in a variety of language contexts. Timelines provide an intuitive way to facilitate an overview of events related to a \textit{query entity} – i.e. an entity or an event of user interest – over a certain period of time. In this paper, we present \eventTL{} – a novel system that generates cross-lingual event timelines using EventKG and facilitates an overview of the language-specific event relevance and popularity along with the cross-lingual differences.
One way of characterizing the topological and structural properties of vertices and edges in a graph is by using structural similarity measures. Measures like Cosine, Jaccard and Dice compute the similarities restricted to the immediate neighborhood of the vertices, bypassing important structural properties beyond the locality. Others measures, such as the generalized edge clustering coefficient, go beyond the locality but with high computational complexity, making them impractical in large-scale scenarios. In this paper we propose a novel similarity measure that determines the structural similarity by dynamically diffusing and capturing information beyond the locality. This new similarity is modeled as an iterated function that can be solved by fixed point iteration in super-linear time and memory complexity, so it is able to analyze large-scale graphs. In order to show the advantages of the proposed similarity in the community detection task, we replace the local structural similarity used in the SCAN algorithm with the proposed similarity measure, improving the quality of the detected community structure and also reducing the sensitivity to the parameter $\epsilon$ of the SCAN algorithm.