We present a suite of reinforcement learning environments illustrating various safety properties of intelligent agents. These problems include safe interruptibility, avoiding side effects, absent supervisor, reward gaming, safe exploration, as well as robustness to self-modification, distributional shift, and adversaries. To measure compliance with the intended safe behavior, we equip each environment with a performance function that is hidden from the agent. This allows us to categorize AI safety problems into robustness and specification problems, depending on whether the performance function corresponds to the observed reward function. We evaluate A2C and Rainbow, two recent deep reinforcement learning agents, on our environments and show that they are not able to solve them satisfactorily.
We address the problem of prescribing an optimal decision in a framework where its cost depends on uncertain problem parameters $Y$ that need to be learned from data. Earlier work by Bertsimas and Kallus (2014) transforms classical machine learning methods that merely predict $Y$ from supervised training data $[(x_1, y_1), \dots, (x_n, y_n)]$ into prescriptive methods taking optimal decisions specific to a particular covariate context $X=\bar x$. Their prescriptive methods factor in additional observed contextual information on a potentially large number of covariates $X=\bar x$ to take context specific actions $z(\bar x)$ which are superior to any static decision $z$. Any naive use of limited training data may, however, lead to gullible decisions over-calibrated to one particular data set. In this paper, we borrow ideas from distributionally robust optimization and the statistical bootstrap of Efron (1982) to propose two novel prescriptive methods based on (nw) Nadaraya-Watson and (nn) nearest-neighbors learning which safeguard against overfitting and lead to improved out-of-sample performance. Both resulting robust prescriptive methods reduce to tractable convex optimization problems and enjoy a limited disappointment on bootstrap data. We illustrate the data-driven decision-making framework and our novel robustness notion on a small news vendor problem as well as a small portfolio allocation problem.
The next-generation wireless networks are evolving into very complex systems because of the very diversified service requirements, heterogeneity in applications, devices, and networks. The mobile network operators (MNOs) need to make the best use of the available resources, for example, power, spectrum, as well as infrastructures. Traditional networking approaches, i.e., reactive, centrally-managed, one-size-fits-all approaches and conventional data analysis tools that have limited capability (space and time) are not competent anymore and cannot satisfy and serve that future complex networks in terms of operation and optimization in a cost-effective way. A novel paradigm of proactive, self-aware, self- adaptive and predictive networking is much needed. The MNOs have access to large amounts of data, especially from the network and the subscribers. Systematic exploitation of the big data greatly helps in making the network smart, intelligent and facilitates cost-effective operation and optimization. In view of this, we consider a data-driven next-generation wireless network model, where the MNOs employ advanced data analytics for their networks. We discuss the data sources and strong drivers for the adoption of the data analytics and the role of machine learning, artificial intelligence in making the network intelligent in terms of being self-aware, self-adaptive, proactive and prescriptive. A set of network design and optimization schemes are presented with respect to data analytics. The paper is concluded with a discussion of challenges and benefits of adopting big data analytics and artificial intelligence in the next-generation communication system.
Tensor completion is a problem of filling the missing or unobserved entries of partially observed tensors. Due to the multidimensional character of tensors in describing complex datasets, tensor completion algorithms and their applications have received wide attention and achievement in data mining, computer vision, signal processing, and neuroscience, etc. In this survey, we provide a modern overview of recent advances in tensor completion algorithms from the perspective of big data analytics characterized by diverse variety, large volume, and high velocity. Towards a better comprehension and comparison of vast existing advances, we summarize and categorize them into four groups including general tensor completion algorithms, tensor completion with auxiliary information (variety), scalable tensor completion algorithms (volume) and dynamic tensor completion algorithms (velocity). Besides, we introduce their applications on real-world data-driven problems and present an open-source package covering several widely used tensor decomposition and completion algorithms. Our goal is to summarize these popular methods and introduce them to researchers for promoting the research process in this field and give an available repository for practitioners. In the end, we also discuss some challenges and promising research directions in this community for future explorations.
This paper presents a new adversarial learning method for generative conversational agents (GCA) besides a new model of GCA. Similar to previous works on adversarial learning for dialogue generation, our method assumes the GCA as a generator that aims at fooling a discriminator that labels dialogues as human-generated or machine-generated; however, in our approach, the discriminator performs token-level classification, i.e. it indicates whether the current token was generated by humans or machines. To do so, the discriminator also receives the context utterances (the dialogue history) and the incomplete answer up to the current token as input. This new approach makes possible the end-to-end training by backpropagation. A self-conversation process enables to produce a set of generated data with more diversity for the adversarial training. This approach improves the performance on questions not related to the training data. Experimental results with human and adversarial evaluations show that the adversarial method yields significant performance gains over the usual teacher forcing training.
Quantitative CBA is a postprocessing algorithm for association rule classification algorithm CBA (Liu et al, 1998). QCBA uses original, undiscretized numerical attributes to optimize the discovered association rules, refining the boundaries of literals in the antecedent of the rules produced by CBA. Some rules as well as literals from the rules can consequently be removed, which makes the resulting classifier smaller. One-rule classification and crisp rules make CBA classification models possibly most comprehensible among all association rule classification algorithms. These viable properties are retained by QCBA. The postprocessing is conceptually fast, because it is performed on a relatively small number of rules that passed data coverage pruning in CBA. Benchmark of our QCBA approach on 22 UCI datasets shows average 53% decrease in the total size of the model as measured by the total number of conditions in all rules. Model accuracy remains on the same level as for CBA.
In this paper we present an alternative strategy for fine-tuning the parameters of a network. We named the technique Gradual Tuning. Once trained on a first task, the network is fine-tuned on a second task by modifying a progressively larger set of the network’s parameters. We test Gradual Tuning on different transfer learning tasks, using networks of different sizes trained with different regularization techniques. The result shows that compared to the usual fine tuning, our approach significantly reduces catastrophic forgetting of the initial task, while still retaining comparable if not better performance on the new task.
Object ranking or ‘learning to rank’ is an important problem in the realm of preference learning. On the basis of training data in the form of a set of rankings of objects represented as feature vectors, the goal is to learn a ranking function that predicts a linear order of any new set of objects. In this paper, we propose a new approach to object ranking based on principles of analogical reasoning. More specifically, our inference pattern is formalized in terms of so-called analogical proportions and can be summarized as follows: Given objects $A,B,C,D$, if object $A$ is known to be preferred to $B$, and $C$ relates to $D$ as $A$ relates to $B$, then $C$ is (supposedly) preferred to $D$. Our method applies this pattern as a main building block and combines it with ideas and techniques from instance-based learning and rank aggregation. Based on first experimental results for data sets from various domains (sports, education, tourism, etc.), we conclude that our approach is highly competitive. It appears to be specifically interesting in situations in which the objects are coming from different subdomains, and which hence require a kind of knowledge transfer.
In face-related applications with a public available dataset, synthesizing non-linear facial variations (e.g., facial expression, head-pose, illumination, etc.) through a generative model is helpful in addressing the lack of training data. In reality, however, there is insufficient data to even train the generative model for face synthesis. In this paper, we propose Differential Generative Adversarial Networks (D-GAN) that can perform photo-realistic face synthesis even when training data is small. Two adversarial networks are devised to ensure the generator to approximate a face manifold, which can express face changes as it wants. Experimental results demonstrate that the proposed method is robust to the amount of training data and synthesized images are useful to improve the performance of a face expression classifier.
In this paper, we extend the methodology developed for Support Vector Machines (SVM) using $\ell_2$-norm ($\ell_2$-SVM) to the more general case of $\ell_p$-norms with $p\ge 1$ ($\ell_p$-SVM). The resulting primal and dual problems are formulated as mathematical programming problems; namely, in the primal case, as a second order cone optimization problem and in the dual case, as a polynomial optimization problem involving homogeneous polynomials. Scalability of the primal problem is obtained via general transformations based on the expansion of functionals in Schauder spaces. The concept of Kernel function, widely applied in $\ell_2$-SVM, is extended to the more general case by defining a new operator called multidimensional Kernel. This object gives rise to reformulations of dual problems, in a transformed space of the original data, which are solved by a moment-sdp based approach. The results of some computational experiments on real-world datasets are presented showing rather good behavior in terms of standard indicators such a \textit{accuracy index} and its ability to classify new data.
Generative adversarial networks (GAN) are a powerful subclass of generative models. Despite a very rich research activity leading to numerous interesting GAN algorithms, it is still very hard to assess which algorithm(s) perform better than others. We conduct a neutral, multi-faceted large-scale empirical study on state-of-the art models and evaluation measures. We find that most models can reach similar scores with enough hyperparameter optimization and random restarts. This suggests that improvements can arise from a higher computational budget and tuning more than fundamental algorithmic changes. To overcome some limitations of the current metrics, we also propose several data sets on which precision and recall can be computed. Our experimental results suggest that future GAN research should be based on more systematic and objective evaluation procedures. Finally, we did not find evidence that any of the tested algorithms consistently outperforms the original one.
This paper introduces a statistical model for the arrival times of connection events in a computer network. Edges between nodes in a network can be interpreted and modelled as point processes where events in the process indicate information being sent along that edge. A model of normal behaviour can be constructed for each edge in the network by identifying key network user features such as seasonality and self-exciting behaviour, where events typically arise in bursts at particular times of day. When monitoring the network in real time, unusual patterns of activity could indicate the presence of a malicious actor. Four different models for self-exciting behaviour are introduced and compared using data collected from the Imperial College and Los Alamos National Laboratory computer networks.
We consider the problem of identifying groups of mutually associated variables in moderate or high dimensional data. In many cases, ordinary Pearson correlation provides useful information concerning the linear relationship between variables. However, for binary data, ordinary correlation may lose power and may lack interpretability. In this paper, we develop and investigate a new method called Latent Association Mining in Binary Data (LAMB). The LAMB method is built on the assumption that the binary observations represent a random thresholding of a latent continuous variable that may have a complex correlation structure. We consider a new measure of association, latent correlation, that is designed to assess association in the underlying continuous variable, without bias due to the mediating effects of the thresholding procedure. The full LAMB procedure makes use of iterative hypothesis testing to identify groups of latently correlated variables. LAMB is shown to improve power over existing methods in simulated settings, to be computationally efficient for large datasets, and to uncover new meaningful results from common real data types.
A supervised learning algorithm searches over a set of functions $A \to B$ parametrised by a space $P$ to find the best approximation to some ideal function $f\colon A \to B$. It does this by taking examples $(a,f(a)) \in A\times B$, and updating the parameter according to some rule. We define a category where these update rules may be composed, and show that gradient descent—with respect to a fixed step size and an error function satisfying a certain property—defines a monoidal functor from a category of parametrised functions to this category of update rules. This provides a structural perspective on backpropagation, as well as a broad generalisation of neural networks.