Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables — aided by physical intuition — that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable’s discriminating power. Planing also allows the investigation of the linear versus non-linear nature of the boundaries between signal and background. We demonstrate the efficacy of this approach using a toy example, followed by an application to an idealized heavy resonance scenario at the Large Hadron Collider. By unpacking the information being utilized by these algorithms, this method puts in context what it means for a machine to learn.
We present the first model and algorithm for L1-norm kernel PCA. While L2-norm kernel PCA has been widely studied, there has been no work on L1-norm kernel PCA. For this non-convex and non-smooth problem, we offer geometric understandings through reformulations and present an efficient algorithm where the kernel trick is applicable. To attest the efficiency of the algorithm, we provide a convergence analysis including linear rate of convergence. Moreover, we prove that the output of our algorithm is a local optimal solution to the L1-norm kernel PCA problem. We also numerically show its robustness when extracting principal components in the presence of influential outliers, as well as its runtime comparability to L2-norm kernel PCA. Lastly, we introduce its application to outlier detection and show that the L1-norm kernel PCA based model outperforms especially for high dimensional data.
A distributed discrete-time algorithm is proposed for multi-agent networks to achieve a common least squares solution of a group of linear equations, in which each agent only knows some of the equations and is only able to receive information from its nearby neighbors. For fixed, connected, and undirected networks, the proposed discrete-time algorithm results in each agents solution estimate to converging exponentially fast to the same least squares solution. Moreover, the convergence does not require careful choices of time-varying small step sizes.
We performed an empirical comparison of ICA and PCA algorithms by applying them on two simulated noisy time series with varying distribution parameters and level of noise. In general, ICA shows better results than PCA because it takes into account higher moments of data distribution. On the other hand, PCA remains quite sensitive to the level of correlations among signals.
Although artificial intelligence is currently one of the most interesting areas in scientific research, the potential threats posed by emerging AI systems remain a source of persistent controversy. To address the issue of AI threat, this study proposes a standard intelligence model that unifies AI and human characteristics in terms of four aspects of knowledge, i.e., input, output, mastery, and creation. Using this model, we observe three challenges, namely, expanding of the von Neumann architecture; testing and ranking the intelligence quotient of naturally and artificially intelligent systems, including humans, Google, Bing, Baidu, and Siri; and finally, the dividing of artificially intelligent systems into seven grades from robots to Google Brain. Based on this, we conclude that AlphaGo belongs to the third grade.
As AI is increasingly being adopted into application solutions, the challenge of supporting interaction with humans is becoming more apparent. Partly this is to support integrated working styles, in which humans and intelligent systems cooperate in problem-solving, but also it is a necessary step in the process of building trust as humans migrate greater responsibility to such systems. The challenge is to find effective ways to communicate the foundations of AI-driven behaviour, when the algorithms that drive it are far from transparent to humans. In this paper we consider the opportunities that arise in AI planning, exploiting the model-based representations that form a familiar and common basis for communication with users, while acknowledging the gap between planning algorithms and human problem-solving.
In the design of deep neural architectures, recent studies have demonstrated the benefits of grouping subnetworks into a larger network. For examples, the Inception architecture integrates multi-scale subnetworks and the residual network can be regarded that a residual unit combines a residual subnetwork with an identity shortcut. In this work, we embrace this observation and propose the Competitive Pathway Network (CoPaNet). The CoPaNet comprises a stack of competitive pathway units and each unit contains multiple parallel residual-type subnetworks followed by a max operation for feature competition. This mechanism enhances the model capability by learning a variety of features in subnetworks. The proposed strategy explicitly shows that the features propagate through pathways in various routing patterns, which is referred to as pathway encoding of category information. Moreover, the cross-block shortcut can be added to the CoPaNet to encourage feature reuse. We evaluated the proposed CoPaNet on four object recognition benchmarks: CIFAR-10, CIFAR-100, SVHN, and ImageNet. CoPaNet obtained the state-of-the-art or comparable results using similar amounts of parameters. The code of CoPaNet is available at: https://…/CoPaNet.
A recent theoretical analysis shows the equivalence between non-negative matrix factorization (NMF) and spectral clustering based approach to subspace clustering. As NMF and many of its variants are essentially linear, we introduce a nonlinear NMF with explicit orthogonality and derive general kernel-based orthogonal multiplicative update rules to solve the subspace clustering problem. In nonlinear orthogonal NMF framework, we propose two subspace clustering algorithms, named kernel-based non-negative subspace clustering KNSC-Ncut and KNSC-Rcut and establish their connection with spectral normalized cut and ratio cut clustering. We further extend the nonlinear orthogonal NMF framework and introduce a graph regularization to obtain a factorization that respects a local geometric structure of the data after the nonlinear mapping. The proposed NMF-based approach to subspace clustering takes into account the nonlinear nature of the manifold, as well as its intrinsic local geometry, which considerably improves the clustering performance when compared to the several recently proposed state-of-the-art methods.
Media content in large repositories usually exhibits multiple groups of strongly varying sizes. Media of potential interest often form notably smaller groups. Such media groups differ so much from the remaining data that it may be worthy to look at them in more detail. In contrast, media with popular content appear in larger groups. Identifying groups of varying sizes is addressed by clustering of imbalanced data. Clustering highly imbalanced media groups is additionally challenged by the high dimensionality of the underlying features. In this paper, we present the Imbalanced Clustering (IClust) algorithm designed to reveal group structures in high-dimensional media data. IClust employs an existing clustering method in order to find an initial set of a large number of potentially highly pure clusters which are then successively merged. The main advantage of IClust is that the number of clusters does not have to be pre-specified and that no specific assumptions about the cluster or data characteristics need to be made. Experiments on real-world media data demonstrate that in comparison to existing methods, IClust is able to better identify media groups, especially groups of small sizes.
We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data. Within a life-long interactive learning period, the agent, trained using Reinforcement Learning (RL), must be able to handle natural conversations with human users and achieve good learning performance (accuracy) while minimising human effort in the learning process. We train and evaluate this system in interaction with a simulated human tutor, which is built on the BURCHAK corpus — a Human-Human Dialogue dataset for the visual learning task. The results show that: 1) The learned policy can coherently interact with the simulated user to achieve the goal of the task (i.e. learning visual attributes of objects, e.g. colour and shape); and 2) it finds a better trade-off between classifier accuracy and tutoring costs than hand-crafted rule-based policies, including ones with dynamic policies.
Four key subprocesses in data integration are: data preparation (i.e., transforming and cleaning data), schema integration (i.e., lining up like attributes), entity resolution (i.e., finding clusters of records that represent the same entity) and entity consolidation (i.e., merging each cluster into a ‘golden record’ which contains the canonical values for each attribute). In real scenarios, the output of entity resolution typically contains multiple data formats and different abbreviations for cell values, in addition to the omnipresent problem of missing data. These issues make entity consolidation challenging. In this paper, we study the entity consolidation problem. Truth discovery systems can be used to solve this problem. They usually employ simplistic heuristics such as majority consensus (MC) or source authority to determine the golden record. However, these techniques are not capable of recognizing simple data variation, such as Jeff to Jeffery, and may give biased results. To address this issue, we propose to first reduce attribute variation by merging duplicate values before applying the truth discovery system to create the golden records. Comparing to the existing data transformation solutions, which typically try to transform an entire column from one format to another, our approach is more robust to data variety as we leverage the hidden matchings within the clusters. We tried our methods on three real-world datasets. In the best case, our methods reduced the variation in clusters by 75% with high precision (>98%) by having a human confirm only 100 generated matching groups. When we invoked our algorithm prior to running MC, we were able to improve the precision of golden record creation by 40%.
In the load balancing problem, introduced by Graham in the 1960s (SIAM J. of Appl. Math. 1966, 1969), jobs arriving online have to be assigned to machines so to minimize an objective defined on machine loads. A long line of work has addressed this problem for both the makespan norm and arbitrary $\ell_q$-norms of machine loads. Recent literature (e.g., Azar et al., STOC 2013; Im et al., FOCS 2015) has further expanded the scope of this problem to vector loads, to capture jobs with multi-dimensional resource requirements in applications such as data centers. In this paper, we completely resolve the job scheduling problem for both scalar and vector jobs on related machines, i.e., where each machine has a given speed and the time taken to process a job is inversely proportional to the speed of the machine it is assigned on. We show the following results. For scalar scheduling, we give a constant competitive algorithm for optimizing any $\ell_q$-norm for related machines. The only previously known result was for the makespan norm. For vector scheduling, there are two natural variants for vector scheduling, depending on whether the speed of a machine is dimension-dependent or not. We show a sharp contrast between these two variants, proving that they are respectively equivalent to unrelated machines and identical machines for the makespan norm. We also extend these results to arbitrary $\ell_q$-norms of the machine loads. No previous results were known for vector scheduling on related machines.