What is the Machine Learning?

Applications of machine learning tools to problems of physical interest are often criticized for producing sensitivity at the expense of transparency. To address this concern, we explore a data planing procedure for identifying combinations of variables — aided by physical intuition — that can discriminate signal from background. Weights are introduced to smooth away the features in a given variable(s). New networks are then trained on this modified data. Observed decreases in sensitivity diagnose the variable’s discriminating power. Planing also allows the investigation of the linear versus non-linear nature of the boundaries between signal and background. We demonstrate the efficacy of this approach using a toy example, followed by an application to an idealized heavy resonance scenario at the Large Hadron Collider. By unpacking the information being utilized by these algorithms, this method puts in context what it means for a machine to learn.

L1-norm Kernel PCA

We present the first model and algorithm for L1-norm kernel PCA. While L2-norm kernel PCA has been widely studied, there has been no work on L1-norm kernel PCA. For this non-convex and non-smooth problem, we offer geometric understandings through reformulations and present an efficient algorithm where the kernel trick is applicable. To attest the efficiency of the algorithm, we provide a convergence analysis including linear rate of convergence. Moreover, we prove that the output of our algorithm is a local optimal solution to the L1-norm kernel PCA problem. We also numerically show its robustness when extracting principal components in the presence of influential outliers, as well as its runtime comparability to L2-norm kernel PCA. Lastly, we introduce its application to outlier detection and show that the L1-norm kernel PCA based model outperforms especially for high dimensional data.

A Distributed Algorithm for Least Square Solutions of Linear Equations

A distributed discrete-time algorithm is proposed for multi-agent networks to achieve a common least squares solution of a group of linear equations, in which each agent only knows some of the equations and is only able to receive information from its nearby neighbors. For fixed, connected, and undirected networks, the proposed discrete-time algorithm results in each agents solution estimate to converging exponentially fast to the same least squares solution. Moreover, the convergence does not require careful choices of time-varying small step sizes.

Comparison of PCA with ICA from data distribution perspective

We performed an empirical comparison of ICA and PCA algorithms by applying them on two simulated noisy time series with varying distribution parameters and level of noise. In general, ICA shows better results than PCA because it takes into account higher moments of data distribution. On the other hand, PCA remains quite sensitive to the level of correlations among signals.

Intelligence Quotient and Intelligence Grade of Artificial Intelligence

Although artificial intelligence is currently one of the most interesting areas in scientific research, the potential threats posed by emerging AI systems remain a source of persistent controversy. To address the issue of AI threat, this study proposes a standard intelligence model that unifies AI and human characteristics in terms of four aspects of knowledge, i.e., input, output, mastery, and creation. Using this model, we observe three challenges, namely, expanding of the von Neumann architecture; testing and ranking the intelligence quotient of naturally and artificially intelligent systems, including humans, Google, Bing, Baidu, and Siri; and finally, the dividing of artificially intelligent systems into seven grades from robots to Google Brain. Based on this, we conclude that AlphaGo belongs to the third grade.

Explainable Planning

As AI is increasingly being adopted into application solutions, the challenge of supporting interaction with humans is becoming more apparent. Partly this is to support integrated working styles, in which humans and intelligent systems cooperate in problem-solving, but also it is a necessary step in the process of building trust as humans migrate greater responsibility to such systems. The challenge is to find effective ways to communicate the foundations of AI-driven behaviour, when the algorithms that drive it are far from transparent to humans. In this paper we consider the opportunities that arise in AI planning, exploiting the model-based representations that form a familiar and common basis for communication with users, while acknowledging the gap between planning algorithms and human problem-solving.

Deep Competitive Pathway Networks

In the design of deep neural architectures, recent studies have demonstrated the benefits of grouping subnetworks into a larger network. For examples, the Inception architecture integrates multi-scale subnetworks and the residual network can be regarded that a residual unit combines a residual subnetwork with an identity shortcut. In this work, we embrace this observation and propose the Competitive Pathway Network (CoPaNet). The CoPaNet comprises a stack of competitive pathway units and each unit contains multiple parallel residual-type subnetworks followed by a max operation for feature competition. This mechanism enhances the model capability by learning a variety of features in subnetworks. The proposed strategy explicitly shows that the features propagate through pathways in various routing patterns, which is referred to as pathway encoding of category information. Moreover, the cross-block shortcut can be added to the CoPaNet to encourage feature reuse. We evaluated the proposed CoPaNet on four object recognition benchmarks: CIFAR-10, CIFAR-100, SVHN, and ImageNet. CoPaNet obtained the state-of-the-art or comparable results using similar amounts of parameters. The code of CoPaNet is available at: https://…/CoPaNet.

A Nonlinear Orthogonal Non-Negative Matrix Factorization Approach to Subspace Clustering

A recent theoretical analysis shows the equivalence between non-negative matrix factorization (NMF) and spectral clustering based approach to subspace clustering. As NMF and many of its variants are essentially linear, we introduce a nonlinear NMF with explicit orthogonality and derive general kernel-based orthogonal multiplicative update rules to solve the subspace clustering problem. In nonlinear orthogonal NMF framework, we propose two subspace clustering algorithms, named kernel-based non-negative subspace clustering KNSC-Ncut and KNSC-Rcut and establish their connection with spectral normalized cut and ratio cut clustering. We further extend the nonlinear orthogonal NMF framework and introduce a graph regularization to obtain a factorization that respects a local geometric structure of the data after the nonlinear mapping. The proposed NMF-based approach to subspace clustering takes into account the nonlinear nature of the manifold, as well as its intrinsic local geometry, which considerably improves the clustering performance when compared to the several recently proposed state-of-the-art methods.

Clustering of imbalanced high-dimensional media data

Media content in large repositories usually exhibits multiple groups of strongly varying sizes. Media of potential interest often form notably smaller groups. Such media groups differ so much from the remaining data that it may be worthy to look at them in more detail. In contrast, media with popular content appear in larger groups. Identifying groups of varying sizes is addressed by clustering of imbalanced data. Clustering highly imbalanced media groups is additionally challenged by the high dimensionality of the underlying features. In this paper, we present the Imbalanced Clustering (IClust) algorithm designed to reveal group structures in high-dimensional media data. IClust employs an existing clustering method in order to find an initial set of a large number of potentially highly pure clusters which are then successively merged. The main advantage of IClust is that the number of clusters does not have to be pre-specified and that no specific assumptions about the cluster or data characteristics need to be made. Experiments on real-world media data demonstrate that in comparison to existing methods, IClust is able to better identify media groups, especially groups of small sizes.

Learning how to learn: an adaptive dialogue agent for incrementally learning visually grounded word meanings

We present an optimised multi-modal dialogue agent for interactive learning of visually grounded word meanings from a human tutor, trained on real human-human tutoring data. Within a life-long interactive learning period, the agent, trained using Reinforcement Learning (RL), must be able to handle natural conversations with human users and achieve good learning performance (accuracy) while minimising human effort in the learning process. We train and evaluate this system in interaction with a simulated human tutor, which is built on the BURCHAK corpus — a Human-Human Dialogue dataset for the visual learning task. The results show that: 1) The learned policy can coherently interact with the simulated user to achieve the goal of the task (i.e. learning visual attributes of objects, e.g. colour and shape); and 2) it finds a better trade-off between classifier accuracy and tutoring costs than hand-crafted rule-based policies, including ones with dynamic policies.

Entity Consolidation: The Golden Record Problem

Four key subprocesses in data integration are: data preparation (i.e., transforming and cleaning data), schema integration (i.e., lining up like attributes), entity resolution (i.e., finding clusters of records that represent the same entity) and entity consolidation (i.e., merging each cluster into a ‘golden record’ which contains the canonical values for each attribute). In real scenarios, the output of entity resolution typically contains multiple data formats and different abbreviations for cell values, in addition to the omnipresent problem of missing data. These issues make entity consolidation challenging. In this paper, we study the entity consolidation problem. Truth discovery systems can be used to solve this problem. They usually employ simplistic heuristics such as majority consensus (MC) or source authority to determine the golden record. However, these techniques are not capable of recognizing simple data variation, such as Jeff to Jeffery, and may give biased results. To address this issue, we propose to first reduce attribute variation by merging duplicate values before applying the truth discovery system to create the golden records. Comparing to the existing data transformation solutions, which typically try to transform an entire column from one format to another, our approach is more robust to data variety as we leverage the hidden matchings within the clusters. We tried our methods on three real-world datasets. In the best case, our methods reduced the variation in clusters by 75% with high precision (>98%) by having a human confirm only 100 generated matching groups. When we invoked our algorithm prior to running MC, we were able to improve the precision of golden record creation by 40%.

Online Load Balancing for Related Machines

In the load balancing problem, introduced by Graham in the 1960s (SIAM J. of Appl. Math. 1966, 1969), jobs arriving online have to be assigned to machines so to minimize an objective defined on machine loads. A long line of work has addressed this problem for both the makespan norm and arbitrary \ell_q-norms of machine loads. Recent literature (e.g., Azar et al., STOC 2013; Im et al., FOCS 2015) has further expanded the scope of this problem to vector loads, to capture jobs with multi-dimensional resource requirements in applications such as data centers. In this paper, we completely resolve the job scheduling problem for both scalar and vector jobs on related machines, i.e., where each machine has a given speed and the time taken to process a job is inversely proportional to the speed of the machine it is assigned on. We show the following results. For scalar scheduling, we give a constant competitive algorithm for optimizing any \ell_q-norm for related machines. The only previously known result was for the makespan norm. For vector scheduling, there are two natural variants for vector scheduling, depending on whether the speed of a machine is dimension-dependent or not. We show a sharp contrast between these two variants, proving that they are respectively equivalent to unrelated machines and identical machines for the makespan norm. We also extend these results to arbitrary \ell_q-norms of the machine loads. No previous results were known for vector scheduling on related machines.

Distributed Join-the-Idle-Queue for Low Latency Cloud Services
Mechanism Design for Demand Response Programs with financial and non-monetary (social) Incentives
Optimal Online Learning with Randomized Feedback Graphs with Application in PUE Attacks in CRN
A mixed-integer branching approach for very small formulations of disjunctive constraints
Performance Evaluation of Container-based Virtualization for High Performance Computing Environments
Resilient Learning-Based Control for Synchronization of Passive Multi-Agent Systems under Attack
Energy Constrained Depth First Search
Finite-Time Distributed Linear Equation Solver for Minimum $l_1$ Norm Solutions
A Web of Hate: Tackling Hateful Speech in Online Social Spaces
Deep TAMER: Interactive Agent Shaping in High-Dimensional State Spaces
Emergent failures and cascades in power grids: a statistical physics perspective
Diagonal stability of a class of discrete-time positive switched systems with delay
Recognition of feature curves on 3D shapes using an algebraic approach to Hough transforms
Distance ideals of graphs
Possibilistic Fuzzy Local Information C-Means for Sonar Image Segmentation
On some geometrical methods leading to martingales useful in measure theory
Unified Deep Supervised Domain Adaptation and Generalization
Jointly Trained Sequential Labeling and Classification by Sparse Attention Neural Networks
Fast Barcode Retrieval for Consensus Contouring
Balanced complexes and effective divisors on $\overline{M}_{0,n}$
On the Approximation of Toeplitz Operators for Nonparametric $\mathcal{H}_\infty$-norm Estimation
A Neural Comprehensive Ranker (NCR) for Open-Domain Question Answering
Neural and Synaptic Array Transceiver: A Brain-Inspired Computing Framework for Embedded Learning
Ground-Truth Adversarial Examples
Reservoir Computing using Stochastic p-Bits
The First Evaluation of Chinese Human-Computer Dialogue Technology
Information Geometry Connecting Wasserstein Distance and Kullback-Leibler Divergence via the Entropy-Relaxed Transportation Problem
Generalized Polyhedral Convex Optimization Problems
Light Cascaded Convolutional Neural Networks for Accurate Player Detection
Duality between cooperation and defection in the presence of tit-for-tat in replicator dynamics
Achievable Rate of Relay Assisted Cooperative O-NOMA under Rician Fading Channels
In reply to Faes et al. and Barnett et al. regarding ‘A study of problems encountered in Granger causality analysis from a neuroscience perspective’
DAGGER: A sequential algorithm for FDR control on DAGs
Ranked Enumeration of Minimal Triangulations
Recognizing Matroids
Robust Estimation in High Dimensional Generalized Linear Models
A Matsumoto-Yor characterization for Kummer and Wishart random matrices
Fast online low-rank tensor subspace tracking by CP decomposition using recursive least squares from incomplete observations
Shapley Facility Location Games
Non-parametric Message Important Measure: Storage Code Design and Transmission Planning for Big Data
Beyond the law of large numbers: Introducing progressive sampling, weaving, the geometric triangle, and corresponding distributions
Barrier Coverage with Non-uniform Lengths to Minimize Aggregate Movements
The Quality of Equilibria for Set Packing Games
Two-dimensional anisotropic random walks: fixed versus random column configurations for transport phenomena
Spin glasses : experimental signatures and salient outcomes
Classification of the Bounds on the Probability of Ruin for L{é}vy Processes with Light-tailed Jumps
Privacy Preserving Identification Using Sparse Approximation with Ambiguization
Structure estimation of binary graphical models on stratified data: application to the description of injury tables for victims of road accidents
Fast Computation of Graph Edit Distance
Non-approximability and Polylogarithmic Approximations of the Single-Sink Unsplittable and Confluent Dynamic Flow Problems
Non-Ergodic Delocalization in the Rosenzweig-Porter Model
Fast generation of isotropic Gaussian random fields on the sphere
A design criterion for symmetric model discrimination based on nominal confidence sets
Redefine the correlation coefficient by experiment methods
A Variational Approach to Shape-from-shading Under Natural Illumination
Acyclic cluster algebras, reflection groups, and curves on a punctured disc
The Layered Structure of Tensor Estimation and its Mutual Information
Multi-Kernel Polar Codes: Proof of Polarization and Error Exponents
An Empirical Evaluation of Recurrent Neural Network Rule Extraction
Towards Universal Semantic Tagging
Obstacle problems for nonlocal operators
Runtime Distributions and Criteria for Restarts
Impacts and Benefits of UPFC to Wind Power Integration in Unit Commitment
Generalization of nonlinear control for nonlinear discrete systems
Central limit theorem for quasi-local statistics of spin models on Cayley graphs
Training an adaptive dialogue policy for interactive learning of visually grounded word meanings
The BURCHAK corpus: a Challenge Data Set for Interactive Learning of Visually Grounded Word Meanings
Convergence Analysis of Distributed Stochastic Gradient Descent with Shuffling
On the Capacity of Face Representation
Optimisation of photometric stereo methods by non-convex variational minimisation
Upper and lower bounds for rich lines in grids
A representer theorem for deep kernel learning
Adaptive Generation-Based Evolution Control for Gaussian Process Surrogate Models
Random surface growth and Karlin-McGregor polynomials
Synonym Discovery with Etymology-based Word Embeddings
The 2CNF Boolean Formula Satisfiability Problem and the Linear Space Hypothesis
Improving image generative models with human interactions
Cohen-Macaulay Property of pinched Veronese Rings
Regular Intersecting Families
Extrema-weighted feature extraction for functional data
On-the-Fly Array Initialization in Less Space
What Automated Planning can do for Business Process Management
Designing Real-Time Prices to Reduce Load Variability with HVAC
Symbol, Conversational, and Societal Grounding with a Toy Robot
Self-supervised Deep Reinforcement Learning with Generalized Computation Graphs for Robot Navigation
Human motion primitive discovery and recognition
A generalization of the Jensen divergence: The chord gap divergence
Discriminating between two models based on Bregman divergence in small samples
Building your path to escape from home
Vision-based deep execution monitoring
Self-avoiding walk on nonunimodular transitive graphs