Applying machine learning in the health care domain has shown promising results in recent years. Interpretable outputs from learning algorithms are desirable for decision making by health care personnel. In this work, we explore the possibility of utilizing causal relationships to refine diagnostic prediction. We focus on the task of diagnostic prediction using discomfort drawings, and explore two ways to employ causal identification to improve the diagnostic results. Firstly, we use causal identification to infer the causal relationships among diagnostic labels which, by itself, provides interpretable results to aid the decision making and training of health care personnel. Secondly, we suggest a post-processing approach where the inferred causal relationships are used to refine the prediction accuracy of a multi-view probabilistic model. Experimental results show firstly that causal identification is capable of detecting the causal relationships among diagnostic labels correctly, and secondly that there is potential for improving pain diagnostics prediction accuracy using the causal relationships.
Many application settings involve the analysis of timestamped relations or events between a set of entities, e.g. messages between users of an on-line social network. Static and discrete-time network models are typically used as analysis tools in these settings; however, they discard a significant amount of information by aggregating events over time to form network snapshots. In this paper, we introduce a block point process model (BPPM) for dynamic networks evolving in continuous time in the form of events at irregular time intervals. The BPPM is inspired by the well-known stochastic block model (SBM) for static networks and is a simpler version of the recently-proposed Hawkes infinite relational model (IRM). We show that networks generated by the BPPM follow an SBM in the limit of a growing number of nodes and leverage this property to develop an efficient inference procedure for the BPPM. We fit the BPPM to several real network data sets, including a Facebook network with over 3, 500 nodes and 130, 000 events, several orders of magnitude larger than the Hawkes IRM and other existing point process network models.
Incremental class learning involves sequentially learning classes in bursts of examples from the same class. This violates the assumptions that underlie methods for training standard deep neural networks, and will cause them to suffer from catastrophic forgetting. Arguably, the best method for incremental class learning is iCaRL, but it requires storing training examples for each class, making it challenging to scale. Here, we propose FearNet for incremental class learning. FearNet is a generative model that does not store previous examples, making it memory efficient. FearNet uses a brain-inspired dual-memory system in which new memories are consolidated from a network for recent memories inspired by the mammalian hippocampal complex to a network for long-term storage inspired by medial prefrontal cortex. Memory consolidation is inspired by mechanisms that occur during sleep. FearNet also uses a module inspired by the basolateral amygdala for determining which memory system to use for recall. FearNet achieves state-of-the-art performance at incremental class learning on image (CIFAR-100, CUB-200) and audio classification (AudioSet) benchmarks.
This paper presents a proof-of concept study for demonstrating the viability of building collaboration among multiple agents through standard Q learning algorithm embedded in particle swarm optimisation. Collaboration is formulated to be achieved among the agents via some sort competition, where the agents are expected to balance their action in such a way that none of them drifts away of the team and none intervene any fellow neighbours territory. Particles are devised with Q learning algorithm for self training to learn how to act as members of a swarm and how to produce collaborative/collective behaviours. The produced results are supportive to the algorithmic structures suggesting that a substantive collaboration can be build via proposed learning algorithm.
Recommender systems take inputs from user history, use an internal ranking algorithm to generate results and possibly optimize this ranking based on feedback. However, often the recommender system is unaware of the actual intent of the user and simply provides recommendations dynamically without properly understanding the thought process of the user. An intelligent recommender system is not only useful for the user but also for businesses which want to learn the tendencies of their users. Finding out tendencies or intents of a user is a difficult problem to solve. Keeping this in mind, we sought out to create an intelligent system which will keep track of the user’s activity on a web-application as well as determine the intent of the user in each session. We devised a way to encode the user’s activity through the sessions. Then, we have represented the information seen by the user in a high dimensional format which is reduced to lower dimensions using tensor factorization techniques. The aspect of intent awareness (or scoring) is dealt with at this stage. Finally, combining the user activity data with the contextual information gives the recommendation score. The final recommendations are then ranked using filtering and collaborative recommendation techniques to show the top-k recommendations to the user. A provision for feedback is also envisioned in the current system which informs the model to update the various weights in the recommender system. Our overall model aims to combine both frequency-based and context-based recommendation systems and quantify the intent of a user to provide better recommendations. We ran experiments on real-world timestamped user activity data, in the setting of recommending reports to the users of a business analytics tool and the results are better than the baselines. We also tuned certain aspects of our model to arrive at optimized results.
Outlier detection plays an essential role in many data-driven applications to identify isolated instances that are different from the majority. While many statistical learning and data mining techniques have been used for developing more effective outlier detection algorithms, the interpretation of detected outliers does not receive much attention. Interpretation is becoming increasingly important to help people trust and evaluate the developed models through providing intrinsic reasons why the certain outliers are chosen. It is difficult, if not impossible, to simply apply feature selection for explaining outliers due to the distinct characteristics of various detection models, complicated structures of data in certain applications, and imbalanced distribution of outliers and normal instances. In addition, the role of contrastive contexts where outliers locate, as well as the relation between outliers and contexts, are usually overlooked in interpretation. To tackle the issues above, in this paper, we propose a novel Contextual Outlier INterpretation (COIN) method to explain the abnormality of existing outliers spotted by detectors. The interpretability for an outlier is achieved from three aspects: outlierness score, attributes that contribute to the abnormality, and contextual description of its neighborhoods. Experimental results on various types of datasets demonstrate the flexibility and effectiveness of the proposed framework compared with existing interpretation approaches.
The TensorFlow Distributions library implements a vision of probability theory adapted to the modern deep-learning paradigm of end-to-end differentiable computation. Building on two basic abstractions, it offers flexible building blocks for probabilistic computation. Distributions provide fast, numerically stable methods for generating samples and computing statistics, e.g., log density. Bijectors provide composable volume-tracking transformations with automatic caching. Together these enable modular construction of high dimensional distributions and transformations not possible with previous libraries (e.g., pixelCNNs, autoregressive flows, and reversible residual networks). They are the workhorse behind deep probabilistic programming systems like Edward and empower fast black-box inference in probabilistic models built on deep-network components. TensorFlow Distributions has proven an important part of the TensorFlow toolkit within Google and in the broader deep learning community.
Centrality rankings such as degree, closeness, betweenness, Katz, PageRank, etc. are commonly used to identify critical nodes in a graph. These methods are based on two assumptions that restrict their wider applicability. First, they assume the exact topology of the network is available. Secondly, they do not take into account the activity over the network and only rely on its topology. However, in many applications, the network is autonomous, vast, and distributed, and it is hard to collect the exact topology. At the same time, the underlying pairwise activity between node pairs is not uniform and node criticality strongly depends on the activity on the underlying network. In this paper, we propose active betweenness cardinality, as a new measure, where the node criticalities are based on not the static structure, but the activity of the network. We show how this metric can be computed efficiently by using only local information for a given node and how we can find the most critical nodes starting from only a few nodes. We also show how this metric can be used to monitor a network and identify failed nodes.We present experimental results to show effectiveness by demonstrating how the failed nodes can be identified by measuring active betweenness cardinality of a few nodes in the system.
Ordinary least square (OLS) estimation of a linear regression model is well-known to be highly sensitive to outliers. It is common practice to first identify and remove outliers by looking at the data then to fit OLS and form confidence intervals and p-values on the remaining data as if this were the original data collected. We show in this paper that this ‘detect-and-forget’ approach can lead to invalid inference, and we propose a framework that properly accounts for outlier detection and removal to provide valid confidence intervals and hypothesis tests. Our inferential procedures apply to any outlier removal procedure that can be characterized by a set of quadratic constraints on the response vector, and we show that several of the most commonly used outlier detection procedures are of this form. Our methodology is built upon recent advances in selective inference (Taylor & Tibshirani 2015), which are focused on inference corrected for variable selection. We conduct simulations to corroborate the theoretical results, and we apply our method to two classic data sets considered in the outlier detection literature to illustrate how our inferential results can differ from the traditional detect-and-forget strategy. A companion R package, outference, implements these new procedures with an interface that matches the functions commonly used for inference with lm in R.
The Rapid and Accurate Image Super Resolution (RAISR) method of Romano, Isidoro, and Milanfar is a computationally efficient image upscaling method using a trained set of filters. We describe a generalization of RAISR, which we name Best Linear Adaptive Enhancement (BLADE). This approach is a trainable edge-adaptive filtering framework that is general, simple, computationally efficient, and useful for a wide range of image processing problems. We show applications to denoising, compression artifact removal, demosaicing, and approximation of anisotropic diffusion equations.
Network embedding aims to learn the low-dimensional representations of vertexes in a network, while structure and inherent properties of the network is preserved. Existing network embedding works primarily focus on preserving the microscopic structure, such as the first- and second-order proximity of vertexes, while the macroscopic scale-free property is largely ignored. Scale-free property depicts the fact that vertex degrees follow a heavy-tailed distribution (i.e., only a few vertexes have high degrees) and is a critical property of real-world networks, such as social networks. In this paper, we study the problem of learning representations for scale-free networks. We first theoretically analyze the difficulty of embedding and reconstructing a scale-free network in the Euclidean space, by converting our problem to the sphere packing problem. Then, we propose the ‘degree penalty’ principle for designing scale-free property preserving network embedding algorithm: punishing the proximity between high-degree vertexes. We introduce two implementations of our principle by utilizing the spectral techniques and a skip-gram model respectively. Extensive experiments on six datasets show that our algorithms are able to not only reconstruct heavy-tailed distributed degree distribution, but also outperform state-of-the-art embedding models in various network mining tasks, such as vertex classification and link prediction.
Previous work has shown that it is possible to train deep neural networks with low precision weights and activations. In the extreme case it is even possible to constrain the network to binary values. The costly floating point multiplications are then reduced to fast logical operations. High end smart phones such as Google’s Pixel 2 and Apple’s iPhone X are already equipped with specialised hardware for image processing and it is very likely that other future consumer hardware will also have dedicated accelerators for deep neural networks. Binary neural networks are attractive in this case because the logical operations are very fast and efficient when implemented in hardware. We propose a transfer learning based architecture where we first train a binary network on Imagenet and then retrain part of the network for different tasks while keeping most of the network fixed. The fixed binary part could be implemented in a hardware accelerator while the last layers of the network are evaluated in software. We show that a single binary neural network trained on the Imagenet dataset can indeed be used as a feature extractor for other datasets.
Tensors are multidimensional arrays of numerical values and therefore generalize matrices to multiple dimensions. While tensors first emerged in the psychometrics community in the $20^{\text{th}}$ century, they have since then spread to numerous other disciplines, including machine learning. Tensors and their decompositions are especially beneficial in unsupervised learning settings, but are gaining popularity in other sub-disciplines like temporal and multi-relational data analysis, too. The scope of this paper is to give a broad overview of tensors, their decompositions, and how they are used in machine learning. As part of this, we are going to introduce basic tensor concepts, discuss why tensors can be considered more rigid than matrices with respect to the uniqueness of their decomposition, explain the most important factorization algorithms and their properties, provide concrete examples of tensor decomposition applications in machine learning, conduct a case study on tensor-based estimation of mixture models, talk about the current state of research, and provide references to available software libraries.
Many machine learning systems utilize latent factors as internal representations for making predictions. However, since these latent factors are largely uninterpreted, predictions made using them are opaque. Collaborative filtering via matrix factorization is a prime example of such an algorithm that uses uninterpreted latent features, and yet has seen widespread adoption for many recommendation tasks. We present Latent Factor Interpretation (LFI), a method for interpreting models by leveraging interpretations of latent factors in terms of human-understandable features. The interpretation of latent factors can then replace the uninterpreted latent factors, resulting in a new model that expresses predictions in terms of interpretable features. This new model can then be interpreted using recently developed model explanation techniques. In this paper, we develop LFI for collaborative filtering based recommender systems, which are particularly challenging from an interpretation perspective. We illustrate the use of LFI interpretations on the MovieLens dataset demonstrating that latent factors can be predicted with enough accuracy for accurately replicating the predictions of the true model. Further, we demonstrate the accuracy of interpretations by applying the methodology to a collaborative recommender system using DB tropes and IMDB data and synthetic user preferences.