We present a simple LSTM-based transition-based dependency parser. Our model is composed of a single LSTM hidden layer replacing the hidden layer in the usual feed-forward network architecture. We also propose a new initialization method that uses the pre-trained weights from a feed-forward neural network to initialize our LSTM-based model. We also show that using dropout on the input layer has a positive effect on performance. Our final parser achieves a 93.06% unlabeled and 91.01% labeled attachment score on the Penn Treebank. We additionally replace LSTMs with GRUs and Elman units in our model and explore the effectiveness of our initialization method on individual gates constituting all three types of RNN units.
The matricized-tensor times Khatri-Rao product (MTTKRP) is the computational bottleneck for algorithms computing CP decompositions of tensors. In this paper, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors. The algorithms cast nearly all of the computation as matrix operations in order to use optimized BLAS subroutines, and they avoid reordering tensor entries in memory. We benchmark sequential and parallel performance of our implementations, demonstrating high sequential performance and efficient parallel scaling. We use our parallel implementation to compute a CP decomposition of a neuroimaging data set and achieve a speedup of up to $7.4\times$ over existing parallel software.
In this paper we present a method for the unsupervised clustering of high-dimensional binary data, with a special focus on electronic healthcare records. We present a robust and efficient heuristic to face this problem using tensor decomposition. We present the reasons why this approach is preferable for tasks such as clustering patient records, to more commonly used distance-based methods. We run the algorithm on two datasets of healthcare records, obtaining clinically meaningful results.
In this paper, we present hierarchical relationbased latent Dirichlet allocation (hrLDA), a data-driven hierarchical topic model for extracting terminological ontologies from a large number of heterogeneous documents. In contrast to traditional topic models, hrLDA relies on noun phrases instead of unigrams, considers syntax and document structures, and enriches topic hierarchies with topic relations. Through a series of experiments, we demonstrate the superiority of hrLDA over existing topic models, especially for building hierarchies. Furthermore, we illustrate the robustness of hrLDA in the settings of noisy data sets, which are likely to occur in many practical scenarios. Our ontology evaluation results show that ontologies extracted from hrLDA are very competitive with the ontologies created by domain experts.
Graph modeling allows numerous security problems to be tackled in a general way, however, little work has been done to understand their ability to withstand adversarial attacks. We design and evaluate two novel graph attacks against a state-of-the-art network-level, graph-based detection system. Our work highlights areas in adversarial machine learning that have not yet been addressed, specifically: graph-based clustering techniques, and a global feature space where realistic attackers without perfect knowledge must be accounted for (by the defenders) in order to be practical. Even though less informed attackers can evade graph clustering with low cost, we show that some practical defenses are possible.
Incorporating additional knowledge in the learning process can be beneficial for several computer vision and machine learning tasks. Whether privileged information originates from a source domain that is adapted to a target domain, or as additional features available at training time only, using such privileged (i.e., auxiliary) information is of high importance as it improves the recognition performance and generalization. However, both primary and privileged information are rarely derived from the same distribution, which poses an additional challenge to the recognition task. To address these challenges, we present a novel learning paradigm that leverages privileged information in a domain adaptation setup to perform visual recognition tasks. The proposed framework, named Adaptive SVM+, combines the advantages of both the learning using privileged information (LUPI) paradigm and the domain adaptation framework, which are naturally embedded in the objective function of a regular SVM. We demonstrate the effectiveness of our approach on the publicly available Animals with Attributes and INTERACT datasets and report state-of-the-art results in both of them.
Between matrix factorization or Random Walk with Restart (RWR), which method works better for recommender systems? Which method handles explicit or implicit feedback data better? Does additional side information help recommen- dation? Recommender systems play an important role in many e-commerce services such as Amazon and Netflix to recommend new items to a user. Among various recommendation strategies, collaborative filtering has shown good performance by using rating patterns of users. Matrix factorization and random walk with restart are the most representative collaborative filtering methods. However, it is still unclear which method provides better recommendation performance despite their extensive utility. In this paper, we provide a comparative study of matrix factorization and RWR in recommender systems. We exactly formulate each correspondence of the two methods according to various tasks in recommendation. Especially, we newly devise an RWR method using global bias term which corresponds to a matrix factorization method using biases. We describe details of the two methods in various aspects of recommendation quality such as how those methods handle cold-start problem which typ- ically happens in collaborative filtering. We extensively perform experiments over real-world datasets to evaluate the performance of each method in terms of various measures. We observe that matrix factorization performs better with explicit feedback ratings while RWR is better with implicit ones. We also observe that exploiting global popularities of items is advantageous in the performance and that side information produces positive synergy with explicit feedback but gives negative effects with implicit one.
The explanation of heterogeneous multivariate time series data is a central problem in many applications. The problem requires two major data mining challenges to be addressed simultaneously: Learning models that are human-interpretable and mining of heterogeneous multivariate time series data. The intersection of these two areas is not adequately explored in the existing literature. To address this gap, we propose grammar-based decision trees and an algorithm for learning them. Grammar-based decision tree extends decision trees with a grammar framework. Logical expressions, derived from context-free grammar, are used for branching in place of simple thresholds on attributes. The added expressivity enables support for a wide range of data types while retaining the interpretability of decision trees. By choosing a grammar based on temporal logic, we show that grammar-based decision trees can be used for the interpretable classification of high-dimensional and heterogeneous time series data. In addition to classification, we show how grammar-based decision trees can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply grammar-based decision trees to analyze the classic Australian Sign Language dataset as well as categorize and explain near mid-air collisions to support the development of a prototype aircraft collision avoidance system.
This article investigates emergence and complexity in complex systems that can share information on a network. To this end, we use a theoretical approach from information theory, computability theory, and complex networks. One key studied question is how much emergent complexity arises when a population of computable systems is networked compared with when this population is isolated. First, we define a general model for networked theoretical machines, which we call algorithmic networks. Then, we narrow our scope to investigate algorithmic networks that optimize the average fitnesses of nodes in which each node imitates the fittest neighbor and the randomly generated population is networked by a time-varying graph. We show that there are graph-topological conditions that make these algorithmic networks have the property of expected emergent open-endedness for large enough populations. In other words, the expected emergent algorithmic complexity of a node tends to infinity as the population size tends to infinity. Given a dynamic network, we show that these conditions imply the existence of a central time to trigger expected emergent open-endedness. Moreover, we show that networks with small diameter meet these conditions. We also discuss future research based on how our results are related to some problems in network science, information theory, computability theory, distributed computing, game theory, evolutionary biology, and synergy in complex systems.
Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.
Leveraging on the recent developments in convolutional neural networks (CNNs), matching dense correspondence from a stereo pair has been cast as a learning problem, with performance exceeding traditional approaches. However, it remains challenging to generate high-quality disparities for the inherently ill-posed regions. To tackle this problem, we propose a novel cascade CNN architecture composing of two stages. The first stage advances the recently proposed DispNet by equipping it with extra up-convolution modules, leading to disparity images with more details. The second stage explicitly rectifies the disparity initialized by the first stage; it couples with the first-stage and generates residual signals across multiple scales. The summation of the outputs from the two stages gives the final disparity. As opposed to directly learning the disparity at the second stage, we show that residual learning provides more effective refinement. Moreover, it also benefits the training of the overall cascade network. Experimentation shows that our cascade residual learning scheme provides state-of-the-art performance for matching stereo correspondence. By the time of the submission of this paper, our method ranks first in the KITTI 2015 stereo benchmark, surpassing the prior works by a noteworthy margin.
The paper proposes the ScatterNet Hybrid Deep Learning (SHDL) network that extracts invariant and discriminative image representations for object recognition. SHDL framework is constructed with a multi-layer ScatterNet front-end, an unsupervised learning middle, and a supervised learning back-end module. Each layer of the SHDL network is automatically designed as an explicit optimization problem leading to an optimal deep learning architecture with improved computational performance as compared to the more usual deep network architectures. SHDL network produces the state-of-the-art classification performance against unsupervised and semi-supervised learning (GANs) on two image datasets. Advantages of the SHDL network over supervised methods (NIN, VGG) are also demonstrated with experiments performed on training datasets of reduced size.
Named Entity Recognition and Disambiguation (NERD) systems have recently been widely researched to deal with the significant growth of the Web. NERD systems are crucial for several Natural Language Processing (NLP) tasks such as summarization, understanding, and machine translation. However, there is no standard interface specification, i.e. these systems may vary significantly either for exporting their outputs or for processing the inputs. Thus, when a given company desires to implement more than one NERD system, the process is quite exhaustive and prone to failure. In addition, industrial solutions demand critical requirements, e.g., large-scale processing, completeness, versatility, and licenses. Commonly, these requirements impose a limitation, making good NERD models to be ignored by companies. This paper presents TANKER, a distributed architecture which aims to overcome scalability, reliability and failure tolerance limitations related to industrial needs by combining NERD systems. To this end, TANKER relies on a micro-services oriented architecture, which enables agile development and delivery of complex enterprise applications. In addition, TANKER provides a standardized API which makes possible to combine several NERD systems at once.