A Simple LSTM model for Transition-based Dependency Parsing

We present a simple LSTM-based transition-based dependency parser. Our model is composed of a single LSTM hidden layer replacing the hidden layer in the usual feed-forward network architecture. We also propose a new initialization method that uses the pre-trained weights from a feed-forward neural network to initialize our LSTM-based model. We also show that using dropout on the input layer has a positive effect on performance. Our final parser achieves a 93.06% unlabeled and 91.01% labeled attachment score on the Penn Treebank. We additionally replace LSTMs with GRUs and Elman units in our model and explore the effectiveness of our initialization method on individual gates constituting all three types of RNN units.

Shared Memory Parallelization of MTTKRP for Dense Tensors

The matricized-tensor times Khatri-Rao product (MTTKRP) is the computational bottleneck for algorithms computing CP decompositions of tensors. In this paper, we develop shared-memory parallel algorithms for MTTKRP involving dense tensors. The algorithms cast nearly all of the computation as matrix operations in order to use optimized BLAS subroutines, and they avoid reordering tensor entries in memory. We benchmark sequential and parallel performance of our implementations, demonstrating high sequential performance and efficient parallel scaling. We use our parallel implementation to compute a CP decomposition of a neuroimaging data set and achieve a speedup of up to 7.4\times over existing parallel software.

Clustering Patients with Tensor Decomposition

In this paper we present a method for the unsupervised clustering of high-dimensional binary data, with a special focus on electronic healthcare records. We present a robust and efficient heuristic to face this problem using tensor decomposition. We present the reasons why this approach is preferable for tasks such as clustering patient records, to more commonly used distance-based methods. We run the algorithm on two datasets of healthcare records, obtaining clinically meaningful results.

Unsupervised Terminological Ontology Learning based on Hierarchical Topic Modeling

In this paper, we present hierarchical relationbased latent Dirichlet allocation (hrLDA), a data-driven hierarchical topic model for extracting terminological ontologies from a large number of heterogeneous documents. In contrast to traditional topic models, hrLDA relies on noun phrases instead of unigrams, considers syntax and document structures, and enriches topic hierarchies with topic relations. Through a series of experiments, we demonstrate the superiority of hrLDA over existing topic models, especially for building hierarchies. Furthermore, we illustrate the robustness of hrLDA in the settings of noisy data sets, which are likely to occur in many practical scenarios. Our ontology evaluation results show that ontologies extracted from hrLDA are very competitive with the ontologies created by domain experts.

Practical Attacks Against Graph-based Clustering

Graph modeling allows numerous security problems to be tackled in a general way, however, little work has been done to understand their ability to withstand adversarial attacks. We design and evaluate two novel graph attacks against a state-of-the-art network-level, graph-based detection system. Our work highlights areas in adversarial machine learning that have not yet been addressed, specifically: graph-based clustering techniques, and a global feature space where realistic attackers without perfect knowledge must be accounted for (by the defenders) in order to be practical. Even though less informed attackers can evade graph clustering with low cost, we show that some practical defenses are possible.

Adaptive SVM+: Learning with Privileged Information for Domain Adaptation

Incorporating additional knowledge in the learning process can be beneficial for several computer vision and machine learning tasks. Whether privileged information originates from a source domain that is adapted to a target domain, or as additional features available at training time only, using such privileged (i.e., auxiliary) information is of high importance as it improves the recognition performance and generalization. However, both primary and privileged information are rarely derived from the same distribution, which poses an additional challenge to the recognition task. To address these challenges, we present a novel learning paradigm that leverages privileged information in a domain adaptation setup to perform visual recognition tasks. The proposed framework, named Adaptive SVM+, combines the advantages of both the learning using privileged information (LUPI) paradigm and the domain adaptation framework, which are naturally embedded in the objective function of a regular SVM. We demonstrate the effectiveness of our approach on the publicly available Animals with Attributes and INTERACT datasets and report state-of-the-art results in both of them.

A Comparative Study of Matrix Factorization and Random Walk with Restart in Recommender Systems

Between matrix factorization or Random Walk with Restart (RWR), which method works better for recommender systems? Which method handles explicit or implicit feedback data better? Does additional side information help recommen- dation? Recommender systems play an important role in many e-commerce services such as Amazon and Netflix to recommend new items to a user. Among various recommendation strategies, collaborative filtering has shown good performance by using rating patterns of users. Matrix factorization and random walk with restart are the most representative collaborative filtering methods. However, it is still unclear which method provides better recommendation performance despite their extensive utility. In this paper, we provide a comparative study of matrix factorization and RWR in recommender systems. We exactly formulate each correspondence of the two methods according to various tasks in recommendation. Especially, we newly devise an RWR method using global bias term which corresponds to a matrix factorization method using biases. We describe details of the two methods in various aspects of recommendation quality such as how those methods handle cold-start problem which typ- ically happens in collaborative filtering. We extensively perform experiments over real-world datasets to evaluate the performance of each method in terms of various measures. We observe that matrix factorization performs better with explicit feedback ratings while RWR is better with implicit ones. We also observe that exploiting global popularities of items is advantageous in the performance and that side information produces positive synergy with explicit feedback but gives negative effects with implicit one.

Interpretable Categorization of Heterogeneous Time Series Data

The explanation of heterogeneous multivariate time series data is a central problem in many applications. The problem requires two major data mining challenges to be addressed simultaneously: Learning models that are human-interpretable and mining of heterogeneous multivariate time series data. The intersection of these two areas is not adequately explored in the existing literature. To address this gap, we propose grammar-based decision trees and an algorithm for learning them. Grammar-based decision tree extends decision trees with a grammar framework. Logical expressions, derived from context-free grammar, are used for branching in place of simple thresholds on attributes. The added expressivity enables support for a wide range of data types while retaining the interpretability of decision trees. By choosing a grammar based on temporal logic, we show that grammar-based decision trees can be used for the interpretable classification of high-dimensional and heterogeneous time series data. In addition to classification, we show how grammar-based decision trees can also be used for categorization, which is a combination of clustering and generating interpretable explanations for each cluster. We apply grammar-based decision trees to analyze the classic Australian Sign Language dataset as well as categorize and explain near mid-air collisions to support the development of a prototype aircraft collision avoidance system.

Algorithmic Networks: central time to trigger expected emergent open-endedness

This article investigates emergence and complexity in complex systems that can share information on a network. To this end, we use a theoretical approach from information theory, computability theory, and complex networks. One key studied question is how much emergent complexity arises when a population of computable systems is networked compared with when this population is isolated. First, we define a general model for networked theoretical machines, which we call algorithmic networks. Then, we narrow our scope to investigate algorithmic networks that optimize the average fitnesses of nodes in which each node imitates the fittest neighbor and the randomly generated population is networked by a time-varying graph. We show that there are graph-topological conditions that make these algorithmic networks have the property of expected emergent open-endedness for large enough populations. In other words, the expected emergent algorithmic complexity of a node tends to infinity as the population size tends to infinity. Given a dynamic network, we show that these conditions imply the existence of a central time to trigger expected emergent open-endedness. Moreover, we show that networks with small diameter meet these conditions. We also discuss future research based on how our results are related to some problems in network science, information theory, computability theory, distributed computing, game theory, evolutionary biology, and synergy in complex systems.

Tensor Networks for Dimensionality Reduction and Large-Scale Optimizations. Part 2 Applications and Future Perspectives

Part 2 of this monograph builds on the introduction to tensor networks and their operations presented in Part 1. It focuses on tensor network models for super-compressed higher-order representation of data/parameters and related cost functions, while providing an outline of their applications in machine learning and data analytics. A particular emphasis is on the tensor train (TT) and Hierarchical Tucker (HT) decompositions, and their physically meaningful interpretations which reflect the scalability of the tensor network approach. Through a graphical approach, we also elucidate how, by virtue of the underlying low-rank tensor approximations and sophisticated contractions of core tensors, tensor networks have the ability to perform distributed computations on otherwise prohibitively large volumes of data/parameters, thereby alleviating or even eliminating the curse of dimensionality. The usefulness of this concept is illustrated over a number of applied areas, including generalized regression and classification (support tensor machines, canonical correlation analysis, higher order partial least squares), generalized eigenvalue decomposition, Riemannian optimization, and in the optimization of deep neural networks. Part 1 and Part 2 of this work can be used either as stand-alone separate texts, or indeed as a conjoint comprehensive review of the exciting field of low-rank tensor networks and tensor decompositions.

Cascade Residual Learning: A Two-stage Convolutional Neural Network for Stereo Matching

Leveraging on the recent developments in convolutional neural networks (CNNs), matching dense correspondence from a stereo pair has been cast as a learning problem, with performance exceeding traditional approaches. However, it remains challenging to generate high-quality disparities for the inherently ill-posed regions. To tackle this problem, we propose a novel cascade CNN architecture composing of two stages. The first stage advances the recently proposed DispNet by equipping it with extra up-convolution modules, leading to disparity images with more details. The second stage explicitly rectifies the disparity initialized by the first stage; it couples with the first-stage and generates residual signals across multiple scales. The summation of the outputs from the two stages gives the final disparity. As opposed to directly learning the disparity at the second stage, we show that residual learning provides more effective refinement. Moreover, it also benefits the training of the overall cascade network. Experimentation shows that our cascade residual learning scheme provides state-of-the-art performance for matching stereo correspondence. By the time of the submission of this paper, our method ranks first in the KITTI 2015 stereo benchmark, surpassing the prior works by a noteworthy margin.

ScatterNet Hybrid Deep Learning (SHDL) Network For Object Classification

The paper proposes the ScatterNet Hybrid Deep Learning (SHDL) network that extracts invariant and discriminative image representations for object recognition. SHDL framework is constructed with a multi-layer ScatterNet front-end, an unsupervised learning middle, and a supervised learning back-end module. Each layer of the SHDL network is automatically designed as an explicit optimization problem leading to an optimal deep learning architecture with improved computational performance as compared to the more usual deep network architectures. SHDL network produces the state-of-the-art classification performance against unsupervised and semi-supervised learning (GANs) on two image datasets. Advantages of the SHDL network over supervised methods (NIN, VGG) are also demonstrated with experiments performed on training datasets of reduced size.

TANKER: Distributed Architecture for Named Entity Recognition and Disambiguation

Named Entity Recognition and Disambiguation (NERD) systems have recently been widely researched to deal with the significant growth of the Web. NERD systems are crucial for several Natural Language Processing (NLP) tasks such as summarization, understanding, and machine translation. However, there is no standard interface specification, i.e. these systems may vary significantly either for exporting their outputs or for processing the inputs. Thus, when a given company desires to implement more than one NERD system, the process is quite exhaustive and prone to failure. In addition, industrial solutions demand critical requirements, e.g., large-scale processing, completeness, versatility, and licenses. Commonly, these requirements impose a limitation, making good NERD models to be ignored by companies. This paper presents TANKER, a distributed architecture which aims to overcome scalability, reliability and failure tolerance limitations related to industrial needs by combining NERD systems. To this end, TANKER relies on a micro-services oriented architecture, which enables agile development and delivery of complex enterprise applications. In addition, TANKER provides a standardized API which makes possible to combine several NERD systems at once.

Counting local integrals of motion in disordered spinless-fermion and Hubbard chains
Iterative Compression-Decimation Scheme for Tensor Network Optimization
On the zeroth-order general Randić index, variable sum exdeg index and trees having vertices with prescribed degree
On Rainbow Hamilton Cycles in Random Hypergraphs
Preliminary testing derivatives of a linear unified estimator in the logistic regression model
Pix2face: Direct 3D Face Model Estimation
Turing instability in a model with two interacting Ising lines: linear stability and non-equilibrium fluctuations
Learning to Price with Reference Effects
A Connectedness Constraint for Learning Sparse Graphs
Tail approximations for sums of dependent regularly varying random variables under Archimedean copula models
Uniformly Efficient Simulation for Extremes of Gaussian Random Fields
Plausibility and probability in deductive reasoning
Convolutional Sparse Coding with Overlapping Group Norms
Modelling Protagonist Goals and Desires in First-Person Narrative
Generalizations of Maximal Inequalities to Arbitrary Selection Rules
An O(log log m)-competitive Algorithm for Online Machine Minimization
Answering Spatial Multiple-Set Intersection Queries Using 2-3 Cuckoo Hash-Filters
Block-Simultaneous Direction Method of Multipliers: A proximal primal-dual splitting algorithm for nonconvex problems with multiple constraints
Ergodic behaviour of a Douglas-Rachford operator away from the origin
Complete graphs: the space of simplicial cones, and their path tree representation
Continual One-Shot Learning of Hidden Spike-Patterns with Neural Network Simulation Expansion and STDP Convergence Predictions
Parking cars of different sizes
Dynamic Graph Coloring
PersonaBank: A Corpus of Personal Narratives and Their Story Intention Graphs
Argument Strength is in the Eye of the Beholder: Audience Effects in Persuasion
A Deep Learning Approach for Population Estimation from Satellite Imagery
Automating Direct Speech Variations in Stories and Games
An efficient duality-based approach for PDE-constrained sparse optimization
Identity Testing from High Powers of Polynomials of Large Degree over Finite Fields
Finite State Markov Decision Processes with Transfer Entropy Costs
Optimizing scoring function of dynamic programming of pairwise profile alignment using derivative free neural network
Erdős-Ginzburg-Ziv constants by avoiding three-term arithmetic progressions
Simultaneously Color-Depth Super-Resolution with Conditional Generative Adversarial Network
Planar L-Drawings of Directed Graphs
Hook length property of $d$-complete posets via $q$-integrals
Slope Stability Analysis with Geometric Semantic Genetic Programming
Cache-Aided Interference Management in Partially Connected Wireless Networks
Technical Report for ‘User-Centric Participatory Sensing: A Game Theoretic Analysis’
Photorealistic Facial Expression Synthesis by the Conditional Difference Adversarial Autoencoder
Graph theory general position problem
Network Slicing for Ultra-Reliable Low Latency Communication in Industry 4.0 Scenarios
Summability of Sequence of Random Variables
A New Super-Twisting Algorithm-Based Sliding Mode Observer Design for Fault Estimation in a Class of Nonlinear Fractional Order Systems
Randomized Load-balanced Routing for Fat-tree Networks
The Complexity of Computing a Cardinality Repair for Functional Dependencies
Constructive Characterization for Cycle Packing and Cycle Covering
Paradigm Completion for Derivational Morphology
Spatial Resource Allocation for Spectrum Reuse in Unlicensed LTE Systems
Cross-lingual, Character-Level Neural Morphological Tagging
An Empirical Study of Discriminative Sequence Labeling Models for Vietnamese Text Processing
Colored Point-set Embeddings of Acyclic Graphs
Enforcing Privacy in Cloud Databases
Calibrating chemical multisensory devices for real world applications: An in-depth comparison of quantitative Machine Learning approaches
Optimal pebbling and rubbling of graphs with given diameter
Propriétés de maximalité concernant une représentation définie par Lusztig
A Greedy Part Assignment Algorithm for Real-time Multi-person 2D Pose Estimation
Dilation volumes of sets of finite perimeter
Deformation and flow of amorphous solids: a review of mesoscale elastoplastic models
On Smooth Orthogonal and Octilinear Drawings: Relations, Complexity and Kandinsky Drawings
Joint Maximum Purity Forest with Application to Image Super-Resolution
On the consistency of the spacings test for multivariate uniformity
Look-ahead Attention for Generation in Neural Machine Translation
Experimental Evaluation of Book Drawing Algorithms
On Vertex- and Empty-Ply Proximity Drawings
Fighting with the Sparsity of Synonymy Dictionaries
Planar Drawings of Fixed-Mobile Bigraphs
On the secrecy gain of $\ell$-modular lattices
Tilings in randomly perturbed dense graphs
Counting equilibria of the Kuramoto model using birationally invariant intersection index
An Interactive Tool to Explore and Improve the Ply Number of Drawings
Efficient Convolutional Network Learning using Parametric Log based Dual-Tree Wavelet ScatterNet
Some Topological Invariants of Generalized Möbius Ladder
Two-stream Flow-guided Convolutional Attention Networks for Action Recognition
Inhomogeneous perturbation and error bounds for the stationary performance of random walks in the quarter plane
NodeTrix Planarity Testing with Small Clusters
Distributed Holistic Clustering on Linked Data
Texture and Structure Incorporated ScatterNet Hybrid Deep Learning Network (TS-SHDL) For Brain Matter Segmentation
A Pseudo Knockoff Filter for Correlated Features
Rotation Symmetric Bent Boolean Functions for n = 2p
Disguised Face Identification (DFI) with Facial KeyPoints using Spatial Fusion Convolutional Network
Adversarial nets with perceptual losses for text-to-image synthesis
Approximating Weighted Duo-Preservation in Comparative Genomics
Balanced scheduling of school bus trips using a perfect matching heuristic
Optimal and Learning Control for Autonomous Robots
Secondary frequency control with on-off load side participation in power networks
Quantum-enhanced reinforcement learning for finite-episode games with discrete state spaces
Quantum simulation from the bottom up: the case of rebits
Non-explosivity of stochastically modeled reaction networks that are complex balanced
Euler characteristics of Brill-Noether varieties