Tensor network methods are taking a central role in modern quantum physics and beyond. They can provide an efficient approximation to certain classes of quantum states, and the associated graphical language makes it easy to describe and pictorially reason about quantum circuits, channels, protocols, open systems and more. Our goal is to explain tensor networks and some associated methods as quickly and as painlessly as possible. Beginning with the key definitions, the graphical tensor network language is presented through examples. We then provide an introduction to matrix product states. We conclude the tutorial with tensor contractions evaluating combinatorial counting problems. The first one counts the number of solutions for Boolean formulae, whereas the second is Penrose’s tensor contraction algorithm, returning the number of $3$-edge-colorings of $3$-regular planar graphs.
Active learning has long been a topic of study in machine learning. However, as increasingly complex and opaque models have become standard practice, the process of active learning, too, has become more opaque. There has been little investigation into interpreting what specific trends and patterns an active learning strategy may be exploring. This work expands on the Local Interpretable Model-agnostic Explanations framework (LIME) to provide explanations for active learning recommendations. We demonstrate how LIME can be used to generate locally faithful explanations for an active learning strategy, and how these explanations can be used to understand how different models and datasets explore a problem space over time. In order to quantify the per-subgroup differences in how an active learning strategy queries spatial regions, we introduce a notion of uncertainty bias (based on disparate impact) to measure the discrepancy in the confidence for a model’s predictions between one subgroup and another. Using the uncertainty bias measure, we show that our query explanations accurately reflect the subgroup focus of the active learning queries, allowing for an interpretable explanation of what is being learned as points with similar sources of uncertainty have their uncertainty bias resolved. We demonstrate that this technique can be applied to track uncertainty bias over user-defined clusters or automatically generated clusters based on the source of uncertainty.
Existing sequence prediction methods are mostly concerned with time-independent sequences, in which the actual time span between events is irrelevant and the distance between events is simply the difference between their order positions in the sequence. While this time-independent view of sequences is applicable for data such as natural languages, e.g., dealing with words in a sentence, it is inappropriate and inefficient for many real world events that are observed and collected at unequally spaced points of time as they naturally arise, e.g., when a person goes to a grocery store or makes a phone call. The time span between events can carry important information about the sequence dependence of human behaviors. To leverage continuous time in sequence prediction, we propose two methods for integrating time into event representation, based on the intuition on how time is tokenized in everyday life and previous work on embedding contextualization. We particularly focus on using these methods in recurrent neural networks, which have gained popularity in many sequence prediction tasks. We evaluated these methods as well as baseline models on two learning tasks: mobile app usage prediction and music recommendation. The experiments revealed that the proposed methods for time-dependent representation offer consistent gain on accuracy compared to baseline models that either directly use continuous time value in a recurrent neural network or do not use time.
Recurrent neural networks show state-of-the-art results in many text analysis tasks but often require a lot of memory to store their weights. Recently proposed Sparse Variational Dropout eliminates the majority of the weights in a feed-forward neural network without significant loss of quality. We apply this technique to sparsify recurrent neural networks. To account for recurrent specifics we also rely on Binary Variational Dropout for RNN. We report 99.5% sparsity level on sentiment analysis task without a quality drop and up to 87% sparsity level on language modeling task with slight loss of accuracy.
We introduce a model that learns active learning algorithms via metalearning. For a distribution of related tasks, our model jointly learns: a data representation, an item selection heuristic, and a method for constructing prediction functions from labeled training sets. Our model uses the item selection heuristic to gather labeled training sets from which to construct prediction functions. Using the Omniglot and MovieLens datasets, we test our model in synthetic and practical settings.
We propose a new shared task for tactical data-to-text generation in the domain of source code libraries. Specifically, we focus on text generation of function descriptions from example software projects. Data is drawn from existing resources used for studying the related problem of semantic parser induction (Richardson and Kuhn, 2017b; Richardson and Kuhn, 2017a), and spans a wide variety of both natural languages and programming languages. In this paper, we describe these existing resources, which will serve as training and development data for the task, and discuss plans for building new independent test sets.
One question central to Reinforcement Learning is how to learn a feature representation that supports algorithm scaling and re-use of learned information from different tasks. Successor Features approach this problem by learning a feature representation that satisfies a temporal constraint. We present an implementation of an approach that decouples the feature representation from the reward function, making it suitable for transferring knowledge between domains. We then assess the advantages and limitations of using Successor Features for transfer.
The combination of argumentation and probability paves the way to new accounts of qualitative and quantitative uncertainty, thereby offering new theoretical and applicative opportunities. Due to a variety of interests, probabilistic argumentation is approached in the literature with different frameworks, pertaining to structured and abstract argumentation, and with respect to diverse types of uncertainty, in particular the uncertainty on the credibility of the premises, the uncertainty about which arguments to consider, and the uncertainty on the acceptance status of arguments or statements. Towards a general framework for probabilistic argumentation, we investigate a labelling-oriented framework encompassing a basic setting for rule-based argumentation and its (semi-) abstract account, along with diverse types of uncertainty. Our framework provides a systematic treatment of various kinds of uncertainty and of their relationships and allows us to retrieve (by derivation) multiple statements (sometimes assumed) or results from the literature.
Session length is a very important aspect in determining a user’s satisfaction with a media streaming service. Being able to predict how long a session will last can be of great use for various downstream tasks, such as recommendations and ad scheduling. Most of the related literature on user interaction duration has focused on dwell time for websites, usually in the context of approximating post-click satisfaction either in search results, or display ads. In this work we present the first analysis of session length in a mobile-focused online service, using a real world data-set from a major music streaming service. We use survival analysis techniques to show that the characteristics of the length distributions can differ significantly between users, and use gradient boosted trees with appropriate objectives to predict the length of a session using only information available at its beginning. Our evaluation on real world data illustrates that our proposed technique outperforms the considered baseline.
Tensor train (TT) decomposition provides a space-efficient representation for higher-order tensors. Despite its advantage, we face two crucial limitations when we apply the TT decomposition to machine learning problems: the lack of statistical theory and of scalable algorithms. In this paper, we address the limitations. First, we introduce a convex relaxation of the TT decomposition problem and derive its error bound for the tensor completion task. Next, we develop an alternating optimization method with a randomization technique, in which the time complexity is as efficient as the space complexity is. In experiments, we numerically confirm the derived bounds and empirically demonstrate the performance of our method with a real higher-order tensor.
Traditional Recurrent Neural Networks assume vectorized data as inputs. However many data from modern science and technology come in certain structures such as tensorial time series data. To apply the recurrent neural networks for this type of data, a vectorisation process is necessary, while such a vectorisation leads to the loss of the precise information of the spatial or longitudinal dimensions. In addition, such a vectorized data is not an optimum solution for learning the representation of the longitudinal data. In this paper, we propose a new variant of tensorial neural networks which directly take tensorial time series data as inputs. We call this new variant as Tensorial Recurrent Neural Network (TRNN). The proposed TRNN is based on tensor Tucker decomposition.
As more and more collections of data are becoming available on the web to everyone, non expert users demand easy ways to retrieve data from these collections. One solution is the so called Visual Query Systems (VQS) where queries are represented visually and users do not have to understand query languages such as SQL or XQuery. In 1996, a paper by Catarci reviewed the Visual Query Systems available until that year. In this paper, we review VQSs from 1997 until now and try to determine whether they have been the solution for non expert users. The short answer is no because very few systems have in fact been used in real environments or as commercial tools. We have also gathered basic features of VQSs such as the visual representation adopted to present the reality of interest or the visual representation adopted to express queries.
We show that small and shallow feed-forward neural networks can achieve near state-of-the-art results on a range of unstructured and structured language processing tasks while being considerably cheaper in memory and computational requirements than deep recurrent models. Motivated by resource-constrained environments like mobile phones, we showcase simple techniques for obtaining such small neural network models, and investigate different tradeoffs when deciding how to allocate a small memory budget.
With the ever increasing size of web, relevant information extraction on the Internet with a query formed by a few keywords has become a big challenge. To overcome this, query expansion (QE) plays a crucial role in improving the Internet searches, where the user’s initial query is reformulated to a new query by adding new meaningful terms with similar significance. QE — as part of information retrieval (IR) — has long attracted researchers’ attention. It has also become very influential in the field of personalized social document, Question Answering over Linked Data (QALD), and, Text Retrieval Conference (TREC) and REAL sets. This paper surveys QE techniques in IR from 1960 to 2017 with respect to core techniques, data sources used, weighting and ranking methodologies, user participation and applications (of QE techniques) — bringing out similarities and differences.
We propose Deep Asymmetric Multitask Feature Learning (Deep-AMTFL) which can learn deep representations shared across multiple tasks while effectively preventing negative transfer that may happen in the feature sharing process. Specifically, we introduce an asymmetric autoencoder term that allows predictors for the confident tasks to have high contribution to the feature learning while suppressing the influences of less confident task predictors. This allows learning less noisy representations, and allows weak predictors to exploit knowledge from the strong predictors via the shared latent features. Such asymmetric knowledge transfer through shared features is also more scalable and efficient than inter-task asymmetric transfer. We validate our Deep-AMTFL model on multiple benchmark datasets for multitask learning and image classification, on which it significantly outperforms existing symmetric and asymmetric multitask learning models, by effectively preventing negative transfer in deep feature learning.
We present a new topic model that generates documents by sampling a topic for one whole sentence at a time, and generating the words in the sentence using an RNN decoder that is conditioned on the topic of the sentence. We argue that this novel formalism will help us not only visualize and model the topical discourse structure in a document better, but also potentially lead to more interpretable topics since we can now illustrate topics by sampling representative sentences instead of bag of words or phrases. We present a variational auto-encoder approach for learning in which we use a factorized variational encoder that independently models the posterior over topical mixture vectors of documents using a feed-forward network, and the posterior over topic assignments to sentences using an RNN. Our preliminary experiments on two different datasets indicate early promise, but also expose many challenges that remain to be addressed.
Generative Adversarial Networks (GANs) have recently achieved significant improvement on paired/unpaired image-to-image translation, such as photo$\rightarrow$ sketch and artist painting style transfer. However, existing models can only be capable of transferring the low-level information (e.g. color or texture changes), but fail to edit high-level semantic meanings (e.g., geometric structure or content) of objects. On the other hand, while some researches can synthesize compelling real-world images given a class label or caption, they cannot condition on arbitrary shapes or structures, which largely limits their application scenarios and interpretive capability of model results. In this work, we focus on a more challenging semantic manipulation task, which aims to modify the semantic meaning of an object while preserving its own characteristics (e.g. viewpoints and shapes), such as cow$\rightarrow$sheep, motor$\rightarrow$ bicycle, cat$\rightarrow$dog. To tackle such large semantic changes, we introduce a contrasting GAN (contrast-GAN) with a novel adversarial contrasting objective. Instead of directly making the synthesized samples close to target data as previous GANs did, our adversarial contrasting objective optimizes over the distance comparisons between samples, that is, enforcing the manipulated data be semantically closer to the real data with target category than the input data. Equipped with the new contrasting objective, a novel mask-conditional contrast-GAN architecture is proposed to enable disentangle image background with object semantic changes. Experiments on several semantic manipulation tasks on ImageNet and MSCOCO dataset show considerable performance gain by our contrast-GAN over other conditional GANs. Quantitative results further demonstrate the superiority of our model on generating manipulated results with high visual fidelity and reasonable object semantics.
This paper proposes a clustering procedure for samples of multivariate functions in $(L^2(I))^{J}$, with $J\geq1$. This method is based on a k-means algorithm in which the distance between the curves is measured with a metrics that generalizes the Mahalanobis distance in Hilbert spaces, considering the correlation and the variability along all the components of the functional data. The proposed procedure has been studied in simulation and compared with the k-means based on other distances typically adopted for clustering multivariate functional data. In these simulations, it is shown that the k-means algorithm with the generalized Mahalanobis distance provides the best clustering performances, both in terms of mean and standard deviation of the number of misclassified curves. Finally, the proposed method has been applied to two real cases studies, concerning ECG signals and growth curves, where the results obtained in simulation are confirmed and strengthened.
Factor analysis aims to describe high dimensional random vectors by means of a small number of unknown common factors. In mathematical terms, it is required to decompose the covariance matrix $\Sigma$ of the random vector as the sum of a diagonal matrix $D$ | accounting for the idiosyncratic noise in the data | and a low rank matrix $R$ | accounting for the variance of the common factors | in such a way that the rank of $R$ is as small as possible so that the number of common factors is minimal. In practice, however, the matrix $\Sigma$ is unknown and must be replaced by its estimate, i.e. the sample covariance, which comes from a finite amount of data. This paper provides a strategy to account for the uncertainty in the estimation of $\Sigma$ in the factor analysis problem.
Nowadays, with the remarkable expansion of the information through the internet, users prefer to receive the exact information that they need through some suggestions from their friends or profiles to save their time and money. Recommend systems based on different algorithms as one of the basic ways to reach this goal through the internet have been proposed but each of them has their own advantages and disadvantages. In this study, we have selected and implemented two approaches which are Collaborative Filtering (CF) and Social Network Recommendations System (SNRS). Based on some limitations to finding a dataset which covers friendship, rating and item categories we generated it for 10 categories, 10 items, and 100 users and compared two approaches. We used Mean Absolute Error (MAE) and accuracy to compare the result of two mentioned approaches and found that the SNRS method as it is claimed to be improved version of CF works more efficiency.