Tableau Public Tableau Public is a free data storytelling application. Create and share interactive charts and graphs, stunning maps, live dashboards and fun applications in minutes, then publish anywhere on the web. Anyone can do it, it’s that easy – and it’s free.
Tag Management System
A Tag Management System (TMS) replaces hard-coded tags that are used for marketing, analytics, and testing on a website, with dynamic tags that are easier to implement and update. Every tag management system uses a container tag – a small snippet of code that allows you to dynamically insert tags into your website. You can think of container tags as buckets that hold other types of tags. You control which tags are added to the buckets using a simple web interface. In 2012, Google released a TMS called Google Tag Manager, which has quickly become one of the most widely used Tag Management Systems in the market. The benefits of tag management (and specifically Google Tag Manager) are enormous to any business, large or small. You can add and update Google AdWords tags, Google Analytics tags, DoubleClick Floodlight tags and many non-Google third-party tags directly from Google Tag Manager, instead of editing site code. This reduces errors, frees you from having to involve a webmaster, and allows you to quickly deploy tags on your site. To effectively use tag management, it’s important to understand basic concepts like the data layer, triggers, and variables.
Tagger We present a framework for efficient perceptual inference that explicitly reasons about the segmentation of its inputs and features. Rather than being trained for any specific segmentation, our framework learns the grouping process in an unsupervised manner or alongside any supervised task. By enriching the representations of a neural network, we enable it to group the representations of different objects in an iterative manner. By allowing the system to amortize the iterative inference of the groupings, we achieve very fast convergence. In contrast to many other recently proposed methods for addressing multi-object scenes, our system does not assume the inputs to be images and can therefore directly handle other modalities. For multi-digit classification of very cluttered images that require texture segmentation, our method offers improved classification performance over convolutional networks despite being fully connected. Furthermore, we observe that our system greatly improves on the semi-supervised result of a baseline Ladder network on our dataset, indicating that segmentation can also improve sample efficiency.
Tagging Systems Tagging systems have become increasingly popular. These systems enable users to add keywords (i.e., ‘tags’) to Internet resources (e.g., web pages, images, videos) without relying on a controlled vocabulary. Tagging systems have the potential to improve search, spam detection, reputation systems, and personal organization while introducing new modalities of social communication and opportunities for data mining. This potential is largely due to the social structure that underlies many of the current systems. Despite the rapid expansion of applications that support tagging of resources, tagging systems are still not well studied or understood. In this paper, we provide a short description of the academic related work to date. We offer a model of tagging systems, specifically in the context of web-based systems, to help us illustrate the possible benefits of these tools. Since many such systems already exist, we provide a taxonomy of tagging systems to help inform their analysis and design, and thus enable researchers to frame and compare evidence for the sustainability of such systems. We also provide a simple taxonomy of incentives and contribution models to inform potential evaluative frameworks. While this work does not present comprehensive empirical results, we present a preliminary study of the photosharing and tagging system Flickr to demonstrate our model and explore some of the issues in one sample system. This analysis helps us outline and motivate possible future directions of research in tagging systems.
Takeya Semantic Structure Analysis
TANKER Named Entity Recognition and Disambiguation (NERD) systems have recently been widely researched to deal with the significant growth of the Web. NERD systems are crucial for several Natural Language Processing (NLP) tasks such as summarization, understanding, and machine translation. However, there is no standard interface specification, i.e. these systems may vary significantly either for exporting their outputs or for processing the inputs. Thus, when a given company desires to implement more than one NERD system, the process is quite exhaustive and prone to failure. In addition, industrial solutions demand critical requirements, e.g., large-scale processing, completeness, versatility, and licenses. Commonly, these requirements impose a limitation, making good NERD models to be ignored by companies. This paper presents TANKER, a distributed architecture which aims to overcome scalability, reliability and failure tolerance limitations related to industrial needs by combining NERD systems. To this end, TANKER relies on a micro-services oriented architecture, which enables agile development and delivery of complex enterprise applications. In addition, TANKER provides a standardized API which makes possible to combine several NERD systems at once.
Target Diagram tdr
Targeted Learning The statistics profession is at a unique point in history. The need for valid statistical tools is greater than ever; data sets are massive, often measuring hundreds of thousands of measurements for a single subject. The field is ready to move towards clear objective benchmarks under which tools can be evaluated. Targeted learning allows
1) the full generalization and utilization of cross-validation as an estimator selection tool so that the subjective choices made by humans are now made by the machine, and
2) targeting the fitting of the probability distribution of the data toward the target parameter representing the scientific question of interest.
Targeted learning methods build machine-learning-based estimators of parameters defined as features of the probability distribution of the data, while also providing influence-curve or bootstrap-based confidence internals. The theory offers a general template for creating targeted maximum likelihood estimators for a data structure, nonparametric or semiparametric statistical model, and parameter mapping. These estimators of causal inference parameters are double robust and have a variety of other desirable statistical properties.
Targeted maximum likelihood estimation built on the loss-based ‘super learning’ system such that lower-dimensional parameters could be targeted (e.g., a marginal causal effect); the remaining bias for the (low-dimensional) target feature of the probability distribution was removed. Targeted learning for effect estimation and causal inference allows for the complete integration of machine learning advances in prediction while providing statistical inference for the target parameter(s) of interest.
Targeted Maximum Likelihood Estimation
Maximum likelihood estimation fits a model to data, minimizing a global measure, such as mean squared error (MSE). When we are interested in one particular parameter of the data distribution and consider the remaining parameters to be nuisance parameters, we would prefer an estimate that has smaller bias and variance for the targeted parameter, at the expense of increased bias and/or variance in the estimation of nuisance parameters. Targeted maximum likelihood estimation targets the MLE estimate of the parameter of interest in a way that reduces bias. This bias reduction is sometimes accompanied by an increase in the variance of the estimate, but the procedure often reduces variance as well in finite samples. Asymptotically, TMLE is maximally efficient when the model and nuisance parameters are correctly specified.
The framework of targeted maximum likelihood estimation (TMLE), introduced in van der Laan & Rubin (2006), is a principled approach for constructing asymptotically linear and efficient substitution estimators in rich infinite-dimensional models. The mechanics of TMLE hinge upon first-order approximations of the parameter of interest as a mapping on the space of probability distributions. For such approximations to hold, a second-order remainder term must tend to zero sufficiently fast. In practice, this means an initial estimator of the underlying data-generating distribution with a sufficiently large rate of convergence must be available — in many cases, this requirement is prohibitively difficult to satisfy.
Targeted Minimum Loss Based Estimation
Targeted minimum loss based estimation (TMLE) provides a template for the construction of semiparametric locally efficient double robust substitution estimators of the target parameter of the data generating distribution in a semiparametric censored data or causal inference model based on a sample of independent and identically distributed copies from this data generating distribution.
A New Approach to Hierarchical Data Analysis: Targeted Maximum Likelihood Estimation of Cluster-Based Effects Under Interference
Tarjan’s Strongly Connected Components Algorithm Tarjan’s Algorithm (named for its discoverer, Robert Tarjan) is a graph theory algorithm for finding the strongly connected components of a graph. Although it precedes it chronologically, it can be seen as an improved version of Kosaraju’s algorithm, and is comparable in efficiency to the path-based strong component algorithm.
TauCharts Javascript charts with a focus on data, design and flexibility. Free open source D3.js-based library. TauCharts is the data-focused charting library. Our goal – help people to build interactive complex visualizations easily.
Achieve Charting Zen With TauCharts
tau-False Positive Learning
Learning a classifier with control on the false-positive rate plays a critical role in many machine learning applications. Existing approaches either introduce prior knowledge dependent label cost or tune parameters based on traditional classifiers, which lack consistency in methodology because they do not strictly adhere to the false-positive rate constraint. In this paper, we propose a novel scoring-thresholding approach, tau-False Positive Learning (tau-FPL) to address this problem. We show the scoring problem which takes the false-positive rate tolerance into accounts can be efficiently solved in linear time, also an out-of-bootstrap thresholding method can transform the learned ranking function into a low false-positive classifier. Both theoretical analysis and experimental results show superior performance of the proposed tau-FPL over existing approaches.
TBATS Models
The identifier BATS is an acronym for key features of the model: Box-Cox transform, ARMA errors, Trend, and Seasonal components. The initial T in TBATS is connoting ‘trigonometric’.
t-Distributed Stochastic Neighbor Embedding
t-distributed stochastic neighbor embedding (t-SNE) is a machine learning algorithm for dimensionality reduction developed by Laurens van der Maaten and Geoffrey Hinton. It is a nonlinear dimensionality reduction technique that is particularly well suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points. The t-SNE algorithms comprises two main stages. First, t-SNE constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an infinitesimal probability of being picked. Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback-Leibler divergence between the two distributions with respect to the locations of the points in the map.
Visualizing Data using t-SNE
TEA Functions
• Transformations are functions that take existing input data and apply a function to it such that it changes form. A simple example could be combining first name, middle name, and last name fields in source data and creating a full name field that is the combination of the three sub fields.
• Enrichments are functions that take existing input data, combined with additional data sources, and create new information that could not be gleaned from either source independently. For example, one could take two different lists of individuals and use pattern matching to create relationships that are not apparent from either list itself.
• Augmentations are functions that add data of use in combination with the input data. The result is a more complete set of information that combines data from multiple sources. For example, a set of business entities gleaned from a conference attendee list, combined with Dun and Bradstreet profiles for those entities, creates a more complete set of information for each business entity.
Teacher-Student Curriculum Learning
We propose Teacher-Student Curriculum Learning (TSCL), a framework for automatic curriculum learning, where the Student tries to learn a complex task and the Teacher automatically chooses subtasks from a given set for the Student to train on. We describe a family of Teacher algorithms that rely on the intuition that the Student should practice more those tasks on which it makes the fastest progress, i.e. where the slope of the learning curve is highest. In addition, the Teacher algorithms address the problem of forgetting by also choosing tasks where the Student’s performance is getting worse. We demonstrate that TSCL matches or surpasses the results of carefully hand-crafted curricula in two tasks: addition of decimal numbers with LSTM and navigation in Minecraft. Using our automatically generated curriculum enabled to solve a Minecraft maze that could not be solved at all when training directly on solving the maze, and the learning was an order of magnitude faster than uniform sampling of subtasks.
Teaching-Learning-Based Optimization
A new efficient optimization method, called ‘Teaching-Learning-Based Optimization (TLBO)’, is proposed in this paper for the optimization of mechanical design problems. This method works on the effect of influence of a teacher on learners. Like other nature-inspired algorithms, TLBO is also a population-based method and uses a population of solutions to proceed to the global solution. The population is considered as a group of learners or a class of learners. The process of TLBO is divided into two parts: the first part consists of the ‘Teacher Phase’ and the second part consists of the ‘Learner Phase’. ‘Teacher Phase’ means learning from the teacher and ‘Learner Phase’ means learning by the interaction between learners. The basic philosophy of the TLBO method is explained in detail. To check the effectiveness of the method it is tested on five different constrained benchmark test functions with different characteristics, four different benchmark mechanical design problems and six mechanical design optimization problems which have real world applications. The effectiveness of the TLBO method is compared with the other population-based optimization algorithms based on the best solution, average solution, convergence rate and computational effort. Results show that TLBO is more effective and efficient than the other optimization methods for the mechanical design optimization problems considered. This novel optimization method can be easily extended to other engineering design optimization problems.
Teaching Learning Based Optimization Algorithm
TeKnowbase In this paper, we describe the construction of TeKnowbase, a knowledge-base of technical concepts in computer science. Our main information sources are technical websites such as Webopedia and Techtarget as well as Wikipedia and online textbooks. We divide the knowledge-base construction problem into two parts — the acquisition of entities and the extraction of relationships among these entities. Our knowledge-base consists of approximately 100,000 triples. We conducted an evaluation on a sample of triples and report an accuracy of a little over 90\%. We additionally conducted classification experiments on StackOverflow data with features from TeKnowbase and achieved improved classification accuracy.
Template Model Builder glmmTMB
Temporal Aggregation We call temporal aggregation the situation in which a variable that evolves through time can not be observed at all dates. This phenomenon arises frequently in economics, where it is very expensive to collect data on certain variables, and there is no reason to believe that economic time series are collected at the frequency required to fully capture the movements of the economy. For example, we only have quarterly observations on GNP, but it is reasonable to believe that the behavior of GNP within a quarter carries relevant information about the structure of the economy.
Temporal Automatic Relation Discovery in Sequences
Recent empirical results on long-term dependency tasks have shown that neural networks augmented with an external memory can learn the long-term dependency tasks more easily and achieve better generalization than vanilla recurrent neural networks (RNN). We suggest that memory augmented neural networks can reduce the effects of vanishing gradients by creating shortcut (or wormhole) connections. Based on this observation, we propose a novel memory augmented neural network model called TARDIS (Temporal Automatic Relation Discovery in Sequences). The controller of TARDIS can store a selective set of embeddings of its own previous hidden states into an external memory and revisit them as and when needed. For TARDIS, memory acts as a storage for wormhole connections to the past to propagate the gradients more effectively and it helps to learn the temporal dependencies. The memory structure of TARDIS has similarities to both Neural Turing Machines (NTM) and Dynamic Neural Turing Machines (D-NTM), but both read and write operations of TARDIS are simpler and more efficient. We use discrete addressing for read/write operations which helps to substantially to reduce the vanishing gradient problem with very long sequences. Read and write operations in TARDIS are tied with a heuristic once the memory becomes full, and this makes the learning problem simpler when compared to NTM or D-NTM type of architectures. We provide a detailed analysis on the gradient propagation in general for MANNs. We evaluate our models on different long-term dependency tasks and report competitive results in all of them.
Temporal Database A temporal database is a database with built-in support for handling data involving time, being related to Slowly changing dimension concept, for example a temporal data model and a temporal version of Structured Query Language (SQL). More specifically the temporal aspects usually include valid time and transaction time. These attributes can be combined to form bitemporal data. Valid time is the time period during which a fact is true with respect to the real world. Transaction time is the time period during which a fact stored in the database is considered to be true. Bitemporal data combines both Valid and Transaction Time. It is possible to have timelines other than Valid Time and Transaction Time, such as Decision Time, in the database. In that case the database is called a multitemporal database as opposed to a bitemporal database. However, this approach introduces additional complexities such as dealing with the validity of (foreign) keys. Temporal databases are in contrast to current databases, which store only facts which are believed to be true at the current time.
Temporal Difference Learning
Temporal difference (TD) learning is a prediction method. It has been mostly used for solving the reinforcement learning problem. ‘TD learning is a combination of Monte Carlo ideas and dynamic programming (DP) ideas.’ TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy. TD is related to dynamic programming techniques because it approximates its current estimate based on previously learned estimates (a process known as bootstrapping). The TD learning algorithm is related to the temporal difference model of animal learning.
Adaptive Lambda Least-Squares Temporal Difference Learning
Temporal Event Graph
Temporal networks are increasingly being used to model the interactions of complex systems. Most studies require the temporal aggregation of edges (or events) into discrete time steps to perform analysis. In this article we describe a static, lossless, and unique representation of a temporal network, the temporal event graph (TEG). The TEG describes the temporal network in terms of both the inter-event time and two-event temporal motif distributions. By considering these distributions in unison we provide a new method to characterise the behaviour of individuals and collectives in temporal networks as well as providing a natural decomposition of the network. We illustrate the utility of the TEG by providing examples on both synthetic and real temporal networks.
Temporal Exponential Random Graph Model
Temporal Hierarchical Clustering We study hierarchical clusterings of metric spaces that change over time. This is a natural geometric primitive for the analysis of dynamic data sets. Specifically, we introduce and study the problem of finding a temporally coherent sequence of hierarchical clusterings from a sequence of unlabeled point sets. We encode the clustering objective by embedding each point set into an ultrametric space, which naturally induces a hierarchical clustering of the set of points. We enforce temporal coherence among the embeddings by finding correspondences between successive pairs of ultrametric spaces which exhibit small distortion in the Gromov-Hausdorff sense. We present both upper and lower bounds on the approximability of the resulting optimization problems.
Temporal Multinomial Mixture
Evolutionary clustering aims at capturing the temporal evolution of clusters. This issue is particularly important in the context of social media data that are naturally temporally driven. In this paper, we propose a new probabilistic model-based evolutionary clustering technique. The Temporal Multinomial Mixture (TMM) is an extension of classical mixture model that optimizes feature co-occurrences in the trade-off with temporal smoothness. Our model is evaluated for two recent case studies on opinion aggregation over time. We compare four different probabilistic clustering models and we show the superiority of our proposal in the task of instance-oriented clustering.
Temporal Network Autocorrelation Models
Temporal Network Centrality
Temporal Overdrive Recurrent Neural Network In this work we present a novel recurrent neural network architecture designed to model systems characterized by multiple characteristic timescales in their dynamics. The proposed network is composed by several recurrent groups of neurons that are trained to separately adapt to each timescale, in order to improve the system identification process. We test our framework on time series prediction tasks and we show some promising, preliminary results achieved on synthetic data. To evaluate the capabilities of our network, we compare the performance with several state-of-the-art recurrent architectures.
Temporal Regularized Matrix Factorization
Matrix factorization approaches have been applied to a variety of applications, from recommendation systems to multi-label learning. Standard low rank matrix factorization methods fail in cases when the data can be modeled as a time series, since they do not take into account the dependencies among factors, while EM algorithms designed for time series data are inapplicable to large multiple time series data. To overcome this, matrix factorization approaches are augmented with dynamic linear model based regularization frameworks. A major drawback in such approaches is that the exact dependencies between the latent factors are assumed to be known. In this paper, we introduce a Temporal Regularized Matrix Factorization (TRMF) method, an efficient alternating minimization scheme that not only learns the latent time series factors, but also the dependencies among the latent factors. TRMF is highly general, and subsumes several existing matrix factorization approaches for time series data. We make interesting connections to graph based matrix factorization methods in the context of learning the dependencies. Experiments on both real and synthetic data show that TRMF is highly scalable, and outperforms several existing approaches used for common large scale time series tasks.
TensiStrength Computer systems need to be able to react to stress in order to perform optimally on some tasks. This article describes TensiStrength, a system to detect the strength of stress and relaxation expressed in social media text messages. TensiStrength uses a lexical approach and a set of rules to detect direct and indirect expressions of stress or relaxation, particularly in the context of transportation. It is slightly more effective than a comparable sentiment analysis program, although their similar performances occur despite differences on almost half of the tweets gathered. The effectiveness of TensiStrength depends on the nature of the tweets classified, with tweets that are rich in stress-related terms being particularly problematic. Although generic machine learning methods can give better performance than TensiStrength overall, they exploit topic-related terms in a way that may be undesirable in practical applications and that may not work as well in more focused contexts. In conclusion, TensiStrength and generic machine learning approaches work well enough to be practical choices for intelligent applications that need to take advantage of stress information, and the decision about which to use depends on the nature of the texts analysed and the purpose of the task.
Tensor Graphical Lasso
The Bigraphical Lasso estimator was proposed to parsimoniously model the precision matrices of matrix-normal data based on the Cartesian product of graphs. By enforcing extreme sparsity (the number of parameters) and explicit structures on the precision matrix, this model has excellent potential for improving scalability of the computation and interpretability of complex data analysis. As a result, this model significantly reduces the size of the sample in order to learn the precision matrices, and hence the conditional probability models along different coordinates such as space, time and replicates. In this work, we extend the Bigraphical Lasso (BiGLasso) estimator to the TEnsor gRAphical Lasso (TeraLasso) estimator and propose an analogous method for modeling the precision matrix of tensor-valued data. We establish consistency for both the BiGLasso and TeraLasso estimators and obtain the rates of convergence in the operator and Frobenius norm for estimating the precision matrix. We design a scalable gradient descent method for solving the objective function and analyze the computational convergence rate, showing that the composite gradient descent algorithm is guaranteed to converge at a geometric rate to the global minimizer. Finally, we provide simulation evidence and analysis of a meteorological dataset, showing that we can recover graphical structures and estimate the precision matrices, as predicted by theory.
Tensor Methods Tensors are generalizations of matrices that let you look beyond pairwise relationships to higher-dimensional models (a matrix is a second-order tensor). For instance, one can examine patterns between any three (or more) dimensions in data sets. In a text mining application, this leads to models that incorporate the co-occurrence of three or more words, and in social networks, you can use tensors to encode arbitrary degrees of influence (e.g., ‘friend of friend of friend’ of a user).
Tensors, as generalizations of vectors and matrices, have become increasingly popular in different areas of machine learning and data mining, where they are employed to approach a diverse number of difficult learning and analysis tasks. Prominent examples include learning on multi-relational data and large-scale knowledge bases, recommendation systems, computer vision, mining boolean data, neuroimaging or the analysis of time-varying networks. The success of tensors methods is strongly related to their ability to efficiently model, analyse and predict data with multiple modalities. To address specific challenges and problems, a variety of methods has been developed in different fields of application.
Tensor Network Language Model We propose a new statistical model suitable for machine learning tasks of systems with long distance correlations such as human languages. The model is based on directed acyclic graph decorated by multi-linear tensor maps in the vertices and vector spaces in the edges, called tensor network. Such tensor networks have been previously employed for effective numerical computation of the renormalization group flow on the space of effective quantum field theories and lattice models of statistical mechanics. We provide explicit algebro-geometric analysis of the parameter moduli space for tree graphs, discuss model properties and applications such as statistical translation.
Tensor Product Generation Network
We present a new tensor product generation network (TPGN) that generates natural language descriptions for images. The model has a novel architecture that instantiates a general framework for encoding and processing symbolic structure through neural network computation. This framework is built on Tensor Product Representations (TPRs). We evaluated the proposed TPGN on the MS COCO image captioning task. The experimental results show that the TPGN outperforms the LSTM based state-of-the-art baseline with a significant margin. Further, we show that our caption generation model can be interpreted as generating sequences of grammatical categories and retrieving words by their categories from a plan encoded as a distributed representation.
Tensor Rank Decomposition In multilinear algebra, the tensor rank decomposition or canonical polyadic decomposition (CPD) may be regarded as a generalization of the matrix singular value decomposition (SVD) to tensors, which has found application in statistics, signal processing, psychometrics, linguistics and chemometrics. It was introduced by Hitchcock in 1927 and later rediscovered several times, notably in psychometrics. For this reason, the tensor rank decomposition is sometimes historically referred to as PARAFAC or CANDECOMP.
Tensor Regression Networks
To date, most convolutional neural network architectures output predictions by flattening 3rd-order activation tensors, and applying fully-connected output layers. This approach has two drawbacks: (i) we lose rich, multi-modal structure during the flattening process and (ii) fully-connected layers require many parameters. We present the first attempt to circumvent these issues by expressing the output of a neural network directly as the the result of a multi-linear mapping from an activation tensor to the output. By imposing low-rank constraints on the regression tensor, we can efficiently solve problems for which existing solutions are badly parametrized. Our proposed tensor regression layer replaces flattening operations and fully-connected layers by leveraging multi-modal structure in the data and expressing the regression weights via a low rank tensor decomposition. Additionally, we combine tensor regression with tensor contraction to further increase efficiency. Augmenting the VGG and ResNet architectures, we demonstrate large reductions in the number of parameters with negligible impact on performance on the ImageNet dataset.
Tensor Robust Principal Component Analysis
Tensor robust principal component analysis (TRPCA) has received a substantial amount of attention in various fields. Most existing methods, normally relying on tensor nuclear norm minimization, need to pay an expensive computational cost due to multiple singular value decompositions (SVDs) at each iteration. To overcome the drawback, we propose a scalable and efficient method, named Parallel Active Subspace Decomposition (PASD), which divides the unfolding along each mode of the tensor into a columnwise orthonormal matrix (active subspace) and another small-size matrix in parallel. Such a transformation leads to a nonconvex optimization problem in which the scale of nulcear norm minimization is generally much smaller than that in the original problem. Furthermore, we introduce an alternating direction method of multipliers (ADMM) method to solve the reformulated problem and provide rigorous analyses for its convergence and suboptimality. Experimental results on synthetic and real-world data show that our algorithm is more accurate than the state-of-the-art approaches, and is orders of magnitude faster.
Tensor Switching Networks
We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks.
Tensor Train Rank Minimization Tensor train (TT) decomposition provides a space-efficient representation for higher-order tensors. Despite its advantage, we face two crucial limitations when we apply the TT decomposition to machine learning problems: the lack of statistical theory and of scalable algorithms. In this paper, we address the limitations. First, we introduce a convex relaxation of the TT decomposition problem and derive its error bound for the tensor completion task. Next, we develop an alternating optimization method with a randomization technique, in which the time complexity is as efficient as the space complexity is. In experiments, we numerically confirm the derived bounds and empirically demonstrate the performance of our method with a real higher-order tensor.
Tensor2Tensor Deep Learning (DL) has enabled the rapid advancement of many useful technologies, such as machine translation, speech recognition and object detection. In the research community, one can find code open-sourced by the authors to help in replicating their results and further advancing deep learning. However, most of these DL systems use unique setups that require significant engineering effort and may only work for a specific problem or architecture, making it hard to run new experiments and compare the results.
Today, we are happy to release Tensor2Tensor (T2T), an open-source system for training deep learning models in TensorFlow. T2T facilitates the creation of state-of-the art models for a wide variety of ML applications, such as translation, parsing, image captioning and more, enabling the exploration of various ideas much faster than previously possible. This release also includes a library of datasets and models, including the best models from a few recent papers (Attention Is All You Need, Depthwise Separable Convolutions for Neural Machine Translation and One Model to Learn Them All) to help kick-start your own DL research.
TensorFlow TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at
TensorFlow Agents We introduce TensorFlow Agents, an efficient infrastructure paradigm for building parallel reinforcement learning algorithms in TensorFlow. We simulate multiple environments in parallel, and group them to perform the neural network computation on a batch rather than individual observations. This allows the TensorFlow execution engine to parallelize computation, without the need for manual synchronization. Environments are stepped in separate Python processes to progress them in parallel without interference of the global interpreter lock. As part of this project, we introduce BatchPPO, an efficient implementation of the proximal policy optimization algorithm. By open sourcing TensorFlow Agents, we hope to provide a flexible starting point for future projects that accelerates future research in the field.
Tensorflow Boosted Trees
TF Boosted Trees (TFBT) is a new open-sourced frame-work for the distributed training of gradient boosted trees. It is based on TensorFlow, and its distinguishing features include a novel architecture, automatic loss differentiation, layer-by-layer boosting that results in smaller ensembles and faster prediction, principled multi-class handling, and a number of regularization techniques to prevent overfitting.
Tensorial Mixture Models We introduce a generative model, we call Tensorial Mixture Models (TMMs) based on mixtures of basic component distributions over local structures (e.g. patches in an image) where the dependencies between the local-structures are represented by a ‘priors tensor’ holding the prior probabilities of assigning a component distribution to each local-structure. In their general form, TMMs are intractable as the prior tensor is typically of exponential size. However, when the priors tensor is decomposed it gives rise to an arithmetic circuit which in turn transforms the TMM into a Convolutional Arithmetic Circuit (ConvAC). A ConvAC corresponds to a shallow (single hidden layer) network when the priors tensor is decomposed by a CP (sum of rank-1) approach and corresponds to a deep network when the decomposition follows the Hierarchical Tucker (HT) model. The ConvAC representation of a TMM possesses several attractive properties. First, the inference is tractable and is implemented by a forward pass through a deep network. Second, the architectural design of the model follows the deep networks community design, i.e., the structure of TMMs is determined by just two easily understood factors: size of pooling windows and number of channels. Finally, we demonstrate the effectiveness of our model when tackling the problem of classification with missing data, leveraging TMMs unique ability of tractable marginalization which leads to optimal classifiers regardless of the missingness distribution.
Tensorized LSTM Long Short-Term Memory (LSTM) is a popular approach to boosting the ability of Recurrent Neural Networks to store longer term temporal information. The capacity of an LSTM network can be increased by widening and adding layers. However, usually the former introduces additional parameters, while the latter increases the runtime. As an alternative we propose the Tensorized LSTM in which the hidden states are represented by tensors and updated via a cross-layer convolution. By increasing the tensor size, the network can be widened efficiently without additional parameters since the parameters are shared across different locations in the tensor; by delaying the output, the network can be deepened implicitly with little additional runtime since deep computations for each timestep are merged into temporal computations of the sequence. Experiments conducted on five challenging sequence learning tasks show the potential of the proposed model.
TensorLayer Deep learning has enabled major advances in the fields of computer vision, natural language processing, and multimedia among many others. Developing a deep learning system is arduous and complex, as it involves constructing neural network architectures, managing training/trained models, tuning optimization process, preprocessing and organizing data, etc. TensorLayer is a versatile Python library that aims at helping researchers and engineers efficiently develop deep learning systems. It offers rich abstractions for neural networks, model and data management, and parallel workflow mechanism. While boosting efficiency, TensorLayer maintains both performance and scalability. TensorLayer was released in September 2016 on GitHub, and has helped people from academia and industry develop real-world applications of deep learning.
TensorLog We present an implementation of a probabilistic first-order logic called TensorLog, in which classes of logical queries are compiled into differentiable functions in a neural-network infrastructure such as Tensorflow or Theano. This leads to a close integration of probabilistic logical reasoning with deep-learning infrastructure: in particular, it enables high-performance deep learning frameworks to be used for tuning the parameters of a probabilistic logic. Experimental results show that TensorLog scales to problems involving hundreds of thousands of knowledge-base triples and tens of thousands of examples.
TensorLy Tensor methods are gaining increasing traction in machine learning. However, there are scant to no resources available to perform tensor learning and decomposition in Python. To answer this need we developed TensorLy. TensorLy is a state of the art general purpose library for tensor learning. Written in Python, it aims at following the same standards adopted by the main projects of the Python scientific community and fully integrating with these. It allows for fast and straightforward tensor decomposition and learning and comes with exhaustive tests, thorough documentation and minimal dependencies. It can be easily extended and its BSD licence makes it suitable for both academic and commercial applications. TensorLy is available at https://…/tensorly.
Tensor-Train RNN
We present Tensor-Train RNN (TT-RNN), a novel family of neural sequence architectures for multivariate forecasting in environments with nonlinear dynamics. Long-term forecasting in such systems is highly challenging, since there exist long-term temporal dependencies, higher-order correlations and sensitivity to error propagation. Our proposed tensor recurrent architecture addresses these issues by learning the nonlinear dynamics directly using higher order moments and high-order state transition functions. Furthermore, we decompose the higher-order structure using the tensor-train (TT) decomposition to reduce the number of parameters while preserving the model performance. We theoretically establish the approximation properties of Tensor-Train RNNs for general sequence inputs, and such guarantees are not available for usual RNNs. We also demonstrate significant long-term prediction improvements over general RNN and LSTM architectures on a range of simulated environments with nonlinear dynamics, as well on real-world climate and traffic data.
Term Document Matrix A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. There are various schemes for determining the value that each entry in the matrix should take. One such scheme is tf-idf. They are useful in the field of natural language processing.
Term Frequency – Inverse Document Frequency
tf-idf, short for term frequency-inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others. Variations of the tf-idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document’s relevance given a user query. tf-idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification. One of the simplest ranking functions is computed by summing the tf-idf for each query term; many more sophisticated ranking functions are variants of this simple model.
Termbase Exchange TermBase eXchange (TBX) ist eine XML-basierte Auszeichnungssprache für den Austausch von Terminologiedaten, die meist in Terminologiedatenbanken verwaltet werden. Anwendungen, die dieses Format unterstützen, können Terminologiebestände untereinander austauschen und pflegen. Ursprünglich ein Standard der Localization Industry Standards Association (LISA), nahm sich die ISO des Standards an und überarbeitete und spezifizierte ihn in ISO 30042, welcher sich auf ISO 12620, ISO 12200 und ISO 16642 stützt.
Terminology Extraction Terminology mining, term extraction, term recognition, or glossary extraction, is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus. In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topic-driven web crawlers, web services, recommender systems, etc. The development of terminology extraction is essential to the language industry. One of the first steps to model the knowledge domain of a virtual community is to collect a vocabulary of domain-relevant terms, constituting the linguistic surface manifestation of domain concepts. Several methods to automatically extract technical terms from domain-specific document warehouses have been described in the literature. Typically, approaches to automatic term extraction make use of linguistic processors (part of speech tagging, phrase chunking) to extract terminological candidates, i.e. syntactically plausible terminological noun phrases, NPs (e.g. compounds ‘credit card’, adjective-NPs ‘local tourist information office’, and prepositional-NPs ‘board of directors’ – in English, the first two constructs are the most frequent). Terminological entries are then filtered from the candidate list using statistical and machine learning methods. Once filtered, because of their low ambiguity and high specificity, these terms are particularly useful for conceptualizing a knowledge domain or for supporting the creation of a domain ontology. Furthermore, terminology extraction is a very useful starting point for semantic similarity, knowledge management, human translation and machine translation, etc.
Ternary Neural Networks
The computation and storage requirements for Deep Neural Networks (DNNs) are usually high. This issue limit their deployability on ubiquitous computing devices such as smart phones or wearables. In this paper, we propose ternary neural networks (TNNs) in order to make deep learning more resource-efficient. We train these TNNs using a teacher-student approach. Using only ternary weights and ternary neurons, with a step activation function of two-thresholds, the student ternary network learns to mimic the behaviour of its teacher network. We propose a novel, layer-wise greedy methodology for training TNNs. During training, a ternary neural network inherently prunes the smaller weights by setting them to zero. This makes them even more compact thus more resource-friendly. We devise a purpose-built hardware design for TNNs and implement it on FPGA. The benchmark results with our purpose-built hardware running TNNs reveal that, with only 1.24 microjoules per image, we can achieve 97.76% accuracy with 5.37 microsecond latency and with a rate of 255K images per second on MNIST.
Ternary Plot / Ternary Diagram A ternary plot, ternary graph, triangle plot, simplex plot, or de Finetti diagram is a barycentric plot on three variables which sum to a constant. It graphically depicts the ratios of the three variables as positions in an equilateral triangle.
Ternary Residual Networks Sub-8-bit representation of DNNs incur some noticeable loss of accuracy despite rigorous (re)training at low-precision. Such loss of accuracy essentially makes them equivalent to a much shallower counterpart, diminishing the power of being deep networks. To address this problem of accuracy drop we introduce the notion of \textit{residual networks} where we add more low-precision edges to sensitive branches of the sub-8-bit network to compensate for the lost accuracy. Further, we present a perturbation theory to identify such sensitive edges. Aided by such an elegant trade-off between accuracy and model size, the 8-2 architecture (8-bit activations, ternary weights), enhanced by residual ternary edges, turns out to be sophisticated enough to achieve similar accuracy as 8-8 representation ($\sim 1\%$ drop from our FP-32 baseline), despite $\sim 1.6\times$ reduction in model size, $\sim 26\times$ reduction in number of multiplications , and potentially $\sim 2\times$ inference speed up comparing to 8-8 representation, on the state-of-the-art deep network ResNet-101 pre-trained on ImageNet dataset. Moreover, depending on the varying accuracy requirements in a dynamic environment, the deployed low-precision model can be upgraded/downgraded on-the-fly by partially enabling/disabling residual connections. For example, disabling the least important residual connections in the above enhanced network, the accuracy drop is $\sim 2\%$ (from our FP-32 baseline), despite $\sim 1.9\times$ reduction in model size, $\sim 32\times$ reduction in number of multiplications, and potentially $\sim 2.3\times$ inference speed up comparing to 8-8 representation. Finally, all the ternary connections are sparse in nature, and the residual ternary conversion can be done in a resource-constraint setting without any low-precision (re)training and without accessing the data.
Ternary Weight Neural Networks
We introduce ternary weight networks (TWNs) – neural networks with weights constrained to +1, 0 and -1. The Euclidian distance between full (float or double) precision weights and the ternary weights along with a scaling factor is minimized. Besides, a threshold-based ternary function is optimized to get an approximated solution which can be fast and easily computed. TWNs have stronger expressive abilities than the recently proposed binary precision counterparts and are thus more effective than the latter. Meanwhile, TWNs achieve up to 16× or 32× model compression rate and need fewer multiplications compared with the full precision counterparts. Benchmarks on MNIST, CIFAR-10, and large scale ImageNet datasets show that the performance of TWNs is only slightly worse than the full precision counterparts but outperforms the analogous binary precision counterparts a lot.
“Ternary Neural Networks”
TerpreT We study machine learning formulations of inductive program synthesis; that is, given input-output examples, synthesize source code that maps inputs to corresponding outputs. Our key contribution is TerpreT, a domain-specific language for expressing program synthesis problems. A TerpreT model is composed of a specification of a program representation and an interpreter that describes how programs map inputs to outputs. The inference task is to observe a set of input-output examples and infer the underlying program. From a TerpreT model we automatically perform inference using four different back-ends: gradient descent (thus each TerpreT model can be seen as defining a differentiable interpreter), linear program (LP) relaxations for graphical models, discrete satisfiability solving, and the Sketch program synthesis system. TerpreT has two main benefits. First, it enables rapid exploration of a range of domains, program representations, and interpreter models. Second, it separates the model specification from the inference algorithm, allowing proper comparisons between different approaches to inference. We illustrate the value of TerpreT by developing several interpreter models and performing an extensive empirical comparison between alternative inference algorithms on a variety of program models. To our knowledge, this is the first work to compare gradient-based search over program space to traditional search-based alternatives. Our key empirical finding is that constraint solvers dominate the gradient descent and LP-based formulations. This is a workshop summary of a longer report at arXiv:1608.04428
Test for Excess Significance
In any series of typically-powered experiments, we expect some to fail to be non-significant due to sampling error, even if a true effect exists. If we see a series of five experiments, and they are all significant, one thinks that either they are either very high powered, the authors got lucky, or there are some nonsignificant studies missing. For many sets of studies, the first seems implausible because the effect sizes are small; the last is important, because if it is true then the picture we get of the results is misleading.
Test Set A test set is a set of data used in various areas of information science to assess the strength and utility of a predictive relationship. Test sets are used in artificial intelligence, machine learning, genetic programming and statistics. In all these fields, a test set has much the same role.
Test-Based Bayes Factor
Text Data Processing
Text Mining Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. ‘High quality’ in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).
TextBoxes This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression. TextBoxes outperforms competing methods in terms of text localization accuracy and is much faster, taking only 0.09s per image in a fast implementation. Furthermore, combined with a text recognizer, TextBoxes significantly outperforms state-of-the-art approaches on word spotting and end-to-end text recognition tasks.
Textology A Textology is a graph of word clusters connected by co-occurrence relations.
Speech Synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely ‘synthetic’ voice output. The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood clearly. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1990s. Overview of a typical TTS system Automatic announcement Menu 0:00 A synthetic voice announcing an arriving train in Sweden. Problems playing this file? See media help. Problems playing this file? See media help. A text-to-speech system (or ‘engine’) is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end – often referred to as the synthesizer – then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations), which is then imposed on the output speech.
Textures.js SVG patterns for Data Visualization
The House Of inteRactions
We introduce The House Of inteRactions (THOR), a framework for visual AI research, available at AI2-THOR consists of near photo-realistic 3D indoor scenes, where AI agents can navigate in the scenes and interact with objects to perform tasks. AI2-THOR enables research in many different domains including but not limited to deep reinforcement learning, imitation learning, learning by interaction, planning, visual question answering, unsupervised representation learning, object detection and segmentation, and learning models of cognition. The goal of AI2-THOR is to facilitate building visually intelligent models and push the research forward in this domain.
Theano Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features:
• tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions.
• transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
• efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
• speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
• dynamic C code generation – Evaluate expressions faster.
• extensive unit-testing and self-verification – Detect and diagnose many types of mistake.
Theano has been powering large-scale computationally intensive scientific investigations since 2007. But it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal).
Thematic Map Thematic maps are geographical maps in which statistical data are visualized. A thematic map is a type of map especially designed to show a particular theme connected with a specific geographic area. These maps ‘can portray physical, social, political, cultural, economic, sociological, agricultural, or any other aspects of a city, state, region, nation, or continent’.
Theta Method Accurate and robust forecasting methods for univariate time series are very important when the objective is to produce estimates for a large number of time series. In this context, the Theta method called researchers attention due its performance in the largest up-to-date forecasting competition, the M3-Competition. Theta method proposes the decomposition of the deseasonalised data into two ‘theta lines’. The first theta line removes completely the curvatures of the data, thus being a good estimator of the long-term trend component. The second theta line doubles the curvatures of the series, as to better approximate the short-term behaviour.
Thick Data Thick Data: ethnographic approaches that uncover the meaning behind Big Data visualization and analysis. Thick Data analysis primarily relies on human brain power to process a small “N” while big data analysis requires computational power (of course with humans writing the algorithms) to process a large “N”. Big Data reveals insights with a particular range of data points, while Thick Data reveals the social context of and connections between data points. Big Data delivers numbers; thick data delivers stories. Big data relies on machine learning; thick data relies on human learning.
Thingscoop Thingscoop is a command-line utility for analyzing videos semantically – that means searching, filtering, and describing videos based on objects, places, and other things that appear in them. When you first run thingscoop on a video file, it uses a convolutional neural network to create an ‘index’ of what’s contained in the every second of the input by repeatedly performing image classification on a frame-by-frame basis. Once an index for a video file has been created, you can search (i.e. get the start and end times of the regions in the video matching the query) and filter (i.e. create a supercut of the matching regions) the input using arbitrary queries. Thingscoop uses a very basic query language that lets you to compose queries that test for the presence or absence of labels with the logical operators ! (not), || (or) and && (and). For example, to search a video the presence of the sky and the absence of the ocean: thingscoop search ‘sky && !ocean’ <file>. Right now two models are supported by thingscoop: vgg_imagenet uses the architecture described in ‘Very Deep Convolutional Networks for Large-Scale Image Recognition’ to recognize objects from the ImageNet database, and googlenet_places uses the architecture described in ‘Going Deeper with Convolutions’ to recognize settings and places from the MIT Places database. You can specify which model you’d like to use by running thingscoop models use <model>, where <model> is either vgg_imagenet or googlenet_places. More models will be added soon. Thingscoop is based on Caffe, an open-source deep learning framework.
Thompson Sampling In artificial intelligence, Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists in choosing the action that maximizes the expected reward with respect to a randomly drawn belief.
Thouless-Anderson-Palmer Gibbs Free Energy
(TAP Gibbs Free Energy)
The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization ofMinka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem.
Threading Building Blocks
Threading Building Blocks (TBB) is a C++ template library developed by Intel for writing software programs that take advantage of multi-core processors. The library consists of data structures and algorithms that allow a programmer to avoid some complications arising from the use of native threading packages such as POSIX threads, Windows threads, or the portable Boost Threads in which individual threads of execution are created, synchronized, and terminated manually. Instead the library abstracts access to the multiple processors by allowing the operations to be treated as “tasks”, which are allocated to individual cores dynamically by the library’s run-time engine, and by automating efficient use of the CPU cache. A TBB program creates, synchronizes and destroys graphs of dependent tasks according to algorithms, i.e. high-level parallel programming paradigms (a.k.a. Algorithmic Skeletons). Tasks are then executed respecting graph dependencies. This approach groups TBB in a family of solutions for parallel programming aiming to decouple the programming from the particulars of the underlying machine.
Three-Mode Principal Components Analysis In multivariate analysis the data have usually two way and/or two modes. This book treats prinicipal component analysis of data which can be characterised by three-ways and/or modes, like subjects by variables by conditions or occasions. The book extends the work on three-mode factor analysis by Tucker and the work on individual differences scaling by Carroll and colleagues. The many examples give a true feeling of the working of the techniques.
THUMT This paper introduces THUMT, an open-source toolkit for neural machine translation (NMT) developed by the Natural Language Processing Group at Tsinghua University. THUMT implements the standard attention-based encoder-decoder framework on top of Theano and supports three training criteria: maximum likelihood estimation, minimum risk training, and semi-supervised training. It features a visualization tool for displaying the relevance between hidden states in neural networks and contextual words, which helps to analyze the internal workings of NMT. Experiments on Chinese-English datasets show that THUMT using minimum risk training significantly outperforms GroundHog, a state-of-the-art toolkit for NMT.
Tibble Tibbles are a modern take on data frames. They keep the features that have stood the test of time, and drop the features that used to be convenient but are now frustrating (i.e. converting character vectors to factors).
tick tick is a statistical learning library for Python~3, with a particular emphasis on time-dependent models, such as point processes, and tools for generalized linear models and survival analysis. The core of the library is an optimization module providing model computational classes, solvers and proximal operators for regularization. tick relies on a C++ implementation and state-of-the-art optimization algorithms to provide very fast computations in a single node multi-core setting. Source code and documentation can be downloaded from https://…/tick
Tidy Data Tidy datasets are easy to manipulate, model and visualise, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.
Tiered Sampling We introduce Tiered Sampling, a novel technique for approximate counting sparse motifs in massive graphs whose edges are observed in a stream. Our technique requires only a single pass on the data and uses a memory of fixed size $M$, which can be magnitudes smaller than the number of edges. Our methods addresses the challenging task of counting sparse motifs – sub-graph patterns that have low probability to appear in a sample of $M$ edges in the graph, which is the maximum amount of data available to the algorithms in each step. To obtain an unbiased and low variance estimate of the count we partition the available memory to tiers (layers) of reservoir samples. While the base layer is a standard reservoir sample of edges, other layers are reservoir samples of sub-structures of the desired motif. By storing more frequent sub-structures of the motif, we increase the probability of detecting an occurrence of the sparse motif we are counting, thus decreasing the variance and error of the estimate. We demonstrate the advantage of our method in the specific applications of counting sparse 4 and 5-cliques in massive graphs. We present a complete analytical analysis and extensive experimental results using both synthetic and real-world data. Our results demonstrate the advantage of our method in obtaining high-quality approximations for the number of 4 and 5-cliques for large graphs using a very limited amount of memory, significantly outperforming the single edge sample approach for counting sparse motifs in large scale graphs.
Tight Semi-Nonnegative Matrix Factorization The nonnegative matrix factorization is a widely used, flexible matrix decomposition, finding applications in biology, image and signal processing and information retrieval, among other areas. Here we present a related matrix factorization. A multi-objective optimization problem finds conical combinations of templates that approximate a given data matrix. The templates are chosen so that as far as possible only the initial data set can be represented this way. However, the templates are not required to be nonnegative nor convex combinations of the original data.
Tikhonov Regularization Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of ill-posed problems. In statistics, the method is known as ridge regression, and with multiple independent discoveries, it is also variously known as the Tikhonov-Miller method, the Phillips-Twomey method, the constrained linear inversion method, and the method of linear regularization. It is related to the Levenberg-Marquardt algorithm for non-linear least-squares problems.
Time Oriented Language
TOL is the Time Oriented Language. It is a programming language dedicated to the world of statistics and focused on time series analysis and stochastic processes. It is a declarative language based on two key features: simple syntactical rules and powerful set of extensible data types and functions. TOL is callable by a small text console, but there is also a graphical interface to easily handle all language’s tools and functions, providing powerful graphical capacities. TOL is distributed under the GNU GPL license.
Time Series Analysis / Time Series A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones Industrial Average and the annual flow volume of the Nile River at Aswan. Time series are very frequently plotted via line charts. Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, and communications engineering.
Time Series Cointegrated System
Time Series Database
A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range). In some fields these time series are called profiles, curves, or traces. A time series of stock prices might be called a price curve. A time series of energy consumption might be called a load profile. A log of temperature values over time might be called a temperature trace. Despite the disparate names, many of the same mathematical operations, queries, or database transactions are useful for analysing all of them. The implementation of a database that can correctly, reliably, and efficiently implement these operations must be specialized for time-series data. TSDBs are databases that are optimized for time series data. Software with complex logic or business rules and high transaction volume for time series data may not be practical with traditional relational database management systems. Flat file databases are not a viable option either, if the data and transaction volume reaches a maximum threshold determined by the capacity of individual servers (processing power and storage capacity). Queries for historical data, replete with time ranges and roll ups and arbitrary time zone conversions are difficult in a relational database. Compositions of those rules are even more difficult. This is a problem compounded by the free nature of relational systems themselves. Many relational systems are often not modelled correctly with respect to time series data. TSDBs on the other hand impose a model and this allows them to provide more features for doing so. Ideally, these repositories are often natively implemented using specialized database algorithms. However, it is possible to store time series as binary large objects (BLOBs) in a relational database or by using a VLDB approach coupled with a pure star schema. Efficiency is often improved if time is treated as a discrete quantity rather than as a continuous mathematical dimension. Database joins across multiple time series data sets is only practical when the time tag associated with each data entry spans the same set of discrete times for all data sets across which the join is performed.
Time-Lapse Mining We introduce an approach for synthesizing time-lapse videos of popular landmarks from large community photo collections. The approach is completely automated and leverages the vast quantity of photos available online. First, we cluster 86 million photos into landmarks and popular viewpoints. Then, we sort the photos by date and warp each photo onto a common viewpoint. Finally, we stabilize the appearance of the sequence to compensate for lighting effects and minimize flicker. Our resulting time-lapses show diverse changes in the world’s most popular sites, like glaciers shrinking, skyscrapers being constructed, and waterfalls changing course.
Time-to-Event Data Time-to-event data, also often referred to as survival data, arise when interest is focused on the time elapsing before an event is experienced. By events we mean occurrences that are of interest in scientific studies from various disciplines such as medicine, epidemiology, demography, biology, sociology, economics, engineering, et cetera. Examples of such events are: death, onset of infection, divorce, unemployment, and failure of a mechanical device. All of these may be subject to scientific interest where one tries to understand their cause or establish risk factors.
Time-Weighted Dynamic Time Warping
Dynamic time warping (DTW), which finds the minimum path by providing non-linear alignments between two time series, has been widely used as a distance measure for time series classification and clustering. However, DTW does not account for the relative importance regarding the phase difference between a reference point and a testing point. This may lead to misclassification especially in applications where the shape similarity between two sequences is a major consideration for an accurate recognition. Therefore, we propose a novel distance measure, called a weighted DTW (WDTW), which is a penalty-based DTW. Our approach penalizes points with higher phase difference between a reference point and a testing point in order to prevent minimum distance distortion caused by outliers. The rationale underlying the proposed distance measure is demonstrated with some illustrative examples. A new weight function, called the modified logistic weight function (MLWF), is also proposed to systematically assign weights as a function of the phase difference between a reference point and a testing point. By applying different weights to adjacent points, the proposed algorithm can enhance the detection of similarity between two time series. We show that some popular distance measures such as DTW and Euclidean distance are special cases of our proposed WDTW measure. We extend the proposed idea to other variants of DTW such as derivative dynamic time warping (DDTW) and propose the weighted version of DDTW. We have compared the performances of our proposed procedures with other popular approaches using public data sets available through the UCR Time Series Data Mining Archive for both time series classification and clustering problems. The experimental results indicate that the proposed approaches can achieve improved accuracy for time series classification and clustering problems.
“Dynamic Time Warping”
Topic Compositional Neural Language Model
We propose a Topic Compositional Neural Language Model (TCNLM), a novel method designed to simultaneously capture both the global semantic meaning and the local word ordering structure in a document. The TCNLM learns the global semantic coherence of a document via a neural topic model, and the probability of each learned latent topic is further used to build a Mixture-of-Experts (MoE) language model, where each expert (corresponding to one topic) is a recurrent neural network (RNN) that accounts for learning the local structure of a word sequence. In order to train the MoE model efficiently, a matrix factorization method is applied, by extending each weight matrix of the RNN to be an ensemble of topic-dependent weight matrices. The degree to which each member of the ensemble is used is tied to the document-dependent probability of the corresponding topics. Experimental results on several corpora show that the proposed approach outperforms both a pure RNN-based model and other topic-guided language models. Further, our model yields sensible topics, and also has the capacity to generate meaningful sentences conditioned on given topics.
Topic Detection and Tracking
Topic Detection and Tracking (TDT) is a Body of Research and an Evaluation Paradigm that Addresses Event-Based Organization of Broadcast News. The TDT Evaluation Tasks of Tracking, Cluster Detection, and First Story Detection are Each Information Filtering Technology in the Sense That They Require That ‘yes or no’ Decisions be Made on a Stream of News Stories Before Additional Stories Have Arrived.
Topic Model In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: “dog” and “bone” will appear more often in documents about dogs, “cat” and “meow” will appear in documents about cats, and “the” and “is” will appear equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents and discovering, based on the statistics of the words in each, what the topics might be and what each document’s balance of topics is.
Topic Tagging
TopicRNN In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics. Because of their sequential nature, RNNs are good at capturing the local structure of a word sequence – both semantic and syntactic – but might face difficulty remembering long-range dependencies. Intuitively, these long-range dependencies are of semantic nature. In contrast, latent topic models are able to capture the global underlying semantic structure of a document but do not account for word ordering. The proposed TopicRNN model integrates the merits of RNNs and latent topic models: it captures local (syntactic) dependencies using an RNN and global (semantic) dependencies using latent topics. Unlike previous work on contextual RNN language modeling, our model is learned end-to-end. Empirical results on word prediction show that TopicRNN outperforms existing contextual RNN baselines. In addition, TopicRNN can be used as an unsupervised feature extractor for documents. We do this for sentiment analysis and report a new state-of-the-art error rate on the IMDB movie review dataset that amounts to a $13.3\%$ improvement over the previous best result. Finally TopicRNN also yields sensible topics, making it a useful alternative to document models such as latent Dirichlet allocation.
Topological Anomaly Detection
The technique is essentially a density based outlier detection algorithm that, instead of calculating local densities, constructs a graph of the data using nearest-neighbors. The algorithm is different from other kNN outlier detection algorithms in that instead of setting ‘k’ as a parameter, you instead set a maximal inter-observation distance (called the graph “resolution” by Gartley and Basener). If the distance between two points is less than the graph resolution, add an edge between those two observations to the graph. Once the full graph is constructed, determine which connected components comprise the “background” of the data by setting some threshold percentage of observations ‘p’: any components with fewer than ‘p’ observations is considered an anomalous component, and all the observations (nodes) in this component are outliers.
Topological Data Analysis
Topological data analysis (TDA) is a new area of study aimed at having applications in areas such as data mining and computer vision. The main problems are:
1. how one infers high-dimensional structure from low-dimensional representations; and
2. how one assembles discrete points into global structure.
The human brain can easily extract global structure from representations in a strictly lower dimension, i.e. we infer a 3D environment from a 2D image from each eye. The inference of global structure also occurs when converting discrete data into continuous images, e.g. dot-matrix printers and televisions communicate images via arrays of discrete points.
The main method used by topological data analysis is:
1. Replace a set of data points with a family of simplicial complexes, indexed by a proximity parameter.
2. Analyse these topological complexes via algebraic topology – specifically, via the theory of persistent homology.
3. Encode the persistent homology of a data set in the form of a parameterized version of a Betti number which is called a persistence diagram or barcode.
Topological Analysis of Data
Topological Sorting In computer science, a topological sort (sometimes abbreviated topsort or toposort) or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time.
Topology-Based Pathway Enrichment Analysis
Torch Torch is a scientific computing framework with wide support for machine learning algorithms. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation. A summary of core features:
• a powerful N-dimensional array
• lots of routines for indexing, slicing, transposing, …
• amazing interface to C, via LuaJIT
• linear algebra routines
• neural network, and energy-based models
• numeric optimization routines
• Fast and efficient GPU support
• Embeddable, with ports to iOS, Android and FPGA backends
Total Distance Multivariance We introduce two new measures for the dependence of $n \ge 2$ random variables: `distance multivariance’ and `total distance multivariance’. Both measures are based on the weighted $L^2$-distance of quantities related to the characteristic functions of the underlying random variables. They extend distance covariance (introduced by Szekely, Rizzo and Bakirov) and generalized distance covariance (introduced in part I) from pairs of random variables to $n$-tuplets of random variables. We show that total distance multivariance can be used to detect the independence of $n$ random variables and has a simple finite-sample representation in terms of distance matrices of the sample points, where distance is measured by a continuous negative definite function. Based on our theoretical results, we present a test for independence of multiple random vectors which is consistent against all alternatives.
Total Operating Characteristic
The relative operating characteristic (ROC) is a popular statistical method to measure the association between observed and diagnosed presence of a characteristic. The diagnosis of presence or absence depends on whether the value of an index variable is above a threshold. ROC considers multiple possible thresholds. Each threshold generates a two-by-two contingency table, which contains four central entries: hits, misses, false alarms, and correct rejections. ROC reveals for each threshold only two ratios, hits/(hits + misses) and false alarms/(false alarms + correct rejections). This article introduces the total operating characteristic (TOC), which shows the total information in the contingency table for each threshold. TOC maintains desirable properties of ROC, while TOC reveals strictly more information than ROC in a manner that makes TOC more useful than ROC.
Total Unduplicated Reach and Frequency
TURF Analysis, an acronym for “Total Unduplicated Reach and Frequency”, is a type of statistical analysis used for providing estimates of media or market potential and devising optimal communication and placement strategies given limited resources. TURF analysis identifies the number of users reached by a communication, and how often they are reached. Although originally used by media schedulers to maximize reach and frequency of media spending across different items (print, broadcast, etc.), TURF is also now used to provide estimates of market potential. For example, if a company plans to market a new yogurt, they may consider launching ten possible flavors, but in reality, only three might be purchased in large quantities. The TURF algorithm identifies the optimal product line to maximize the total number of consumers who will purchase at least one SKU. Typically, when T.U.R.F. is undertaken for optimizing a product range, the analysis only looks at the reach of the product range (ignoring the Frequency component of TURF).
Toyplot Toyplot, the kid-sized plotting toolkit for Python with grownup-sized goals:
• Develop beautiful interactive, animated plots that embrace the unique capabilities of electronic publishing and support repoducibility.
• Create the best possible data graphics ‘out-of-the-box’, maximizing data ink and minimizing chartjunk.
• Provide a clean, minimalist interface that scientists and engineers will love.
The Toyplot Tutorial
Training Set A training set is a set of data used in various areas of information science to discover potentially predictive relationships. Training sets are used in artificial intelligence, machine learning, genetic programming, intelligent systems, and statistics. In all these fields, a training set has much the same role and is often used in conjunction with a test set.
Training, Validation, Test Divide the data set into three parts:
• Training, Validation, Test (e.g. 50, 25, 25)
• Fit model on the TRAINING set
• Select model using VALIDATION set
• Assess prediction error using TEST set
Trajectory Analysis traj
Transduction In logic, statistical inference, and supervised learning, transduction or transductive inference is reasoning from observed, specific (training) cases to specific (test) cases. In contrast, induction is reasoning from observed training cases to general rules, which are then applied to the test cases. The distinction is most interesting in cases where the predictions of the transductive model are not achievable by any inductive model. Note that this is caused by transductive inference on different test sets producing mutually inconsistent predictions.
Transfer Function Model Transfer function models describe the relationship between the inputs and outputs of a system using a ratio of polynomials. The model order is equal to the order of the denominator polynomial. The roots of the denominator polynomial are referred to as the model poles. The roots of the numerator polynomial are referred to as the model zeros. The parameters of a transfer function model are its poles, zeros and transport delays.
Transfer Learning Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments.
Recycling Deep Learning Models with Transfer Learning
Transfinite Mean We define a generalization of the arithmetic mean to bounded well-ordered sequences of real numbers. We show that every probability space admits a well-ordered sequences of points such that the measure of each measurable subset is equal to the frequency with which the sequence is in this subset. We include an argument suggested by Woodin that the club filter on $\omega_1$ does not admit such a sequence of order type $\omega_1$.
Transformation Forests Regression models for supervised learning problems with a continuous target are commonly understood as models for the conditional mean of the target given predictors. This notion is simple and therefore appealing for interpretation and visualisation. Information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding of regression models as models for conditional distributions allows much broader inference from such models, for example the computation of prediction intervals. Several random forest-type algorithms aim at estimating conditional distributions, most prominently quantile regression forests (Meinshausen, 2006, JMLR). We propose a novel approach based on a parametric family of distributions characterised by their transformation function. A dedicated novel ‘transformation tree’ algorithm able to detect distributional changes is developed. Based on these transformation trees, we introduce ‘transformation forests’ as an adaptive local likelihood estimator of conditional distribution functions. The resulting models are fully parametric yet very general and allow broad inference procedures, such as the model-based bootstrap, to be applied in a straightforward way.
Transformed Generalized Autoregressive Moving Average
Transformed Generalized Autoregressive Moving Average (TGARMA) models were recently proposed to deal with non-additivity, non-normality and heteroscedasticity in real time series data. In this paper, a Bayesian approach is proposed for TGARMA models, thus extending the original model. We conducted a simulation study to investigate the performance of Bayesian estimation and Bayesian model selection criteria. In addition, a real dataset was analysed using the proposed approach.
Transformer The dominant sequence transduction models are based on complex recurrent or convolutional neural networks in an encoder-decoder configuration. The best performing models also connect the encoder and decoder through an attention mechanism. We propose a new simple network architecture, the Transformer, based solely on attention mechanisms, dispensing with recurrence and convolutions entirely. Experiments on two machine translation tasks show these models to be superior in quality while being more parallelizable and requiring significantly less time to train. Our model achieves 28.4 BLEU on the WMT 2014 English-to-German translation task, improving over the existing best results, including ensembles by over 2 BLEU. On the WMT 2014 English-to-French translation task, our model establishes a new single-model state-of-the-art BLEU score of 41.0 after training for 3.5 days on eight GPUs, a small fraction of the training costs of the best models from the literature. We show that the Transformer generalizes well to other tasks by applying it successfully to English constituency parsing both with large and limited training data.
Transitory Queueing Network
Queueing networks are notoriously difficult to analyze sans both Markovian and stationarity assumptions. Much of the theoretical contribution towards performance analysis of time-inhomogeneous single class queueing networks has focused on Markovian networks, with the recent exception of work in Liu and Whitt (2011) and Mandelbaum and Ramanan (2010). In this paper, we introduce transitory queueing networks as a model of inhomogeneous queueing networks, where a large, but finite, number of jobs arrive at queues in the network over a fixed time horizon. The queues offer FIFO service, and we assume that the service rate can be time-varying. The non-Markovian dynamics of this model complicate the analysis of network performance metrics, necessitating approximations. In this paper we develop fluid and diffusion approximations to the number-in-system performance metric by scaling up the number of external arrivals to each queue, following Honnappa et al. (2014). We also discuss the implications for bottleneck detection in tandem queueing networks.
translate2R Many companies realizied the advantages of the open source programming language R. translate2R allows a fast and inexpensive migration to R. The manual migration of complex SPSS® scripts has always been tedious and error-prone, but with translate2R the task of translating by hand becomes a thing of the past. The automatic and comprehensible process of translating SPSS® code to R code with translate2R offers users an enormous number of new analytical opportunities. Besides the usual migration process translate2R allows programmers an easy start into programming with R. Make use of translate2R for the translation of scripts to R. We will be pleased to help you in terms of migration projects, or starting off with R.
Translational Recommender Networks Representing relationships as translations in vector space lives at the heart of many neural embedding models such as word embeddings and knowledge graph embeddings. In this work, we study the connections of this translational principle with collaborative filtering algorithms. We propose Translational Recommender Networks (\textsc{TransRec}), a new attentive neural architecture that utilizes the translational principle to model the relationships between user and item pairs. Our model employs a neural attention mechanism over a \emph{Latent Relational Attentive Memory} (LRAM) module to learn the latent relations between user-item pairs that best explains the interaction. By exploiting adaptive user-item specific translations in vector space, our model also alleviates the geometric inflexibility problem of other metric learning algorithms while enabling greater modeling capability and fine-grained fitting of users and items in vector space. The proposed architecture not only demonstrates the state-of-the-art performance across multiple recommendation benchmarks but also boasts of improved interpretability. Qualitative studies over the LRAM module shows evidence that our proposed model is able to infer and encode explicit sentiment, temporal and attribute information despite being only trained on implicit feedback. As such, this ascertains the ability of \textsc{TransRec} to uncover hidden relational structure within implicit datasets.
TransNets Recently, deep learning methods have been shown to improve the performance of recommender systems over traditional methods, especially when review text is available. For example, a recent model, DeepCoNN, uses neural nets to learn one latent representation for the text of all reviews written by a target user, and a second latent representation for the text of all reviews for a target item, and then combines these latent representations to obtain state-of-the-art performance on recommendation tasks. We show that (unsurprisingly) much of the predictive value of review text comes from reviews of the target user for the target item. We then introduce a way in which this information can be used in recommendation, even when the target user’s review for the target item is not available. Our model, called TransNets, extends the DeepCoNN model by introducing an additional latent layer representing the target user-target item pair. We then regularize this layer, at training time, to be similar to another latent representation of the target user’s review of the target item. We show that TransNets and extensions of it improve substantially over the previous state-of-the-art.
Transportation Theory In mathematics and economics, transportation theory is a name given to the study of optimal transportation and allocation of resources. The problem was formalized by the French mathematician Gaspard Monge in 1781. In the 1920s A.N. Tolstoi was one of the first to study the transportation problem mathematically. In 1930, in the collection Transportation Planning Volume I for the National Commissariat of Transportation of the Soviet Union, he published a paper ‘Methods of Finding the Minimal Kilometrage in Cargo-transportation in space’. Major advances were made in the field during World War II by the Soviet/Russian mathematician and economist Leonid Kantorovich. Consequently, the problem as it is stated is sometimes known as the Monge-Kantorovich transportation problem. The linear programming formulation of the transportation problem is also known as the Hitchcock-Koopmans transportation problem.
Tree of Predictors
We present a new approach to ensemble learning. Our approach constructs a tree of subsets of the feature space and associates a predictor (predictive model) – determined by training one of a given family of base learners on an endogenously determined training set – to each node of the tree; we call the resulting object a tree of predictors. The (locally) optimal tree of predictors is derived recursively; each step involves jointly optimizing the split of the terminal nodes of the previous tree and the choice of learner and training set (hence predictor) for each set in the split. The feature vector of a new instance determines a unique path through the optimal tree of predictors; the final prediction aggregates the predictions of the predictors along this path. We derive loss bounds for the final predictor in terms of the Rademacher complexity of the base learners. We report the results of a number of experiments on a variety of datasets, showing that our approach provides statistically significant improvements over state-of-the-art machine learning algorithms, including various ensemble learning methods. Our approach works because it allows us to endogenously create more complex learners – when needed – and endogenously match both the learner and the training set to the characteristics of the dataset while still avoiding over-fitting.
Tree Recurrent Neural Network
In this paper we develop a recurrent neural network (TreeRNN), which is designed to predict a tree rather than a linear sequence as is the case in conventional recurrent neural networks. Our model defines the probability of a sentence by estimating the generation probability of its dependency tree. We construct the tree incrementally by generating the left and right dependents of a node whose probability is computed using recurrent neural networks with shared hidden layers. Application of our model to two language modeling tasks shows that it outperforms or performs on par with related models.
Tree Structured Vector Quantization
1. First we apply k-means to get 2 centroids or prototypes within the entire data set. This provides us with a boundary between the two clusters, which would be a straight line based on the nearest neighbor rule.
2. Next, the data are assigned to the 2 centroids.
3. Then, for the data assigned to each centroid (call them a group), apply 2 centroid k-means to each group separately. The initialization can be done by splitting the centroid into two. Note that data points channeled to different centroids are treated separately.
4. Repeat the above step.
Treelogy We propose a novel tree classification system called Treelogy, that fuses deep representations with hand-crafted features obtained from leaf images to perform leaf-based plant classification. Key to this system are segmentation of the leaf from an untextured background, using convolutional neural networks (CNNs) for learning deep representations, extracting hand-crafted features with a number of image processing techniques, training a linear SVM with feature vectors, merging SVM and CNN results, and identifying the species from a dataset of 57 trees. Our classification results show that fusion of deep representations with hand-crafted features leads to the highest accuracy. The proposed algorithm is embedded in a smart-phone application, which is publicly available. Furthermore, our novel dataset comprised of 5408 leaf images is also made public for use of other researchers.
Treemapping In information visualization and computing, treemapping is a method for displaying hierarchical data by using nested rectangles.
Tree-Structured Boosting Additive models, such as produced by gradient boosting, and full interaction models, such as classification and regression trees (CART), are widely used algorithms that have been investigated largely in isolation. We show that these models exist along a spectrum, revealing never-before-known connections between these two approaches. This paper introduces a novel technique called tree-structured boosting for creating a single decision tree, and shows that this method can produce models equivalent to CART or gradient boosted stumps at the extremes by varying a single parameter. Although tree-structured boosting is designed primarily to provide both the model interpretability and predictive performance needed for high-stake applications like medicine, it also can produce decision trees represented by hybrid models between CART and boosted stumps that can outperform either of these approaches.
Tree-Structured Long Short-Term Memory
For years, recursive neural networks (RvNNs) have shown to be suitable for representing text into fixed-length vectors and achieved good performance on several natural language processing tasks. However, the main drawback of RvNN is that it requires explicit tree structure (e.g. parse tree), which makes data preparation and model implementation hard. In this paper, we propose a novel tree-structured long short-term memory (Tree-LSTM) architecture that efficiently learns how to compose task-specific tree structures only from plain text data. To achieve this property, our model uses Straight-Through (ST) Gumbel-Softmax estimator to decide the parent node among candidates and to calculate gradients of the discrete decision. We evaluate the proposed model on natural language interface and sentiment analysis and show that our model outperforms or at least comparable to previous Tree-LSTM-based works. We also find that our model converges significantly faster and needs less memory than other models of complex structures.
Trellis Graphics Extremely useful approach for graphical exploratory data analysis (EDA). Allows to examine for complicated, multiple variable relationships. Types of plots:
• xyplot: scatterplot
• bwplot: boxplots
• stripplot: display univariate data against a numerical variable
• dotplot: similar to stripplot
• histogram
• densityplot: kernel density estimates
• barchart
• piechart: (Not available in R)
• splom: scatterplot matrices
• contourplot: contour plot of a surface on a regular grid
• levelplot: pseudo-colour plot of a surface on a rectangular grid
• wireframe: perspective plot of a surface evaluated on a regular grid
• cloud: perspective plot of a cloud of points (3D scatterplot)
Trend Analysis Trend Analysis is the practice of collecting information and attempting to spot a pattern, or trend, in the information. In some fields of study, the term ‘trend analysis’ has more formally defined meanings. Although trend analysis is often used to predict future events, it could be used to estimate uncertain events in the past, such as how many ancient kings probably ruled between two dates, based on data such as the average years which other known kings reigned.
Triangle Generative Adversarial Network
A Triangle Generative Adversarial Network ($\Delta$-GAN) is developed for semi-supervised cross-domain joint distribution matching, where the training data consists of samples from each domain, and supervision of domain correspondence is provided by only a few paired samples. $\Delta$-GAN consists of four neural networks, two generators and two discriminators. The generators are designed to learn the two-way conditional distributions between the two domains, while the discriminators implicitly define a ternary discriminative function, which is trained to distinguish real data pairs and two kinds of fake data pairs. The generators and discriminators are trained together using adversarial learning. Under mild assumptions, in theory the joint distributions characterized by the two generators concentrate to the data distribution. In experiments, three different kinds of domain pairs are considered, image-label, image-image and image-attribute pairs. Experiments on semi-supervised image classification, image-to-image translation and attribute-based image generation demonstrate the superiority of the proposed approach.
Triangular Norm
In mathematics, a t-norm (also T-norm or, unabbreviated, triangular norm) is a kind of binary operation used in the framework of probabilistic metric spaces and in multi-valued logic, specifically in fuzzy logic. A t-norm generalizes intersection in a lattice and conjunction in logic. The name triangular norm refers to the fact that in the framework of probabilistic metric spaces t-norms are used to generalize triangle inequality of ordinary metric spaces.
Trimmed Clustering tclust,trimcluster
Triple Exponential Smoothing What happens if the data show trend and seasonality? We now introduce a third equation to take care of seasonality (sometimes called periodicity). The resulting set of equations is called the ‘Holt-Winters’ (HW) method after the names of the inventors.
Triplestore A triplestore is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject-predicate-object, like “Bob is 35” or “Bob knows Fred”. Much like a relational database, one stores information in a triplestore and retrieves it via a query language. Unlike a relational database, a triplestore is optimized for the storage and retrieval of triples. In addition to queries, triples can usually be imported/exported using Resource Description Framework (RDF) and other formats.
Tropical Linear Programming On Tropical Linear and Integer Programs
True Asymptotic Natural Gradient Optimization
We introduce a simple algorithm, True Asymptotic Natural Gradient Optimization (TANGO), that converges to a true natural gradient descent in the limit of small learning rates, without explicit Fisher matrix estimation. For quadratic models the algorithm is also an instance of averaged stochastic gradient, where the parameter is a moving average of a ‘fast’, constant-rate gradient descent. TANGO appears as a particular de-linearization of averaged SGD, and is sometimes quite different on non-quadratic models. This further connects averaged SGD and natural gradient, both of which are arguably optimal asymptotically. In large dimension, small learning rates will be required to approximate the natural gradient well. Still, this shows it is possible to get arbitrarily close to exact natural gradient descent with a lightweight algorithm.
TrueSkill Ranking System TrueSkill is a Bayesian ranking algorithm developed by Microsoft Research and used in the Xbox matchmaking system built to address some perceived flaws in the Elo rating system. It is an extension of the Glicko rating system to multiplayer games. The purpose of a ranking system is to both identify and track the skills of gamers in a game (mode) in order to be able to match them into competitive matches. The TrueSkill ranking system only uses the final standings of all teams in a game in order to update the skill estimates (ranks) of all gamers playing in this game. Ranking systems have been proposed for many sports but possibly the most prominent ranking system in use today is ELO.
TrueSkill Sort
In this paper we present TSSort, a probabilistic, noise resistant, quickly converging comparison sort algorithm based on Microsoft TrueSkill. The algorithm combines TrueSkill’s updating rules with a newly developed next item pair selection strategy, enabling it to beat standard sorting algorithms w.r.t. convergence speed and noise resistance, as shown in simulations. TSSort is useful if comparisons of items are expensive or noisy, or if intermediate results shall be approximately ordered.
Truncated Variance Reduction
We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically non-trivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and real-world data sets.
Truncation In statistics, truncation results in values that are limited above or below, resulting in a truncated sample.[1] Truncation is similar to but distinct from the concept of statistical censoring. A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the bounds entirely omitted, with not even a count of those omitted being kept. With statistical censoring, a note would be recorded documenting which bound (upper or lower) had been exceeded and the value of that bound. With truncated sampling, no note is recorded.
Trust Region based Derivative Free Optimization
In this work, we utilize a Trust Region based Derivative Free Optimization (DFO-TR) method to directly maximize the Area Under Receiver Operating Characteristic Curve (AUC), which is a nonsmooth, noisy function. We show that AUC is a smooth function, in expectation, if the distributions of the positive and negative data points obey a jointly normal distribution. The practical performance of this algorithm is compared to three prominent Bayesian optimization methods and random search. The presented numerical results show that DFO-TR surpasses Bayesian optimization and random search on various black-box optimization problem, such as maximizing AUC and hyperparameter tuning.
Tsallis Entropy In physics, the Tsallis entropy is a generalization of the standard Boltzmann-Gibbs entropy. It was introduced in 1988 by Constantino Tsallis[1] as a basis for generalizing the standard statistical mechanics. In the scientific literature, the physical relevance of the Tsallis entropy was occasionally debated. However, from the years 2000 on, an increasingly wide spectrum of natural, artificial and social complex systems have been identified which confirm the predictions and consequences that are derived from this nonadditive entropy, such as nonextensive statistical mechanics,[2] which generalizes the Boltzmann-Gibbs theory.
Tsallis Entropy Information Metric
The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near-optimal decision trees. Most of them, however, are greedy algorithms which have the drawback of obtaining only local optimums. Besides, common split criteria, e.g. Shannon entropy, Gain Ratio and Gini index, are also not flexible due to lack of adjustable parameters on data sets. To address the above issues, we propose a series of novel methods using Tsallis entropy in this paper. Firstly, a Tsallis Entropy Criterion (TEC) algorithm is proposed to unify Shannon entropy, Gain Ratio and Gini index, which generalizes the split criteria of decision trees. Secondly, we propose a Tsallis Entropy Information Metric (TEIM) algorithm for efficient construction of decision trees. The TEIM algorithm takes advantages of the adaptability of Tsallis conditional entropy and the reducing greediness ability of two-stage approach. Experimental results on UCI data sets indicate that the TEC algorithm achieves statistically significant improvement over the classical algorithms, and that the TEIM algorithm yields significantly better decision trees in both classification accuracy and tree complexity.
Tshinghua-alpha-Algorithm Tshinghua-alpha algorithm which uses timestamps in the log files to construct a Petri net. It is related to the a algorithm, but uses a different approach. Details can be found in. It is interesting to note that this mining plug-in was the first plug-in developed by researchers outside of our research group. Researchers from Tshinghua University in China (Jianmin Wang and Wen Lijie) were able to develop and integrate this plug-in without any help or changes to the framework.
Tuatara GS1 The Tuatara GS1 algorithm relies on the more advanced Tuatara GS2 algorithm which generates relationships between objects based on principles in congnition related to Computational Theory of the Mind (CTM) (Pinker, S. 1997) and auto-association (Xijin Ge , Shuichi Iwata, 2002) and reinforced learning (Wenhuan, X., Nandi, A. K., Zhang, J., Evans, K. G. 2005) with exponential decays that follow the Golden Ratio F (Dunlap, Richard A. 1997).
Tube Convolutional Neural Network
Deep learning has been demonstrated to achieve excellent results for image classification and object detection. However, the impact of deep learning on video analysis (e.g. action detection and recognition) has been limited due to complexity of video data and lack of annotations. Previous convolutional neural networks (CNN) based video action detection approaches usually consist of two major steps: frame-level action proposal detection and association of proposals across frames. Also, these methods employ two-stream CNN framework to handle spatial and temporal feature separately. In this paper, we propose an end-to-end deep network called Tube Convolutional Neural Network (T-CNN) for action detection in videos. The proposed architecture is a unified network that is able to recognize and localize action based on 3D convolution features. A video is first divided into equal length clips and for each clip a set of tube proposals are generated next based on 3D Convolutional Network (ConvNet) features. Finally, the tube proposals of different clips are linked together employing network flow and spatio-temporal action detection is performed using these linked video proposals. Extensive experiments on several video datasets demonstrate the superior performance of T-CNN for classifying and localizing actions in both trimmed and untrimmed videos compared to state-of-the-arts.
Tukey Mean-Difference Plot The Tukey mean-difference plot is a scatter graph produced not for (x,y) values themselves, but for modified coordinates (X,Y) : X = (x+y)/2, Y = y-x.
Such a plot is useful, for example, to analyze data with strong correlation between x and y – when the (x,y) dots on the plot are close to the diagonal x=y. In this case, the value of the transformed variable X is about the same as x and y; and the variable Y shows the difference between x and y. The Tukey mean-difference plot is meaningful for two similar variables – that is, when both x and y are of the same physical dimension and expressed in the same units – e.g mass in pounds (or kilograms, …), length in foots (or meters, …). Otherwise, it makes no sense to sum up or subtract values of the variables x and y.
Turek-Fletcher Model Model-averaging is commonly used as a means of allowing for model uncertainty in parameter estimation. In the frequentist framework, a model-averaged estimate of a parameter is the weighted mean of the estimates from each of the candidate models, the weights typically being chosen using an information criterion. Current methods for calculating a model-averaged confidence interval assume approximate normality of the model-averaged estimate, i.e., they are Wald intervals. As in the single-model setting, we might improve the coverage performance of this interval by a one-to-one transformation of the parameter, obtaining a Wald interval, and then back-transforming the endpoints. However, a transformation that works in the single-model setting may not when model-averaging, due to the weighting and the need to estimate the weights. In the single-model setting, a natural alternative is to use a profile likelihood interval, which generally provides better coverage than a Wald interval. We propose a method for model-averaging a set of single-model profile likelihood intervals, making use of the link between profile likelihood intervals and Bayesian credible intervals. We illustrate its use in an example involving negative binomial regression, and perform two simulation studies to compare its coverage properties with the existing Wald intervals.
Turfjs Turf.js is a JavaScript library for spatial analysis. It helps you analyze, aggregate, and transform data in order to visualize it in new ways and answer advanced questions about it.
TVClust In this paper, we propose a model-based clustering method (TVClust) that robustly incorporates noisy side information as soft-constraints and aims to seek a consensus between side information and the observed data. Our method is based on a nonparametric Bayesian hierarchical model that combines the probabilistic model for the data instance and the one for the side-information. An efficient Gibbs sampling algorithm is proposed for posterior inference. Using the small-variance asymptotics of our probabilistic model, we then derive a new deterministic clustering algorithm (RDP-means). It can be viewed as an extension of K-means that allows for the inclusion of side information and has the additional property that the number of clusters does not need to be specified a priori. Empirical studies have been carried out to compare our work with many constrained clustering algorithms from the literature on both a variety of data sets and under a variety of conditions such as using noisy side information and erroneous k values. The results of our experiments show strong results for our probabilistic and deterministic approaches under these conditions when compared to other algorithms in the literature.
Tweedie Distribution In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal and gamma distributions, the purely discrete scaled Poisson distribution, and the class of mixed compound Poisson-gamma distributions which have positive mass at zero, but are otherwise continuous. For any random variable Y that obeys a Tweedie distribution, the variance var(Y) relates to the mean E(Y) by the power law, where a and p are positive constants. The Tweedie distributions were named by Bent Joergensen after Maurice Tweedie, a statistician and medical physicist at the University of Liverpool, UK, who presented the first thorough study of these distributions in 1984.
Tweedie Model TDboost
Tweet2Vec We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.
TwiInsight Social media platforms contain a great wealth of information which provides opportunities for us to explore hidden patterns or unknown correlations, and understand people’s satisfaction with what they are discussing. As one showcase, in this paper, we present a system, TwiInsight which explores the insight of Twitter data. Different from other Twitter analysis systems, TwiInsight automatically extracts the popular topics under different categories (e.g., healthcare, food, technology, sports and transport) discussed in Twitter via topic modeling and also identifies the correlated topics across different categories. Additionally, it also discovers the people’s opinions on the tweets and topics via the sentiment analysis. The system also employs an intuitive and informative visualization to show the uncovered insight. Furthermore, we also develop and compare six most popular algorithms – three for sentiment analysis and three for topic modeling.
Twin Sort Technique The objective behind the Twin Sort technique is to sort the list of unordered data elements efficiently and to allow efficient and simple arrangement of data elements within the data structure with optimization of comparisons and iterations in the sorting method. This sorting technique effectively terminates the iterations when there is no need of comparison if the elements are all sorted in between the iterations. Unlike Quick sort, Merge sorting technique, this new sorting technique is based on the iterative method of sorting elements within the data structure. So it will be advantageous for optimization of iterations when there is no need for sorting elements. Finally, the Twin Sort technique is more efficient and simple method of arranging elements within a data structure and it is easy to implement when comparing to the other sorting technique. By the introduction of optimization of comparison and iterations, it will never allow the arranging task on the ordered elements.
Twin Support Vector Machine
Twin Support Vector Machine (TWSVM) is an emerging machine learning method suitable for both classification and regression problems. It utilizes the concept of Generalized Eigen-values Proximal Support Vector Machine (GEPSVM) and finds two non-parallel planes for each class by solving a pair of Quadratic Programming Problems. It enhances the computational speed as compared to the traditional Support Vector Machine (SVM). TWSVM was initially constructed to solve binary classification problems; later researchers successfully extended it for multi-class problem domain. TWSVM always gives promising empirical results, due to which it has many attractive features which enhance its applicability. This paper presents the research development of TWSVM in recent years. This study is divided into two main broad categories – variant based and multi-class based TWSVM methods. The paper primarily discusses the basic concept of TWSVM and highlights its applications in recent years. A comparative analysis of various research contributions based on TWSVM is also presented. This is helpful for researchers to effectively utilize the TWSVM as an emergent research methodology and encourage them to work further in the performance enhancement of TWSVM.
Two Alternatives Forced Choice Score
“Generalized Discrimination Score”
Two one-Sided Tests
Two one-sided tests (TOST) procedure to test equivalence for t-tests, correlations, and meta-analyses, including power analysis for t-tests and correlations. Allows you to specify equivalence bounds in raw scale units or in terms of effect sizes.
Two Stage Least Squares
Two-Stage least squares (2SLS) regression analysis is a statistical technique that is used in the analysis of structural equations. This technique is the extension of the OLS method. It is used when the dependent variable’s error terms are correlated with the independent variables. Additionally, it is useful when there are feedback loops in the model. In structural equations modeling, we use the maximum likelihood method to estimate the path coefficient. This technique is an alternative in SEM modeling to estimate the path coefficient. This technique can also be applied in quasi-experimental studies.
Typicality and Eccentricity Data Analysis
The typicality and eccentricity data analysis (TEDA) framework was put forward by Angelov (2013) <DOI:10.14313/JAMRIS_2-2014/16>. It has been further developed into multiple different techniques since, and provides a non-parametric way of determining how similar an observation, from a process that is not purely random, is to other observations generated by the process.