|Tableau Public||Tableau Public is a free data storytelling application. Create and share interactive charts and graphs, stunning maps, live dashboards and fun applications in minutes, then publish anywhere on the web. Anyone can do it, it’s that easy – and it’s free.|
|Tag Management System
|A Tag Management System (TMS) replaces hard-coded tags that are used for marketing, analytics, and testing on a website, with dynamic tags that are easier to implement and update. Every tag management system uses a container tag – a small snippet of code that allows you to dynamically insert tags into your website. You can think of container tags as buckets that hold other types of tags. You control which tags are added to the buckets using a simple web interface. In 2012, Google released a TMS called Google Tag Manager, which has quickly become one of the most widely used Tag Management Systems in the market. The benefits of tag management (and specifically Google Tag Manager) are enormous to any business, large or small. You can add and update Google AdWords tags, Google Analytics tags, DoubleClick Floodlight tags and many non-Google third-party tags directly from Google Tag Manager, instead of editing site code. This reduces errors, frees you from having to involve a webmaster, and allows you to quickly deploy tags on your site. To effectively use tag management, it’s important to understand basic concepts like the data layer, triggers, and variables.|
|Tagger||We present a framework for efficient perceptual inference that explicitly reasons about the segmentation of its inputs and features. Rather than being trained for any specific segmentation, our framework learns the grouping process in an unsupervised manner or alongside any supervised task. By enriching the representations of a neural network, we enable it to group the representations of different objects in an iterative manner. By allowing the system to amortize the iterative inference of the groupings, we achieve very fast convergence. In contrast to many other recently proposed methods for addressing multi-object scenes, our system does not assume the inputs to be images and can therefore directly handle other modalities. For multi-digit classification of very cluttered images that require texture segmentation, our method offers improved classification performance over convolutional networks despite being fully connected. Furthermore, we observe that our system greatly improves on the semi-supervised result of a baseline Ladder network on our dataset, indicating that segmentation can also improve sample efficiency.|
|Tagging Systems||Tagging systems have become increasingly popular. These systems enable users to add keywords (i.e., ‘tags’) to Internet resources (e.g., web pages, images, videos) without relying on a controlled vocabulary. Tagging systems have the potential to improve search, spam detection, reputation systems, and personal organization while introducing new modalities of social communication and opportunities for data mining. This potential is largely due to the social structure that underlies many of the current systems. Despite the rapid expansion of applications that support tagging of resources, tagging systems are still not well studied or understood. In this paper, we provide a short description of the academic related work to date. We offer a model of tagging systems, specifically in the context of web-based systems, to help us illustrate the possible benefits of these tools. Since many such systems already exist, we provide a taxonomy of tagging systems to help inform their analysis and design, and thus enable researchers to frame and compare evidence for the sustainability of such systems. We also provide a simple taxonomy of incentives and contribution models to inform potential evaluative frameworks. While this work does not present comprehensive empirical results, we present a preliminary study of the photosharing and tagging system Flickr to demonstrate our model and explore some of the issues in one sample system. This analysis helps us outline and motivate possible future directions of research in tagging systems.|
|Takeya Semantic Structure Analysis
|Targeted Learning||The statistics profession is at a unique point in history. The need for valid statistical tools is greater than ever; data sets are massive, often measuring hundreds of thousands of measurements for a single subject. The field is ready to move towards clear objective benchmarks under which tools can be evaluated. Targeted learning allows
1) the full generalization and utilization of cross-validation as an estimator selection tool so that the subjective choices made by humans are now made by the machine, and
2) targeting the fitting of the probability distribution of the data toward the target parameter representing the scientific question of interest.
Targeted learning methods build machine-learning-based estimators of parameters defined as features of the probability distribution of the data, while also providing influence-curve or bootstrap-based confidence internals. The theory offers a general template for creating targeted maximum likelihood estimators for a data structure, nonparametric or semiparametric statistical model, and parameter mapping. These estimators of causal inference parameters are double robust and have a variety of other desirable statistical properties.
Targeted maximum likelihood estimation built on the loss-based ‘super learning’ system such that lower-dimensional parameters could be targeted (e.g., a marginal causal effect); the remaining bias for the (low-dimensional) target feature of the probability distribution was removed. Targeted learning for effect estimation and causal inference allows for the complete integration of machine learning advances in prediction while providing statistical inference for the target parameter(s) of interest.
|Targeted Maximum Likelihood Estimation
|Maximum likelihood estimation fits a model to data, minimizing a global measure, such as mean squared error (MSE). When we are interested in one particular parameter of the data distribution and consider the remaining parameters to be nuisance parameters, we would prefer an estimate that has smaller bias and variance for the targeted parameter, at the expense of increased bias and/or variance in the estimation of nuisance parameters. Targeted maximum likelihood estimation targets the MLE estimate of the parameter of interest in a way that reduces bias. This bias reduction is sometimes accompanied by an increase in the variance of the estimate, but the procedure often reduces variance as well in finite samples. Asymptotically, TMLE is maximally efficient when the model and nuisance parameters are correctly specified.
The framework of targeted maximum likelihood estimation (TMLE), introduced in van der Laan & Rubin (2006), is a principled approach for constructing asymptotically linear and efficient substitution estimators in rich infinite-dimensional models. The mechanics of TMLE hinge upon first-order approximations of the parameter of interest as a mapping on the space of probability distributions. For such approximations to hold, a second-order remainder term must tend to zero sufficiently fast. In practice, this means an initial estimator of the underlying data-generating distribution with a sufficiently large rate of convergence must be available — in many cases, this requirement is prohibitively difficult to satisfy.
|Targeted Minimum Loss Based Estimation
|Targeted minimum loss based estimation (TMLE) provides a template for the construction of semiparametric locally efficient double robust substitution estimators of the target parameter of the data generating distribution in a semiparametric censored data or causal inference model based on a sample of independent and identically distributed copies from this data generating distribution.|
|Tarjan’s Strongly Connected Components Algorithm||Tarjan’s Algorithm (named for its discoverer, Robert Tarjan) is a graph theory algorithm for finding the strongly connected components of a graph. Although it precedes it chronologically, it can be seen as an improved version of Kosaraju’s algorithm, and is comparable in efficiency to the path-based strong component algorithm.|
Achieve Charting Zen With TauCharts
|The identifier BATS is an acronym for key features of the model: Box-Cox transform, ARMA errors, Trend, and Seasonal components. The initial T in TBATS is connoting ‘trigonometric’.|
|t-Distributed Stochastic Neighbor Embedding
|t-distributed stochastic neighbor embedding (t-SNE) is a machine learning algorithm for dimensionality reduction developed by Laurens van der Maaten and Geoffrey Hinton. It is a nonlinear dimensionality reduction technique that is particularly well suited for embedding high-dimensional data into a space of two or three dimensions, which can then be visualized in a scatter plot. Specifically, it models each high-dimensional object by a two- or three-dimensional point in such a way that similar objects are modeled by nearby points and dissimilar objects are modeled by distant points. The t-SNE algorithms comprises two main stages. First, t-SNE constructs a probability distribution over pairs of high-dimensional objects in such a way that similar objects have a high probability of being picked, whilst dissimilar points have an infinitesimal probability of being picked. Second, t-SNE defines a similar probability distribution over the points in the low-dimensional map, and it minimizes the Kullback-Leibler divergence between the two distributions with respect to the locations of the points in the map.
Visualizing Data using t-SNE
|• Transformations are functions that take existing input data and apply a function to it such that it changes form. A simple example could be combining first name, middle name, and last name fields in source data and creating a full name field that is the combination of the three sub fields.
• Enrichments are functions that take existing input data, combined with additional data sources, and create new information that could not be gleaned from either source independently. For example, one could take two different lists of individuals and use pattern matching to create relationships that are not apparent from either list itself.
• Augmentations are functions that add data of use in combination with the input data. The result is a more complete set of information that combines data from multiple sources. For example, a set of business entities gleaned from a conference attendee list, combined with Dun and Bradstreet profiles for those entities, creates a more complete set of information for each business entity.
|A new efficient optimization method, called ‘Teaching-Learning-Based Optimization (TLBO)’, is proposed in this paper for the optimization of mechanical design problems. This method works on the effect of influence of a teacher on learners. Like other nature-inspired algorithms, TLBO is also a population-based method and uses a population of solutions to proceed to the global solution. The population is considered as a group of learners or a class of learners. The process of TLBO is divided into two parts: the first part consists of the ‘Teacher Phase’ and the second part consists of the ‘Learner Phase’. ‘Teacher Phase’ means learning from the teacher and ‘Learner Phase’ means learning by the interaction between learners. The basic philosophy of the TLBO method is explained in detail. To check the effectiveness of the method it is tested on five different constrained benchmark test functions with different characteristics, four different benchmark mechanical design problems and six mechanical design optimization problems which have real world applications. The effectiveness of the TLBO method is compared with the other population-based optimization algorithms based on the best solution, average solution, convergence rate and computational effort. Results show that TLBO is more effective and efficient than the other optimization methods for the mechanical design optimization problems considered. This novel optimization method can be easily extended to other engineering design optimization problems.
Teaching Learning Based Optimization Algorithm
|TeKnowbase||In this paper, we describe the construction of TeKnowbase, a knowledge-base of technical concepts in computer science. Our main information sources are technical websites such as Webopedia and Techtarget as well as Wikipedia and online textbooks. We divide the knowledge-base construction problem into two parts — the acquisition of entities and the extraction of relationships among these entities. Our knowledge-base consists of approximately 100,000 triples. We conducted an evaluation on a sample of triples and report an accuracy of a little over 90\%. We additionally conducted classification experiments on StackOverflow data with features from TeKnowbase and achieved improved classification accuracy.|
|Template Model Builder||glmmTMB|
|Temporal Aggregation||We call temporal aggregation the situation in which a variable that evolves through time can not be observed at all dates. This phenomenon arises frequently in economics, where it is very expensive to collect data on certain variables, and there is no reason to believe that economic time series are collected at the frequency required to fully capture the movements of the economy. For example, we only have quarterly observations on GNP, but it is reasonable to believe that the behavior of GNP within a quarter carries relevant information about the structure of the economy.|
|Temporal Automatic Relation Discovery in Sequences
|Recent empirical results on long-term dependency tasks have shown that neural networks augmented with an external memory can learn the long-term dependency tasks more easily and achieve better generalization than vanilla recurrent neural networks (RNN). We suggest that memory augmented neural networks can reduce the effects of vanishing gradients by creating shortcut (or wormhole) connections. Based on this observation, we propose a novel memory augmented neural network model called TARDIS (Temporal Automatic Relation Discovery in Sequences). The controller of TARDIS can store a selective set of embeddings of its own previous hidden states into an external memory and revisit them as and when needed. For TARDIS, memory acts as a storage for wormhole connections to the past to propagate the gradients more effectively and it helps to learn the temporal dependencies. The memory structure of TARDIS has similarities to both Neural Turing Machines (NTM) and Dynamic Neural Turing Machines (D-NTM), but both read and write operations of TARDIS are simpler and more efficient. We use discrete addressing for read/write operations which helps to substantially to reduce the vanishing gradient problem with very long sequences. Read and write operations in TARDIS are tied with a heuristic once the memory becomes full, and this makes the learning problem simpler when compared to NTM or D-NTM type of architectures. We provide a detailed analysis on the gradient propagation in general for MANNs. We evaluate our models on different long-term dependency tasks and report competitive results in all of them.|
|Temporal Database||A temporal database is a database with built-in support for handling data involving time, being related to Slowly changing dimension concept, for example a temporal data model and a temporal version of Structured Query Language (SQL). More specifically the temporal aspects usually include valid time and transaction time. These attributes can be combined to form bitemporal data. Valid time is the time period during which a fact is true with respect to the real world. Transaction time is the time period during which a fact stored in the database is considered to be true. Bitemporal data combines both Valid and Transaction Time. It is possible to have timelines other than Valid Time and Transaction Time, such as Decision Time, in the database. In that case the database is called a multitemporal database as opposed to a bitemporal database. However, this approach introduces additional complexities such as dealing with the validity of (foreign) keys. Temporal databases are in contrast to current databases, which store only facts which are believed to be true at the current time.|
|Temporal Difference Learning
|Temporal difference (TD) learning is a prediction method. It has been mostly used for solving the reinforcement learning problem. ‘TD learning is a combination of Monte Carlo ideas and dynamic programming (DP) ideas.’ TD resembles a Monte Carlo method because it learns by sampling the environment according to some policy. TD is related to dynamic programming techniques because it approximates its current estimate based on previously learned estimates (a process known as bootstrapping). The TD learning algorithm is related to the temporal difference model of animal learning.
Adaptive Lambda Least-Squares Temporal Difference Learning
|Temporal Exponential Random Graph Model
|Temporal Multinomial Mixture
|Evolutionary clustering aims at capturing the temporal evolution of clusters. This issue is particularly important in the context of social media data that are naturally temporally driven. In this paper, we propose a new probabilistic model-based evolutionary clustering technique. The Temporal Multinomial Mixture (TMM) is an extension of classical mixture model that optimizes feature co-occurrences in the trade-off with temporal smoothness. Our model is evaluated for two recent case studies on opinion aggregation over time. We compare four different probabilistic clustering models and we show the superiority of our proposal in the task of instance-oriented clustering.|
|Temporal Network Autocorrelation Models
|Temporal Overdrive Recurrent Neural Network||In this work we present a novel recurrent neural network architecture designed to model systems characterized by multiple characteristic timescales in their dynamics. The proposed network is composed by several recurrent groups of neurons that are trained to separately adapt to each timescale, in order to improve the system identification process. We test our framework on time series prediction tasks and we show some promising, preliminary results achieved on synthetic data. To evaluate the capabilities of our network, we compare the performance with several state-of-the-art recurrent architectures.|
|Temporal Regularized Matrix Factorization
|Matrix factorization approaches have been applied to a variety of applications, from recommendation systems to multi-label learning. Standard low rank matrix factorization methods fail in cases when the data can be modeled as a time series, since they do not take into account the dependencies among factors, while EM algorithms designed for time series data are inapplicable to large multiple time series data. To overcome this, matrix factorization approaches are augmented with dynamic linear model based regularization frameworks. A major drawback in such approaches is that the exact dependencies between the latent factors are assumed to be known. In this paper, we introduce a Temporal Regularized Matrix Factorization (TRMF) method, an efficient alternating minimization scheme that not only learns the latent time series factors, but also the dependencies among the latent factors. TRMF is highly general, and subsumes several existing matrix factorization approaches for time series data. We make interesting connections to graph based matrix factorization methods in the context of learning the dependencies. Experiments on both real and synthetic data show that TRMF is highly scalable, and outperforms several existing approaches used for common large scale time series tasks.|
|TensiStrength||Computer systems need to be able to react to stress in order to perform optimally on some tasks. This article describes TensiStrength, a system to detect the strength of stress and relaxation expressed in social media text messages. TensiStrength uses a lexical approach and a set of rules to detect direct and indirect expressions of stress or relaxation, particularly in the context of transportation. It is slightly more effective than a comparable sentiment analysis program, although their similar performances occur despite differences on almost half of the tweets gathered. The effectiveness of TensiStrength depends on the nature of the tweets classified, with tweets that are rich in stress-related terms being particularly problematic. Although generic machine learning methods can give better performance than TensiStrength overall, they exploit topic-related terms in a way that may be undesirable in practical applications and that may not work as well in more focused contexts. In conclusion, TensiStrength and generic machine learning approaches work well enough to be practical choices for intelligent applications that need to take advantage of stress information, and the decision about which to use depends on the nature of the texts analysed and the purpose of the task.|
|Tensor Methods||Tensors are generalizations of matrices that let you look beyond pairwise relationships to higher-dimensional models (a matrix is a second-order tensor). For instance, one can examine patterns between any three (or more) dimensions in data sets. In a text mining application, this leads to models that incorporate the co-occurrence of three or more words, and in social networks, you can use tensors to encode arbitrary degrees of influence (e.g., ‘friend of friend of friend’ of a user).
Tensors, as generalizations of vectors and matrices, have become increasingly popular in different areas of machine learning and data mining, where they are employed to approach a diverse number of difficult learning and analysis tasks. Prominent examples include learning on multi-relational data and large-scale knowledge bases, recommendation systems, computer vision, mining boolean data, neuroimaging or the analysis of time-varying networks. The success of tensors methods is strongly related to their ability to efficiently model, analyse and predict data with multiple modalities. To address specific challenges and problems, a variety of methods has been developed in different fields of application.
|Tensor Switching Networks
|We present a novel neural network algorithm, the Tensor Switching (TS) network, which generalizes the Rectified Linear Unit (ReLU) nonlinearity to tensor-valued hidden units. The TS network copies its entire input vector to different locations in an expanded representation, with the location determined by its hidden unit activity. In this way, even a simple linear readout from the TS representation can implement a highly expressive deep-network-like function. The TS network hence avoids the vanishing gradient problem by construction, at the cost of larger representation size. We develop several methods to train the TS network, including equivalent kernels for infinitely wide and deep TS networks, a one-pass linear learning algorithm, and two backpropagation-inspired representation learning algorithms. Our experimental results demonstrate that the TS network is indeed more expressive and consistently learns faster than standard ReLU networks.|
|TensorFlow||TensorFlow is an interface for expressing machine learning algorithms, and an implementation for executing such algorithms. A computation expressed using TensorFlow can be executed with little or no change on a wide variety of heterogeneous systems, ranging from mobile devices such as phones and tablets up to large-scale distributed systems of hundreds of machines and thousands of computational devices such as GPU cards. The system is flexible and can be used to express a wide variety of algorithms, including training and inference algorithms for deep neural network models, and it has been used for conducting research and for deploying machine learning systems into production across more than a dozen areas of computer science and other fields, including speech recognition, computer vision, robotics, information retrieval, natural language processing, geographic information extraction, and computational drug discovery. This paper describes the TensorFlow interface and an implementation of that interface that we have built at Google. The TensorFlow API and a reference implementation were released as an open-source package under the Apache 2.0 license in November, 2015 and are available at http://www.tensorflow.org.|
|Tensorial Mixture Models||We introduce a generative model, we call Tensorial Mixture Models (TMMs) based on mixtures of basic component distributions over local structures (e.g. patches in an image) where the dependencies between the local-structures are represented by a ‘priors tensor’ holding the prior probabilities of assigning a component distribution to each local-structure. In their general form, TMMs are intractable as the prior tensor is typically of exponential size. However, when the priors tensor is decomposed it gives rise to an arithmetic circuit which in turn transforms the TMM into a Convolutional Arithmetic Circuit (ConvAC). A ConvAC corresponds to a shallow (single hidden layer) network when the priors tensor is decomposed by a CP (sum of rank-1) approach and corresponds to a deep network when the decomposition follows the Hierarchical Tucker (HT) model. The ConvAC representation of a TMM possesses several attractive properties. First, the inference is tractable and is implemented by a forward pass through a deep network. Second, the architectural design of the model follows the deep networks community design, i.e., the structure of TMMs is determined by just two easily understood factors: size of pooling windows and number of channels. Finally, we demonstrate the effectiveness of our model when tackling the problem of classification with missing data, leveraging TMMs unique ability of tractable marginalization which leads to optimal classifiers regardless of the missingness distribution.|
|TensorLog||Large knowledge bases (KBs) are useful in many tasks, but it is unclear how to integrate this sort of knowledge into ‘deep’ gradient-based learning systems. To address this problem, we describe a probabilistic deductive database, called TensorLog, in which reasoning uses a differentiable process. In TensorLog, each clause in a logical theory is first converted into certain type of factor graph. Then, for each type of query to the factor graph, the message-passing steps required to perform belief propagation (BP) are ‘unrolled’ into a function, which is differentiable. We show that these functions can be composed recursively to perform inference in non-trivial logical theories containing multiple interrelated clauses and predicates. Both compilation and inference in TensorLog are efficient: compilation is linear in theory size and proof depth, and inference is linear in database size and the number of message-passing steps used in BP. We also present experimental results with TensorLog and discuss its relationship to other first-order probabilistic logics.|
|TensorLy||Tensor methods are gaining increasing traction in machine learning. However, there are scant to no resources available to perform tensor learning and decomposition in Python. To answer this need we developed TensorLy. TensorLy is a state of the art general purpose library for tensor learning. Written in Python, it aims at following the same standards adopted by the main projects of the Python scientific community and fully integrating with these. It allows for fast and straightforward tensor decomposition and learning and comes with exhaustive tests, thorough documentation and minimal dependencies. It can be easily extended and its BSD licence makes it suitable for both academic and commercial applications. TensorLy is available at https://…/tensorly.|
|Term Document Matrix||A document-term matrix or term-document matrix is a mathematical matrix that describes the frequency of terms that occur in a collection of documents. In a document-term matrix, rows correspond to documents in the collection and columns correspond to terms. There are various schemes for determining the value that each entry in the matrix should take. One such scheme is tf-idf. They are useful in the field of natural language processing.|
|Term Frequency – Inverse Document Frequency
|tf-idf, short for term frequency-inverse document frequency, is a numerical statistic that is intended to reflect how important a word is to a document in a collection or corpus. It is often used as a weighting factor in information retrieval and text mining. The tf-idf value increases proportionally to the number of times a word appears in the document, but is offset by the frequency of the word in the corpus, which helps to control for the fact that some words are generally more common than others. Variations of the tf-idf weighting scheme are often used by search engines as a central tool in scoring and ranking a document’s relevance given a user query. tf-idf can be successfully used for stop-words filtering in various subject fields including text summarization and classification. One of the simplest ranking functions is computed by summing the tf-idf for each query term; many more sophisticated ranking functions are variants of this simple model.|
|Termbase Exchange||TermBase eXchange (TBX) ist eine XML-basierte Auszeichnungssprache für den Austausch von Terminologiedaten, die meist in Terminologiedatenbanken verwaltet werden. Anwendungen, die dieses Format unterstützen, können Terminologiebestände untereinander austauschen und pflegen. Ursprünglich ein Standard der Localization Industry Standards Association (LISA), nahm sich die ISO des Standards an und überarbeitete und spezifizierte ihn in ISO 30042, welcher sich auf ISO 12620, ISO 12200 und ISO 16642 stützt.|
|Terminology Extraction||Terminology mining, term extraction, term recognition, or glossary extraction, is a subtask of information extraction. The goal of terminology extraction is to automatically extract relevant terms from a given corpus. In the semantic web era, a growing number of communities and networked enterprises started to access and interoperate through the internet. Modeling these communities and their information needs is important for several web applications, like topic-driven web crawlers, web services, recommender systems, etc. The development of terminology extraction is essential to the language industry. One of the first steps to model the knowledge domain of a virtual community is to collect a vocabulary of domain-relevant terms, constituting the linguistic surface manifestation of domain concepts. Several methods to automatically extract technical terms from domain-specific document warehouses have been described in the literature. Typically, approaches to automatic term extraction make use of linguistic processors (part of speech tagging, phrase chunking) to extract terminological candidates, i.e. syntactically plausible terminological noun phrases, NPs (e.g. compounds ‘credit card’, adjective-NPs ‘local tourist information office’, and prepositional-NPs ‘board of directors’ – in English, the first two constructs are the most frequent). Terminological entries are then filtered from the candidate list using statistical and machine learning methods. Once filtered, because of their low ambiguity and high specificity, these terms are particularly useful for conceptualizing a knowledge domain or for supporting the creation of a domain ontology. Furthermore, terminology extraction is a very useful starting point for semantic similarity, knowledge management, human translation and machine translation, etc.|
|Ternary Neural Networks
|The computation and storage requirements for Deep Neural Networks (DNNs) are usually high. This issue limit their deployability on ubiquitous computing devices such as smart phones or wearables. In this paper, we propose ternary neural networks (TNNs) in order to make deep learning more resource-efficient. We train these TNNs using a teacher-student approach. Using only ternary weights and ternary neurons, with a step activation function of two-thresholds, the student ternary network learns to mimic the behaviour of its teacher network. We propose a novel, layer-wise greedy methodology for training TNNs. During training, a ternary neural network inherently prunes the smaller weights by setting them to zero. This makes them even more compact thus more resource-friendly. We devise a purpose-built hardware design for TNNs and implement it on FPGA. The benchmark results with our purpose-built hardware running TNNs reveal that, with only 1.24 microjoules per image, we can achieve 97.76% accuracy with 5.37 microsecond latency and with a rate of 255K images per second on MNIST.|
|Ternary Plot / Ternary Diagram||A ternary plot, ternary graph, triangle plot, simplex plot, or de Finetti diagram is a barycentric plot on three variables which sum to a constant. It graphically depicts the ratios of the three variables as positions in an equilateral triangle.|
|Ternary Weight Neural Networks
|We introduce ternary weight networks (TWNs) – neural networks with weights constrained to +1, 0 and -1. The Euclidian distance between full (float or double) precision weights and the ternary weights along with a scaling factor is minimized. Besides, a threshold-based ternary function is optimized to get an approximated solution which can be fast and easily computed. TWNs have stronger expressive abilities than the recently proposed binary precision counterparts and are thus more effective than the latter. Meanwhile, TWNs achieve up to 16× or 32× model compression rate and need fewer multiplications compared with the full precision counterparts. Benchmarks on MNIST, CIFAR-10, and large scale ImageNet datasets show that the performance of TWNs is only slightly worse than the full precision counterparts but outperforms the analogous binary precision counterparts a lot.
➘ “Ternary Neural Networks”
|TerpreT||We study machine learning formulations of inductive program synthesis; that is, given input-output examples, synthesize source code that maps inputs to corresponding outputs. Our key contribution is TerpreT, a domain-specific language for expressing program synthesis problems. A TerpreT model is composed of a specification of a program representation and an interpreter that describes how programs map inputs to outputs. The inference task is to observe a set of input-output examples and infer the underlying program. From a TerpreT model we automatically perform inference using four different back-ends: gradient descent (thus each TerpreT model can be seen as defining a differentiable interpreter), linear program (LP) relaxations for graphical models, discrete satisfiability solving, and the Sketch program synthesis system. TerpreT has two main benefits. First, it enables rapid exploration of a range of domains, program representations, and interpreter models. Second, it separates the model specification from the inference algorithm, allowing proper comparisons between different approaches to inference. We illustrate the value of TerpreT by developing several interpreter models and performing an extensive empirical comparison between alternative inference algorithms on a variety of program models. To our knowledge, this is the first work to compare gradient-based search over program space to traditional search-based alternatives. Our key empirical finding is that constraint solvers dominate the gradient descent and LP-based formulations. This is a workshop summary of a longer report at arXiv:1608.04428|
|Test for Excess Significance
|In any series of typically-powered experiments, we expect some to fail to be non-significant due to sampling error, even if a true effect exists. If we see a series of five experiments, and they are all significant, one thinks that either they are either very high powered, the authors got lucky, or there are some nonsignificant studies missing. For many sets of studies, the first seems implausible because the effect sizes are small; the last is important, because if it is true then the picture we get of the results is misleading.
|Test Set||A test set is a set of data used in various areas of information science to assess the strength and utility of a predictive relationship. Test sets are used in artificial intelligence, machine learning, genetic programming and statistics. In all these fields, a test set has much the same role.|
|Text Data Processing
|Text Mining||Text mining, also referred to as text data mining, roughly equivalent to text analytics, refers to the process of deriving high-quality information from text. High-quality information is typically derived through the devising of patterns and trends through means such as statistical pattern learning. Text mining usually involves the process of structuring the input text (usually parsing, along with the addition of some derived linguistic features and the removal of others, and subsequent insertion into a database), deriving patterns within the structured data, and finally evaluation and interpretation of the output. ‘High quality’ in text mining usually refers to some combination of relevance, novelty, and interestingness. Typical text mining tasks include text categorization, text clustering, concept/entity extraction, production of granular taxonomies, sentiment analysis, document summarization, and entity relation modeling (i.e., learning relations between named entities).|
|TextBoxes||This paper presents an end-to-end trainable fast scene text detector, named TextBoxes, which detects scene text with both high accuracy and efficiency in a single network forward pass, involving no post-process except for a standard non-maximum suppression. TextBoxes outperforms competing methods in terms of text localization accuracy and is much faster, taking only 0.09s per image in a fast implementation. Furthermore, combined with a text recognizer, TextBoxes significantly outperforms state-of-the-art approaches on word spotting and end-to-end text recognition tasks.|
|Textology||A Textology is a graph of word clusters connected by co-occurrence relations.|
|Speech Synthesis is the artificial production of human speech. A computer system used for this purpose is called a speech computer or speech synthesizer, and can be implemented in software or hardware products. A text-to-speech (TTS) system converts normal language text into speech; other systems render symbolic linguistic representations like phonetic transcriptions into speech. Synthesized speech can be created by concatenating pieces of recorded speech that are stored in a database. Systems differ in the size of the stored speech units; a system that stores phones or diphones provides the largest output range, but may lack clarity. For specific usage domains, the storage of entire words or sentences allows for high-quality output. Alternatively, a synthesizer can incorporate a model of the vocal tract and other human voice characteristics to create a completely ‘synthetic’ voice output. The quality of a speech synthesizer is judged by its similarity to the human voice and by its ability to be understood clearly. An intelligible text-to-speech program allows people with visual impairments or reading disabilities to listen to written works on a home computer. Many computer operating systems have included speech synthesizers since the early 1990s. Overview of a typical TTS system Automatic announcement Menu 0:00 A synthetic voice announcing an arriving train in Sweden. Problems playing this file? See media help. Problems playing this file? See media help. A text-to-speech system (or ‘engine’) is composed of two parts: a front-end and a back-end. The front-end has two major tasks. First, it converts raw text containing symbols like numbers and abbreviations into the equivalent of written-out words. This process is often called text normalization, pre-processing, or tokenization. The front-end then assigns phonetic transcriptions to each word, and divides and marks the text into prosodic units, like phrases, clauses, and sentences. The process of assigning phonetic transcriptions to words is called text-to-phoneme or grapheme-to-phoneme conversion. Phonetic transcriptions and prosody information together make up the symbolic linguistic representation that is output by the front-end. The back-end – often referred to as the synthesizer – then converts the symbolic linguistic representation into sound. In certain systems, this part includes the computation of the target prosody (pitch contour, phoneme durations), which is then imposed on the output speech.|
|Textures.js||SVG patterns for Data Visualization|
|Theano||Theano is a Python library that allows you to define, optimize, and evaluate mathematical expressions involving multi-dimensional arrays efficiently. Theano features:
• tight integration with NumPy – Use numpy.ndarray in Theano-compiled functions.
• transparent use of a GPU – Perform data-intensive calculations up to 140x faster than with CPU.(float32 only)
• efficient symbolic differentiation – Theano does your derivatives for function with one or many inputs.
• speed and stability optimizations – Get the right answer for log(1+x) even when x is really tiny.
• dynamic C code generation – Evaluate expressions faster.
• extensive unit-testing and self-verification – Detect and diagnose many types of mistake.
Theano has been powering large-scale computationally intensive scientific investigations since 2007. But it is also approachable enough to be used in the classroom (IFT6266 at the University of Montreal).
|Thematic Map||Thematic maps are geographical maps in which statistical data are visualized. A thematic map is a type of map especially designed to show a particular theme connected with a specific geographic area. These maps ‘can portray physical, social, political, cultural, economic, sociological, agricultural, or any other aspects of a city, state, region, nation, or continent’.
|Theta Method||Accurate and robust forecasting methods for univariate time series are very important when the objective is to produce estimates for a large number of time series. In this context, the Theta method called researchers attention due its performance in the largest up-to-date forecasting competition, the M3-Competition. Theta method proposes the decomposition of the deseasonalised data into two ‘theta lines’. The first theta line removes completely the curvatures of the data, thus being a good estimator of the long-term trend component. The second theta line doubles the curvatures of the series, as to better approximate the short-term behaviour.
|Thick Data||Thick Data: ethnographic approaches that uncover the meaning behind Big Data visualization and analysis. Thick Data analysis primarily relies on human brain power to process a small “N” while big data analysis requires computational power (of course with humans writing the algorithms) to process a large “N”. Big Data reveals insights with a particular range of data points, while Thick Data reveals the social context of and connections between data points. Big Data delivers numbers; thick data delivers stories. Big data relies on machine learning; thick data relies on human learning.|
|Thingscoop||Thingscoop is a command-line utility for analyzing videos semantically – that means searching, filtering, and describing videos based on objects, places, and other things that appear in them. When you first run thingscoop on a video file, it uses a convolutional neural network to create an ‘index’ of what’s contained in the every second of the input by repeatedly performing image classification on a frame-by-frame basis. Once an index for a video file has been created, you can search (i.e. get the start and end times of the regions in the video matching the query) and filter (i.e. create a supercut of the matching regions) the input using arbitrary queries. Thingscoop uses a very basic query language that lets you to compose queries that test for the presence or absence of labels with the logical operators ! (not), || (or) and && (and). For example, to search a video the presence of the sky and the absence of the ocean: thingscoop search ‘sky && !ocean’ <file>. Right now two models are supported by thingscoop: vgg_imagenet uses the architecture described in ‘Very Deep Convolutional Networks for Large-Scale Image Recognition’ to recognize objects from the ImageNet database, and googlenet_places uses the architecture described in ‘Going Deeper with Convolutions’ to recognize settings and places from the MIT Places database. You can specify which model you’d like to use by running thingscoop models use <model>, where <model> is either vgg_imagenet or googlenet_places. More models will be added soon. Thingscoop is based on Caffe, an open-source deep learning framework.
|Thompson Sampling||In artificial intelligence, Thompson sampling, named after William R. Thompson, is a heuristic for choosing actions that addresses the exploration-exploitation dilemma in the multi-armed bandit problem. It consists in choosing the action that maximizes the expected reward with respect to a randomly drawn belief.|
|Thouless-Anderson-Palmer Gibbs Free Energy
(TAP Gibbs Free Energy)
|The adaptive TAP Gibbs free energy for a general densely connected probabilistic model with quadratic interactions and arbritary single site constraints is derived. We show how a specific sequential minimization of the free energy leads to a generalization ofMinka’s expectation propagation. Lastly, we derive a sparse representation version of the sequential algorithm. The usefulness of the approach is demonstrated on classification and density estimation with Gaussian processes and on an independent component analysis problem.|
|Threading Building Blocks
|Threading Building Blocks (TBB) is a C++ template library developed by Intel for writing software programs that take advantage of multi-core processors. The library consists of data structures and algorithms that allow a programmer to avoid some complications arising from the use of native threading packages such as POSIX threads, Windows threads, or the portable Boost Threads in which individual threads of execution are created, synchronized, and terminated manually. Instead the library abstracts access to the multiple processors by allowing the operations to be treated as “tasks”, which are allocated to individual cores dynamically by the library’s run-time engine, and by automating efficient use of the CPU cache. A TBB program creates, synchronizes and destroys graphs of dependent tasks according to algorithms, i.e. high-level parallel programming paradigms (a.k.a. Algorithmic Skeletons). Tasks are then executed respecting graph dependencies. This approach groups TBB in a family of solutions for parallel programming aiming to decouple the programming from the particulars of the underlying machine.|
|Three-Mode Principal Components Analysis||In multivariate analysis the data have usually two way and/or two modes. This book treats prinicipal component analysis of data which can be characterised by three-ways and/or modes, like subjects by variables by conditions or occasions. The book extends the work on three-mode factor analysis by Tucker and the work on individual differences scaling by Carroll and colleagues. The many examples give a true feeling of the working of the techniques.
|Tidy Data||Tidy datasets are easy to manipulate, model and visualise, and have a specific structure: each variable is a column, each observation is a row, and each type of observational unit is a table.|
|Tikhonov Regularization||Tikhonov regularization, named for Andrey Tikhonov, is the most commonly used method of regularization of ill-posed problems. In statistics, the method is known as ridge regression, and with multiple independent discoveries, it is also variously known as the Tikhonov-Miller method, the Phillips-Twomey method, the constrained linear inversion method, and the method of linear regularization. It is related to the Levenberg-Marquardt algorithm for non-linear least-squares problems.|
|Time Oriented Language
|TOL is the Time Oriented Language. It is a programming language dedicated to the world of statistics and focused on time series analysis and stochastic processes. It is a declarative language based on two key features: simple syntactical rules and powerful set of extensible data types and functions. TOL is callable by a small text console, but there is also a graphical interface to easily handle all language’s tools and functions, providing powerful graphical capacities. TOL is distributed under the GNU GPL license.
|Time Series Analysis / Time Series||A time series is a sequence of data points, measured typically at successive points in time spaced at uniform time intervals. Examples of time series are the daily closing value of the Dow Jones Industrial Average and the annual flow volume of the Nile River at Aswan. Time series are very frequently plotted via line charts. Time series are used in statistics, signal processing, pattern recognition, econometrics, mathematical finance, weather forecasting, earthquake prediction, electroencephalography, control engineering, astronomy, and communications engineering.|
|Time Series Database
|A time series database (TSDB) is a software system that is optimized for handling time series data, arrays of numbers indexed by time (a datetime or a datetime range). In some fields these time series are called profiles, curves, or traces. A time series of stock prices might be called a price curve. A time series of energy consumption might be called a load profile. A log of temperature values over time might be called a temperature trace. Despite the disparate names, many of the same mathematical operations, queries, or database transactions are useful for analysing all of them. The implementation of a database that can correctly, reliably, and efficiently implement these operations must be specialized for time-series data. TSDBs are databases that are optimized for time series data. Software with complex logic or business rules and high transaction volume for time series data may not be practical with traditional relational database management systems. Flat file databases are not a viable option either, if the data and transaction volume reaches a maximum threshold determined by the capacity of individual servers (processing power and storage capacity). Queries for historical data, replete with time ranges and roll ups and arbitrary time zone conversions are difficult in a relational database. Compositions of those rules are even more difficult. This is a problem compounded by the free nature of relational systems themselves. Many relational systems are often not modelled correctly with respect to time series data. TSDBs on the other hand impose a model and this allows them to provide more features for doing so. Ideally, these repositories are often natively implemented using specialized database algorithms. However, it is possible to store time series as binary large objects (BLOBs) in a relational database or by using a VLDB approach coupled with a pure star schema. Efficiency is often improved if time is treated as a discrete quantity rather than as a continuous mathematical dimension. Database joins across multiple time series data sets is only practical when the time tag associated with each data entry spans the same set of discrete times for all data sets across which the join is performed.|
|Time-Lapse Mining||We introduce an approach for synthesizing time-lapse videos of popular landmarks from large community photo collections. The approach is completely automated and leverages the vast quantity of photos available online. First, we cluster 86 million photos into landmarks and popular viewpoints. Then, we sort the photos by date and warp each photo onto a common viewpoint. Finally, we stabilize the appearance of the sequence to compensate for lighting effects and minimize flicker. Our resulting time-lapses show diverse changes in the world’s most popular sites, like glaciers shrinking, skyscrapers being constructed, and waterfalls changing course.|
|Time-Weighted Dynamic Time Warping
|Dynamic time warping (DTW), which finds the minimum path by providing non-linear alignments between two time series, has been widely used as a distance measure for time series classification and clustering. However, DTW does not account for the relative importance regarding the phase difference between a reference point and a testing point. This may lead to misclassification especially in applications where the shape similarity between two sequences is a major consideration for an accurate recognition. Therefore, we propose a novel distance measure, called a weighted DTW (WDTW), which is a penalty-based DTW. Our approach penalizes points with higher phase difference between a reference point and a testing point in order to prevent minimum distance distortion caused by outliers. The rationale underlying the proposed distance measure is demonstrated with some illustrative examples. A new weight function, called the modified logistic weight function (MLWF), is also proposed to systematically assign weights as a function of the phase difference between a reference point and a testing point. By applying different weights to adjacent points, the proposed algorithm can enhance the detection of similarity between two time series. We show that some popular distance measures such as DTW and Euclidean distance are special cases of our proposed WDTW measure. We extend the proposed idea to other variants of DTW such as derivative dynamic time warping (DDTW) and propose the weighted version of DDTW. We have compared the performances of our proposed procedures with other popular approaches using public data sets available through the UCR Time Series Data Mining Archive for both time series classification and clustering problems. The experimental results indicate that the proposed approaches can achieve improved accuracy for time series classification and clustering problems.
➚ “Dynamic Time Warping”
|Topic Detection and Tracking
|Topic Detection and Tracking (TDT) is a Body of Research and an Evaluation Paradigm that Addresses Event-Based Organization of Broadcast News. The TDT Evaluation Tasks of Tracking, Cluster Detection, and First Story Detection are Each Information Filtering Technology in the Sense That They Require That ‘yes or no’ Decisions be Made on a Stream of News Stories Before Additional Stories Have Arrived.
|Topic Model||In machine learning and natural language processing, a topic model is a type of statistical model for discovering the abstract “topics” that occur in a collection of documents. Intuitively, given that a document is about a particular topic, one would expect particular words to appear in the document more or less frequently: “dog” and “bone” will appear more often in documents about dogs, “cat” and “meow” will appear in documents about cats, and “the” and “is” will appear equally in both. A document typically concerns multiple topics in different proportions; thus, in a document that is 10% about cats and 90% about dogs, there would probably be about 9 times more dog words than cat words. A topic model captures this intuition in a mathematical framework, which allows examining a set of documents and discovering, based on the statistics of the words in each, what the topics might be and what each document’s balance of topics is.|
|TopicRNN||In this paper, we propose TopicRNN, a recurrent neural network (RNN)-based language model designed to directly capture the global semantic meaning relating words in a document via latent topics. Because of their sequential nature, RNNs are good at capturing the local structure of a word sequence – both semantic and syntactic – but might face difficulty remembering long-range dependencies. Intuitively, these long-range dependencies are of semantic nature. In contrast, latent topic models are able to capture the global underlying semantic structure of a document but do not account for word ordering. The proposed TopicRNN model integrates the merits of RNNs and latent topic models: it captures local (syntactic) dependencies using an RNN and global (semantic) dependencies using latent topics. Unlike previous work on contextual RNN language modeling, our model is learned end-to-end. Empirical results on word prediction show that TopicRNN outperforms existing contextual RNN baselines. In addition, TopicRNN can be used as an unsupervised feature extractor for documents. We do this for sentiment analysis and report a new state-of-the-art error rate on the IMDB movie review dataset that amounts to a $13.3\%$ improvement over the previous best result. Finally TopicRNN also yields sensible topics, making it a useful alternative to document models such as latent Dirichlet allocation.|
|Topological Anomaly Detection
|The technique is essentially a density based outlier detection algorithm that, instead of calculating local densities, constructs a graph of the data using nearest-neighbors. The algorithm is different from other kNN outlier detection algorithms in that instead of setting ‘k’ as a parameter, you instead set a maximal inter-observation distance (called the graph “resolution” by Gartley and Basener). If the distance between two points is less than the graph resolution, add an edge between those two observations to the graph. Once the full graph is constructed, determine which connected components comprise the “background” of the data by setting some threshold percentage of observations ‘p’: any components with fewer than ‘p’ observations is considered an anomalous component, and all the observations (nodes) in this component are outliers.|
|Topological Data Analysis
|Topological data analysis (TDA) is a new area of study aimed at having applications in areas such as data mining and computer vision. The main problems are:
1. how one infers high-dimensional structure from low-dimensional representations; and
2. how one assembles discrete points into global structure.
The human brain can easily extract global structure from representations in a strictly lower dimension, i.e. we infer a 3D environment from a 2D image from each eye. The inference of global structure also occurs when converting discrete data into continuous images, e.g. dot-matrix printers and televisions communicate images via arrays of discrete points.
The main method used by topological data analysis is:
1. Replace a set of data points with a family of simplicial complexes, indexed by a proximity parameter.
2. Analyse these topological complexes via algebraic topology – specifically, via the theory of persistent homology.
3. Encode the persistent homology of a data set in the form of a parameterized version of a Betti number which is called a persistence diagram or barcode.
|Topological Sorting||In computer science, a topological sort (sometimes abbreviated topsort or toposort) or topological ordering of a directed graph is a linear ordering of its vertices such that for every directed edge uv from vertex u to vertex v, u comes before v in the ordering. For instance, the vertices of the graph may represent tasks to be performed, and the edges may represent constraints that one task must be performed before another; in this application, a topological ordering is just a valid sequence for the tasks. A topological ordering is possible if and only if the graph has no directed cycles, that is, if it is a directed acyclic graph (DAG). Any DAG has at least one topological ordering, and algorithms are known for constructing a topological ordering of any DAG in linear time.|
|Topology-Based Pathway Enrichment Analysis
|Torch||Torch is a scientific computing framework with wide support for machine learning algorithms. It is easy to use and efficient, thanks to an easy and fast scripting language, LuaJIT, and an underlying C/CUDA implementation. A summary of core features:
• a powerful N-dimensional array
• lots of routines for indexing, slicing, transposing, …
• amazing interface to C, via LuaJIT
• linear algebra routines
• neural network, and energy-based models
• numeric optimization routines
• Fast and efficient GPU support
• Embeddable, with ports to iOS, Android and FPGA backends
|Total Operating Characteristic
|The relative operating characteristic (ROC) is a popular statistical method to measure the association between observed and diagnosed presence of a characteristic. The diagnosis of presence or absence depends on whether the value of an index variable is above a threshold. ROC considers multiple possible thresholds. Each threshold generates a two-by-two contingency table, which contains four central entries: hits, misses, false alarms, and correct rejections. ROC reveals for each threshold only two ratios, hits/(hits + misses) and false alarms/(false alarms + correct rejections). This article introduces the total operating characteristic (TOC), which shows the total information in the contingency table for each threshold. TOC maintains desirable properties of ROC, while TOC reveals strictly more information than ROC in a manner that makes TOC more useful than ROC.
|Total Unduplicated Reach and Frequency
|TURF Analysis, an acronym for “Total Unduplicated Reach and Frequency”, is a type of statistical analysis used for providing estimates of media or market potential and devising optimal communication and placement strategies given limited resources. TURF analysis identifies the number of users reached by a communication, and how often they are reached. Although originally used by media schedulers to maximize reach and frequency of media spending across different items (print, broadcast, etc.), TURF is also now used to provide estimates of market potential. For example, if a company plans to market a new yogurt, they may consider launching ten possible flavors, but in reality, only three might be purchased in large quantities. The TURF algorithm identifies the optimal product line to maximize the total number of consumers who will purchase at least one SKU. Typically, when T.U.R.F. is undertaken for optimizing a product range, the analysis only looks at the reach of the product range (ignoring the Frequency component of TURF).
|Toyplot||Toyplot, the kid-sized plotting toolkit for Python with grownup-sized goals:
• Develop beautiful interactive, animated plots that embrace the unique capabilities of electronic publishing and support repoducibility.
• Create the best possible data graphics ‘out-of-the-box’, maximizing data ink and minimizing chartjunk.
• Provide a clean, minimalist interface that scientists and engineers will love.
The Toyplot Tutorial
|Training Set||A training set is a set of data used in various areas of information science to discover potentially predictive relationships. Training sets are used in artificial intelligence, machine learning, genetic programming, intelligent systems, and statistics. In all these fields, a training set has much the same role and is often used in conjunction with a test set.|
|Training, Validation, Test||Divide the data set into three parts:
• Training, Validation, Test (e.g. 50, 25, 25)
• Fit model on the TRAINING set
• Select model using VALIDATION set
• Assess prediction error using TEST set
|Transduction||In logic, statistical inference, and supervised learning, transduction or transductive inference is reasoning from observed, specific (training) cases to specific (test) cases. In contrast, induction is reasoning from observed training cases to general rules, which are then applied to the test cases. The distinction is most interesting in cases where the predictions of the transductive model are not achievable by any inductive model. Note that this is caused by transductive inference on different test sets producing mutually inconsistent predictions.|
|Transfer Function Model||Transfer function models describe the relationship between the inputs and outputs of a system using a ratio of polynomials. The model order is equal to the order of the denominator polynomial. The roots of the denominator polynomial are referred to as the model poles. The roots of the numerator polynomial are referred to as the model zeros. The parameters of a transfer function model are its poles, zeros and transport delays.|
|Transfer Learning||Machine learning and data mining techniques have been used in numerous real-world applications. An assumption of traditional machine learning methodologies is the training data and testing data are taken from the same domain, such that the input feature space and data distribution characteristics are the same. However, in some real-world machine learning scenarios, this assumption does not hold. There are cases where training data is expensive or difficult to collect. Therefore, there is a need to create high-performance learners trained with more easily obtained data from different domains. This methodology is referred to as transfer learning. This survey paper formally defines transfer learning, presents information on current solutions, and reviews applications applied to transfer learning. Lastly, there is information listed on software downloads for various transfer learning solutions and a discussion of possible future research work. The transfer learning solutions surveyed are independent of data size and can be applied to big data environments.
Recycling Deep Learning Models with Transfer Learning
|Transformation Forests||Regression models for supervised learning problems with a continuous target are commonly understood as models for the conditional mean of the target given predictors. This notion is simple and therefore appealing for interpretation and visualisation. Information about the whole underlying conditional distribution is, however, not available from these models. A more general understanding of regression models as models for conditional distributions allows much broader inference from such models, for example the computation of prediction intervals. Several random forest-type algorithms aim at estimating conditional distributions, most prominently quantile regression forests (Meinshausen, 2006, JMLR). We propose a novel approach based on a parametric family of distributions characterised by their transformation function. A dedicated novel ‘transformation tree’ algorithm able to detect distributional changes is developed. Based on these transformation trees, we introduce ‘transformation forests’ as an adaptive local likelihood estimator of conditional distribution functions. The resulting models are fully parametric yet very general and allow broad inference procedures, such as the model-based bootstrap, to be applied in a straightforward way.|
|Transformed Generalized Autoregressive Moving Average
|Transformed Generalized Autoregressive Moving Average (TGARMA) models were recently proposed to deal with non-additivity, non-normality and heteroscedasticity in real time series data. In this paper, a Bayesian approach is proposed for TGARMA models, thus extending the original model. We conducted a simulation study to investigate the performance of Bayesian estimation and Bayesian model selection criteria. In addition, a real dataset was analysed using the proposed approach.|
|translate2R||Many companies realizied the advantages of the open source programming language R. translate2R allows a fast and inexpensive migration to R. The manual migration of complex SPSS® scripts has always been tedious and error-prone, but with translate2R the task of translating by hand becomes a thing of the past. The automatic and comprehensible process of translating SPSS® code to R code with translate2R offers users an enormous number of new analytical opportunities. Besides the usual migration process translate2R allows programmers an easy start into programming with R. Make use of translate2R for the translation of scripts to R. We will be pleased to help you in terms of migration projects, or starting off with R.
|Transportation Theory||In mathematics and economics, transportation theory is a name given to the study of optimal transportation and allocation of resources. The problem was formalized by the French mathematician Gaspard Monge in 1781. In the 1920s A.N. Tolstoi was one of the first to study the transportation problem mathematically. In 1930, in the collection Transportation Planning Volume I for the National Commissariat of Transportation of the Soviet Union, he published a paper ‘Methods of Finding the Minimal Kilometrage in Cargo-transportation in space’. Major advances were made in the field during World War II by the Soviet/Russian mathematician and economist Leonid Kantorovich. Consequently, the problem as it is stated is sometimes known as the Monge-Kantorovich transportation problem. The linear programming formulation of the transportation problem is also known as the Hitchcock-Koopmans transportation problem.|
|Tree Recurrent Neural Network
|In this paper we develop a recurrent neural network (TreeRNN), which is designed to predict a tree rather than a linear sequence as is the case in conventional recurrent neural networks. Our model defines the probability of a sentence by estimating the generation probability of its dependency tree. We construct the tree incrementally by generating the left and right dependents of a node whose probability is computed using recurrent neural networks with shared hidden layers. Application of our model to two language modeling tasks shows that it outperforms or performs on par with related models.
|Tree Structured Vector Quantization
|1. First we apply k-means to get 2 centroids or prototypes within the entire data set. This provides us with a boundary between the two clusters, which would be a straight line based on the nearest neighbor rule.
2. Next, the data are assigned to the 2 centroids.
3. Then, for the data assigned to each centroid (call them a group), apply 2 centroid k-means to each group separately. The initialization can be done by splitting the centroid into two. Note that data points channeled to different centroids are treated separately.
4. Repeat the above step.
|Treelogy||We propose a novel tree classification system called Treelogy, that fuses deep representations with hand-crafted features obtained from leaf images to perform leaf-based plant classification. Key to this system are segmentation of the leaf from an untextured background, using convolutional neural networks (CNNs) for learning deep representations, extracting hand-crafted features with a number of image processing techniques, training a linear SVM with feature vectors, merging SVM and CNN results, and identifying the species from a dataset of 57 trees. Our classification results show that fusion of deep representations with hand-crafted features leads to the highest accuracy. The proposed algorithm is embedded in a smart-phone application, which is publicly available. Furthermore, our novel dataset comprised of 5408 leaf images is also made public for use of other researchers.|
|Treemapping||In information visualization and computing, treemapping is a method for displaying hierarchical data by using nested rectangles.|
|Trellis Graphics||Extremely useful approach for graphical exploratory data analysis (EDA). Allows to examine for complicated, multiple variable relationships. Types of plots:
• xyplot: scatterplot
• bwplot: boxplots
• stripplot: display univariate data against a numerical variable
• dotplot: similar to stripplot
• densityplot: kernel density estimates
• piechart: (Not available in R)
• splom: scatterplot matrices
• contourplot: contour plot of a surface on a regular grid
• levelplot: pseudo-colour plot of a surface on a rectangular grid
• wireframe: perspective plot of a surface evaluated on a regular grid
• cloud: perspective plot of a cloud of points (3D scatterplot)
|Trend Analysis||Trend Analysis is the practice of collecting information and attempting to spot a pattern, or trend, in the information. In some fields of study, the term ‘trend analysis’ has more formally defined meanings. Although trend analysis is often used to predict future events, it could be used to estimate uncertain events in the past, such as how many ancient kings probably ruled between two dates, based on data such as the average years which other known kings reigned.|
|In mathematics, a t-norm (also T-norm or, unabbreviated, triangular norm) is a kind of binary operation used in the framework of probabilistic metric spaces and in multi-valued logic, specifically in fuzzy logic. A t-norm generalizes intersection in a lattice and conjunction in logic. The name triangular norm refers to the fact that in the framework of probabilistic metric spaces t-norms are used to generalize triangle inequality of ordinary metric spaces.|
|Triple Exponential Smoothing||What happens if the data show trend and seasonality? We now introduce a third equation to take care of seasonality (sometimes called periodicity). The resulting set of equations is called the ‘Holt-Winters’ (HW) method after the names of the inventors.|
|Triplestore||A triplestore is a purpose-built database for the storage and retrieval of triples through semantic queries. A triple is a data entity composed of subject-predicate-object, like “Bob is 35” or “Bob knows Fred”. Much like a relational database, one stores information in a triplestore and retrieves it via a query language. Unlike a relational database, a triplestore is optimized for the storage and retrieval of triples. In addition to queries, triples can usually be imported/exported using Resource Description Framework (RDF) and other formats.|
|TrueSkill Ranking System||TrueSkill is a Bayesian ranking algorithm developed by Microsoft Research and used in the Xbox matchmaking system built to address some perceived flaws in the Elo rating system. It is an extension of the Glicko rating system to multiplayer games. The purpose of a ranking system is to both identify and track the skills of gamers in a game (mode) in order to be able to match them into competitive matches. The TrueSkill ranking system only uses the final standings of all teams in a game in order to update the skill estimates (ranks) of all gamers playing in this game. Ranking systems have been proposed for many sports but possibly the most prominent ranking system in use today is ELO.|
|In this paper we present TSSort, a probabilistic, noise resistant, quickly converging comparison sort algorithm based on Microsoft TrueSkill. The algorithm combines TrueSkill’s updating rules with a newly developed next item pair selection strategy, enabling it to beat standard sorting algorithms w.r.t. convergence speed and noise resistance, as shown in simulations. TSSort is useful if comparisons of items are expensive or noisy, or if intermediate results shall be approximately ordered.|
|Truncated Variance Reduction
|We present a new algorithm, truncated variance reduction (TruVaR), that treats Bayesian optimization (BO) and level-set estimation (LSE) with Gaussian processes in a unified fashion. The algorithm greedily shrinks a sum of truncated variances within a set of potential maximizers (BO) or unclassified points (LSE), which is updated based on confidence bounds. TruVaR is effective in several important settings that are typically non-trivial to incorporate into myopic algorithms, including pointwise costs and heteroscedastic noise. We provide a general theoretical guarantee for TruVaR covering these aspects, and use it to recover and strengthen existing results on BO and LSE. Moreover, we provide a new result for a setting where one can select from a number of noise levels having associated costs. We demonstrate the effectiveness of the algorithm on both synthetic and real-world data sets.|
|Truncation||In statistics, truncation results in values that are limited above or below, resulting in a truncated sample. Truncation is similar to but distinct from the concept of statistical censoring. A truncated sample can be thought of as being equivalent to an underlying sample with all values outside the bounds entirely omitted, with not even a count of those omitted being kept. With statistical censoring, a note would be recorded documenting which bound (upper or lower) had been exceeded and the value of that bound. With truncated sampling, no note is recorded.|
|Trust Region based Derivative Free Optimization
|In this work, we utilize a Trust Region based Derivative Free Optimization (DFO-TR) method to directly maximize the Area Under Receiver Operating Characteristic Curve (AUC), which is a nonsmooth, noisy function. We show that AUC is a smooth function, in expectation, if the distributions of the positive and negative data points obey a jointly normal distribution. The practical performance of this algorithm is compared to three prominent Bayesian optimization methods and random search. The presented numerical results show that DFO-TR surpasses Bayesian optimization and random search on various black-box optimization problem, such as maximizing AUC and hyperparameter tuning.|
|Tsallis Entropy||In physics, the Tsallis entropy is a generalization of the standard Boltzmann-Gibbs entropy. It was introduced in 1988 by Constantino Tsallis as a basis for generalizing the standard statistical mechanics. In the scientific literature, the physical relevance of the Tsallis entropy was occasionally debated. However, from the years 2000 on, an increasingly wide spectrum of natural, artificial and social complex systems have been identified which confirm the predictions and consequences that are derived from this nonadditive entropy, such as nonextensive statistical mechanics, which generalizes the Boltzmann-Gibbs theory.|
|Tsallis Entropy Information Metric
|The construction of efficient and effective decision trees remains a key topic in machine learning because of their simplicity and flexibility. A lot of heuristic algorithms have been proposed to construct near-optimal decision trees. Most of them, however, are greedy algorithms which have the drawback of obtaining only local optimums. Besides, common split criteria, e.g. Shannon entropy, Gain Ratio and Gini index, are also not flexible due to lack of adjustable parameters on data sets. To address the above issues, we propose a series of novel methods using Tsallis entropy in this paper. Firstly, a Tsallis Entropy Criterion (TEC) algorithm is proposed to unify Shannon entropy, Gain Ratio and Gini index, which generalizes the split criteria of decision trees. Secondly, we propose a Tsallis Entropy Information Metric (TEIM) algorithm for efficient construction of decision trees. The TEIM algorithm takes advantages of the adaptability of Tsallis conditional entropy and the reducing greediness ability of two-stage approach. Experimental results on UCI data sets indicate that the TEC algorithm achieves statistically significant improvement over the classical algorithms, and that the TEIM algorithm yields significantly better decision trees in both classification accuracy and tree complexity.|
|Tshinghua-alpha-Algorithm||Tshinghua-alpha algorithm which uses timestamps in the log files to construct a Petri net. It is related to the a algorithm, but uses a different approach. Details can be found in. It is interesting to note that this mining plug-in was the first plug-in developed by researchers outside of our research group. Researchers from Tshinghua University in China (Jianmin Wang and Wen Lijie) were able to develop and integrate this plug-in without any help or changes to the framework.|
|Tuatara GS1||The Tuatara GS1 algorithm relies on the more advanced Tuatara GS2 algorithm which generates relationships between objects based on principles in congnition related to Computational Theory of the Mind (CTM) (Pinker, S. 1997) and auto-association (Xijin Ge , Shuichi Iwata, 2002) and reinforced learning (Wenhuan, X., Nandi, A. K., Zhang, J., Evans, K. G. 2005) with exponential decays that follow the Golden Ratio F (Dunlap, Richard A. 1997).|
|Tube Convolutional Neural Network
|Deep learning has been demonstrated to achieve excellent results for image classification and object detection. However, the impact of deep learning on video analysis (e.g. action detection and recognition) has been limited due to complexity of video data and lack of annotations. Previous convolutional neural networks (CNN) based video action detection approaches usually consist of two major steps: frame-level action proposal detection and association of proposals across frames. Also, these methods employ two-stream CNN framework to handle spatial and temporal feature separately. In this paper, we propose an end-to-end deep network called Tube Convolutional Neural Network (T-CNN) for action detection in videos. The proposed architecture is a unified network that is able to recognize and localize action based on 3D convolution features. A video is first divided into equal length clips and for each clip a set of tube proposals are generated next based on 3D Convolutional Network (ConvNet) features. Finally, the tube proposals of different clips are linked together employing network flow and spatio-temporal action detection is performed using these linked video proposals. Extensive experiments on several video datasets demonstrate the superior performance of T-CNN for classifying and localizing actions in both trimmed and untrimmed videos compared to state-of-the-arts.|
|Tukey Mean-Difference Plot||The Tukey mean-difference plot is a scatter graph produced not for (x,y) values themselves, but for modified coordinates (X,Y) : X = (x+y)/2, Y = y-x.
Such a plot is useful, for example, to analyze data with strong correlation between x and y – when the (x,y) dots on the plot are close to the diagonal x=y. In this case, the value of the transformed variable X is about the same as x and y; and the variable Y shows the difference between x and y. The Tukey mean-difference plot is meaningful for two similar variables – that is, when both x and y are of the same physical dimension and expressed in the same units – e.g mass in pounds (or kilograms, …), length in foots (or meters, …). Otherwise, it makes no sense to sum up or subtract values of the variables x and y.
|Turek-Fletcher Model||Model-averaging is commonly used as a means of allowing for model uncertainty in parameter estimation. In the frequentist framework, a model-averaged estimate of a parameter is the weighted mean of the estimates from each of the candidate models, the weights typically being chosen using an information criterion. Current methods for calculating a model-averaged confidence interval assume approximate normality of the model-averaged estimate, i.e., they are Wald intervals. As in the single-model setting, we might improve the coverage performance of this interval by a one-to-one transformation of the parameter, obtaining a Wald interval, and then back-transforming the endpoints. However, a transformation that works in the single-model setting may not when model-averaging, due to the weighting and the need to estimate the weights. In the single-model setting, a natural alternative is to use a profile likelihood interval, which generally provides better coverage than a Wald interval. We propose a method for model-averaging a set of single-model profile likelihood intervals, making use of the link between profile likelihood intervals and Bayesian credible intervals. We illustrate its use in an example involving negative binomial regression, and perform two simulation studies to compare its coverage properties with the existing Wald intervals.|
|TVClust||In this paper, we propose a model-based clustering method (TVClust) that robustly incorporates noisy side information as soft-constraints and aims to seek a consensus between side information and the observed data. Our method is based on a nonparametric Bayesian hierarchical model that combines the probabilistic model for the data instance and the one for the side-information. An efficient Gibbs sampling algorithm is proposed for posterior inference. Using the small-variance asymptotics of our probabilistic model, we then derive a new deterministic clustering algorithm (RDP-means). It can be viewed as an extension of K-means that allows for the inclusion of side information and has the additional property that the number of clusters does not need to be specified a priori. Empirical studies have been carried out to compare our work with many constrained clustering algorithms from the literature on both a variety of data sets and under a variety of conditions such as using noisy side information and erroneous k values. The results of our experiments show strong results for our probabilistic and deterministic approaches under these conditions when compared to other algorithms in the literature.|
|Tweedie Distribution||In probability and statistics, the Tweedie distributions are a family of probability distributions which include the purely continuous normal and gamma distributions, the purely discrete scaled Poisson distribution, and the class of mixed compound Poisson-gamma distributions which have positive mass at zero, but are otherwise continuous. For any random variable Y that obeys a Tweedie distribution, the variance var(Y) relates to the mean E(Y) by the power law, where a and p are positive constants. The Tweedie distributions were named by Bent Joergensen after Maurice Tweedie, a statistician and medical physicist at the University of Liverpool, UK, who presented the first thorough study of these distributions in 1984.|
|Tweet2Vec||We present Tweet2Vec, a novel method for generating general-purpose vector representation of tweets. The model learns tweet embeddings using character-level CNN-LSTM encoder-decoder. We trained our model on 3 million, randomly selected English-language tweets. The model was evaluated using two methods: tweet semantic similarity and tweet sentiment categorization, outperforming the previous state-of-the-art in both tasks. The evaluations demonstrate the power of the tweet embeddings generated by our model for various tweet categorization tasks. The vector representations generated by our model are generic, and hence can be applied to a variety of tasks. Though the model presented in this paper is trained on English-language tweets, the method presented can be used to learn tweet embeddings for different languages.|
|Two one-Sided Tests
|Two one-sided tests (TOST) procedure to test equivalence for t-tests, correlations, and meta-analyses, including power analysis for t-tests and correlations. Allows you to specify equivalence bounds in raw scale units or in terms of effect sizes.
|Two Stage Least Squares
|Two-Stage least squares (2SLS) regression analysis is a statistical technique that is used in the analysis of structural equations. This technique is the extension of the OLS method. It is used when the dependent variable’s error terms are correlated with the independent variables. Additionally, it is useful when there are feedback loops in the model. In structural equations modeling, we use the maximum likelihood method to estimate the path coefficient. This technique is an alternative in SEM modeling to estimate the path coefficient. This technique can also be applied in quasi-experimental studies.
|Typicality and Eccentricity Data Analysis
|The typicality and eccentricity data analysis (TEDA) framework was put forward by Angelov (2013) <DOI:10.14313/JAMRIS_2-2014/16>. It has been further developed into multiple different techniques since, and provides a non-parametric way of determining how similar an observation, from a process that is not purely random, is to other observations generated by the process.