|U*F Clustering||In this paper, we propose a new clustering method consisting in automated “flood- fill segmentation” of the U*-matrix of a Self-Organizing Map after training. Using several artificial datasets as a benchmark, we find that the clustering results of our U*F method are good over a wide range of critical dataset types. Furthermore, comparison to standard clustering algorithms (K-means, single-linkage and Ward) directly applied on the same datasets show that each of the latter performs very bad on at least one kind of dataset, contrary to our U*F clustering method: while not always the best, U*F clustering has the great advantage of exhibiting consistently good results. Another advantage of U*F is that the computation cost of the SOM segmentation phase is negligible, contrary to other SOM-based clustering approaches which apply O(n2logn) standard clustering algorithms to the SOM prototypes. Finally, it should be emphasized that U*F clustering does not require a priori knowledge on the number of clusters, making it a real “cluster-mining” algorithm.|
|Ukkonen’s Algorithm||In computer science, Ukkonen’s algorithm is a linear-time, online algorithm for constructing suffix trees, proposed by Esko Ukkonen in 1995. The algorithm begins with an implicit suffix tree containing the first character of the string. Then it steps through the string adding successive characters until the tree is complete. This order addition of characters gives Ukkonen’s algorithm its “on-line” property. The original algorithm presented by P. Weiner proceeded backward from the last character to the first one from the shortest to the longest suffix. A simpler algorithm was found by Edward M. McCreight, going from the longest to the shortest suffix. The naive implementation for generating a suffix tree going forward requires O(n2) or even O(n3) time complexity in big O notation, where n is the length of the string. By exploiting a number of algorithmic techniques, Ukkonen reduced this to O(n) (linear) time, for constant-size alphabets, and O(n log n) in general, matching the runtime performance of the earlier two algorithms.|
|Uncertainty in Artificial Intelligence
|The Association for Uncertainty in Artificial Intelligence is a non-profit organization focused on organizing the annual Conference on Uncertainty in Artificial Intelligence (UAI) and, more generally, on promoting research in pursuit of advances in knowledge representation, learning and reasoning under uncertainty.
➚ “Association for Uncertainty in Artificial Intelligence”
|Uncertainty Quantification||Uncertainty quantification (UQ) is the science of quantitative characterization and reduction of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict the acceleration of a human body in a head-on crash with another car: even if we exactly knew the speed, small differences in the manufacturing of individual cars, how tightly every bolt has been tightened, etc., will lead to different results that can only be predicted in a statistical sense. Many problems in the natural sciences and engineering are also rife with sources of uncertainty. Computer experiments on computer simulations are the most common approach to study problems in uncertainty quantification.|
|Unconditional Maximum Likelihood Estimation
|Unconstrained Optimization||Unconstrained Optimization works, in general, by doing a search, starting at some initial values and taking steps that decrease (or for FindMaximum, increase) an objective or merit function.|
|Underfitting||Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. Intuitively, underfitting occurs when the model or the algorithm does not fit the data well enough. Specifically, underfitting occurs if the model or algorithm shows low variance but high bias. Underfitting is often a result of an excessively simple model. Both overfitting and underfitting lead to poor predictions on new data sets.
|Unigram Model||A unigram model used in information retrieval can be treated as the combination of several one-state finite automata. It splits the probabilities of different terms in a context. In this model, the probability to hit each word all depends on its own, so we only have one-state finite automata as units. For each automaton, we only have one way to hit its only state, assigned with one probability. Viewing from the whole model, the sum of all the one-state-hitting probabilities should be 1.|
|Unique Trait Combinations
|Unit of Analysis||One of the most important ideas in a research project is the unit of analysis. The unit of analysis is the major entity that you are analyzing in your study. For instance, any of the following could be a unit of analysis in a study:
• artifacts (books, photos, newspapers)
• geographical units (town, census tract, state)
• social interactions (dyadic relations, divorces, arrests)
Why is it called the ‘unit of analysis’ and not something else (like, the unit of sampling)? Because it is the analysis you do in your study that determines what the unit is. For instance, if you are comparing the children in two classrooms on achievement test scores, the unit is the individual child because you have a score for each child. On the other hand, if you are comparing the two classes on classroom climate, your unit of analysis is the group, in this case the classroom, because you only have a classroom climate score for the class as a whole and not for each individual student. For different analyses in the same study you may have different units of analysis. If you decide to base an analysis on student scores, the individual is the unit. But you might decide to compare average classroom performance. In this case, since the data that goes into the analysis is the average itself (and not the individuals’ scores) the unit of analysis is actually the group. Even though you had data at the student level, you use aggregates in the analysis. In many areas of social research these hierarchies of analysis units have become particularly important and have spawned a whole area of statistical analysis sometimes referred to as hierarchical modeling. This is true in education, for instance, where we often compare classroom performance but collected achievement data at the individual student level.
|Unit Root Processes||A unit root is a feature of processes that evolves through time that can cause problems in statistical inference involving time series models. A linear stochastic process has a unit root if 1 is a root of the process’s characteristic equation. Such a process is non-stationary. If the other roots of the characteristic equation lie inside the unit circle—that is, have a modulus (absolute value) less than one—then the first difference of the process will be stationary.
|Universal Approximation Theorem||In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. The theorem thus states that simple neural networks can represent a wide variety of interesting functions when given appropriate parameters; it does not touch upon the algorithmic learnability of those parameters. One of the first versions of the theorem was proved by George Cybenko in 1989 for sigmoid activation functions. Kurt Hornik showed in 1991 that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators. The output units are always assumed to be linear. For notational convenience, only the single output case will be shown. The general case can easily be deduced from the single output case.|
|Universal Numeric Fingerprint
|Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed. The signature is thus independent of the storage format. E.g., the same data object stored in, say, SPSS and Stata, will have the same UNF.
A universal numeric fingerprint is used to guarantee that a two digital objects (or parts thereof) in different formats represent the same intellectual object (or work). UNFs are formed by generating an approximation of the intellectual content of the object, putting this in a normalized form, and applying a cryptographic hash to produce a unique key. (Altman, et al. 2003)
|Unobserved Component Models
|A UCM decomposes the response series into components such as trend, seasons, cycles, and the regression effects due to predictor series.
|Unstructured Information Management Architecture
|UIMA stands for Unstructured Information Management Architecture. An OASIS standard as of March 2009, UIMA is to date the only industry standard for content analytics. Other general frameworks used for natural language processing include the General Architecture for Text Engineering (GATE) and the Natural Language Toolkit (NLTK).|
|Unum Number Format
|The unum (universal number) format is a floating point format proposed by John Gustafson as an alternative to the now ubiquitous IEEE 754 format. The proposal and justification are explained in his book The End of Error.
The two defining features of the unum format (while unum 2.0 is different) are:
• a variable-width storage format for both the significand and exponent, and
• an u-bit, which determines whether the unum corresponds to an exact number (u=0), or an interval between consecutive exact unums (u=1). In this way, the unums cover the entire extended real number line .
For performing computation with the format, Gustafson proposes using interval arithmetic with a pair of unums, what he calls an ubound, providing the guarantee that the resulting interval contains the exact solution.
Unum implementations have been explored in Julia. including unum 2.0 (or at least a modified version of his new proposal). Recently, unum has been explored in MATLAB.
The Unum Number Format: Mathematical Foundations, Implementation and Comparison to IEEE 754 Floating-Point Numbers
|Uplift Modeling||Uplift modelling, also known as incremental modelling, true lift modelling, or net modelling is a predictive modelling technique that directly models the incremental impact of a treatment (such as a direct marketing action) on an individual’s behaviour. Uplift modelling has applications in customer relationship management for up-sell, cross-sell and retention modelling. It has also been applied to political election and personalised medicine. Unlike the related Differential Prediction concept in psychology, Uplift Modelling assumes an active agent.|
|User Behavior Analytics
|User Behavior Analytics (UBA) is rocking this year’s security conferences. Rather than trying to build an ever stronger perimeter, the discussion has changed substantially. Security professionals are investing more resources than ever before into collecting and analyzing vast amounts of user-specific event and access logs which holds the promise of major security benefits including the opportunity to:
• Quickly identify anomalous user behaviors.
• Investigate a prioritized list of potential threats.
• Leverage machine learning techniques to isolate evolving threats.
• Minimize reliance on pre-defined rules or heuristics.
• Detect and respond to Insider Threats much faster.
The future of UBA is promising, however, with significant interest and hype surrounding the benefits of UBA for both enterprises and large organizations, how can someone begin to incorporate UBA into their existing security infrastructure?
➚ “Behavioral Analytics”
|User Generated Content
|User-generated content (UGC) refers to a variety of media available in a range of modern communications technologies. UGC is often produced through open collaboration: it is created by goal-oriented yet loosely coordinated participants, who interact to create a product or service of economic value, which they make available to contributors and non-contributors alike. User generated content (UGC) is collectively known as data originating from Facebook, LinkedIn, Twitter, Instagram, YouTube, and many other networking sites, the social media shared by users and the associated metadata.|