U*F Clustering In this paper, we propose a new clustering method consisting in automated “flood- fill segmentation” of the U*-matrix of a Self-Organizing Map after training. Using several artificial datasets as a benchmark, we find that the clustering results of our U*F method are good over a wide range of critical dataset types. Furthermore, comparison to standard clustering algorithms (K-means, single-linkage and Ward) directly applied on the same datasets show that each of the latter performs very bad on at least one kind of dataset, contrary to our U*F clustering method: while not always the best, U*F clustering has the great advantage of exhibiting consistently good results. Another advantage of U*F is that the computation cost of the SOM segmentation phase is negligible, contrary to other SOM-based clustering approaches which apply O(n2logn) standard clustering algorithms to the SOM prototypes. Finally, it should be emphasized that U*F clustering does not require a priori knowledge on the number of clusters, making it a real “cluster-mining” algorithm.
Ukkonen’s Algorithm In computer science, Ukkonen’s algorithm is a linear-time, online algorithm for constructing suffix trees, proposed by Esko Ukkonen in 1995. The algorithm begins with an implicit suffix tree containing the first character of the string. Then it steps through the string adding successive characters until the tree is complete. This order addition of characters gives Ukkonen’s algorithm its “on-line” property. The original algorithm presented by P. Weiner proceeded backward from the last character to the first one from the shortest to the longest suffix. A simpler algorithm was found by Edward M. McCreight, going from the longest to the shortest suffix. The naive implementation for generating a suffix tree going forward requires O(n2) or even O(n3) time complexity in big O notation, where n is the length of the string. By exploiting a number of algorithmic techniques, Ukkonen reduced this to O(n) (linear) time, for constant-size alphabets, and O(n log n) in general, matching the runtime performance of the earlier two algorithms.
Uncertainty in Artificial Intelligence
The Association for Uncertainty in Artificial Intelligence is a non-profit organization focused on organizing the annual Conference on Uncertainty in Artificial Intelligence (UAI) and, more generally, on promoting research in pursuit of advances in knowledge representation, learning and reasoning under uncertainty.
“Association for Uncertainty in Artificial Intelligence”
Uncertainty Quantification
Uncertainty quantification (UQ) is the science of quantitative characterization and reduction of uncertainties in both computational and real world applications. It tries to determine how likely certain outcomes are if some aspects of the system are not exactly known. An example would be to predict the acceleration of a human body in a head-on crash with another car: even if we exactly knew the speed, small differences in the manufacturing of individual cars, how tightly every bolt has been tightened, etc., will lead to different results that can only be predicted in a statistical sense. Many problems in the natural sciences and engineering are also rife with sources of uncertainty. Computer experiments on computer simulations are the most common approach to study problems in uncertainty quantification.
Book: Handbook of Uncertainty Quantification
Unconditional Maximum Likelihood Estimation
Unconstrained Optimization Unconstrained Optimization works, in general, by doing a search, starting at some initial values and taking steps that decrease (or for FindMaximum, increase) an objective or merit function.
Underfitting Underfitting occurs when a statistical model or machine learning algorithm cannot capture the underlying trend of the data. Intuitively, underfitting occurs when the model or the algorithm does not fit the data well enough. Specifically, underfitting occurs if the model or algorithm shows low variance but high bias. Underfitting is often a result of an excessively simple model. Both overfitting and underfitting lead to poor predictions on new data sets.
Unigram Model A unigram model used in information retrieval can be treated as the combination of several one-state finite automata. It splits the probabilities of different terms in a context. In this model, the probability to hit each word all depends on its own, so we only have one-state finite automata as units. For each automaton, we only have one way to hit its only state, assigned with one probability. Viewing from the whole model, the sum of all the one-state-hitting probabilities should be 1.
Unique Trait Combinations
Unit of Analysis One of the most important ideas in a research project is the unit of analysis. The unit of analysis is the major entity that you are analyzing in your study. For instance, any of the following could be a unit of analysis in a study:
• individuals
• groups
• artifacts (books, photos, newspapers)
• geographical units (town, census tract, state)
• social interactions (dyadic relations, divorces, arrests)
Why is it called the ‘unit of analysis’ and not something else (like, the unit of sampling)? Because it is the analysis you do in your study that determines what the unit is. For instance, if you are comparing the children in two classrooms on achievement test scores, the unit is the individual child because you have a score for each child. On the other hand, if you are comparing the two classes on classroom climate, your unit of analysis is the group, in this case the classroom, because you only have a classroom climate score for the class as a whole and not for each individual student. For different analyses in the same study you may have different units of analysis. If you decide to base an analysis on student scores, the individual is the unit. But you might decide to compare average classroom performance. In this case, since the data that goes into the analysis is the average itself (and not the individuals’ scores) the unit of analysis is actually the group. Even though you had data at the student level, you use aggregates in the analysis. In many areas of social research these hierarchies of analysis units have become particularly important and have spawned a whole area of statistical analysis sometimes referred to as hierarchical modeling. This is true in education, for instance, where we often compare classroom performance but collected achievement data at the individual student level.
Unit Root Processes A unit root is a feature of processes that evolves through time that can cause problems in statistical inference involving time series models. A linear stochastic process has a unit root if 1 is a root of the process’s characteristic equation. Such a process is non-stationary. If the other roots of the characteristic equation lie inside the unit circle—that is, have a modulus (absolute value) less than one—then the first difference of the process will be stationary.
Unitary Recurrent Neural Network
Universal Approximation Theorem In the mathematical theory of artificial neural networks, the universal approximation theorem states that a feed-forward network with a single hidden layer containing a finite number of neurons (i.e., a multilayer perceptron), can approximate continuous functions on compact subsets of Rn, under mild assumptions on the activation function. The theorem thus states that simple neural networks can represent a wide variety of interesting functions when given appropriate parameters; it does not touch upon the algorithmic learnability of those parameters. One of the first versions of the theorem was proved by George Cybenko in 1989 for sigmoid activation functions. Kurt Hornik showed in 1991 that it is not the specific choice of the activation function, but rather the multilayer feedforward architecture itself which gives neural networks the potential of being universal approximators. The output units are always assumed to be linear. For notational convenience, only the single output case will be shown. The general case can easily be deduced from the single output case.
Universal Denoising Network We design a novel network architecture for learning discriminative image models that are employed to efficiently tackle the problem of grayscale and color image denoising. Based on the proposed architecture, we introduce two different variants. The first network involves convolutional layers as a core component, while the second one relies instead on non-local filtering layers and thus it is able to exploit the inherent non-local self-similarity property of natural images. As opposed to most of the existing neural networks, which require the training of a specific model for each considered noise level, the proposed networks are able to handle a wide range of different noise levels, while they are very robust when the noise degrading the latent image does not match the statistics of the one used during training. The latter argument is supported by results that we report on publicly available images corrupted by unknown noise and which we compare against solutions obtained by alternative state-of-the-art methods. At the same time the introduced networks achieve excellent results under additive white Gaussian noise (AWGN), which are comparable to the current state-of-the-art network, while they depend on a more shallow architecture with the number of trained parameters being one order of magnitude smaller. These properties make the proposed networks ideal candidates to serve as sub-solvers on restoration methods that deal with general inverse imaging problems such as deblurring, demosaicking, superresolution, etc.
Universal Numeric Fingerprint
Universal Numerical Fingerprint (UNF) is a unique signature of the semantic content of a digital object. It is not simply a checksum of a binary data file. Instead, the UNF algorithm approximates and normalizes the data stored within. A cryptographic hash of that normalized (or canonicalized) representation is then computed. The signature is thus independent of the storage format. E.g., the same data object stored in, say, SPSS and Stata, will have the same UNF.
A universal numeric fingerprint is used to guarantee that a two digital objects (or parts thereof) in different formats represent the same intellectual object (or work). UNFs are formed by generating an approximation of the intellectual content of the object, putting this in a normalized form, and applying a cryptographic hash to produce a unique key. (Altman, et al. 2003)
Unobserved Component Models
A UCM decomposes the response series into components such as trend, seasons, cycles, and the regression effects due to predictor series.
Unstructured Information Management Architecture
UIMA stands for Unstructured Information Management Architecture. An OASIS standard as of March 2009, UIMA is to date the only industry standard for content analytics. Other general frameworks used for natural language processing include the General Architecture for Text Engineering (GATE) and the Natural Language Toolkit (NLTK).
Unsupervised Generative Adversarial Cross-Modal Hashing
Cross-modal hashing aims to map heterogeneous multimedia data into a common Hamming space, which can realize fast and flexible retrieval across different modalities. Unsupervised cross-modal hashing is more flexible and applicable than supervised methods, since no intensive labeling work is involved. However, existing unsupervised methods learn hashing functions by preserving inter and intra correlations, while ignoring the underlying manifold structure across different modalities, which is extremely helpful to capture meaningful nearest neighbors of different modalities for cross-modal retrieval. To address the above problem, in this paper we propose an Unsupervised Generative Adversarial Cross-modal Hashing approach (UGACH), which makes full use of GAN’s ability for unsupervised representation learning to exploit the underlying manifold structure of cross-modal data. The main contributions can be summarized as follows: (1) We propose a generative adversarial network to model cross-modal hashing in an unsupervised fashion. In the proposed UGACH, given a data of one modality, the generative model tries to fit the distribution over the manifold structure, and select informative data of another modality to challenge the discriminative model. The discriminative model learns to distinguish the generated data and the true positive data sampled from correlation graph to achieve better retrieval accuracy. These two models are trained in an adversarial way to improve each other and promote hashing function learning. (2) We propose a correlation graph based approach to capture the underlying manifold structure across different modalities, so that data of different modalities but within the same manifold can have smaller Hamming distance and promote retrieval accuracy. Extensive experiments compared with 6 state-of-the-art methods verify the effectiveness of our proposed approach.
Unum Number Format
The unum (universal number) format is a floating point format proposed by John Gustafson as an alternative to the now ubiquitous IEEE 754 format. The proposal and justification are explained in his book The End of Error.
The two defining features of the unum format (while unum 2.0 is different) are:
• a variable-width storage format for both the significand and exponent, and
• an u-bit, which determines whether the unum corresponds to an exact number (u=0), or an interval between consecutive exact unums (u=1). In this way, the unums cover the entire extended real number line .
For performing computation with the format, Gustafson proposes using interval arithmetic with a pair of unums, what he calls an ubound, providing the guarantee that the resulting interval contains the exact solution.
Unum implementations have been explored in Julia. including unum 2.0 (or at least a modified version of his new proposal). Recently, unum has been explored in MATLAB.
The Unum Number Format: Mathematical Foundations, Implementation and Comparison to IEEE 754 Floating-Point Numbers
Uplift Modeling Uplift modelling, also known as incremental modelling, true lift modelling, or net modelling is a predictive modelling technique that directly models the incremental impact of a treatment (such as a direct marketing action) on an individual’s behaviour. Uplift modelling has applications in customer relationship management for up-sell, cross-sell and retention modelling. It has also been applied to political election and personalised medicine. Unlike the related Differential Prediction concept in psychology, Uplift Modelling assumes an active agent.
uSer and Agent Model IntegrAtion
Task-oriented dialogue systems can efficiently serve a large number of customers and relieve people from tedious works. However, existing task-oriented dialogue systems depend on handcrafted actions and states or extra semantic labels, which sometimes degrades user experience despite the intensive human intervention. Moreover, current user simulators have limited expressive ability so that deep reinforcement Seq2Seq models have to rely on selfplay and only work in some special cases. To address those problems, we propose a uSer and Agent Model IntegrAtion (SAMIA) framework inspired by an observation that the roles of the user and agent models are asymmetric. Firstly, this SAMIA framework model the user model as a Seq2Seq learning problem instead of ranking or designing rules. Then the built user model is used as a leverage to train the agent model by deep reinforcement learning. In the test phase, the output of the agent model is filtered by the user model to enhance the stability and robustness. Experiments on a real-world coffee ordering dataset verify the effectiveness of the proposed SAMIA framework.
User Behavior Analytics
User Behavior Analytics (UBA) is rocking this year’s security conferences. Rather than trying to build an ever stronger perimeter, the discussion has changed substantially. Security professionals are investing more resources than ever before into collecting and analyzing vast amounts of user-specific event and access logs which holds the promise of major security benefits including the opportunity to:
• Quickly identify anomalous user behaviors.
• Investigate a prioritized list of potential threats.
• Leverage machine learning techniques to isolate evolving threats.
• Minimize reliance on pre-defined rules or heuristics.
• Detect and respond to Insider Threats much faster.
The future of UBA is promising, however, with significant interest and hype surrounding the benefits of UBA for both enterprises and large organizations, how can someone begin to incorporate UBA into their existing security infrastructure?
“Behavioral Analytics”
User Generated Content
User-generated content (UGC) refers to a variety of media available in a range of modern communications technologies. UGC is often produced through open collaboration: it is created by goal-oriented yet loosely coordinated participants, who interact to create a product or service of economic value, which they make available to contributors and non-contributors alike. User generated content (UGC) is collectively known as data originating from Facebook, LinkedIn, Twitter, Instagram, YouTube, and many other networking sites, the social media shared by users and the associated metadata.
User-Centric Composable Services Machine Learning (ML) techniques, such as Neural Network, are widely used in today’s applications. However, there is still a big gap between the current ML systems and users’ requirements. ML systems focus on the improve the performance of models in training, while individual users cares more about response time and expressiveness of the tool. Many existing research and product begin to move computation towards edge devices. Based on the numerical computing system Owl, we propose to build the Zoo system to support construction, compose, and deployment of ML models on edge and local devices.