|H2O||The Open Source In-Memory Prediction Engine for Big Data Science. H2O is an awesome machine learning framework. It is really great for data scientists and business analysts ‘who need scalable and fast machine learning’. H2O is completely open source and what makes it important is that works right of the box. There seems to be no easier way to start with scalable machine learning. It hast support for R, Python, Scala, Java and also has a REST API and a own WebUI. So you can use it perfectly for research but also in production environments. H2O is based on Apache Hadoop and Apache Spark which gives it enormous power with in-memory parallel processing.
Predict Social Network Influence with R and H2O Ensemble Learning
|Half-Life of Data||Radioactive substances have a half life. The half life is the amount of time it takes for the substance to lose half of its radioactivity. Half life is used more generally in physics as a way to estimate the rate of decay. We can apply exactly the same principle – the rate of decay – to business information. Like natural materials, data is subject to deterioration over time. In science, the half life of a given substance could be milliseconds. It could be many thousands of years. The half life of data has been measured, and it may be shorter than you were expecting.
|Hamiltonian Monte Carlo
|The random-walk behavior of many Markov Chain Monte Carlo (MCMC) algorithms makes Markov chain convergence to a target stationary distribution p(x) inefficient, resulting in slow mixing. Hamiltonian/Hybrid Monte Carlo (HMC), is a MCMC method that adopts physical system dynamics rather than a probability distribution to propose future states in the Markov chain. This allows the Markov chain to explore the target distribution much more efficiently, resulting in faster convergence. Here we introduce basic analytic and numerical concepts for simulation of Hamiltonian dynamics. We then show how Hamiltonian dynamics can be used as the Markov chain proposal function for an MCMC sampling algorithm (HMC).
➘ “Hybrid Monte Carlo”
MCMC using Hamiltonian Dynamics
|Hamming Distance||In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other.|
|HANA Data Scientist Tool||The Application Function Modeler 2.0 (AFM 2) is a graphical editor for complex data analysis pipelines in the HANA Studio. This tool is based on the HANA Data Scientist prototype developed at the HANA Platform Innovation Center in Potsdam, Germany. It is planned to be the next generation of the existing HANA Studio Application Function Modeler which was developed at the TIP CE&SP Algorithm Labs in Shanghai, China. The AFM 2 team consists of original and new developers from both locations.|
|HANA Graph Engine||The HANA Graph Engine implements graph data processing capabilities directly inside the Column Store Engine of the SAP HANA Database.|
|HANA Sizing||Check the HANA sizing overview to find the appropriate sizing method.|
|HANA Social Media Integration
|HANA-SMI is a reusable component on HANA XS that enables XS application developers the integration of social media providers (with an initial focus on SAP Jam) into their business application.|
|Well, really more of a 4-letter acronym, but a powerful advantage of DaaS is the ability to source hard-to-find data that has been aggregated from hundreds of Big Data sources. These data sets are highly targeted and go well beyond third party lists.|
|Harmony Search Algorithm
|In computer science and operations research, harmony search (HS) is a phenomenon-mimicking algorithm (also known as metaheuristic algorithm, soft computing algorithm or evolutionary algorithm) inspired by the improvisation process of musicians proposed by Zong Woo Geem in 2001. In the HS algorithm, each musician (= decision variable) plays (= generates) a note (= a value) for finding a best harmony (= global optimum) all together. Proponents claim the following merits:
• HS does not require differential gradients, thus it can consider discontinuous functions as well as continuous functions.
• HS can handle discrete variables as well as continuous variables.
• HS does not require initial value setting for the variables.
• HS is free from divergence.
• HS may escape local optima.
• HS may overcome the drawback of GA’s building block theory which works well only if the relationship among variables in a chromosome is carefully considered. If neighbor variables in a chromosome have weaker relationship than remote variables, building block theory may not work well because of crossover operation. However, HS explicitly considers the relationship using ensemble operation.
• HS has a novel stochastic derivative applied to discrete variables, which uses musician’s experiences as a searching direction.
• Certain HS variants do not require algorithm parameters such as HMCR and PAR, thus novice users can easily use the algorithm.
Harmony Search Algorithm
Harmony Search Algorithm
|Harvest Classification Algorithm||A tree model will often provide good prediction relative to other methods. It is also relatively interpretable, which is key, since it is of interest to identify diverse chemical classes amongst the active compounds, to serve as leads for drug optimization. Interpretability of a tree is often reduced, however, by the sheer size and number of variables involved. We develop a ‘tree harvesting’ algorithm to reduce the complexity of the tree.
|Hash2Vec||In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words. We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications.|
|HashNet||Learning to hash has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval, due to its computation efficiency and retrieval quality. Deep learning to hash, which improves retrieval quality by end-to-end representation learning and hash encoding, has received increasing attention recently. Subject to the vanishing gradient difficulty in the optimization with binary activations, existing deep learning to hash methods need to first learn continuous representations and then generate binary hash codes in a separated binarization step, which suffer from substantial loss of retrieval quality. This paper presents HashNet, a novel deep architecture for deep learning to hash by continuation method, which learns exactly binary hash codes from imbalanced similarity data where the number of similar pairs is much smaller than the number of dissimilar pairs. The key idea is to attack the vanishing gradient problem in optimizing deep networks with non-smooth binary activations by continuation method, in which we begin from learning an easier network with smoothed activation function and let it evolve during the training, until it eventually goes back to being the original, difficult to optimize, deep network with the sign activation function. Comprehensive empirical evidence shows that HashNet can generate exactly binary hash codes and yield state-of-the-art multimedia retrieval performance on standard benchmarks.|
|Hawkes Graph||This paper introduces the Hawkes skeleton and the Hawkes graph. These notions summarize the branching structure of a multivariate Hawkes point process in a compact and fertile way. In particular, we explain how the graph view is useful for the specification and estimation of Hawkes models from large, multitype event streams. Based on earlier work, we give a nonparametric statistical procedure to estimate the Hawkes skeleton and the Hawkes graph from data. We show how the graph estimation may then be used for choosing and fitting parametric Hawkes models. Our method avoids the a priori assumptions on the model from a straighforward MLE-approach and it is numerically more flexible than the latter. A simulation study confirms that the presented procedure works as desired. We give special attention to computational issues in the implementation. This makes our results applicable to high-dimensional event-stream data, such as dozens of event streams and thousands of events per component.|
|Hazard Function||The hazard function (also known as the failure rate, hazard rate, or force of mortality) h(x) is the ratio of the probability density function P(x) to the survival function S(x), given by h(x) = P(x)/S(x) = P(x)/(1 – D(x)), where D(x) is the distribution function.|
|Hazard Ratio||In survival analysis, the hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions described by two levels of an explanatory variable. For example, in a drug study, the treated population may die at twice the rate per unit time as the control population. The hazard ratio would be 2, indicating higher hazard of death from the treatment. Or in another study, men receiving the same treatment may suffer a certain complication ten times more frequently per unit time than women, giving a hazard ratio of 10. Hazard ratios differ from relative risks in that the latter are cumulative over an entire study, using a defined endpoint, while the former represent instantaneous risk over the study time period, or some subset thereof. Hazard ratios suffer somewhat less from selection bias with respect to the endpoints chosen and can indicate risks that happen before the endpoint.|
|Hazelcast||Hazelcast, a leading open source in-memory data grid (IMDG) with hundreds of thousands of installed clusters and over 17 million server starts per month, launched Hazelcast Jet – a distributed processing engine for big data streams. With Hazelcast’s IMDG providing storage functionality, Hazelcast Jet is a new Apache 2 licensed open source project that performs parallel execution to enable data-intensive applications to operate in near real-time. Using directed acyclic graphs (DAG) to model relationships between individual steps in the data processing pipeline, Hazelcast Jet is simple to deploy and can execute both batch and stream-based data processing applications. Hazelcast Jet is appropriate for applications that require a near real-time experience such as sensor updates in IoT architectures (house thermostats, lighting systems), in-store e-commerce systems and social media platforms.|
|HDIdx||Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale data processing and analytics, particularly for analyzing multimedia contents which are often of high dimensionality. Instead of using exact NN search, extensive research efforts have been focusing on approximate NN search algorithms. In this work, we present ‘HDIdx’, an efficient high-dimensional indexing library for fast approximate NN search, which is open-source and written in Python. It offers a family of state-of-the-art algorithms that convert input high-dimensional vectors into compact binary codes, making them very efficient and scalable for NN search with very low space complexity.|
|Heckman Correction||The Heckman correction (the two-stage method, Heckman’s lambda or the Heckit method, Heckman Model) is any of a number of related statistical methods developed by James Heckman at the University of Chicago in 1976 to 1979 which allow the researcher to correct for selection bias. Selection bias problems are endemic to applied econometric problems, which make Heckman’s original technique, and subsequent refinements by both himself and others, indispensable to applied econometricians. Heckman received the Economics Nobel Prize in 2000 for this achievement.
|Hedonic Regression||In economics, hedonic regression or hedonic demand theory is a revealed preference method of estimating demand or value. It decomposes the item being researched into its constituent characteristics, and obtains estimates of the contributory value of each characteristic. This requires that the composite good being valued can be reduced to its constituent parts and that the market values those constituent parts. Hedonic models are most commonly estimated using regression analysis, although more generalized models, such as sales adjustment grids, are special cases of hedonic models. An attribute vector, which may be a dummy or panel variable, is assigned to each characteristic or group of characteristics. Hedonic models can accommodate non-linearity, variable interaction, or other complex valuation situations. Hedonic models are commonly used in real estate appraisal, real estate economics, and Consumer Price Index (CPI) calculations. In CPI calculations hedonic regression is used to control the effect of changes in product quality. Price changes that are due to substitution effects are subject to hedonic quality adjustments.|
|Hellinger Distance||In probability and statistics, the Hellinger distance (also called Bhattacharyya distance as this was originally introduced by Anil Kumar Bhattacharya) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.|
|Hessian Approximated Multiple Subsets Iteration
|We propose HAMSI, a provably convergent incremental algorithm for solving large-scale partially separable optimization problems that frequently emerge in machine learning and inferential statistics. The algorithm is based on a local quadratic approximation and hence allows incorporating a second order curvature information to speed-up the convergence. Furthermore, HAMSI needs almost no tuning, and it is scalable as well as easily parallelizable. In large-scale simulation studies with the MovieLens datasets, we illustrate that the method is superior to a state-of-the-art distributed stochastic gradient descent method in terms of convergence behavior. This performance gain comes at the expense of using memory that scales only linearly with the total size of the optimization variables. We conclude that HAMSI may be considered as a viable alternative in many scenarios, where first order methods based on variants of stochastic gradient descent are applicable.|
|Heterogeneous Incremental Nearest Class Mean Random Forest
|In recent years, dynamically growing data and incrementally growing number of classes pose new challenges to large-scale data classification research. Most traditional methods struggle to balance the precision and computational burden when data and its number of classes increased. However, some methods are with weak precision, and the others are time-consuming. In this paper, we propose an incremental learning method, namely, heterogeneous incremental Nearest Class Mean Random Forest (hi-RF), to handle this issue. It is a heterogeneous method that either replaces trees or updates trees leaves in the random forest adaptively, to reduce the computational time in comparable performance, when data of new classes arrive. Specifically, to keep the accuracy, one proportion of trees are replaced by new NCM decision trees; to reduce the computational load, the rest trees are updated their leaves probabilities only. Most of all, out-of-bag estimation and out-of-bag boosting are proposed to balance the accuracy and the computational efficiency. Fair experiments were conducted and demonstrated its comparable precision with much less computational time.|
|Heteroscedasticity||In statistics, a collection of random variables is heteroscedastic if there are sub-populations that have different variabilities from others. Here “variability” could be quantified by the variance or any other measure of statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity.|
|Hidden Factor Graph Models
|Hidden Factor graph models generalise Hidden Markov Models to tree structured data. The distinctive feature of ‘treeHFM’ is that it learns a transition matrix for first order (sequential) and for second order (splitting) events. It can be applied to all discrete and continuous data that is structured as a binary tree. In the case of continuous observations, ‘treeHFM’ has Gaussian distributions as emissions.
|Hidden Markov Model
|Hidden Markov Models (HMMs) are powerful, flexible methods for representing and classifying data with trends over time. A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. A HMM can be considered the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E. Baum and coworkers. It is closely related to an earlier work on optimal nonlinear filtering problem (stochastic processes) by Ruslan L. Stratonovich, who was the first to describe the forward-backward procedure.
In simpler Markov models (like a Markov chain), the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. Note that the adjective ‘hidden’ refers to the state sequence through which the model passes, not to the parameters of the model; the model is still referred to as a ‘hidden’ Markov model even if these parameters are known exactly.
Hidden Markov models are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics.
|Hierarchical Clustering||In data mining, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types:
1. Agglomerative: This is a “bottom up” approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy.
2. Divisive: This is a “top down” approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy.
In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram.
|Hierarchical Clustering and Topic Modeling based on Fast Rank-2 NMF
|The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data. In this paper, we propose a fast method for hierarchical clustering and topic modeling called HierNMF2. Our method is based on fast Rank-2 nonnegative matrix factorization (NMF) that performs binary clustering and an efficient node splitting rule. Further utilizing the final leaf nodes generated in HierNMF2 and the idea of nonnegative least squares fitting, we propose a new clustering/topic modeling method called FlatNMF2 that recovers a flat clustering/topic modeling result in a very simple yet significantly more effective way than any other existing methods. We describe highly optimized open source software in C++ for both HierNMF2 and FlatNMF2 for hierarchical and partitional clustering/topic modeling of document data sets. Substantial experimental tests are presented that illustrate significant improvements both in computational time as well as quality of solutions. We compare our methods to other clustering methods including K-means, standard NMF, and CLUTO, and also topic modeling methods including latent Dirichlet allocation (LDA) and recently proposed algorithms for NMF with separability constraints. Overall, we present efficient tools for analyzing large-scale data sets, and techniques that can be generalized to many other data analytics problem domains.|
|Hierarchical Compositional Network
|We introduce the hierarchical compositional network (HCN), a directed generative model able to discover and disentangle, without supervision, the building blocks of a set of binary images. The building blocks are binary features defined hierarchically as a composition of some of the features in the layer immediately below, arranged in a particular manner. At a high level, HCN is similar to a sigmoid belief network with pooling. Inference and learning in HCN are very challenging and existing variational approximations do not work satisfactorily. A main contribution of this work is to show that both can be addressed using max-product message passing (MPMP) with a particular schedule (no EM required). Also, using MPMP as an inference engine for HCN makes new tasks simple: adding supervision information, classifying images, or performing inpainting all correspond to clamping some variables of the model to their known values and running MPMP on the rest. When used for classification, fast inference with HCN has exactly the same functional form as a convolutional neural network (CNN) with linear activations and binary weights. However, HCN’s features are qualitatively very different.|
|Hierarchical Configuration Model||We introduce a class of random graphs with a hierarchical community structure, which we call the hierarchical configuration model. On the inter-community level, the graph is a configuration model, and on the intra-community level, every vertex in the configuration model is replaced by a community: a small graph. These communities may have any shape, as long as they are connected. For these hierarchical graphs, we find the size of the largest component, the degree distribution and the clustering coefficient. Furthermore, we determine the conditions under which a giant percolation cluster exists, and find its size.|
|Hierarchical Data Format
|Hierarchical Data Format (HDF, HDF4, or HDF5) is a set of file formats and libraries designed to store and organize large amounts of numerical data. Originally developed at the National Center for Supercomputing Applications, it is supported by the non-profit HDF Group, whose mission is to ensure continued development of HDF5 technologies, and the continued accessibility of data stored in HDF.|
|Hierarchical Inference Testing
|Hierarchical Kernel Learning
|Hierarchical Latent Dirichlet Allocation
|An extension to LDA is the hierarchical LDA (hLDA), where topics are joined together in a hierarchy by using the nested Chinese restaurant process.
|Hierarchical Latent Space Network Model
|Hierarchical Latent Tree Analysis
|In the LDA approach to topic detection, a topic is determined by identifying the words that are used with high frequency when writing about the topic. However, high frequency words in one topic may be also used with high frequency in other topics. Thus they may not be the best words to characterize the topic. In this paper, we propose a new method for topic detection, where a topic is determined by identifying words that appear with high frequency in the topic and low frequency in other topics. We model patterns of word cooccurrence and co-occurrences of those patterns using a hierarchy of discrete latent variables. The states of the latent variables represent clusters of documents and they are interpreted as topics. The words that best distinguish a cluster from other clusters are selected to characterize the topic. Empirical results show that the new method yields topics with clearer thematic characterizations than the alternative approaches. In this work, we introduce semantically higher level latent variables to model co-occurrence of those patterns, resulting in hierarchical latent tree models (HLTMs). The latent variables at higher levels of the hierarchy correspond to more general topics, while the latent variables at lower levels correspond to more specific topics. The proposed method for topic detection is therefore called hierarchical latent tree analysis (HLTA).|
|Hierarchical Latent Tree Model
|Hierarchical Mode Association Clustering / Mode Association Clustering
|Mode association clustering (MAC) can be conducted either hierarchically or at one level. MAC is similar to mixture model based clustering in the sense of characterizing clusters by smooth densities. However, MAC requires no model fitting and uses a nonparametric kernel density estimation. The density of a cluster is not restricted to be parametric, for instance, Gaussian, but ensures uni-modality. The algorithm seems to combine the complementary merits of bottom-up clustering such as linkage and topdown clustering such as mixture modeling and k-means. It also tends to be robust against non-Gaussian shaped clusters.|
|Hierarchical Model||There isn’t a single authorative definition of a hierarchical model. Click for an overview.|
|Hierarchical Multinomial Marginal Models
|In the log-linear parametrization all the interactions are contrasts of logarithms of joint probabilities and this is the main reason why this parametrization is not convenient to express hypotheses on marginal distributions or to model ordered categorical data. On the contrary Hierarchical Multinomial Marginal models (HMM) (Bartolucci et al. 2007) are based on parameters, called generalized marginal interactions, which are contrasts of logarithms of sums of probabilities. HMM models allow great flexibility in choosing the marginal distributions, within which the interactions are defined, and they are a useful tool for modeling marginal distributions and for taking into proper account the presence of ordinal categorical variables.
|Hierarchical Nearest Neighbor Descent
|Hierarchical Network||A hierarchical network is the type of network topology in which a central “root” node (the top level of the hierarchy) is connected to one or more other nodes that are one level lower in the hierarchy (i.e., the second level) with a point-to-point link between each of the second level nodes and the top level central “root” node, while each of the second level nodes that are connected to the top level central “root” node will also have one or more other nodes that are one level lower in the hierarchy (i.e., the third level) connected to it, also with a point-to-point link, the top level central “root” node being the only node that has no other node above it in the hierarchy.|
|Hierarchical Network Model
|Hierarchical network models are iterative algorithms for creating networks which are able to reproduce the unique properties of the scale-free topology and the high clustering of the nodes at the same time. These characteristics are widely observed in nature, from biology to language to some social networks.|
|Hierarchical Spectral Merger
|We present a new method for time series clustering which we call the Hierarchical Spectral Merger (HSM) method. This procedure is based on the spectral theory of time series and identifies series that share similar oscillations or waveforms. The extent of similarity between a pair of time series is measured using the total variation distance between their estimated spectral densities. At each step of the algorithm, every time two clusters merge, a new spectral density is estimated using the whole information present in both clusters, which is representative of all the series in the new cluster. The method is implemented in an R package HSMClust. We present two applications of the HSM method, one to data coming from wave-height measurements in oceanography and the other to electroencefalogram (EEG) data.|
|Hierarchical Time Series / Grouped Time Series
|Time series can often be naturally disaggregated in a hierarchical structure using attributes such as geographical location, product type, etc. For example, the total number of bicycles sold by a cycling warehouse can be disaggregated into a hierarchy of bicycle types. Such a warehouse will sell road bikes, mountain bikes, children bikes or hybrids. Each of these can be disaggregated into finer categories. Children’s bikes can be divided into balance bikes for children under 4 years old, single speed bikes for children between 4 and 6 and bikes for children over the age of 6. Hybrid bikes can be divided into city, commuting, comfort, and trekking bikes; and so on. Such disaggregation imposes a hierarchical structure. We refer to these as hierarchical time series.
|Hierarchical Topic Models|
|Hierarchically Supervised Latent Dirichlet Allocation
|We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bag-of-word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not.|
|High Dimensional Data Clustering
|Clustering in high-dimensional spaces is a recurrent problem in many domains, for example in object recognition. High-dimensional data usually live in different lowdimensional subspaces hidden in the original space. HDDC is a clustering approach which estimates the specific subspace and the intrinsic dimension of each class. The approach adapts the Gaussian mixture model framework to high-dimensional data and estimates the parameters which best fit the data. This results in a robust clustering method called High- Dimensional Data Clustering (HDDC). HDDC is applied to locate objects in natural images in a probabilistic framework. Experiments on a recently proposed database demonstrate the effectiveness of our clustering method for category localization.|
|High Frequency Trading
|High-frequency trading (HFT) is a primary form of algorithmic trading in finance. Specifically, it is the use of sophisticated technological tools and computer algorithms to rapidly trade securities. HFT uses proprietary trading strategies carried out by computers to move in and out of positions in seconds or fractions of a second. It is estimated that as of 2009, HFT accounted for 60-73% of all US equity trading volume, with that number falling to approximately 50% in 2012. High-frequency traders move in and out of short-term positions at high volumes aiming to capture sometimes a fraction of a cent in profit on every trade. HFT firms do not consume significant amounts of capital, accumulate positions or hold their portfolios overnight. As a result, HFT has a potential Sharpe ratio (a measure of risk and reward) tens of times higher than traditional buy-and-hold strategies. High-frequency traders typically compete against other HFTs, rather than long-term investors. HFT firms make up the low margins with incredible high volumes of tradings, frequently numbering in the millions. It has been argued that a core incentive in much of the technological development behind high-frequency trading is essentially front running, in which the varying delays in the propagation of orders is taken advantage of by those who have earlier access to information. A substantial body of research argues that HFT and electronic trading pose new types of challenges to the financial system. Algorithmic and high-frequency traders were both found to have contributed to volatility in the May 6, 2010 Flash Crash, when high-frequency liquidity providers rapidly withdrew from the market. Several European countries have proposed curtailing or banning HFT due to concerns about volatility. Other complaints against HFT include the argument that some HFT firms scrape profits from investors when index funds rebalance their portfolios. Other financial analysts point to evidence of benefits that HFT has brought to the modern markets. Researchers have stated that HFT and automated markets improve market liquidity, reduce trading costs, and make stock prices more efficient.|
|High Performance Analytics Toolkit
|Big data analytics requires high programmer productivity and high performance simultaneously on large-scale clusters. However, current big data analytics frameworks (e.g. Apache Spark) have high runtime overheads since they are library-based. Given the characteristics of the data analytics domain, we introduce the High Performance Analytics Toolkit (HPAT), which is a big data analytics framework that performs static compilation of high-level scripting programs into high performance parallel code using novel domainspecific compilation techniques. HPAT provides scripting abstractions in the Julia language for analytics tasks, automatically parallelizes them, generates efficient MPI/C++ code, and provides resiliency. Since HPAT is compilerbased, it avoids overheads of library-based systems such as dynamic task scheduling and master-executor coordination. In addition, it provides automatic optimizations for scripting programs, such as fusion of array operations. Therefore, HPAT is 14x to 400x faster than Spark on the Cori supercomputer at LBL/NERSC. Furthermore, HPAT is much more flexible in distributed data structures, which enables the use of existing libraries such as HDF5, ScaLAPACK, and Intel R DAAL.|
|High Performance Computing
|High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business. A supercomputer is a computer with a very high-level computational capacity. As of 2015, there are supercomputers which could perform up-to quadrillions of floating point operations per second.
|Highest Density Regions
|Highest Posterior Density
|Highest Posterior Density – The x% highest posterior density interval is the shortest interval in parameter space that contains x% of the posterior probability.|
|Hill Climbing||In computer science, hill climbing is a mathematical optimization technique which belongs to the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by incrementally changing a single element of the solution. If the change produces a better solution, an incremental change is made to the new solution, repeating until no further improvements can be found. For example, hill climbing can be applied to the travelling salesman problem. It is easy to find an initial solution that visits all the cities but will be very poor compared to the optimal solution. The algorithm starts with such a solution and makes small improvements to it, such as switching the order in which two cities are visited. Eventually, a much shorter route is likely to be obtained. Hill climbing is good for finding a local optimum (a solution that cannot be improved by considering a neighbouring configuration) but it is not necessarily guaranteed to find the best possible solution (the global optimum) out of all possible solutions (the search space). In convex problems, hill-climbing is optimal. Examples of algorithms that solve convex problems by hill-climbing include the simplex algorithm for linear programming and binary search. The characteristic that only local optima are guaranteed can be cured by using restarts (repeated local search), or more complex schemes based on iterations, like iterated local search, on memory, like reactive search optimization and tabu search, or memory-less stochastic modifications, like simulated annealing. The relative simplicity of the algorithm makes it a popular first choice amongst optimizing algorithms. It is used widely in artificial intelligence, for reaching a goal state from a starting node. Choice of next node and starting node can be varied to give a list of related algorithms. Although more advanced algorithms such as simulated annealing or tabu search may give better results, in some situations hill climbing works just as well. Hill climbing can often produce a better result than other algorithms when the amount of time available to perform a search is limited, such as with real-time systems. It is an anytime algorithm: it can return a valid solution even if it’s interrupted at any time before it ends.|
|Hindcasting||In oceanography and meteorology, backtesting is also known as hindcasting: a hindcast is a way of testing a mathematical model; known or closely estimated inputs for past events are entered into the model to see how well the output matches the known results. Hindcasting usually refers to a numerical model integration of a historical period where no observations have been assimilated. This distinguishes a hindcast run from a reanalysis. Oceanographic observations of salinity and temperature as well as observations of surface wave parameters such as the significant wave height are much scarcer than meteorological observations, making hindcasting more common in oceanography than in meteorology. Also, since surface waves represent a forced system where the wind is the only generating force, wave hindcasting is often considered adequate for generating a reasonable representation of the wave climate with little need for a full reanalysis. Hindcasting is also used in hydrology for model stream flows.|
|Histogram of Oriented Gradients
|Histogram of Oriented Gradients (HOG) are feature descriptors used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy. Navneet Dalal and Bill Triggs, researchers for the French National Institute for Research in Computer Science and Control (INRIA), first described Histogram of Oriented Gradient descriptors in their June 2005 CVPR paper. In this work they focused their algorithm on the problem of pedestrian detection in static images, although since then they expanded their tests to include human detection in film and video, as well as to a variety of common animals and vehicles in static imagery.|
|Hitting Time||In the study of stochastic processes in mathematics, a hitting time (or first hit time) is the first time at which a given process “hits” a given subset of the state space. Exit times and return times are also examples of hitting times.|
|Hive Plot||The hive plot is a rational visualization method for drawing networks. Nodes are mapped to and positioned on radially distributed linear axes – this mapping is based on network structural properties. Edges are drawn as curved links. Simple and interpretable. The purpose of the hive plot is to establish a new baseline for visualization of large networks – a method that is both general and tunable and useful as a starting point in visually exploring network structure.|
|The Hodrick-Prescott filter (also known as Hodrick-Prescott decomposition) is a mathematical tool used in macroeconomics, especially in real business cycle theory, to remove the cyclical component of a time series from raw data. It is used to obtain a smoothed-curve representation of a time series, one that is more sensitive to long-term than to short-term fluctuations. The adjustment of the sensitivity of the trend to short-term fluctuations is achieved by modifying a multiplier \lambda. The filter was popularized in the field of economics in the 1990s by economists Robert J. Hodrick and Nobel Memorial Prize winner Edward C. Prescott. However, it was first proposed much earlier by E. T. Whittaker in 1923.
The H-P Filter and Unit Roots
|A Hoeffding tree (VFDT) is an incremental, anytime decision tree induction algorithm that is capable of learning from massive data streams, assuming that the distribution generating examples does not change over time. Hoeffding trees exploit the fact that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound, which quantifies the number of observations (in our case, examples) needed to estimate some statistics within a prescribed precision (in our case, the goodness of an attribute). A theoretically appealing feature of Hoeffding Trees not shared by otherincremental decision tree learners is that it has sound guarantees of performance. Using the Hoeffding bound one can show that its output is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. For more information see: Geoff Hulten, Laurie Spencer, Pedro Domingos: Mining time-changing data streams. In: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 97-106, 2001.|
|Hogwild!||Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently pro- posed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and im- plementation that SGD can be implemented without any locking. We present an update scheme called Hogwild! which allows processors access to shared memory with the possibility of over- writing each other’s work.|
|Hollow Heap||We introduce the hollow heap, a very simple data structure with the same amortized efficiency as the classical Fibonacci heap. All heap operations except delete and delete-min take $O(1)$ time, worst case as well as amortized; delete and delete-min take $O(\log n)$ amortized time on a heap of $n$ items. Hollow heaps are by far the simplest structure to achieve this. Hollow heaps combine two novel ideas: the use of lazy deletion and re-insertion to do decrease-key operations, and the use of a dag (directed acyclic graph) instead of a tree or set of trees to represent a heap. Lazy deletion produces hollow nodes (nodes without items), giving the data structure its name.|
|Holonomic Gradient Method
|The holonomic gradient method introduced by Nakayama et al. (2011) presents a new methodology for evaluating normalizing constants of probability distributions and for obtaining the maximum likelihood estimate of a statistical model. The method utilizes partial differential equations satisfied by the normalizing constant and is based on the Grobner basis theory for the ring of differential operators. In this talk we give an introduction to this new methodology. The method has already proved to be useful for problems in directional statistics and in classical multivariate distribution theory involving hypergeometric functions of matrix arguments.
|Holt-Winters double exponential smoothing||This method is used when the data shows a trend. Exponential smoothing with a trend works much like simple smoothing except that two components must be updated each period – level and trend. The level is a smoothed estimate of the value of the data at the end of each period. The trend is a smoothed estimate of average growth at the end of each period.
|Holt (1957) and Winters (1960) extended Holt’s method to capture seasonality. The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations – one for the level ℓ t , one for trend b t , and one for the seasonal component denoted by s t, with smoothing parameters α , β ∗ and γ. We use m to denote the period of the seasonality, i.e., the number of seasons in a year. For example, for quarterly data m=4 , and for monthly data m=12. There are two variations to this method that differ in the nature of the seasonal component. The additive method is preferred when the seasonal variations are roughly constant through the series, while the multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series. With the additive method, the seasonal component is expressed in absolute terms in the scale of the observed series, and in the level equation the series is seasonally adjusted by subtracting the seasonal component. Within each year the seasonal component will add up to approximately zero. With the multiplicative method, the seasonal component is expressed in relative terms (percentages) and the series is seasonally adjusted by dividing through by the seasonal component. Within each year, the seasonal component will sum up to approximately m.|
|Homoscedasticity||In statistics, a sequence or a vector of random variables is homoscedastic if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used.|
|Hopfield Network||A Hopfield network is a form of recurrent artificial neural network invented by John Hopfield in 1982. Hopfield nets serve as content-addressable memory systems with binary threshold nodes. They are guaranteed to converge to a local minimum, but convergence to a false pattern (wrong local minimum) rather than the stored pattern (expected local minimum) can occur. Hopfield networks also provide a model for understanding human memory.|
|HopsFS||Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS single node in-memory metadata service, with a distributed metadata service built on a NewSQL database. By removing the metadata bottleneck, HopsFS improves capacity and throughput compared to HDFS. HopsFS can store 24 times more metadata than HDFS. We also provide public, fully reproducible experiments based on a workload trace from Spotify that show HopsFS has 2.6 times the throughput of Apache HDFS, lower latency for greater than 400 concurrent clients, and no downtime during failover. Finally, and most significantly, HopsFS allows metadata to be exported to external systems, analyzed or searched online, and easily extended.|
|Horn||I introduce a new distributed system for effective training and regularizing of Large-Scale Neural Networks on distributed computing architectures. The experiments demonstrate the effectiveness of flexible model partitioning and parallelization strategies based on neuron-centric computation model, with an implementation of the collective and parallel dropout neural networks training. Experiments are performed on MNIST handwritten digits classification including results.|
|HorseRule||The HorseRule model is a flexible tree based Bayesian regression method for linear and nonlinear regression and classification described in Nalenz & Villani (2017) <arXiv:1702.05008>.
|Horseshoe Estimator||This paper proposes a new approach to sparse-signal detection called the horseshoe estimator. We show that the horseshoe is a close cousin of the lasso in that it arises from the same class of multivariate scale mixtures of normals, but that it is almost universally superior to the double-exponential prior at handling sparsity. A theoretical framework is proposed for understanding why the horseshoe is a better default “sparsity” estimator than those that arise from powered-exponential priors. Comprehensive numerical evidence is presented to show that the difference in performance can often be large. Most importantly, we show that the horseshoe estimator corresponds quite closely to the answers one would get if one pursued a full Bayesian model-averaging approach using a “two-groups” model: a point mass at zero for noise, and a continuous density for signals. Surprisingly, this correspondence holds both for the estimator itself and for the classification rule induced by a simple threshold applied to the estimator. We show how the resulting thresholded horseshoe can also be viewed as a novel Bayes multiple-testing procedure.
|Horseshoe Regularization||Feature subset selection arises in many high-dimensional applications in machine learning and statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex $\ell_\gamma$ penalty for $\gamma\in (0,1)$, which results in sparser models than the convex $\ell_1$ or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables an efficient expectation-maximization algorithm for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithm provides better statistical performance, and the computation requires a fraction of time of state of the art non-convex solvers.|
|Hot Deck Imputation||This method sorts respondents and non-respondents into a number of imputation subsets according to a user-specified set of covariates. An imputation subset comprises cases with the same values as those of the user-specified covariates. Missing values are then replaced with values taken from matching respondents (i.e. respondents that are similar with respect to the covariates). If there is more than one matching respondent for any particular non-respondent, the user has two choices:
1. The first respondent’s value as counted from the missing entry downwards within the imputation subset is used to impute. The reason for this is that the first respondent’s value may be closer in time to the case that has the missing value. For example, if cases are entered according to the order in which they occur, there may possibly be some type of time effect in some studies.
2. A respondent’s value is randomly selected from within the imputation subset. If a matching respondent does not exist in the initial imputation class, the subset will be collapsed by one level starting with the last variable that was selected as a sort variable, or until a match can be found. Note that if no matching respondent is found, even after all of the sort variables have been collapsed, three options are available:
1. Re-specify new sort variables: The user can specify up to five sort variables.
2. Perform random overall imputation: Where the missing value will be replaced with a value randomly selected from the observed values in that variable.
3. Do not impute the missing value: SOLAS will not impute any missing values for which no matching respondent is found.
|Hot Spot Analysis||Also known as Getis-Ord Gi* – The resultant z-scores and p-values tell you where features with either high or low values cluster spatially. This tool works by looking at each feature within the context of neighboring features. A feature with a high value is interesting by may not be a statistically significant hot spot. To be a statistically significant hotspot, a feature will have a high value and be surrounded by other features with high values as well. The local sum for a feature and its neighbors is compared proportionally to the sum of all features; when the local sum is very different from the expected local sum, and that difference is too large to be the result of random choice, a statistically significant z-score results. The Gi* statistic returned for each feature in the dataset is a z-score. For statistically significant positive z-scores, the larger the z-score is, the more intense clustering of high values (hot spot). For statistically significant negative z-scores, the smaller the z-score is, the more intense the clustering of low values (cold spot). When to use: Results aren’t reliable with less than 30 features. Applications can be found in crime analysis, epidemiology, voting pattern analysis, economic geography, retail analysis, traffic incident analysis, and demographics. Examples: Where is the disease outbreak concentrated? – Where are kitchen fires a larger than expected proportion of all residential fires? – Where should the evacuation sites be located? – Where/When do peak intensities occur?
How Hot Spot Analysis works
|Houdini||Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines. So far, most existing methods only work for classification and are not designed to alter the true performance measure of the problem at hand. We introduce a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and non-decomposable. We successfully apply Houdini to a range of applications such as speech recognition, pose estimation and semantic segmentation. In all cases, the attacks based on Houdini achieve higher success rate than those based on the traditional surrogates used to train the models while using a less perceptible adversarial perturbation.|
|Huber Loss||In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used.
|Hubs and Authorities||➘ “Hyperlink-Induced Topic Search”|
|Human Group Optimizer
|A large number of optimization algorithms have been developed by researchers to solve a variety of complex problems in operations management area. We present a novel optimization algorithm belonging to the class of swarm intelligence optimization methods. The algorithm mimics the decision making process of human groups and exploits the dynamics of this process as an optimization tool for combinatorial problems. In order to achieve this aim, a continuous-time Markov process is proposed to describe the behavior of a population of socially interacting agents, modelling how humans in a group modify their opinions driven by self-interest and consensus seeking. As in the case of a collection of spins, the dynamics of such a system is characterized by a phase transition from low to high values of the overall consenus (magnetization). We recognize this phase transition as being associated with the emergence of a collective superior intelligence of the population. While this state being active, a cooling schedule is applied to make agents closer and closer to the optimal solution, while performing their random walk on the fitness landscape. A comparison with simulated annealing as well as with a multi-agent version of the simulated annealing is presented in terms of efficacy in finding good solution on a NK – Kauffman landscape. In all cases our method outperforms the others, particularly in presence of limited knowledge of the agent.|
|Hy||Hy is a Lisp dialect that converts its structure into Python’s abstract syntax tree. It is to Python what LFE is to Erlang.This provides developers from many backgrounds with the following:
• A lisp that feels very Pythonic
• A great way to use Lisp’s crazy powers but in the wide world of Python’s libraries
• A great way to start exploring Lisp, from the comfort of python
• A pleasant language that has a lot of neat ideas 🙂
|Hybrid||We study the problem of personalized, interactive tag recommendation for Flickr: While a user enters/selects new tags for a particular picture, the system suggests related tags to her, based on the tags that she or other people have used in the past along with (some of) the tags already entered. The suggested tags are dynamically updated with every additional tag entered/selected. We describe a new algorithm, called Hybrid, which can be applied to this problem, and show that it outperforms previous algorithms. It has only a single tunable parameter, which we found to be very robust.|
|Hybrid Ant Colony Optimization Algorithm
|In this paper, we propose a Hybrid Ant Colony Optimization algorithm (HACO) for Next Release Problem (NRP). NRP, a NP-hard problem in requirement engineering, is to balance customer requests, resource constraints, and requirement dependencies by requirement selection. Inspired by the successes of Ant Colony Optimization algorithms (ACO) for solving NP-hard problems, we design our HACO to approximately solve NRP. Similar to traditional ACO algorithms, multiple artificial ants are employed to construct new solutions. During the solution construction phase, both pheromone trails and neighborhood information will be taken to determine the choices of every ant. In addition, a local search (first found hill climbing) is incorporated into HACO to improve the solution quality. Extensively wide experiments on typical NRP test instances show that HACO outperforms the existing algorithms (GRASP and simulated annealing) in terms of both solution uality and running time.|
|Hybrid Intelligent System||Hybrid intelligent system denotes a software system which employs, in parallel, a combination of methods and techniques from artificial intelligence subfields as:
• Neuro-fuzzy systems
• hybrid connectionist-symbolic models
• Fuzzy expert systems
• Connectionist expert systems
• Evolutionary neural networks
• Genetic fuzzy systems
• Rough fuzzy hybridization
• Reinforcement learning with fuzzy, neural, or evolutionary methods as well as symbolic reasoning methods.
From the cognitive science perspective, every natural intelligent system is hybrid because it performs mental operations on both the symbolic and subsymbolic levels. For the past few years there has been an increasing discussion of the importance of A.I. Systems Integration. Based on notions that there have already been created simple and specific AI systems (such as systems for computer vision, speech synthesis, etc., or software that employs some of the models mentioned above) and now is the time for integration to create broad AI systems. Proponents of this approach are researchers such as Marvin Minsky, Ron Sun, Aaron Sloman, and Michael A. Arbib. An example hybrid is a hierarchical control system in which the lowest, reactive layers are sub-symbolic. The higher layers, having relaxed time constraints, are capable of reasoning from an abstract world model and performing planning. Intelligent systems usually rely on hybrid reasoning systems, which include induction, deduction, abduction and reasoning by analogy.
|Hybrid Monte Carlo||In mathematics and physics, the hybrid Monte Carlo algorithm, also known as Hamiltonian Monte Carlo, is a Markov chain Monte Carlo method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult. This sequence can be used to approximate the distribution (i.e., to generate a histogram), or to compute an integral (such as an expected value). It differs from the Metropolis-Hastings algorithm by reducing the correlation between successive sampled states by using a Hamiltonian evolution between states and additionally by targeting states with a higher acceptance criteria than the observed probability distribution. This causes it to converge more quickly to the absolute probability distribution. It was devised by Simon Duane, A.D. Kennedy, Brian Pendleton and Duncan Roweth in 1987.
➚ “Hamiltonian Monte Carlo”
|Hybrid Transactional / Analytical Processing
|Hybrid Transactional/Analytical Processing (HTAP) is a term used to describe the capability of a single database that can perform both online transaction processing (OLTP) and online analytical processing (OLAP) for the purpose of real-time operational intelligence processing. The term was created by Gartner, Inc., a technology research firm.|
|Hyperdata||Hyperdata indicates data objects linked to other data objects in other places, as hypertext indicates text linked to other text in other places. Hyperdata enables formation of a web of data, evolving from the “data on the Web” that is not inter-related (or at least, not linked).
In the same way that hypertext usually refers to the World Wide Web but is a broader term, hyperdata usually refers to the Semantic Web, but may also be applied more broadly to other data-linking technologies such as Microformats – including XHTML Friends Network.
|Hyper-Heuristics||A hyper-heuristic is a heuristic search method that seeks to automate, often by the incorporation of machine learning techniques, the process of selecting, combining, generating or adapting several simpler heuristics (or components of such heuristics) to efficiently solve computational search problems. One of the motivations for studying hyper-heuristics is to build systems which can handle classes of problems rather than solving just one problem. There might be multiple heuristics from which one can choose for solving a problem, and each heuristic has its own strength and weakness. The idea is to automatically devise algorithms by combining the strength and compensating for the weakness of known heuristics. In a typical hyper-heuristic framework there is a high-level methodology and a set of low-level heuristics (either constructive or perturbative heuristics). Given a problem instance, the high-level method selects which low-level heuristic should be applied at any given time, depending upon the current problem state, or search stage.|
|Hyperlink-Induced Topic Search
|Hyperlink-Induced Topic Search (HITS; also known as hubs and authorities) is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg. The idea behind Hubs and Authorities stemmed from a particular insight into the creation of web pages when the Internet was originally forming; that is, certain web pages, known as hubs, served as large directories that were not actually authoritative in the information that it held, but were used as compilations of a broad catalog of information that led users directly to other authoritative pages. In other words, a good hub represented a page that pointed to many other pages, and a good authority represented a page that was linked by many different hubs. The scheme therefore assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages.
Network Analysis for Wikipedia
HITS Algorithm – Hubs and Authorities on the Internet
|HyperLogLog||HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset (the cardinality). Calculating the exact cardinality of a multiset requires an amount of memory proportional to the cardinality, which is impractical for very large data sets. Probabilistic cardinality estimators, such as the HyperLogLog algorithm, use significantly less memory than this, at the cost of obtaining only an approximation of the cardinality. The HyperLogLog algorithm is able to estimate cardinalities of with a typical accuracy of 2%, using 1.5kB of memory. HyperLogLog is an extension of the earlier LogLog algorithm.|
|Hyperparameter||In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis. For example, if one is using a beta distribution to model the distribution of the parameter p of a Bernoulli distribution, then:
• p is a parameter of the underlying system (Bernoulli distribution), and
• alpha and beta are parameters of the prior distribution (beta distribution), hence hyperparameters
One may take a single value for a given hyperparameter, or one can iterate and take a probability distribution on the hyperparameter itself, called a hyperprior.
State of Hyperparameter Selection
|Hyperparameter Optimization||In the context of machine learning, hyperparameter optimization or model selection is the problem of choosing a set of hyperparameters for a learning algorithm, usually with the goal of obtaining good generalization. Hyperparameter optimization contrasts with actual learning problems, which are also often cast as optimization problems, but optimize a loss function on the training set alone. In effect, learning algorithms learn parameters that model/reconstruct their inputs well, while hyperparameter optimization is to ensure the model does not overfit its data by tuning, e.g., regularization.|
|Hypervariate Data||Hypervariate data is Data with four or more dimensions in the dataset.|