H2O The Open Source In-Memory Prediction Engine for Big Data Science. H2O is an awesome machine learning framework. It is really great for data scientists and business analysts ‘who need scalable and fast machine learning’. H2O is completely open source and what makes it important is that works right of the box. There seems to be no easier way to start with scalable machine learning. It hast support for R, Python, Scala, Java and also has a REST API and a own WebUI. So you can use it perfectly for research but also in production environments. H2O is based on Apache Hadoop and Apache Spark which gives it enormous power with in-memory parallel processing. Predict Social Network Influence with R and H2O Ensemble Learning Half-Life of Data Radioactive substances have a half life. The half life is the amount of time it takes for the substance to lose half of its radioactivity. Half life is used more generally in physics as a way to estimate the rate of decay. We can apply exactly the same principle – the rate of decay – to business information. Like natural materials, data is subject to deterioration over time. In science, the half life of a given substance could be milliseconds. It could be many thousands of years. The half life of data has been measured, and it may be shorter than you were expecting. http://…/infographics-the-half-life-of-data Hamiltonian Flow Monte Carlo(HFMC) Hamiltonian Monte Carlo(HMC) The random-walk behavior of many Markov Chain Monte Carlo (MCMC) algorithms makes Markov chain convergence to a target stationary distribution p(x) inefficient, resulting in slow mixing. Hamiltonian/Hybrid Monte Carlo (HMC), is a MCMC method that adopts physical system dynamics rather than a probability distribution to propose future states in the Markov chain. This allows the Markov chain to explore the target distribution much more efficiently, resulting in faster convergence. Here we introduce basic analytic and numerical concepts for simulation of Hamiltonian dynamics. We then show how Hamiltonian dynamics can be used as the Markov chain proposal function for an MCMC sampling algorithm (HMC). ➘ “Hybrid Monte Carlo” MCMC using Hamiltonian Dynamics Hamiltonian Variational Auto-Encoder(HVAE) Variational Auto-Encoders (VAEs) have become very popular techniques to perform inference and learning in latent variable models as they allow us to leverage the rich representational power of neural networks to obtain flexible approximations of the posterior of latent variables as well as tight evidence lower bounds (ELBOs). Combined with stochastic variational inference, this provides a methodology scaling to large datasets. However, for this methodology to be practically efficient, it is necessary to obtain low-variance unbiased estimators of the ELBO and its gradients with respect to the parameters of interest. While the use of Markov chain Monte Carlo (MCMC) techniques such as Hamiltonian Monte Carlo (HMC) has been previously suggested to achieve this [23, 26], the proposed methods require specifying reverse kernels which have a large impact on performance. Additionally, the resulting unbiased estimator of the ELBO for most MCMC kernels is typically not amenable to the reparameterization trick. We show here how to optimally select reverse kernels in this setting and, by building upon Hamiltonian Importance Sampling (HIS) [17], we obtain a scheme that provides low-variance unbiased estimators of the ELBO and its gradients using the reparameterization trick. This allows us to develop a Hamiltonian Variational Auto-Encoder (HVAE). This method can be reinterpreted as a target-informed normalizing flow [20] which, within our context, only requires a few evaluations of the gradient of the sampled likelihood and trivial Jacobian calculations at each iteration. Hamilton-Jacobi Reachability Analysis(HJRA) Hamilton-Jacobi (HJ) reachability analysis is an important formal verification method for guaranteeing performance and safety properties of dynamical systems; it has been applied to many small-scale systems in the past decade. Its advantages include compatibility with general nonlinear system dynamics, formal treatment of bounded disturbances, and the availability of well-developed numerical tools. The main challenge is addressing its exponential computational complexity with respect to the number of state variables. In this tutorial, we present an overview of basic HJ reachability theory and provide instructions for using the most recent numerical tools, including an efficient GPU-parallelized implementation of a Level Set Toolbox for computing reachable sets. In addition, we review some of the current work in high-dimensional HJ reachability to show how the dimensionality challenge can be alleviated via various general theoretical and application-specific insights. Hamming Distance In information theory, the Hamming distance between two strings of equal length is the number of positions at which the corresponding symbols are different. In another way, it measures the minimum number of substitutions required to change one string into the other, or the minimum number of errors that could have transformed one string into the other. HANA HANA Data Scientist Tool The Application Function Modeler 2.0 (AFM 2) is a graphical editor for complex data analysis pipelines in the HANA Studio. This tool is based on the HANA Data Scientist prototype developed at the HANA Platform Innovation Center in Potsdam, Germany. It is planned to be the next generation of the existing HANA Studio Application Function Modeler which was developed at the TIP CE&SP Algorithm Labs in Shanghai, China. The AFM 2 team consists of original and new developers from both locations. HANA Graph Engine The HANA Graph Engine implements graph data processing capabilities directly inside the Column Store Engine of the SAP HANA Database. HANA Sizing Check the HANA sizing overview to find the appropriate sizing method. HANA Social Media Integration(HANA-SMI) HANA-SMI is a reusable component on HANA XS that enables XS application developers the integration of social media providers (with an initial focus on SAP Jam) into their business application. Handsontable Handsontable is a data grid component with an Excel-like appearance. Built in JavaScript, it integrates with any data source with peak efficiency. It comes with powerful features like data validation, sorting, grouping, data binding, formula support or column ordering. Built and actively supported by the Handsoncode team and the GitHub community ?, distributed free under the MIT license. rhandsontable Hard-to-Find-Data(HTFD) Well, really more of a 4-letter acronym, but a powerful advantage of DaaS is the ability to source hard-to-find data that has been aggregated from hundreds of Big Data sources. These data sets are highly targeted and go well beyond third party lists. Harmony Search Algorithm(HSA) In computer science and operations research, harmony search (HS) is a phenomenon-mimicking algorithm (also known as metaheuristic algorithm, soft computing algorithm or evolutionary algorithm) inspired by the improvisation process of musicians proposed by Zong Woo Geem in 2001. In the HS algorithm, each musician (= decision variable) plays (= generates) a note (= a value) for finding a best harmony (= global optimum) all together. Proponents claim the following merits: · HS does not require differential gradients, thus it can consider discontinuous functions as well as continuous functions. · HS can handle discrete variables as well as continuous variables. · HS does not require initial value setting for the variables. · HS is free from divergence. · HS may escape local optima. · HS may overcome the drawback of GA’s building block theory which works well only if the relationship among variables in a chromosome is carefully considered. If neighbor variables in a chromosome have weaker relationship than remote variables, building block theory may not work well because of crossover operation. However, HS explicitly considers the relationship using ensemble operation. · HS has a novel stochastic derivative applied to discrete variables, which uses musician’s experiences as a searching direction. · Certain HS variants do not require algorithm parameters such as HMCR and PAR, thus novice users can easily use the algorithm. Harmony Search Algorithm Harmony Search Algorithm HARVEST Algorithm Feature selection with high-dimensional data and a very small proportion of relevant features poses a severe challenge to standard statistical methods. We have developed a new approach (HARVEST) that is straightforward to apply, albeit somewhat computer-intensive. This algorithm can be used to pre-screen a large number of features to identify those that are potentially useful. The basic idea is to evaluate each feature in the context of many random subsets of other features. HARVEST is predicated on the assumption that an irrelevant feature can add no real predictive value, regardless of which other features are included in the subset. Motivated by this idea, we have derived a simple statistical test for feature relevance. Empirical analyses and simulations produced so far indicate that the HARVEST algorithm is highly effective in predictive analytics, both in science and business. Harvest Classification Algorithm A tree model will often provide good prediction relative to other methods. It is also relatively interpretable, which is key, since it is of interest to identify diverse chemical classes amongst the active compounds, to serve as leads for drug optimization. Interpretability of a tree is often reduced, however, by the sheer size and number of variables involved. We develop a ‘tree harvesting’ algorithm to reduce the complexity of the tree. Harvest.Tree Hash2Vec In this paper we propose the application of feature hashing to create word embeddings for natural language processing. Feature hashing has been used successfully to create document vectors in related tasks like document classification. In this work we show that feature hashing can be applied to obtain word embeddings in linear time with the size of the data. The results show that this algorithm, that does not need training, is able to capture the semantic meaning of words. We compare the results against GloVe showing that they are similar. As far as we know this is the first application of feature hashing to the word embeddings problem and the results indicate this is a scalable technique with practical results for NLP applications. HashNet Learning to hash has been widely applied to approximate nearest neighbor search for large-scale multimedia retrieval, due to its computation efficiency and retrieval quality. Deep learning to hash, which improves retrieval quality by end-to-end representation learning and hash encoding, has received increasing attention recently. Subject to the vanishing gradient difficulty in the optimization with binary activations, existing deep learning to hash methods need to first learn continuous representations and then generate binary hash codes in a separated binarization step, which suffer from substantial loss of retrieval quality. This paper presents HashNet, a novel deep architecture for deep learning to hash by continuation method, which learns exactly binary hash codes from imbalanced similarity data where the number of similar pairs is much smaller than the number of dissimilar pairs. The key idea is to attack the vanishing gradient problem in optimizing deep networks with non-smooth binary activations by continuation method, in which we begin from learning an easier network with smoothed activation function and let it evolve during the training, until it eventually goes back to being the original, difficult to optimize, deep network with the sign activation function. Comprehensive empirical evidence shows that HashNet can generate exactly binary hash codes and yield state-of-the-art multimedia retrieval performance on standard benchmarks. Haversine Distance The haversine formula determines the great-circle distance between two points on a sphere given their longitudes and latitudes. Important in navigation, it is a special case of a more general formula in spherical trigonometry, the law of haversines, that relates the sides and angles of spherical triangles. The first table of haversines in English was published by James Andrew in 1805, but Florian Cajori credits an earlier use by José de Mendoza y Ríos in 1801. The term haversine was coined in 1835 by James Inman. These names follow from the fact that they are customarily written in terms of the haversine function, given by haversin( ) = sin^2(theta/2). The formulas could equally be written in terms of any multiple of the haversine, such as the older versine function (twice the haversine). Prior to the advent of computers, the elimination of division and multiplication by factors of two proved convenient enough that tables of haversine values and logarithms were included in 19th and early 20th century navigation and trigonometric texts. These days, the haversine form is also convenient in that it has no coefficient in front of the sin^2 function. Hawkes Graph This paper introduces the Hawkes skeleton and the Hawkes graph. These notions summarize the branching structure of a multivariate Hawkes point process in a compact and fertile way. In particular, we explain how the graph view is useful for the specification and estimation of Hawkes models from large, multitype event streams. Based on earlier work, we give a nonparametric statistical procedure to estimate the Hawkes skeleton and the Hawkes graph from data. We show how the graph estimation may then be used for choosing and fitting parametric Hawkes models. Our method avoids the a priori assumptions on the model from a straighforward MLE-approach and it is numerically more flexible than the latter. A simulation study confirms that the presented procedure works as desired. We give special attention to computational issues in the implementation. This makes our results applicable to high-dimensional event-stream data, such as dozens of event streams and thousands of events per component. Hazard Function The hazard function (also known as the failure rate, hazard rate, or force of mortality) h(x) is the ratio of the probability density function P(x) to the survival function S(x), given by h(x) = P(x)/S(x) = P(x)/(1 – D(x)), where D(x) is the distribution function. Hazard Ratio In survival analysis, the hazard ratio (HR) is the ratio of the hazard rates corresponding to the conditions described by two levels of an explanatory variable. For example, in a drug study, the treated population may die at twice the rate per unit time as the control population. The hazard ratio would be 2, indicating higher hazard of death from the treatment. Or in another study, men receiving the same treatment may suffer a certain complication ten times more frequently per unit time than women, giving a hazard ratio of 10. Hazard ratios differ from relative risks in that the latter are cumulative over an entire study, using a defined endpoint, while the former represent instantaneous risk over the study time period, or some subset thereof. Hazard ratios suffer somewhat less from selection bias with respect to the endpoints chosen and can indicate risks that happen before the endpoint. Hazelcast Hazelcast, a leading open source in-memory data grid (IMDG) with hundreds of thousands of installed clusters and over 17 million server starts per month, launched Hazelcast Jet – a distributed processing engine for big data streams. With Hazelcast’s IMDG providing storage functionality, Hazelcast Jet is a new Apache 2 licensed open source project that performs parallel execution to enable data-intensive applications to operate in near real-time. Using directed acyclic graphs (DAG) to model relationships between individual steps in the data processing pipeline, Hazelcast Jet is simple to deploy and can execute both batch and stream-based data processing applications. Hazelcast Jet is appropriate for applications that require a near real-time experience such as sensor updates in IoT architectures (house thermostats, lighting systems), in-store e-commerce systems and social media platforms. HDIdx Fast Nearest Neighbor (NN) search is a fundamental challenge in large-scale data processing and analytics, particularly for analyzing multimedia contents which are often of high dimensionality. Instead of using exact NN search, extensive research efforts have been focusing on approximate NN search algorithms. In this work, we present ‘HDIdx’, an efficient high-dimensional indexing library for fast approximate NN search, which is open-source and written in Python. It offers a family of state-of-the-art algorithms that convert input high-dimensional vectors into compact binary codes, making them very efficient and scalable for NN search with very low space complexity. Heaped Data Heckman Correction The Heckman correction (the two-stage method, Heckman’s lambda or the Heckit method, Heckman Model) is any of a number of related statistical methods developed by James Heckman at the University of Chicago in 1976 to 1979 which allow the researcher to correct for selection bias. Selection bias problems are endemic to applied econometric problems, which make Heckman’s original technique, and subsequent refinements by both himself and others, indispensable to applied econometricians. Heckman received the Economics Nobel Prize in 2000 for this achievement. http://…/HeckmanSelectionModel.html Hedonic Regression In economics, hedonic regression or hedonic demand theory is a revealed preference method of estimating demand or value. It decomposes the item being researched into its constituent characteristics, and obtains estimates of the contributory value of each characteristic. This requires that the composite good being valued can be reduced to its constituent parts and that the market values those constituent parts. Hedonic models are most commonly estimated using regression analysis, although more generalized models, such as sales adjustment grids, are special cases of hedonic models. An attribute vector, which may be a dummy or panel variable, is assigned to each characteristic or group of characteristics. Hedonic models can accommodate non-linearity, variable interaction, or other complex valuation situations. Hedonic models are commonly used in real estate appraisal, real estate economics, and Consumer Price Index (CPI) calculations. In CPI calculations hedonic regression is used to control the effect of changes in product quality. Price changes that are due to substitution effects are subject to hedonic quality adjustments. Hellinger Distance In probability and statistics, the Hellinger distance (also called Bhattacharyya distance as this was originally introduced by Anil Kumar Bhattacharya) is used to quantify the similarity between two probability distributions. It is a type of f-divergence. The Hellinger distance is defined in terms of the Hellinger integral, which was introduced by Ernst Hellinger in 1909.[1][2] Henge We present Henge, a system to support intent-based multi-tenancy in modern stream processing applications. Henge supports multi-tenancy as a first-class citizen: everyone inside an organization can now submit their stream processing jobs to a single, shared, consolidated cluster. Additionally, Henge allows each tenant (job) to specify its own intents (i.e., requirements) as a Service Level Objective (SLO) that captures latency and/or throughput. In a multi-tenant cluster, the Henge scheduler adapts continually to meet jobs’ SLOs in spite of limited cluster resources, and under dynamic input workloads. SLOs are soft and are based on utility functions. Henge continually tracks SLO satisfaction, and when jobs miss their SLOs, it wisely navigates the state space to perform resource allocations in real time, maximizing total system utility achieved by all jobs in the system. Henge is integrated in Apache Storm and we present experimental results using both production topologies and real datasets. Herfindahl-Hirschman Index Based on the aggregated shares retained by individual firms or actors within a market or space, the Herfindahl-Hirschman Index (HHI) measures the level of concentration in the market or space. It is often used as a measure of competition, where 0 equals perfect competition amongst firms or actors and 10,000 equals perfect monopoly. hhi Hessian Approximated Multiple Subsets Iteration(HAMSI) We propose HAMSI, a provably convergent incremental algorithm for solving large-scale partially separable optimization problems that frequently emerge in machine learning and inferential statistics. The algorithm is based on a local quadratic approximation and hence allows incorporating a second order curvature information to speed-up the convergence. Furthermore, HAMSI needs almost no tuning, and it is scalable as well as easily parallelizable. In large-scale simulation studies with the MovieLens datasets, we illustrate that the method is superior to a state-of-the-art distributed stochastic gradient descent method in terms of convergence behavior. This performance gain comes at the expense of using memory that scales only linearly with the total size of the optimization variables. We conclude that HAMSI may be considered as a viable alternative in many scenarios, where first order methods based on variants of stochastic gradient descent are applicable. Heterogeneous Incremental Nearest Class Mean Random Forest(hi-RF) In recent years, dynamically growing data and incrementally growing number of classes pose new challenges to large-scale data classification research. Most traditional methods struggle to balance the precision and computational burden when data and its number of classes increased. However, some methods are with weak precision, and the others are time-consuming. In this paper, we propose an incremental learning method, namely, heterogeneous incremental Nearest Class Mean Random Forest (hi-RF), to handle this issue. It is a heterogeneous method that either replaces trees or updates trees leaves in the random forest adaptively, to reduce the computational time in comparable performance, when data of new classes arrive. Specifically, to keep the accuracy, one proportion of trees are replaced by new NCM decision trees; to reduce the computational load, the rest trees are updated their leaves probabilities only. Most of all, out-of-bag estimation and out-of-bag boosting are proposed to balance the accuracy and the computational efficiency. Fair experiments were conducted and demonstrated its comparable precision with much less computational time. Heterogeneous Simultaneous Multiscale Change Point Estimator(H-SMUCE) We propose, a heterogeneous simultaneous multiscale change point estimator called ‘H-SMUCE’ for the detection of multiple change points of the signal in a heterogeneous Gaussian regression model. A piecewise constant function is estimated by minimizing the number of change points over the acceptance region of a multiscale test which locally adapts to changes in the variance. The multiscale test is a combination of local likelihood ratio tests which are properly calibrated by scale-dependent critical values to keep a global nominal level a, even for finite samples. We show that H-SMUCE controls the error of overestimation and underestimation of the number of change points. For this, new deviation bounds for F-type statistics are derived. Moreover, we obtain confidence sets for the whole signal. All results are non-asymptotic and uniform over a large class of heterogeneous change point models. H-SMUCE is fast to compute, achieves the optimal detection rate and estimates the number of change points at almost optimal accuracy for vanishing signals, while still being robust. We compare H-SMUCE with several state of the art methods in simulations and analyse current recordings of a transmembrane protein in the bacterial outer membrane with pronounced heterogeneity for its states. An R-package is available on line. Heteroscedasticity In statistics, a collection of random variables is heteroscedastic if there are sub-populations that have different variabilities from others. Here “variability” could be quantified by the variance or any other measure of statistical dispersion. Thus heteroscedasticity is the absence of homoscedasticity. Hidden Factor Graph Models(HFM) Hidden Factor graph models generalise Hidden Markov Models to tree structured data. The distinctive feature of ‘treeHFM’ is that it learns a transition matrix for first order (sequential) and for second order (splitting) events. It can be applied to all discrete and continuous data that is structured as a binary tree. In the case of continuous observations, ‘treeHFM’ has Gaussian distributions as emissions. treeHFM Hidden Markov Model(HMM) Hidden Markov Models (HMMs) are powerful, flexible methods for representing and classifying data with trends over time. A hidden Markov model (HMM) is a statistical Markov model in which the system being modeled is assumed to be a Markov process with unobserved (hidden) states. A HMM can be considered the simplest dynamic Bayesian network. The mathematics behind the HMM was developed by L. E. Baum and coworkers. It is closely related to an earlier work on optimal nonlinear filtering problem (stochastic processes) by Ruslan L. Stratonovich, who was the first to describe the forward-backward procedure. In simpler Markov models (like a Markov chain), the state is directly visible to the observer, and therefore the state transition probabilities are the only parameters. In a hidden Markov model, the state is not directly visible, but output, dependent on the state, is visible. Each state has a probability distribution over the possible output tokens. Therefore the sequence of tokens generated by an HMM gives some information about the sequence of states. Note that the adjective ‘hidden’ refers to the state sequence through which the model passes, not to the parameters of the model; the model is still referred to as a ‘hidden’ Markov model even if these parameters are known exactly. Hidden Markov models are especially known for their application in temporal pattern recognition such as speech, handwriting, gesture recognition, part-of-speech tagging, musical score following, partial discharges and bioinformatics. Hidden Tree Markov Network(HTN) The paper introduces the Hidden Tree Markov Network (HTN), a neuro-probabilistic hybrid fusing the representation power of generative models for trees with the incremental and discriminative learning capabilities of neural networks. We put forward a modular architecture in which multiple generative models of limited complexity are trained to learn structural feature detectors whose outputs are then combined and integrated by neural layers at a later stage. In this respect, the model is both deep, thanks to the unfolding of the generative models on the input structures, as well as wide, given the potentially large number of generative modules that can be trained in parallel. Experimental results show that the proposed approach can outperform state-of-the-art syntactic kernels as well as generative kernels built on the same probabilistic model as the HTN. Hidden-Layer LSTM(H-LSTM) Long short-term memory (LSTM) has been widely used for sequential data modeling. Researchers have increased LSTM depth by stacking LSTM cells to improve performance. This incurs model redundancy, increases run-time delay, and makes the LSTMs more prone to overfitting. To address these problems, we propose a hidden-layer LSTM (H-LSTM) that adds hidden layers to LSTM’s original one level non-linear control gates. H-LSTM increases accuracy while employing fewer external stacked layers, thus reducing the number of parameters and run-time latency significantly. We employ grow-and-prune (GP) training to iteratively adjust the hidden layers through gradient-based growth and magnitude-based pruning of connections. This learns both the weights and the compact architecture of H-LSTM control gates. We have GP-trained H-LSTMs for image captioning and speech recognition applications. For the NeuralTalk architecture on the MSCOCO dataset, our three models reduce the number of parameters by 38.7x [floating-point operations (FLOPs) by 45.5x], run-time latency by 4.5x, and improve the CIDEr score by 2.6. For the DeepSpeech2 architecture on the AN4 dataset, our two models reduce the number of parameters by 19.4x (FLOPs by 23.5x), run-time latency by 15.7%, and the word error rate from 12.9% to 8.7%. Thus, GP-trained H-LSTMs can be seen to be compact, fast, and accurate. Hierarchical Attention-Based Recurrent Highway Network(HRHN) Time series prediction has been studied in a variety of domains. However, it is still challenging to predict future series given historical observations and past exogenous data. Existing methods either fail to consider the interactions among different components of exogenous variables which may affect the prediction accuracy, or cannot model the correlations between exogenous data and target data. Besides, the inherent temporal dynamics of exogenous data are also related to the target series prediction, and thus should be considered as well. To address these issues, we propose an end-to-end deep learning model, i.e., Hierarchical attention-based Recurrent Highway Network (HRHN), which incorporates spatio-temporal feature extraction of exogenous variables and temporal dynamics modeling of target variables into a single framework. Moreover, by introducing the hierarchical attention mechanism, HRHN can adaptively select the relevant exogenous features in different semantic levels. We carry out comprehensive empirical evaluations with various methods over several datasets, and show that HRHN outperforms the state of the arts in time series prediction, especially in capturing sudden changes and sudden oscillations of time series. Hierarchical Clustering In data mining, hierarchical clustering is a method of cluster analysis which seeks to build a hierarchy of clusters. Strategies for hierarchical clustering generally fall into two types: 1. Agglomerative: This is a “bottom up” approach: each observation starts in its own cluster, and pairs of clusters are merged as one moves up the hierarchy. 2. Divisive: This is a “top down” approach: all observations start in one cluster, and splits are performed recursively as one moves down the hierarchy. In general, the merges and splits are determined in a greedy manner. The results of hierarchical clustering are usually presented in a dendrogram. Hierarchical Clustering and Topic Modeling based on Fast Rank-2 NMF(HierNMF2) The importance of unsupervised clustering and topic modeling is well recognized with ever-increasing volumes of text data. In this paper, we propose a fast method for hierarchical clustering and topic modeling called HierNMF2. Our method is based on fast Rank-2 nonnegative matrix factorization (NMF) that performs binary clustering and an efficient node splitting rule. Further utilizing the final leaf nodes generated in HierNMF2 and the idea of nonnegative least squares fitting, we propose a new clustering/topic modeling method called FlatNMF2 that recovers a flat clustering/topic modeling result in a very simple yet significantly more effective way than any other existing methods. We describe highly optimized open source software in C++ for both HierNMF2 and FlatNMF2 for hierarchical and partitional clustering/topic modeling of document data sets. Substantial experimental tests are presented that illustrate significant improvements both in computational time as well as quality of solutions. We compare our methods to other clustering methods including K-means, standard NMF, and CLUTO, and also topic modeling methods including latent Dirichlet allocation (LDA) and recently proposed algorithms for NMF with separability constraints. Overall, we present efficient tools for analyzing large-scale data sets, and techniques that can be generalized to many other data analytics problem domains. Hierarchical Compartmental Model A variety of triangle-based stochastic reserving techniques have been proposed for estimating future general insurance claims payments, ranging from generalized linear models (England and Verrall, 2002) to nonlinear hierarchical models (Guszcza, 2008). Methods incorporating both paid and incurred information have been explored (Martínez-Miranda, Nielsen and Verrall, 2012; Quarg and Mack, 2004), which provide richer inference and improved interpretability. Furthermore, Bayesian methods (Zhang, Dukic and Guszcza, 2012; Meyers, 2007; England and Verrall, 2005; Verrall, 2004) have become increasingly ubiquitous; providing flexibility and the ability to robustly incorporate judgment into uncertainty projections. This paper explores a new triangle-based (and optionally-Bayesian) stochastic reserving framework which considers the relationship between exposure, case reserves and paid claims. By doing so, it enables practitioners to build communicable models that are consistent with their understanding of the insurance claims process. Furthermore, it supports the identification and quantification of claims process characteristics to provide tangible business insights. Hierarchical compartmental reserving models Hierarchical Compositional Network(HCN) We introduce the hierarchical compositional network (HCN), a directed generative model able to discover and disentangle, without supervision, the building blocks of a set of binary images. The building blocks are binary features defined hierarchically as a composition of some of the features in the layer immediately below, arranged in a particular manner. At a high level, HCN is similar to a sigmoid belief network with pooling. Inference and learning in HCN are very challenging and existing variational approximations do not work satisfactorily. A main contribution of this work is to show that both can be addressed using max-product message passing (MPMP) with a particular schedule (no EM required). Also, using MPMP as an inference engine for HCN makes new tasks simple: adding supervision information, classifying images, or performing inpainting all correspond to clamping some variables of the model to their known values and running MPMP on the rest. When used for classification, fast inference with HCN has exactly the same functional form as a convolutional neural network (CNN) with linear activations and binary weights. However, HCN’s features are qualitatively very different. Hierarchical Configuration Model We introduce a class of random graphs with a hierarchical community structure, which we call the hierarchical configuration model. On the inter-community level, the graph is a configuration model, and on the intra-community level, every vertex in the configuration model is replaced by a community: a small graph. These communities may have any shape, as long as they are connected. For these hierarchical graphs, we find the size of the largest component, the degree distribution and the clustering coefficient. Furthermore, we determine the conditions under which a giant percolation cluster exists, and find its size. Hierarchical Data Format(HDF) Hierarchical Data Format (HDF, HDF4, or HDF5) is a set of file formats and libraries designed to store and organize large amounts of numerical data. Originally developed at the National Center for Supercomputing Applications, it is supported by the non-profit HDF Group, whose mission is to ensure continued development of HDF5 technologies, and the continued accessibility of data stored in HDF. Hierarchical Deep Learning for Text Classification(HDLTex) The continually increasing number of documents produced each year necessitates ever improving information processing methods for searching, retrieving, and organizing text. Central to these information processing methods is document classification, which has become an important application for supervised learning. Recently the performance of these traditional classifiers has degraded as the number of documents has increased. This is because along with this growth in the number of documents has come an increase in the number of categories. This paper approaches this problem differently from current document classification methods that view the problem as multi-class classification. Instead we perform hierarchical classification using an approach we call Hierarchical Deep Learning for Text classification (HDLTex). HDLTex employs stacks of deep learning architectures to provide specialized understanding at each level of the document hierarchy. Hierarchical Incremental GRAdient Descent(HiGrad) Hierarchical Incremental GRAdient Descent (HiGrad) algorithm, a first-order algorithm for finding the minimizer of a function in online learning just like stochastic gradient descent (SGD). See Su and Zhu (2018) for details. higrad Hierarchical Inference Testing(HIT) hit Hierarchical Kernel Learning(HKL) http://…/jawanpuria15a.pdf Hierarchical Latent Dirichlet Allocation(H-LDA, HLDA) An extension to LDA is the hierarchical LDA (hLDA), where topics are joined together in a hierarchy by using the nested Chinese restaurant process. http://…/automatic-topic-modelling-with-lda Hierarchical Latent Space Network Model(HLSM) HLSM Hierarchical Latent Tree Analysis(HLTA) In the LDA approach to topic detection, a topic is determined by identifying the words that are used with high frequency when writing about the topic. However, high frequency words in one topic may be also used with high frequency in other topics. Thus they may not be the best words to characterize the topic. In this paper, we propose a new method for topic detection, where a topic is determined by identifying words that appear with high frequency in the topic and low frequency in other topics. We model patterns of word cooccurrence and co-occurrences of those patterns using a hierarchy of discrete latent variables. The states of the latent variables represent clusters of documents and they are interpreted as topics. The words that best distinguish a cluster from other clusters are selected to characterize the topic. Empirical results show that the new method yields topics with clearer thematic characterizations than the alternative approaches. In this work, we introduce semantically higher level latent variables to model co-occurrence of those patterns, resulting in hierarchical latent tree models (HLTMs). The latent variables at higher levels of the hierarchy correspond to more general topics, while the latent variables at lower levels correspond to more specific topics. The proposed method for topic detection is therefore called hierarchical latent tree analysis (HLTA). Hierarchical Latent Tree Model(HLTM) Hierarchical Mode Association Clustering / Mode Association Clustering(HMAC, MAC) Mode association clustering (MAC) can be conducted either hierarchically or at one level. MAC is similar to mixture model based clustering in the sense of characterizing clusters by smooth densities. However, MAC requires no model fitting and uses a nonparametric kernel density estimation. The density of a cluster is not restricted to be parametric, for instance, Gaussian, but ensures uni-modality. The algorithm seems to combine the complementary merits of bottom-up clustering such as linkage and topdown clustering such as mixture modeling and k-means. It also tends to be robust against non-Gaussian shaped clusters. Hierarchical Model There isn’t a single authorative definition of a hierarchical model. Click for an overview. Hierarchical Multinomial Marginal Models(HMM) In the log-linear parametrization all the interactions are contrasts of logarithms of joint probabilities and this is the main reason why this parametrization is not convenient to express hypotheses on marginal distributions or to model ordered categorical data. On the contrary Hierarchical Multinomial Marginal models (HMM) (Bartolucci et al. 2007) are based on parameters, called generalized marginal interactions, which are contrasts of logarithms of sums of probabilities. HMM models allow great flexibility in choosing the marginal distributions, within which the interactions are defined, and they are a useful tool for modeling marginal distributions and for taking into proper account the presence of ordinal categorical variables. hmmm Hierarchical Nearest Neighbor Descent(H-NND) Hierarchical Network A hierarchical network is the type of network topology in which a central “root” node (the top level of the hierarchy) is connected to one or more other nodes that are one level lower in the hierarchy (i.e., the second level) with a point-to-point link between each of the second level nodes and the top level central “root” node, while each of the second level nodes that are connected to the top level central “root” node will also have one or more other nodes that are one level lower in the hierarchy (i.e., the third level) connected to it, also with a point-to-point link, the top level central “root” node being the only node that has no other node above it in the hierarchy. Hierarchical Network Model(HNM) Hierarchical network models are iterative algorithms for creating networks which are able to reproduce the unique properties of the scale-free topology and the high clustering of the nodes at the same time. These characteristics are widely observed in nature, from biology to language to some social networks. Hierarchical Reinforcement Learning(HRL) Hierarchical Spectral Merger(HSM) We present a new method for time series clustering which we call the Hierarchical Spectral Merger (HSM) method. This procedure is based on the spectral theory of time series and identifies series that share similar oscillations or waveforms. The extent of similarity between a pair of time series is measured using the total variation distance between their estimated spectral densities. At each step of the algorithm, every time two clusters merge, a new spectral density is estimated using the whole information present in both clusters, which is representative of all the series in the new cluster. The method is implemented in an R package HSMClust. We present two applications of the HSM method, one to data coming from wave-height measurements in oceanography and the other to electroencefalogram (EEG) data. Hierarchical Stochastic Clustering(HSC) Hierarchical clustering is one of the most powerful solutions to the problem of clustering, on the grounds that it performs a multi scale organization of the data. In recent years, research on hierarchical clustering methods has attracted considerable interest due to the demanding modern application domains. We present a novel divisive hierarchical clustering framework called Hierarchical Stochastic Clustering (HSC), that acts in two stages. In the first stage, it finds a primary hierarchy of clustering partitions in a dataset. In the second stage, feeds a clustering algorithm with each one of the clusters of the very detailed partition, in order to settle the final result. The output is a hierarchy of clusters. Our method is based on the previous research of Meyer and Weissel Stochastic Data Clustering and the theory of Simon and Ando on Variable Aggregation. Our experiments show that our framework builds a meaningful hierarchy of clusters and benefits consistently the clustering algorithm that acts in the second stage, not only computationally but also in terms of cluster quality. This result suggest that HSC framework is ideal for obtaining hierarchical solutions of large volumes of data. Hierarchical Temporal Memory(HTM) Hierarchical temporal memory (HTM) is a biologically constrained theory of machine intelligence originally described in the 2004 book On Intelligence by Jeff Hawkins with Sandra Blakeslee. HTM is based on neuroscience and the physiology and interaction of pyramidal neurons in the neocortex of the human brain. The technology has been tested and implemented in software through example applications from Numenta and commercial applications from Numenta’s partners. At the core of HTM are learning algorithms that can store, learn, infer and recall high-order sequences. Unlike most other machine learning methods, HTM learns time-based patterns in unlabeled data on a continuous basis. HTM is robust to noise and high capacity, meaning that it can learn multiple patterns simultaneously. When applied to computers, HTM is well suited for prediction, anomaly detection, classification and ultimately sensorimotor applications. Hierarchical Time Series / Grouped Time Series(HTS) Time series can often be naturally disaggregated in a hierarchical structure using attributes such as geographical location, product type, etc. For example, the total number of bicycles sold by a cycling warehouse can be disaggregated into a hierarchy of bicycle types. Such a warehouse will sell road bikes, mountain bikes, children bikes or hybrids. Each of these can be disaggregated into finer categories. Children’s bikes can be divided into balance bikes for children under 4 years old, single speed bikes for children between 4 and 6 and bikes for children over the age of 6. Hybrid bikes can be divided into city, commuting, comfort, and trekking bikes; and so on. Such disaggregation imposes a hierarchical structure. We refer to these as hierarchical time series. hts,gtop Hierarchical Topic Models Hierarchically Supervised Latent Dirichlet Allocation(HSLDA) We introduce hierarchically supervised latent Dirichlet allocation (HSLDA), a model for hierarchically and multiply labeled bag-of-word data. Examples of such data include web pages and their placement in directories, product descriptions and associated categories from product hierarchies, and free-text clinical records and their assigned diagnosis codes. Out-of-sample label prediction is the primary goal of this work, but improved lower-dimensional representations of the bag-of-word data are also of interest. We demonstrate HSLDA on large-scale data from clinical document labeling and retail product categorization tasks. We show that leveraging the structure from hierarchical labels improves out-of-sample label prediction substantially when compared to models that do not. High Dimensional Data Clustering(HDDC) Clustering in high-dimensional spaces is a recurrent problem in many domains, for example in object recognition. High-dimensional data usually live in different lowdimensional subspaces hidden in the original space. HDDC is a clustering approach which estimates the specific subspace and the intrinsic dimension of each class. The approach adapts the Gaussian mixture model framework to high-dimensional data and estimates the parameters which best fit the data. This results in a robust clustering method called High- Dimensional Data Clustering (HDDC). HDDC is applied to locate objects in natural images in a probabilistic framework. Experiments on a recently proposed database demonstrate the effectiveness of our clustering method for category localization. High Frequency Trading(HFT) High-frequency trading (HFT) is a primary form of algorithmic trading in finance. Specifically, it is the use of sophisticated technological tools and computer algorithms to rapidly trade securities. HFT uses proprietary trading strategies carried out by computers to move in and out of positions in seconds or fractions of a second. It is estimated that as of 2009, HFT accounted for 60-73% of all US equity trading volume, with that number falling to approximately 50% in 2012. High-frequency traders move in and out of short-term positions at high volumes aiming to capture sometimes a fraction of a cent in profit on every trade. HFT firms do not consume significant amounts of capital, accumulate positions or hold their portfolios overnight. As a result, HFT has a potential Sharpe ratio (a measure of risk and reward) tens of times higher than traditional buy-and-hold strategies. High-frequency traders typically compete against other HFTs, rather than long-term investors. HFT firms make up the low margins with incredible high volumes of tradings, frequently numbering in the millions. It has been argued that a core incentive in much of the technological development behind high-frequency trading is essentially front running, in which the varying delays in the propagation of orders is taken advantage of by those who have earlier access to information. A substantial body of research argues that HFT and electronic trading pose new types of challenges to the financial system. Algorithmic and high-frequency traders were both found to have contributed to volatility in the May 6, 2010 Flash Crash, when high-frequency liquidity providers rapidly withdrew from the market. Several European countries have proposed curtailing or banning HFT due to concerns about volatility. Other complaints against HFT include the argument that some HFT firms scrape profits from investors when index funds rebalance their portfolios. Other financial analysts point to evidence of benefits that HFT has brought to the modern markets. Researchers have stated that HFT and automated markets improve market liquidity, reduce trading costs, and make stock prices more efficient. High Performance Analytics Toolkit(HPAT) Big data analytics requires high programmer productivity and high performance simultaneously on large-scale clusters. However, current big data analytics frameworks (e.g. Apache Spark) have high runtime overheads since they are library-based. Given the characteristics of the data analytics domain, we introduce the High Performance Analytics Toolkit (HPAT), which is a big data analytics framework that performs static compilation of high-level scripting programs into high performance parallel code using novel domainspecific compilation techniques. HPAT provides scripting abstractions in the Julia language for analytics tasks, automatically parallelizes them, generates efficient MPI/C++ code, and provides resiliency. Since HPAT is compilerbased, it avoids overheads of library-based systems such as dynamic task scheduling and master-executor coordination. In addition, it provides automatic optimizations for scripting programs, such as fusion of array operations. Therefore, HPAT is 14x to 400x faster than Spark on the Cori supercomputer at LBL/NERSC. Furthermore, HPAT is much more flexible in distributed data structures, which enables the use of existing libraries such as HDF5, ScaLAPACK, and Intel R DAAL. High Performance Computing(HPC) High Performance Computing most generally refers to the practice of aggregating computing power in a way that delivers much higher performance than one could get out of a typical desktop computer or workstation in order to solve large problems in science, engineering, or business. A supercomputer is a computer with a very high-level computational capacity. As of 2015, there are supercomputers which could perform up-to quadrillions of floating point operations per second. http://…/Supercomputer High Quality Bidirectional Generative Adversarial Network Generative adversarial networks (GANs) have achieved outstanding success in generating the high quality data. Focusing on the generation process, existing GANs investigate unidirectional mapping from the latent vector to the data. Later, various studies point out that the latent space of GANs is semantically meaningful and can be utilized in advanced data analysis and manipulation. In order to analyze the real data in the latent space of GANs, it is necessary to investigate the inverse generation mapping from the data to the latent vector. To tackle this problem, the bidirectional generative models introduce an encoder to enable the inverse path of generation process. Unfortunately, this effort leads to the degradation of generation quality because the imperfect generator rather interferes the encoder training and vice versa. In this paper, we propose a new inference model that estimates the latent vector from the feature of GAN discriminator. While existing bidirectional models learns the image to latent translation, our algorithm formulates this inference mapping by the feature to latent translation. It is important to note that training of our model is independent of the GAN training. Owing to the attractive nature of this independency, the proposed algorithm can generate the high quality samples identical to those of unidirectional GANs and also reconstruct the original data faithfully. Moreover, our algorithm can be employed to any unidirectional GAN, even the pre-traind GANs. Highcharts Highcharts is a charting library written in pure JavaScript, offering an easy way of adding interactive charts to your web site or web application. Highcharts currently supports line, spline, area, areaspline, column, bar, pie, scatter, angular gauges, arearange, areasplinerange, columnrange, bubble, box plot, error bars, funnel, waterfall and polar chart types. Higher Order Propagation Framework(HOPF) Given a graph wherein every node has certain attributes associated with it and some nodes have labels associated with them, Collective Classification (CC) is the task of assigning labels to every unlabeled node using information from the node as well as its neighbors. It is often the case that a node is not only influenced by its immediate neighbors but also by its higher order neighbors, multiple hops away. Recent state-of-the-art models for CC use differentiable variations of Weisfeiler-Lehman kernels to aggregate multi-hop neighborhood information. However, in this work, we show that these models suffer from the problem of Node Information Morphing wherein the information of the node is morphed or overwhelmed by the information of its neighbors when considering multiple hops. Further, existing models are not scalable as the memory and computation needs grow exponentially with the number of hops considered. To circumvent these problems, we propose a generic Higher Order Propagation Framework (HOPF) which includes (i) a differentiable Node Information Preserving (NIP) kernel and (ii) a scalable iterative learning and inferencing mechanism to aggregate information over larger hops. We do an extensive evaluation using 11 datasets from different domains and show that unlike existing CC models, our NIP model with iterative inference is robust across all the datasets and can handle much larger neighborhoods in a scalable manner. Higher-Order Generalized Singular Value Decomposition hogsvdR Highest Density Regions(HDR) Highest Posterior Density(HPD) Highest Posterior Density – The x% highest posterior density interval is the shortest interval in parameter space that contains x% of the posterior probability. Highly Efficient Network(HENet) In order to enhance the real-time performance of convolutional neural networks(CNNs), more and more researchers are focusing on improving the efficiency of CNN. Based on the analysis of some CNN architectures, such as ResNet, DenseNet, ShuffleNet and so on, we combined their advantages and proposed a very efficient model called Highly Efficient Networks(HENet). The new architecture uses an unusual way to combine group convolution and channel shuffle which was mentioned in ShuffleNet. Inspired by ResNet and DenseNet, we also proposed a new way to use element-wise addition and concatenation connection with each block. In order to make greater use of feature maps, pooling operations are removed from HENet. The experiments show that our model’s efficiency is more than 1 times higher than ShuffleNet on many open source datasets, such as CIFAR-10/100 and SVHN. High-Resolution Deep Convolutional Generative Adversarial Network(HR-DCGAN) Generative Adversarial Networks (GANs) convergence in a high-resolution setting with a computational constrain of GPU memory capacity (from 12GB to 24 GB) has been beset with difficulty due to the known lack of convergence rate stability. In order to boost network convergence of DCGAN (Deep Convolutional Generative Adversarial Networks) and achieve good-looking high-resolution results we propose a new layered network structure, HR-DCGAN, that incorporates current state-of-the-art techniques for this effect. Hilbert-Schmidt Independence Criterion(HSIC) ‘Dependency Bottleneck’ in Auto-encoding Architectures: an Empirical Study Hill Climbing In computer science, hill climbing is a mathematical optimization technique which belongs to the family of local search. It is an iterative algorithm that starts with an arbitrary solution to a problem, then attempts to find a better solution by incrementally changing a single element of the solution. If the change produces a better solution, an incremental change is made to the new solution, repeating until no further improvements can be found. For example, hill climbing can be applied to the travelling salesman problem. It is easy to find an initial solution that visits all the cities but will be very poor compared to the optimal solution. The algorithm starts with such a solution and makes small improvements to it, such as switching the order in which two cities are visited. Eventually, a much shorter route is likely to be obtained. Hill climbing is good for finding a local optimum (a solution that cannot be improved by considering a neighbouring configuration) but it is not necessarily guaranteed to find the best possible solution (the global optimum) out of all possible solutions (the search space). In convex problems, hill-climbing is optimal. Examples of algorithms that solve convex problems by hill-climbing include the simplex algorithm for linear programming and binary search. The characteristic that only local optima are guaranteed can be cured by using restarts (repeated local search), or more complex schemes based on iterations, like iterated local search, on memory, like reactive search optimization and tabu search, or memory-less stochastic modifications, like simulated annealing. The relative simplicity of the algorithm makes it a popular first choice amongst optimizing algorithms. It is used widely in artificial intelligence, for reaching a goal state from a starting node. Choice of next node and starting node can be varied to give a list of related algorithms. Although more advanced algorithms such as simulated annealing or tabu search may give better results, in some situations hill climbing works just as well. Hill climbing can often produce a better result than other algorithms when the amount of time available to perform a search is limited, such as with real-time systems. It is an anytime algorithm: it can return a valid solution even if it’s interrupted at any time before it ends. Hindcasting In oceanography and meteorology, backtesting is also known as hindcasting: a hindcast is a way of testing a mathematical model; known or closely estimated inputs for past events are entered into the model to see how well the output matches the known results. Hindcasting usually refers to a numerical model integration of a historical period where no observations have been assimilated. This distinguishes a hindcast run from a reanalysis. Oceanographic observations of salinity and temperature as well as observations of surface wave parameters such as the significant wave height are much scarcer than meteorological observations, making hindcasting more common in oceanography than in meteorology. Also, since surface waves represent a forced system where the wind is the only generating force, wave hindcasting is often considered adequate for generating a reasonable representation of the wave climate with little need for a full reanalysis. Hindcasting is also used in hydrology for model stream flows. HIRO Hierarchical reinforcement learning (HRL) is a promising approach to extend traditional reinforcement learning (RL) methods to solve more complex tasks. Yet, the majority of current HRL methods require careful task-specific design and on-policy training, making them difficult to apply in real-world scenarios. In this paper, we study how we can develop HRL algorithms that are general, in that they do not make onerous additional assumptions beyond standard RL algorithms, and efficient, in the sense that they can be used with modest numbers of interaction samples, making them suitable for real-world problems such as robotic control. For generality, we develop a scheme where lower-level controllers are supervised with goals that are learned and proposed automatically by the higher-level controllers. To address efficiency, we propose to use off-policy experience for both higher and lower-level training. This poses a considerable challenge, since changes to the lower-level behaviors change the action space for the higher-level policy, and we introduce an off-policy correction to remedy this challenge. This allows us to take advantage of recent advances in off-policy model-free RL to learn both higher- and lower-level policies using substantially fewer environment interactions than on-policy algorithms. We term the resulting HRL agent HIRO and find that it is generally applicable and highly sample-efficient. Our experiments show that HIRO can be used to learn highly complex behaviors for simulated robots, such as pushing objects and utilizing them to reach target locations, learning from only a few million samples, equivalent to a few days of real-time interaction. In comparisons with a number of prior HRL methods, we find that our approach substantially outperforms previous state-of-the-art techniques. Histogram of Oriented Gradients(HOG) Histogram of Oriented Gradients (HOG) are feature descriptors used in computer vision and image processing for the purpose of object detection. The technique counts occurrences of gradient orientation in localized portions of an image. This method is similar to that of edge orientation histograms, scale-invariant feature transform descriptors, and shape contexts, but differs in that it is computed on a dense grid of uniformly spaced cells and uses overlapping local contrast normalization for improved accuracy. Navneet Dalal and Bill Triggs, researchers for the French National Institute for Research in Computer Science and Control (INRIA), first described Histogram of Oriented Gradient descriptors in their June 2005 CVPR paper. In this work they focused their algorithm on the problem of pedestrian detection in static images, although since then they expanded their tests to include human detection in film and video, as well as to a variety of common animals and vehicles in static imagery. History PCA In this paper we propose a new algorithm for streaming principal component analysis. With limited memory, small devices cannot store all the samples in the high-dimensional regime. Streaming principal component analysis aims to find the $k$-dimensional subspace which can explain the most variation of the $d$-dimensional data points that come into memory sequentially. In order to deal with large $d$ and large $N$ (number of samples), most streaming PCA algorithms update the current model using only the incoming sample and then dump the information right away to save memory. However the information contained in previously streamed data could be useful. Motivated by this idea, we develop a new streaming PCA algorithm called History PCA that achieves this goal. By using $O(Bd)$ memory with $B\approx 10$ being the block size, our algorithm converges much faster than existing streaming PCA algorithms. By changing the number of inner iterations, the memory usage can be further reduced to $O(d)$ while maintaining a comparable convergence speed. We provide theoretical guarantees for the convergence of our algorithm along with the rate of convergence. We also demonstrate on synthetic and real world data sets that our algorithm compares favorably with other state-of-the-art streaming PCA methods in terms of the convergence speed and performance. HiTM-VAE This work focuses on combining nonparametric topic models with Auto-Encoding Variational Bayes (AEVB). Specifically, we first propose iTM-VAE, where the topics are treated as trainable parameters and the document-specific topic proportions are obtained by a stick-breaking construction. The inference of iTM-VAE is modeled by neural networks such that it can be computed in a simple feed-forward manner. We also describe how to introduce a hyper-prior into iTM-VAE so as to model the uncertainty of the prior parameter. Actually, the hyper-prior technique is quite general and we show that it can be applied to other AEVB based models to alleviate the {\it collapse-to-prior} problem elegantly. Moreover, we also propose HiTM-VAE, where the document-specific topic distributions are generated in a hierarchical manner. HiTM-VAE is even more flexible and can generate topic distributions with better variability. Experimental results on 20News and Reuters RCV1-V2 datasets show that the proposed models outperform the state-of-the-art baselines significantly. The advantages of the hyper-prior technique and the hierarchical model construction are also confirmed by experiments. HitNet Neural networks designed for the task of classification have become a commodity in recent years. Many works target the development of better networks, which results in a complexification of their architectures with more layers, multiple sub-networks, or even the combination of multiple classifiers. In this paper, we show how to redesign a simple network to reach excellent performances, which are better than the results reproduced with CapsNet on several datasets, by replacing a layer with a Hit-or-Miss layer. This layer contains activated vectors, called capsules, that we train to hit or miss a central capsule by tailoring a specific centripetal loss function. We also show how our network, named HitNet, is capable of synthesizing a representative sample of the images of a given class by including a reconstruction network. This possibility allows to develop a data augmentation step combining information from the data space and the feature space, resulting in a hybrid data augmentation process. In addition, we introduce the possibility for HitNet, to adopt an alternative to the true target when needed by using the new concept of ghost capsules, which is used here to detect potentially mislabeled images in the training data. Hitting Time In the study of stochastic processes in mathematics, a hitting time (or first hit time) is the first time at which a given process “hits” a given subset of the state space. Exit times and return times are also examples of hitting times. Hive Plot The hive plot is a rational visualization method for drawing networks. Nodes are mapped to and positioned on radially distributed linear axes – this mapping is based on network structural properties. Edges are drawn as curved links. Simple and interpretable. The purpose of the hive plot is to establish a new baseline for visualization of large networks – a method that is both general and tunable and useful as a starting point in visually exploring network structure. Hodrick-Prescott Filter(H-P Filter) The Hodrick-Prescott filter (also known as Hodrick-Prescott decomposition) is a mathematical tool used in macroeconomics, especially in real business cycle theory, to remove the cyclical component of a time series from raw data. It is used to obtain a smoothed-curve representation of a time series, one that is more sensitive to long-term than to short-term fluctuations. The adjustment of the sensitivity of the trend to short-term fluctuations is achieved by modifying a multiplier \lambda. The filter was popularized in the field of economics in the 1990s by economists Robert J. Hodrick and Nobel Memorial Prize winner Edward C. Prescott. However, it was first proposed much earlier by E. T. Whittaker in 1923. The H-P Filter and Unit Roots Hoeffding Anytime Tree We introduce a novel incremental decision tree learning algorithm, Hoeffding Anytime Tree, that is statistically more efficient than the current state-of-the-art, Hoeffding Tree. We demonstrate that an implementation of Hoeffding Anytime Tree—‘Extremely Fast Decision Tree’, a minor modification to the MOA implementation of Hoeffding Tree—obtains significantly superior prequential accuracy on most of the largest classification datasets from the UCI repository. Hoeffding Anytime Tree produces the asymptotic batch tree in the limit, is naturally resilient to concept drift, and can be used as a higher accuracy replacement for Hoeffding Tree in most scenarios, at a small additional computational cost. Hoeffding Tree(VFDT) A Hoeffding tree (VFDT) is an incremental, anytime decision tree induction algorithm that is capable of learning from massive data streams, assuming that the distribution generating examples does not change over time. Hoeffding trees exploit the fact that a small sample can often be enough to choose an optimal splitting attribute. This idea is supported mathematically by the Hoeffding bound, which quantifies the number of observations (in our case, examples) needed to estimate some statistics within a prescribed precision (in our case, the goodness of an attribute). A theoretically appealing feature of Hoeffding Trees not shared by otherincremental decision tree learners is that it has sound guarantees of performance. Using the Hoeffding bound one can show that its output is asymptotically nearly identical to that of a non-incremental learner using infinitely many examples. For more information see: Geoff Hulten, Laurie Spencer, Pedro Domingos: Mining time-changing data streams. In: ACM SIGKDD Intl. Conf. on Knowledge Discovery and Data Mining, 97-106, 2001. Hogwild! Stochastic Gradient Descent (SGD) is a popular algorithm that can achieve state-of-the-art performance on a variety of machine learning tasks. Several researchers have recently pro- posed schemes to parallelize SGD, but all require performance-destroying memory locking and synchronization. This work aims to show using novel theoretical analysis, algorithms, and im- plementation that SGD can be implemented without any locking. We present an update scheme called Hogwild! which allows processors access to shared memory with the possibility of over- writing each other’s work. Hollow Heap We introduce the hollow heap, a very simple data structure with the same amortized efficiency as the classical Fibonacci heap. All heap operations except delete and delete-min take $O(1)$ time, worst case as well as amortized; delete and delete-min take $O(\log n)$ amortized time on a heap of $n$ items. Hollow heaps are by far the simplest structure to achieve this. Hollow heaps combine two novel ideas: the use of lazy deletion and re-insertion to do decrease-key operations, and the use of a dag (directed acyclic graph) instead of a tree or set of trees to represent a heap. Lazy deletion produces hollow nodes (nodes without items), giving the data structure its name. Holographic Neural Architecture(HNA) Representation learning is at the heart of what makes deep learning effective. In this work, we introduce a new framework for representation learning that we call ‘Holographic Neural Architectures’ (HNAs). In the same way that an observer can experience the 3D structure of a holographed object by looking at its hologram from several angles, HNAs derive Holographic Representations from the training set. These representations can then be explored by moving along a continuous bounded single dimension. We show that HNAs can be used to make generative networks, state-of-the-art regression models and that they are inherently highly resistant to noise. Finally, we argue that because of their denoising abilities and their capacity to generalize well from very few examples, models based upon HNAs are particularly well suited for biological applications where training examples are rare or noisy. Holonomic Gradient Method(HGM) The holonomic gradient method introduced by Nakayama et al. (2011) presents a new methodology for evaluating normalizing constants of probability distributions and for obtaining the maximum likelihood estimate of a statistical model. The method utilizes partial differential equations satisfied by the normalizing constant and is based on the Grobner basis theory for the ring of differential operators. In this talk we give an introduction to this new methodology. The method has already proved to be useful for problems in directional statistics and in classical multivariate distribution theory involving hypergeometric functions of matrix arguments. hgm Holt-Winters double exponential smoothing This method is used when the data shows a trend. Exponential smoothing with a trend works much like simple smoothing except that two components must be updated each period – level and trend. The level is a smoothed estimate of the value of the data at the end of each period. The trend is a smoothed estimate of average growth at the end of each period. http://…–the-holt-winters-forecasting-method.pdf Holt-Winters Method(HW) Holt (1957) and Winters (1960) extended Holt’s method to capture seasonality. The Holt-Winters seasonal method comprises the forecast equation and three smoothing equations – one for the level ℓ t , one for trend b t , and one for the seasonal component denoted by s t, with smoothing parameters α , β ∗ and γ. We use m to denote the period of the seasonality, i.e., the number of seasons in a year. For example, for quarterly data m=4 , and for monthly data m=12. There are two variations to this method that differ in the nature of the seasonal component. The additive method is preferred when the seasonal variations are roughly constant through the series, while the multiplicative method is preferred when the seasonal variations are changing proportional to the level of the series. With the additive method, the seasonal component is expressed in absolute terms in the scale of the observed series, and in the level equation the series is seasonally adjusted by subtracting the seasonal component. Within each year the seasonal component will add up to approximately zero. With the multiplicative method, the seasonal component is expressed in relative terms (percentages) and the series is seasonally adjusted by dividing through by the seasonal component. Within each year, the seasonal component will sum up to approximately m. Homebrew Homebrew has made extensive use of GitHub to expand the support of several packages through user contributions. In 2010, Homebrew was the third-most-forked repository on GitHub. In 2012, Homebrew had the largest number of new contributors on GitHub. In 2013, Homebrew had both the largest number of contributors and issues closed of any project on GitHub. Homebrew has spawned several sub-projects such as Linuxbrew, which is a Linux port, Homebrew Cask, which builds upon Homebrew and focuses on the installation of GUI applications, and ‘taps’ dedicated to specific areas or programming languages like PHP. How to Install and Use Homebrew Homographic Adaptation This paper presents a self-supervised framework for training interest point detectors and descriptors suitable for a large number of multiple-view geometry problems in computer vision. As opposed to patch-based neural networks, our fully-convolutional model operates on full-sized images and jointly computes pixel-level interest point locations and associated descriptors in one forward pass. We introduce Homographic Adaptation, a multi-scale, multi-homography approach for boosting interest point detection accuracy and performing cross-domain adaptation (e.g., synthetic-to-real). Our model, when trained on the MS-COCO generic image dataset using Homographic Adaptation, is able to repeatedly detect a much richer set of interest points than the initial pre-adapted deep model and any other traditional corner detector. The final system gives rise to strong interest point repeatability on the HPatches dataset and outperforms traditional descriptors such as ORB and SIFT on point matching accuracy and on the task of homography estimation. Homoscedasticity In statistics, a sequence or a vector of random variables is homoscedastic if all random variables in the sequence or vector have the same finite variance. This is also known as homogeneity of variance. The complementary notion is called heteroscedasticity. The spellings homoskedasticity and heteroskedasticity are also frequently used. Hopfield Network A Hopfield network is a form of recurrent artificial neural network invented by John Hopfield in 1982. Hopfield nets serve as content-addressable memory systems with binary threshold nodes. They are guaranteed to converge to a local minimum, but convergence to a false pattern (wrong local minimum) rather than the stored pattern (expected local minimum) can occur. Hopfield networks also provide a model for understanding human memory. HopsFS Recent improvements in both the performance and scalability of shared-nothing, transactional, in-memory NewSQL databases have reopened the research question of whether distributed metadata for hierarchical file systems can be managed using commodity databases. In this paper, we introduce HopsFS, a next generation distribution of the Hadoop Distributed File System (HDFS) that replaces HDFS single node in-memory metadata service, with a distributed metadata service built on a NewSQL database. By removing the metadata bottleneck, HopsFS improves capacity and throughput compared to HDFS. HopsFS can store 24 times more metadata than HDFS. We also provide public, fully reproducible experiments based on a workload trace from Spotify that show HopsFS has 2.6 times the throughput of Apache HDFS, lower latency for greater than 400 concurrent clients, and no downtime during failover. Finally, and most significantly, HopsFS allows metadata to be exported to external systems, analyzed or searched online, and easily extended. Horn I introduce a new distributed system for effective training and regularizing of Large-Scale Neural Networks on distributed computing architectures. The experiments demonstrate the effectiveness of flexible model partitioning and parallelization strategies based on neuron-centric computation model, with an implementation of the collective and parallel dropout neural networks training. Experiments are performed on MNIST handwritten digits classification including results. Horn Implication Counterexamples(Horn-ICE) Horn-ICE Learning for Synthesizing Invariants and Contracts HornConcerto Graph representations of large knowledge bases may comprise billions of edges. Usually built upon human-generated ontologies, several knowledge bases do not feature declared ontological rules and are far from being complete. Current rule mining approaches rely on schemata or store the graph in-memory, which can be unfeasible for large graphs. In this paper, we introduce HornConcerto, an algorithm to discover Horn clauses in large graphs without the need of a schema. Using a standard fact-based confidence score, we can mine close Horn rules having an arbitrary body size. We show that our method can outperform existing approaches in terms of runtime and memory consumption and mine high-quality rules for the link prediction task, achieving state-of-the-art results on a widely-used benchmark. Moreover, we find that rules alone can perform inference significantly faster than embedding-based methods and achieve accuracies on link prediction comparable to resource-demanding approaches such as Markov Logic Networks. Horovod Training modern deep learning models requires large amounts of computation, often provided by GPUs. Scaling computation from one GPU to many can enable much faster training and research progress but entails two complications. First, the training library must support inter-GPU communication. Depending on the particular methods employed, this communication may entail anywhere from negligible to significant overhead. Second, the user must modify his or her training code to take advantage of inter-GPU communication. Depending on the training library’s API, the modification required may be either significant or minimal. Existing methods for enabling multi-GPU training under the TensorFlow library entail non-negligible communication overhead and require users to heavily modify their model-building code, leading many researchers to avoid the whole mess and stick with slower single-GPU training. In this paper we introduce Horovod, an open source library that improves on both obstructions to scaling: it employs efficient inter-GPU communication via ring reduction and requires only a few lines of modification to user code, enabling faster, easier distributed training in TensorFlow. Horovod is available under the Apache 2.0 license at https://…/horovod. HorseRule The HorseRule model is a flexible tree based Bayesian regression method for linear and nonlinear regression and classification described in Nalenz & Villani (2017) . horserule Horseshoe Estimator This paper proposes a new approach to sparse-signal detection called the horseshoe estimator. We show that the horseshoe is a close cousin of the lasso in that it arises from the same class of multivariate scale mixtures of normals, but that it is almost universally superior to the double-exponential prior at handling sparsity. A theoretical framework is proposed for understanding why the horseshoe is a better default ‘sparsity’ estimator than those that arise from powered-exponential priors. Comprehensive numerical evidence is presented to show that the difference in performance can often be large. Most importantly, we show that the horseshoe estimator corresponds quite closely to the answers one would get if one pursued a full Bayesian model-averaging approach using a ‘two-groups’ model: a point mass at zero for noise, and a continuous density for signals. Surprisingly, this correspondence holds both for the estimator itself and for the classification rule induced by a simple threshold applied to the estimator. We show how the resulting thresholded horseshoe can also be viewed as a novel Bayes multiple-testing procedure. horseshoe Horseshoe Regularization Feature subset selection arises in many high-dimensional applications in machine learning and statistics, such as compressed sensing and genomics. The $\ell_0$ penalty is ideal for this task, the caveat being it requires the NP-hard combinatorial evaluation of all models. A recent area of considerable interest is to develop efficient algorithms to fit models with a non-convex $\ell_\gamma$ penalty for $\gamma\in (0,1)$, which results in sparser models than the convex $\ell_1$ or lasso penalty, but is harder to fit. We propose an alternative, termed the horseshoe regularization penalty for feature subset selection, and demonstrate its theoretical and computational advantages. The distinguishing feature from existing non-convex optimization approaches is a full probabilistic representation of the penalty as the negative of the logarithm of a suitable prior, which in turn enables an efficient expectation-maximization algorithm for optimization and MCMC for uncertainty quantification. In synthetic and real data, the resulting algorithm provides better statistical performance, and the computation requires a fraction of time of state of the art non-convex solvers. Hospital Residents Problem ➘ “Stable Marriage Problem” Hot Deck Imputation This method sorts respondents and non-respondents into a number of imputation subsets according to a user-specified set of covariates. An imputation subset comprises cases with the same values as those of the user-specified covariates. Missing values are then replaced with values taken from matching respondents (i.e. respondents that are similar with respect to the covariates). If there is more than one matching respondent for any particular non-respondent, the user has two choices: 1. The first respondent’s value as counted from the missing entry downwards within the imputation subset is used to impute. The reason for this is that the first respondent’s value may be closer in time to the case that has the missing value. For example, if cases are entered according to the order in which they occur, there may possibly be some type of time effect in some studies. 2. A respondent’s value is randomly selected from within the imputation subset. If a matching respondent does not exist in the initial imputation class, the subset will be collapsed by one level starting with the last variable that was selected as a sort variable, or until a match can be found. Note that if no matching respondent is found, even after all of the sort variables have been collapsed, three options are available: 1. Re-specify new sort variables: The user can specify up to five sort variables. 2. Perform random overall imputation: Where the missing value will be replaced with a value randomly selected from the observed values in that variable. 3. Do not impute the missing value: SOLAS will not impute any missing values for which no matching respondent is found. HotDeckImputation,hot.deck Hot Spot Analysis Also known as Getis-Ord Gi* – The resultant z-scores and p-values tell you where features with either high or low values cluster spatially. This tool works by looking at each feature within the context of neighboring features. A feature with a high value is interesting by may not be a statistically significant hot spot. To be a statistically significant hotspot, a feature will have a high value and be surrounded by other features with high values as well. The local sum for a feature and its neighbors is compared proportionally to the sum of all features; when the local sum is very different from the expected local sum, and that difference is too large to be the result of random choice, a statistically significant z-score results. The Gi* statistic returned for each feature in the dataset is a z-score. For statistically significant positive z-scores, the larger the z-score is, the more intense clustering of high values (hot spot). For statistically significant negative z-scores, the smaller the z-score is, the more intense the clustering of low values (cold spot). When to use: Results aren’t reliable with less than 30 features. Applications can be found in crime analysis, epidemiology, voting pattern analysis, economic geography, retail analysis, traffic incident analysis, and demographics. Examples: Where is the disease outbreak concentrated? – Where are kitchen fires a larger than expected proportion of all residential fires? – Where should the evacuation sites be located? – Where/When do peak intensities occur? How Hot Spot Analysis works Houdini Generating adversarial examples is a critical step for evaluating and improving the robustness of learning machines. So far, most existing methods only work for classification and are not designed to alter the true performance measure of the problem at hand. We introduce a novel flexible approach named Houdini for generating adversarial examples specifically tailored for the final performance measure of the task considered, be it combinatorial and non-decomposable. We successfully apply Houdini to a range of applications such as speech recognition, pose estimation and semantic segmentation. In all cases, the attacks based on Houdini achieve higher success rate than those based on the traditional surrogates used to train the models while using a less perceptible adversarial perturbation. Huber Loss In statistics, the Huber loss is a loss function used in robust regression, that is less sensitive to outliers in data than the squared error loss. A variant for classification is also sometimes used. hqreg Hubs and Authorities ➘ “Hyperlink-Induced Topic Search” Hu-Fu Recently, Deep Learning (DL), especially Convolutional Neural Network (CNN), develops rapidly and is applied to many tasks, such as image classification, face recognition, image segmentation, and human detection. Due to its superior performance, DL-based models have a wide range of application in many areas, some of which are extremely safety-critical, e.g. intelligent surveillance and autonomous driving. Due to the latency and privacy problem of cloud computing, embedded accelerators are popular in these safety-critical areas. However, the robustness of the embedded DL system might be harmed by inserting hardware/software Trojans into the accelerator and the neural network model, since the accelerator and deploy tool (or neural network model) are usually provided by third-party companies. Fortunately, inserting hardware Trojans can only achieve inflexible attack, which means that hardware Trojans can easily break down the whole system or exchange two outputs, but can’t make CNN recognize unknown pictures as targets. Though inserting software Trojans has more freedom of attack, it often requires tampering input images, which is not easy for attackers. So, in this paper, we propose a hardware-software collaborative attack framework to inject hidden neural network Trojans, which works as a back-door without requiring manipulating input images and is flexible for different scenarios. We test our attack framework for image classification and face recognition tasks, and get attack success rate of 92.6% and 100% on CIFAR10 and YouTube Faces, respectively, while keeping almost the same accuracy as the unattacked model in the normal mode. In addition, we show a specific attack scenario in which a face recognition system is attacked and gives a specific wrong answer. Human And Machine co-LEarning Technique(HAMLET) Efficient label acquisition processes are key to obtaining robust classifiers. However, data labeling is often challenging and subject to high levels of label noise. This can arise even when classification targets are well defined, if instances to be labeled are more difficult than the prototypes used to define the class, leading to disagreements among the expert community. Here, we enable efficient training of deep neural networks. From low-confidence labels, we iteratively improve their quality by simultaneous learning of machines and experts. We call it Human And Machine co-LEarning Technique (HAMLET). Throughout the process, experts become more consistent, while the algorithm provides them with explainable feedback for confirmation. HAMLET uses a neural embedding function and a memory module filled with diverse reference embeddings from different classes. Its output includes classification labels and highly relevant reference embeddings as explanation. We took the study of brain monitoring at intensive care unit (ICU) as an application of HAMLET on continuous electroencephalography (cEEG) data. Although cEEG monitoring yields large volumes of data, labeling costs and difficulty make it hard to build a classifier. Additionally, while experts agree on the labels of clear-cut examples of cEEG patterns, labeling many real-world cEEG data can be extremely challenging. Thus, a large minority of sequences might be mislabeled. HAMLET has shown significant performance gain against deep learning and other baselines, increasing accuracy from 7.03% to 68.75% on challenging inputs. Besides improved performance, clinical experts confirmed the interpretability of those reference embeddings in helping explaining the classification results by HAMLET. Human Group Optimizer(HGO) A large number of optimization algorithms have been developed by researchers to solve a variety of complex problems in operations management area. We present a novel optimization algorithm belonging to the class of swarm intelligence optimization methods. The algorithm mimics the decision making process of human groups and exploits the dynamics of this process as an optimization tool for combinatorial problems. In order to achieve this aim, a continuous-time Markov process is proposed to describe the behavior of a population of socially interacting agents, modelling how humans in a group modify their opinions driven by self-interest and consensus seeking. As in the case of a collection of spins, the dynamics of such a system is characterized by a phase transition from low to high values of the overall consenus (magnetization). We recognize this phase transition as being associated with the emergence of a collective superior intelligence of the population. While this state being active, a cooling schedule is applied to make agents closer and closer to the optimal solution, while performing their random walk on the fitness landscape. A comparison with simulated annealing as well as with a multi-agent version of the simulated annealing is presented in terms of efficacy in finding good solution on a NK – Kauffman landscape. In all cases our method outperforms the others, particularly in presence of limited knowledge of the agent. Human-in-the-loop Artificial Intelligence(HIT-AI) Little by little, newspapers are revealing the bright future that Artificial Intelligence (AI) is building. Intelligent machines will help everywhere. However, this bright future has a dark side: a dramatic job market contraction before its unpredictable transformation. Hence, in a near future, large numbers of job seekers will need financial support while catching up with these novel unpredictable jobs. This possible job market crisis has an antidote inside. In fact, the rise of AI is sustained by the biggest knowledge theft of the recent years. Learning AI machines are extracting knowledge from unaware skilled or unskilled workers by analyzing their interactions. By passionately doing their jobs, these workers are digging their own graves. In this paper, we propose Human-in-the-loop Artificial Intelligence (HIT-AI) as a fairer paradigm for Artificial Intelligence systems. HIT-AI will reward aware and unaware knowledge producers with a different scheme: decisions of AI systems generating revenues will repay the legitimate owners of the knowledge used for taking those decisions. As modern Robin Hoods, HIT-AI researchers should fight for a fairer Artificial Intelligence that gives back what it steals. Human-Machine Inference Network(HuMaIN) The emerging paradigm of Human-Machine Inference Networks (HuMaINs) combines complementary cognitive strengths of humans and machines in an intelligent manner to tackle various inference tasks and achieves higher performance than either humans or machines by themselves. While inference performance optimization techniques for human-only or sensor-only networks are quite mature, HuMaINs require novel signal processing and machine learning solutions. In this paper, we present an overview of the HuMaINs architecture with a focus on three main issues that include architecture design, inference algorithms including security/privacy challenges, and application areas/use cases. Hurst Coefficient ➘ “Hurst Exponent” Hurst Exponent The Hurst exponent is used as a measure of long-term memory of time series. It relates to the autocorrelations of the time series, and the rate at which these decrease as the lag between pairs of values increases. Studies involving the Hurst exponent were originally developed in hydrology for the practical matter of determining optimum dam sizing for the Nile river’s volatile rain and drought conditions that had been observed over a long period of time. The name ‘Hurst exponent’, or ‘Hurst coefficient’, derives from Harold Edwin Hurst (1880-1978), who was the lead researcher in these studies; the use of the standard notation H for the coefficient relates to his name also. In fractal geometry, the generalized Hurst exponent has been denoted by H or Hq in honor of both Harold Edwin Hurst and Ludwig Otto Hölder (1859-1937) by Benoît Mandelbrot (1924-2010). H is directly related to fractal dimension, D, and is a measure of a data series’ ‘mild’ or ‘wild’ randomness. The Hurst exponent is referred to as the ‘index of dependence’ or ‘index of long-range dependence’. It quantifies the relative tendency of a time series either to regress strongly to the mean or to cluster in a direction. A value H in the range 0.5-1 indicates a time series with long-term positive autocorrelation, meaning both that a high value in the series will probably be followed by another high value and that the values a long time into the future will also tend to be high. A value in the range 0 – 0.5 indicates a time series with long-term switching between high and low values in adjacent pairs, meaning that a single high value will probably be followed by a low value and that the value after that will tend to be high, with this tendency to switch between high and low values lasting a long time into the future. A value of H=0.5 can indicate a completely uncorrelated series, but in fact it is the value applicable to series for which the autocorrelations at small time lags can be positive or negative but where the absolute values of the autocorrelations decay exponentially quickly to zero. This in contrast to the typically power law decay for the 0.5 < H < 1 and 0 < H < 0.5 cases. HVARX The Vector AutoRegressive (VAR) model is fundamental to the study of multivariate time series. Although VAR models are intensively investigated by many researchers, practitioners often show more interest in analyzing VARX models that incorporate the impact of unmodeled exogenous variables (X) into the VAR. However, since the parameter space grows quadratically with the number of time series, estimation quickly becomes challenging. While several proposals have been made to sparsely estimate large VAR models, the estimation of large VARX models is under-explored. Moreover, typically these sparse proposals involve a lasso-type penalty and do not incorporate lag selection into the estimation procedure. As a consequence, the resulting models may be difficult to interpret. In this paper, we propose a lag-based hierarchically sparse estimator, called ‘HVARX’, for large VARX models. We illustrate the usefulness of HVARX on a cross-category management marketing application. Our results show how it provides a highly interpretable model, and improves out-of-sample forecast accuracy compared to a lasso-type approach. Hy Hy is a Lisp dialect that converts its structure into Python’s abstract syntax tree. It is to Python what LFE is to Erlang.This provides developers from many backgrounds with the following: · A lisp that feels very Pythonic · A great way to use Lisp’s crazy powers but in the wide world of Python’s libraries · A great way to start exploring Lisp, from the comfort of python · A pleasant language that has a lot of neat ideas 🙂 Hybrid We study the problem of personalized, interactive tag recommendation for Flickr: While a user enters/selects new tags for a particular picture, the system suggests related tags to her, based on the tags that she or other people have used in the past along with (some of) the tags already entered. The suggested tags are dynamically updated with every additional tag entered/selected. We describe a new algorithm, called Hybrid, which can be applied to this problem, and show that it outperforms previous algorithms. It has only a single tunable parameter, which we found to be very robust. Hybrid Ant Colony Optimization Algorithm(HACO) In this paper, we propose a Hybrid Ant Colony Optimization algorithm (HACO) for Next Release Problem (NRP). NRP, a NP-hard problem in requirement engineering, is to balance customer requests, resource constraints, and requirement dependencies by requirement selection. Inspired by the successes of Ant Colony Optimization algorithms (ACO) for solving NP-hard problems, we design our HACO to approximately solve NRP. Similar to traditional ACO algorithms, multiple artificial ants are employed to construct new solutions. During the solution construction phase, both pheromone trails and neighborhood information will be taken to determine the choices of every ant. In addition, a local search (first found hill climbing) is incorporated into HACO to improve the solution quality. Extensively wide experiments on typical NRP test instances show that HACO outperforms the existing algorithms (GRASP and simulated annealing) in terms of both solution uality and running time. Hybrid Artificial Intelligence ➘ “Hybrid Intelligent System” Hybrid Consensus Alternating Direction Method of Multipliers(H-CADMM) The present work introduces the hybrid consensus alternating direction method of multipliers (H-CADMM), a novel framework for optimization over networks which unifies existing distributed optimization approaches, including the centralized and the decentralized consensus ADMM. H-CADMM provides a flexible tool that leverages the underlying graph topology in order to achieve a desirable sweet-spot between node-to-node communication overhead and rate of convergence — thereby alleviating known limitations of both C-CADMM and D-CADMM. A rigorous analysis of the novel method establishes linear convergence rate, and also guides the choice of parameters to optimize this rate. The novel hybrid update rules of H-CADMM lend themselves to ‘in-network acceleration’ that is shown to effect considerable — and essentially ‘free-of-charge’ — performance boost over the fully decentralized ADMM. Comprehensive numerical tests validate the analysis and showcase the potential of the method in tackling efficiently, widely useful learning tasks. Hybrid Contextualized Sentiment Classifier(HCSC) The use of user/product information in sentiment analysis is important, especially for cold-start users/products, whose number of reviews are very limited. However, current models do not deal with the cold-start problem which is typical in review websites. In this paper, we present Hybrid Contextualized Sentiment Classifier (HCSC), which contains two modules: (1) a fast word encoder that returns word vectors embedded with short and long range dependency features; and (2) Cold-Start Aware Attention (CSAA), an attention mechanism that considers the existence of cold-start problem when attentively pooling the encoded word vectors. HCSC introduces shared vectors that are constructed from similar users/products, and are used when the original distinct vectors do not have sufficient information (i.e. cold-start). This is decided by a frequency-guided selective gate vector. Our experiments show that in terms of RMSE, HCSC performs significantly better when compared with on famous datasets, despite having less complexity, and thus can be trained much faster. More importantly, our model performs significantly better than previous models when the training data is sparse and has cold-start problems. Hybrid Filter-Wrapper Feature Selection Method HybridFS Hybrid Intelligent System Hybrid intelligent system denotes a software system which employs, in parallel, a combination of methods and techniques from artificial intelligence subfields as: · Neuro-fuzzy systems · hybrid connectionist-symbolic models · Fuzzy expert systems · Connectionist expert systems · Evolutionary neural networks · Genetic fuzzy systems · Rough fuzzy hybridization · Reinforcement learning with fuzzy, neural, or evolutionary methods as well as symbolic reasoning methods. From the cognitive science perspective, every natural intelligent system is hybrid because it performs mental operations on both the symbolic and subsymbolic levels. For the past few years there has been an increasing discussion of the importance of A.I. Systems Integration. Based on notions that there have already been created simple and specific AI systems (such as systems for computer vision, speech synthesis, etc., or software that employs some of the models mentioned above) and now is the time for integration to create broad AI systems. Proponents of this approach are researchers such as Marvin Minsky, Ron Sun, Aaron Sloman, and Michael A. Arbib. An example hybrid is a hierarchical control system in which the lowest, reactive layers are sub-symbolic. The higher layers, having relaxed time constraints, are capable of reasoning from an abstract world model and performing planning. Intelligent systems usually rely on hybrid reasoning systems, which include induction, deduction, abduction and reasoning by analogy. Hybrid Monte Carlo In mathematics and physics, the hybrid Monte Carlo algorithm, also known as Hamiltonian Monte Carlo, is a Markov chain Monte Carlo method for obtaining a sequence of random samples from a probability distribution for which direct sampling is difficult. This sequence can be used to approximate the distribution (i.e., to generate a histogram), or to compute an integral (such as an expected value). It differs from the Metropolis-Hastings algorithm by reducing the correlation between successive sampled states by using a Hamiltonian evolution between states and additionally by targeting states with a higher acceptance criteria than the observed probability distribution. This causes it to converge more quickly to the absolute probability distribution. It was devised by Simon Duane, A.D. Kennedy, Brian Pendleton and Duncan Roweth in 1987. ➚ “Hamiltonian Monte Carlo” Hybrid Transactional / Analytical Processing(HTAP) Hybrid Transactional/Analytical Processing (HTAP) is a term used to describe the capability of a single database that can perform both online transaction processing (OLTP) and online analytical processing (OLAP) for the purpose of real-time operational intelligence processing. The term was created by Gartner, Inc., a technology research firm. HybridNet The robotic systems continuously interact with complex dynamical systems in the physical world. Reliable predictions of spatiotemporal evolution of these dynamical systems, with limited knowledge of system dynamics, are crucial for autonomous operation. In this paper, we present HybridNet, a framework that integrates data-driven deep learning and model-driven computation to reliably predict spatiotemporal evolution of a dynamical systems even with in-exact knowledge of their parameters. A data-driven deep neural network (DNN) with Convolutional LSTM (ConvLSTM) as the backbone is employed to predict the time-varying evolution of the external forces/perturbations. On the other hand, the model-driven computation is performed using Cellular Neural Network (CeNN), a neuro-inspired algorithm to model dynamical systems defined by coupled partial differential equations (PDEs). CeNN converts the intricate numerical computation into a series of convolution operations, enabling a trainable PDE solver. With a feedback control loop, HybridNet can learn the physical parameters governing the system’s dynamics in real-time, and accordingly adapt the computation models to enhance prediction accuracy for time-evolving dynamical systems. The experimental results on two dynamical systems, namely, heat convection-diffusion system, and fluid dynamical system, demonstrate that the HybridNet produces higher accuracy than the state-of-the-art deep learning based approach. HybridSVD We propose a hybrid algorithm for top-$n$ recommendation task that allows to incorporate both user and item side information within the standard collaborative filtering approach. The algorithm extends PureSVD — one of the state-of-the-art latent factor models — by exploiting a generalized formulation of the singular value decomposition. This allows to inherit key advantages of the classical algorithm such as highly efficient Lanczos-based optimization procedure, minimal parameter tuning during a model selection phase and a quick folding-in computation to generate recommendations instantly even in a highly dynamic online environment. Within the generalized formulation itself we provide an efficient scheme for side information fusion which avoids undesirable computational overhead and addresses the scalability question. Evaluation of the model is performed in both standard and cold-start scenarios using the datasets with different sparsity levels. We demonstrate in which cases our approach outperforms conventional methods and also provide some intuition on when it may give no significant improvement. Hyperbolic Attention Network We introduce hyperbolic attention networks to endow neural networks with enough capacity to match the complexity of data with hierarchical and power-law structure. A few recent approaches have successfully demonstrated the benefits of imposing hyperbolic geometry on the parameters of shallow networks. We extend this line of work by imposing hyperbolic geometry on the activations of neural networks. This allows us to exploit hyperbolic geometry to reason about embeddings produced by deep networks. We achieve this by re-expressing the ubiquitous mechanism of soft attention in terms of operations defined for hyperboloid and Klein models. Our method shows improvements in terms of generalization on neural machine translation, learning on graphs and visual question answering tasks while keeping the neural representations compact. Hyperbolic Neural Network Hyperbolic spaces have recently gained momentum in the context of machine learning due to their high capacity and tree-likeliness properties. However, the representational power of hyperbolic geometry is not yet on par with Euclidean geometry, mostly because of the absence of corresponding hyperbolic neural network layers. This makes it hard to use hyperbolic embeddings in downstream tasks. Here, we bridge this gap in a principled manner by combining the formalism of M\’obius gyrovector spaces with the Riemannian geometry of the Poincar\’e model of hyperbolic spaces. As a result, we derive hyperbolic versions of important deep learning tools: multinomial logistic regression, feed-forward and recurrent neural networks such as gated recurrent units. This allows to embed sequential data and perform classification in the hyperbolic space. Empirically, we show that, even if hyperbolic optimization tools are limited, hyperbolic sentence embeddings either outperform or are on par with their Euclidean variants on textual entailment and noisy-prefix recognition tasks. Hyperdata Hyperdata indicates data objects linked to other data objects in other places, as hypertext indicates text linked to other text in other places. Hyperdata enables formation of a web of data, evolving from the “data on the Web” that is not inter-related (or at least, not linked). In the same way that hypertext usually refers to the World Wide Web but is a broader term, hyperdata usually refers to the Semantic Web, but may also be applied more broadly to other data-linking technologies such as Microformats – including XHTML Friends Network. HyperDenseNet Recently, dense connections have attracted substantial attention in computer vision because they facilitate gradient flow and implicit deep supervision during training. Particularly, DenseNet, which connects each layer to every other layer in a feed-forward fashion, has shown impressive performances in natural image classification tasks. We propose HyperDenseNet, a 3D fully convolutional neural network that extends the definition of dense connectivity to multi-modal segmentation problems. Each imaging modality has a path, and dense connections occur not only between the pairs of layers within the same path, but also between those across different paths. This contrasts with the existing multi-modal CNN approaches, in which modeling several modalities relies entirely on a single joint layer (or level of abstraction) for fusion, typically either at the input or at the output of the network. Therefore, the proposed network has total freedom to learn more complex combinations between the modalities, within and in-between all the levels of abstraction, which increases significantly the learning representation. We report extensive evaluations over two different and highly competitive multi-modal brain tissue segmentation challenges, iSEG 2017 and MRBrainS 2013, with the former focusing on 6-month infant data and the latter on adult images. HyperDenseNet yielded significant improvements over many state-of-the-art segmentation networks, ranking at the top on both benchmarks. We further provide a comprehensive experimental analysis of features re-use, which confirms the importance of hyper-dense connections in multi-modal representation learning. Our code is publicly available at https://…/HyperDenseNet. HyperFusion-Net Salient object detection (SOD), which aims to find the most important region of interest and segment the relevant object/item in that area, is an important yet challenging vision task. This problem is inspired by the fact that human seems to perceive main scene elements with high priorities. Thus, accurate detection of salient objects in complex scenes is critical for human-computer interaction. In this paper, we present a novel feature learning framework for SOD, in which we cast the SOD as a pixel-wise classification problem. The proposed framework utilizes a densely hierarchical feature fusion network, named HyperFusion-Net, automatically predicts the most important area and segments the associated objects in an end-to-end manner. Specifically, inspired by the human perception system and image reflection separation, we first decompose input images into reflective image pairs by content-preserving transforms. Then, the complementary information of reflective image pairs is jointly extracted by an interweaved convolutional neural network (ICNN) and hierarchically combined with a hyper-dense fusion mechanism. Based on the fused multi-scale features, our method finally achieves a promising way of predicting SOD. As shown in our extensive experiments, the proposed method consistently outperforms other state-of-the-art methods on seven public datasets with a large margin. Hyper-Heuristics A hyper-heuristic is a heuristic search method that seeks to automate, often by the incorporation of machine learning techniques, the process of selecting, combining, generating or adapting several simpler heuristics (or components of such heuristics) to efficiently solve computational search problems. One of the motivations for studying hyper-heuristics is to build systems which can handle classes of problems rather than solving just one problem. There might be multiple heuristics from which one can choose for solving a problem, and each heuristic has its own strength and weakness. The idea is to automatically devise algorithms by combining the strength and compensating for the weakness of known heuristics. In a typical hyper-heuristic framework there is a high-level methodology and a set of low-level heuristics (either constructive or perturbative heuristics). Given a problem instance, the high-level method selects which low-level heuristic should be applied at any given time, depending upon the current problem state, or search stage. Hyperlink-Induced Topic Search(HITS) Hyperlink-Induced Topic Search (HITS; also known as hubs and authorities) is a link analysis algorithm that rates Web pages, developed by Jon Kleinberg. The idea behind Hubs and Authorities stemmed from a particular insight into the creation of web pages when the Internet was originally forming; that is, certain web pages, known as hubs, served as large directories that were not actually authoritative in the information that it held, but were used as compilations of a broad catalog of information that led users directly to other authoritative pages. In other words, a good hub represented a page that pointed to many other pages, and a good authority represented a page that was linked by many different hubs. The scheme therefore assigns two scores for each page: its authority, which estimates the value of the content of the page, and its hub value, which estimates the value of its links to other pages. Network Analysis for Wikipedia HITS Algorithm – Hubs and Authorities on the Internet HyperLogLog HyperLogLog is an algorithm for the count-distinct problem, approximating the number of distinct elements in a multiset (the cardinality). Calculating the exact cardinality of a multiset requires an amount of memory proportional to the cardinality, which is impractical for very large data sets. Probabilistic cardinality estimators, such as the HyperLogLog algorithm, use significantly less memory than this, at the cost of obtaining only an approximation of the cardinality. The HyperLogLog algorithm is able to estimate cardinalities of with a typical accuracy of 2%, using 1.5kB of memory. HyperLogLog is an extension of the earlier LogLog algorithm. Hyperparameter In Bayesian statistics, a hyperparameter is a parameter of a prior distribution; the term is used to distinguish them from parameters of the model for the underlying system under analysis. For example, if one is using a beta distribution to model the distribution of the parameter p of a Bernoulli distribution, then: · p is a parameter of the underlying system (Bernoulli distribution), and · alpha and beta are parameters of the prior distribution (beta distribution), hence hyperparameters One may take a single value for a given hyperparameter, or one can iterate and take a probability distribution on the hyperparameter itself, called a hyperprior. State of Hyperparameter Selection Hyperparameter Optimization In the context of machine learning, hyperparameter optimization or model selection is the problem of choosing a set of hyperparameters for a learning algorithm, usually with the goal of obtaining good generalization. Hyperparameter optimization contrasts with actual learning problems, which are also often cast as optimization problems, but optimize a loss function on the training set alone. In effect, learning algorithms learn parameters that model/reconstruct their inputs well, while hyperparameter optimization is to ensure the model does not overfit its data by tuning, e.g., regularization. Hyperspherical Convolution(SphereConv) Convolution as inner product has been the founding basis of convolutional neural networks (CNNs) and the key to end-to-end visual representation learning. Benefiting from deeper architectures, recent CNNs have demonstrated increasingly strong representation abilities. Despite such improvement, the increased depth and larger parameter space have also led to challenges in properly training a network. In light of such challenges, we propose hyperspherical convolution (SphereConv), a novel learning framework that gives angular representations on hyperspheres. We introduce SphereNet, deep hyperspherical convolution networks that are distinct from conventional inner product based convolutional networks. In particular, SphereNet adopts SphereConv as its basic convolution operator and is supervised by generalized angular softmax loss – a natural loss formulation under SphereConv. We show that SphereNet can effectively encode discriminative representation and alleviate training difficulty, leading to easier optimization, faster convergence and comparable (even better) classification accuracy over convolutional counterparts. We also provide some theoretical insights for the advantages of learning on hyperspheres. In addition, we introduce the learnable SphereConv, i.e., a natural improvement over prefixed SphereConv, and SphereNorm, i.e., hyperspherical learning as a normalization method. Experiments have verified our conclusions. Hyperspherical Variational Auto-Encoder The Variational Auto-Encoder (VAE) is one of the most used unsupervised machine learning models. But although the default choice of a Gaussian distribution for both the prior and posterior represents a mathematically convenient distribution often leading to competitive results, we show that this parameterization fails to model data with a latent hyperspherical structure. To address this issue we propose using a von Mises-Fisher (vMF) distribution instead, leading to a hyperspherical latent space. Through a series of experiments we show how such a hyperspherical VAE, or $\mathcal{S}$-VAE, is more suitable for capturing data with a hyperspherical latent structure, while outperforming a normal, $\mathcal{N}$-VAE, in low dimensions on other data types. HyperTools A python toolbox for gaining geometric insights into high-dimensional data. Hypervariate Data Hypervariate data is Data with four or more dimensions in the dataset. Dartmouth College researchers have published a free Python software package called HyperTools that allows users to turn complex data into 3D shapes or animations. The tool allows users to visualize patterns in their data and compare the characteristics of different datasets, which in turn could inform researchers on how to train their machine learning algorithms by illuminating differences between groups of data. Additionally, the Dartmouth researchers have published tutorials for HyperTools and a gallery of examples, such as how to plot the text of State of the Union addresses, to help users create visualizations. Hypervolume Under Manifold(HUM) Paper: Jialiang Li (2008) . Jialiang Li (2014) . mcca Hypothesis-testing-based Adaptive Spline Filtering(HASF) Trend Analysis of Fragmented Time Series for mHealth Apps: Hypothesis Testing Based Adaptive Spline Filtering Method with Importance Weighting