Gabor Convolutional Networks (GCNs,Gabor CNN) 
Steerable properties dominate the design of traditional filters, e.g., Gabor filters, and endow features the capability of dealing with spatial transformations. However, such excellent properties have not been well explored in the popular deep convolutional neural networks (DCNNs). In this paper, we propose a new deep model, termed Gabor Convolutional Networks (GCNs or Gabor CNNs), which incorporates Gabor filters into DCNNs to enhance the resistance of deep learned features to the orientation and scale changes. By only manipulating the basic element of DCNNs based on Gabor filters, i.e., the convolution operator, GCNs can be easily implemented and are compatible with any popular deep learning architecture. Experimental results demonstrate the super capability of our algorithm in recognizing objects, where the scale and rotation changes occur frequently. The proposed GCNs have much fewer learnable network parameters, and thus is easier to train with an endtoend pipeline. To encourage further developments, the source code is released at Github. 
GaleShapley Algorithm  GaleShapley Algorithm is a solution for the Stable Marriage Problem. In 1962, David Gale and Lloyd Shapley proved that, for any equal number of men and women, it is always possible to solve the SMP and make all marriages stable. They presented an algorithm to do so. The GaleShapley algorithm involves a number of ’rounds’ (or ‘iterations’). In the first round, first a) each unengaged man proposes to the woman he prefers most, and then b) each woman replies ‘maybe’ to her suitor she most prefers and ‘no’ to all other suitors. She is then provisionally ‘engaged’ to the suitor she most prefers so far, and that suitor is likewise provisionally engaged to her. In each subsequent round, first a) each unengaged man proposes to the mostpreferred woman to whom he has not yet proposed (regardless of whether the woman is already engaged), and then b) each woman replies ‘maybe’ to her suitor she most prefers (whether her existing provisional partner or someone else) and rejects the rest (again, perhaps including her current provisional partner). The provisional nature of engagements preserves the right of an alreadyengaged woman to ‘trade up’ (and, in the process, to ‘jilt’ her untilthen partner). The runtime complexity of this algorithm is O(n^2) where n is number of men or women. This algorithm guarantees that: • Everyone gets married: At the end, there cannot be a man and a woman both unengaged, as he must have proposed to her at some point (since a man will eventually propose to everyone, if necessary) and, being proposed to, she would necessarily be engaged (to someone) thereafter. • The marriages are stable: Let Alice be a woman and Bob be a man who are both engaged, but not to each other. Upon completion of the algorithm, it is not possible for both Alice and Bob to prefer each other over their current partners. If Bob prefers Alice to his current partner, he must have proposed to Alice before he proposed to his current partner. If Alice accepted his proposal, yet is not married to him at the end, she must have dumped him for someone she likes more, and therefore doesn’t like Bob more than her current partner. If Alice rejected his proposal, she was already with someone she liked more than Bob. 
Game Theory  Game theory is the study of strategic decision making. Specifically, it is ‘the study of mathematical models of conflict and cooperation between intelligent rational decisionmakers.’ An alternative term suggested ‘as a more descriptive name for the discipline’ is interactive decision theory. Game theory is mainly used in economics, political science, and psychology, as well as logic, computer science, and biology. The subject first addressed zerosum games, such that one person’s gains exactly equal net losses of the other participant or participants. Today, however, game theory applies to a wide range of behavioral relations, and has developed into an umbrella term for the logical side of decision science, including both humans and nonhumans (e.g. computers, animals). Modern game theory began with the idea regarding the existence of mixedstrategy equilibria in twoperson zerosum games and its proof by John von Neumann. Von Neumann’s original proof used Brouwer fixedpoint theorem on continuous mappings into compact convex sets, which became a standard method in game theory and mathematical economics. His paper was followed by the 1944 book Theory of Games and Economic Behavior, cowritten with Oskar Morgenstern, which considered cooperative games of several players. The second edition of this book provided an axiomatic theory of expected utility, which allowed mathematical statisticians and economists to treat decisionmaking under uncertainty. This theory was developed extensively in the 1950s by many scholars. Game theory was later explicitly applied to biology in the 1970s, although similar developments go back at least as far as the 1930s. Game theory has been widely recognized as an important tool in many fields. With the Nobel Memorial Prize in Economic Sciences going to game theorist Jean Tirole in 2014, eleven gametheorists have now won the economics Nobel Prize. John Maynard Smith was awarded the Crafoord Prize for his application of game theory to biology. 
Gamification  Gamification is the use of game thinking and game mechanics in nongame contexts to engage users in solving problems. Gamification has been studied and applied in several domains, such as to improve user engagement, physical exercise, return on investment, data quality, timeliness, and learning. A review of research on gamification shows that most studies on gamification find positive effects from gamification 
Gamma Divergence  The gammadivergence is a generalization of the KullbackLeibler divergence with the power index gamma. It employs the power transformation of density functions, instead of the logarithmic transformation employed by the KullbackLeibler divergence. rsggm 
GammaPoisson Shrinker (GPS) 
➘ “MultiItem Gamma Poisson Shrinker” openEBGM 
GAN Qlearning  Distributional reinforcement learning (distributional RL) has seen empirical success in complex Markov Decision Processes (MDPs) in the setting of nonlinear function approximation. However, there are many different ways in which one can leverage the distributional approach to reinforcement learning. In this paper, we propose GAN Qlearning, a novel distributional RL method based on generative adversarial networks (GANs) and analyze its performance in simple tabular environments, as well as OpenAI Gym. We empirically show that our algorithm leverages the flexibility and blackbox approach of deep learning models while providing a viable alternative to other stateoftheart methods. 
Gang of GANs  Traditional generative adversarial networks (GAN) and many of its variants are trained by minimizing the KL or JSdivergence loss that measures how close the generated data distribution is from the true data distribution. A recent advance called the WGAN based on Wasserstein distance can improve on the KL and JSdivergence based GANs, and alleviate the gradient vanishing, instability, and mode collapse issues that are common in the GAN training. In this work, we aim at improving on the WGAN by first generalizing its discriminator loss to a marginbased one, which leads to a better discriminator, and in turn a better generator, and then carrying out a progressive training paradigm involving multiple GANs to contribute to the maximum margin ranking loss so that the GAN at later stages will improve upon early stages. We call this method Gang of GANs (GoGAN). We have shown theoretically that the proposed GoGAN can reduce the gap between the true data distribution and the generated data distribution by at least half in an optimally trained WGAN. We have also proposed a new way of measuring GAN quality which is based on image completion tasks. We have evaluated our method on four visual datasets: CelebA, LSUN Bedroom, CIFAR10, and 50KSSFF, and have seen both visual and quantitative improvement over baseline WGAN. 
GappedKmer Support Vector Machine  Oligomers of length k, or kmers, are convenient and widely used features for modeling the properties and functions of DNA and protein sequences. However, kmers suffer from the inherent limitation that if the parameter k is increased to resolve longer features, the probability of observing any specific kmer becomes very small, and kmer counts approach a binary variable, with most kmers absent and a few present once. Thus, any statistical learning approach using kmers as features becomes susceptible to noisy training set kmer frequencies once k becomes large. To address this problem, we introduce alternative feature sets using gapped kmers, a new classifier, gkmSVM, and a general method for robust estimation of kmer frequencies. To make the method applicable to largescale genome wide applications, we develop an efficient tree data structure for computing the kernel matrix. We show that compared to our original kmerSVM and alternative approaches, our gkmSVM predicts functional genomic regulatory elements and tissue specific enhancers with significantly improved accuracy, increasing the precision by up to a factor of two. We then show that gkmSVM consistently outperforms kmerSVM on human ENCODE ChIPseq datasets, and further demonstrate the general utility of our method using a NaiveBayes classifier. Although developed for regulatory sequence analysis, these methods can be applied to any sequence classification problem. gkmSVM 
Gas Station Problem  In the gas station problem we want to find the cheapest path between two vertices of an $n$vertex graph. Our car has a specific fuel capacity and at each vertex we can fill our car with gas, with the fuel cost depending on the vertex. 
Gated Attention Network (GaAN) 
We propose a new network architecture, Gated Attention Networks (GaAN), for learning on graphs. Unlike the traditional multihead attention mechanism, which equally consumes all attention heads, GaAN uses a convolutional subnetwork to control each attention head’s importance. We demonstrate the effectiveness of GaAN on the inductive node classification problem. Moreover, with GaAN as a building block, we construct the Graph Gated Recurrent Unit (GGRU) to address the traffic speed forecasting problem. Extensive experiments on three realworld datasets show that our GaAN framework achieves stateoftheart results on both tasks. 
Gated Linear Network  This paper describes a family of probabilistic architectures designed for online learning under the logarithmic loss. Rather than relying on nonlinear transfer functions, our method gains representational power by the use of data conditioning. We state under general conditions a learnable capacity theorem that shows this approach can in principle learn any bounded Borelmeasurable function on a compact subset of euclidean space; the result is stronger than many universality results for connectionist architectures because we provide both the model and the learning procedure for which convergence is guaranteed. 
Gated Recurrent Neural Tensor Network  Recurrent Neural Networks (RNNs), which are a powerful scheme for modeling temporal and sequential data need to capture longterm dependencies on datasets and represent them in hidden layers with a powerful model to capture more information from inputs. For modeling longterm dependencies in a dataset, the gating mechanism concept can help RNNs remember and forget previous information. Representing the hidden layers of an RNN with more expressive operations (i.e., tensor products) helps it learn a more complex relationship between the current input and the previous hidden layer information. These ideas can generally improve RNN performances. In this paper, we proposed a novel RNN architecture that combine the concepts of gating mechanism and the tensor product into a single model. By combining these two concepts into a single RNN, our proposed models learn longterm dependencies by modeling with gating units and obtain more expressive and direct interaction between input and hidden layers using a tensor product on 3dimensional array (tensor) weight parameters. We use Long Short Term Memory (LSTM) RNN and Gated Recurrent Unit (GRU) RNN and combine them with a tensor product inside their formulations. Our proposed RNNs, which are called a LongShort Term Memory Recurrent Neural Tensor Network (LSTMRNTN) and Gated Recurrent Unit Recurrent Neural Tensor Network (GRURNTN), are made by combining the LSTM and GRU RNN models with the tensor product. We conducted experiments with our proposed models on wordlevel and characterlevel language modeling tasks and revealed that our proposed models significantly improved their performance compared to our baseline models. 
Gaussian Graphical Model (GGM) 
A Gaussian graphical model is a graph in which all random variables are continuous and jointly Gaussian. This model corresponds to the multivariate normal distribution for N variables. Conditional independence in a Gaussian graphical model is simply reflected in the zero entries of the precision matrix. MGL 
Gaussian image entropy and piecewise stationary time series analysis (SPEV) 
Visionbased methods for visibility estimation can play a critical role in reducing traffic accidents caused by fog and haze. To overcome the disadvantages of current visibility estimation methods, we present a novel datadriven approach based on Gaussian image entropy and piecewise stationary time series analysis (SPEV). This is the first time that Gaussian image entropy is used for estimating atmospheric visibility. To lessen the impact of landscape and sunshine illuminance on visibility estimation, we used region of interest (ROI) analysis and took into account relative ratios of image entropy, to improve estimation accuracy. We assume fog and haze cause blurred images and that fog and haze can be considered as a piecewise stationary signal. We used piecewise stationary time series analysis to construct the piecewise causal relationship between image entropy and visibility. To obtain a realworld visibility measure during fog and haze, a subjective assessment was established through a study with 36 subjects who performed visibility observations. Finally, a total of two million videos were used for training the SPEV model and validate its effectiveness. The videos were collected from the constantly foggy and hazy Tongqi expressway in Jiangsu, China. The contrast model of visibility estimation was used for algorithm performance comparison, and the validation results of the SPEV model were encouraging as 99.14% of the relative errors were less than 10%. 
Gaussian Markov Random Field (GMRF) 
http://…/GMRFbook 
Gaussian Means (GMeans) 
The Gmeans algorithm starts with a small number of kmeans centers, and grows the number of centers. Each iteration of the algorithm splits into two those centers whose data appear not to come from a Gaussian distribution. Between each round of splitting, we run kmeans on the entire dataset and all the centers to refine the current solution. We can initialize with just k = 1, or we can choose some larger value of k if we have some prior knowledge about the range of k. 
Gaussian Mixture Model (GMM) 
A Gaussian Mixture Model (GMM) is a parametric probability density function represented as a weighted sum of Gaussian component densities. GMMs are commonly used as a parametric model of the probability distribution of continuous measurements or features in a biometric system, such as vocaltract related spectral features in a speaker recognition system. GMM parameters are estimated from training data using the iterative ExpectationMaximization (EM) algorithm or Maximum A Posteriori (MAP) estimation from a welltrained prior model. AdaptGauss 
Gaussian Multivariance  ➘ “Total Distance Multivariance” 
Gaussian Naive Bayes  When dealing with continuous data, a typical assumption is that the continuous values associated with each class are distributed according to a Gaussian distribution. For example, suppose the training data contain a continuous attribute, x. We first segment the data by the class, and then compute the mean and variance of x in each class. 
Gaussian Process (GP) 
In probability theory and statistics, a Gaussian process is a stochastic process whose realizations consist of random values associated with every point in a range of times (or of space) such that each such random variable has a normal distribution. Moreover, every finite collection of those random variables has a multivariate normal distribution. The concept of Gaussian processes is named after Carl Friedrich Gauss because it is based on the notion of the normal distribution which is often called the Gaussian distribution. In fact, one way of thinking of a Gaussian process is as an infinitedimensional generalization of the multivariate normal distribution. 
Gaussian Process Autoregressive Regression Model (GPAR) 
Multioutput regression models must exploit dependencies between outputs to maximise predictive performance. The application of Gaussian processes (GPs) to this setting typically yields models that are computationally demanding and have limited representational power. We present the Gaussian Process Autoregressive Regression (GPAR) model, a scalable multioutput GP model that is able to capture nonlinear, possibly inputvarying, dependencies between outputs in a simple and tractable way: the product rule is used to decompose the joint distribution over the outputs into a set of conditionals, each of which is modelled by a standard GP. GPAR’s efficacy is demonstrated on a variety of synthetic and realworld problems, outperforming existing GP models and achieving stateoftheart performance on the tasks with existing benchmarks. 
Gaussian Process Latent Variable Alignment Learning  We present a model that can automatically learn alignments between highdimensional data in an unsupervised manner. Learning alignments is an illconstrained problem as there are many different ways of defining a good alignment. Our proposed method casts alignment learning in a framework where both alignment and data are modelled simultaneously. We derive a probabilistic model built on nonparametric priors that allows for flexible warps while at the same time providing means to specify interpretable constraints. We show results on several datasets, including different motion capture sequences and show that the suggested model outperform the classical algorithmic approaches to the alignment task. 
Gaussian Process Regression (GPR) 
Gaussian process regression (GPR) is an even finer approach than this. Rather than claiming f(x) relates to some specific models (e.g. f(x)=mx+c), a Gaussian process can represent f(x) obliquely, but rigorously, by letting the data ‘speak’ more clearly for themselves. GPR is still a form of supervised learning, but the training data are harnessed in a subtler way. As such, GPR is a less ‘parametric’ tool. However, it’s not completely freeform, and if we’re unwilling to make even basic assumptions about f(x), then more general techniques should be considered, including those underpinned by the principle of maximum entropy; Chapter 6 of Sivia and Skilling (2006) offers an introduction. nsgp 
GaussMarkov Theorem  In statistics, the GaussMarkov theorem, named after Carl Friedrich Gauss and Andrey Markov, states that in a linear regression model in which the errors have expectation zero and are uncorrelated and have equal variances, the best linear unbiased estimator (BLUE) of the coefficients is given by the ordinary least squares (OLS) estimator. Here ‘best’ means giving the lowest variance of the estimate, as compared to other unbiased, linear estimators. The errors don’t need to be normal, nor do they need to be independent and identically distributed (only uncorrelated and homoscedastic). The hypothesis that the estimator be unbiased cannot be dropped, since otherwise estimators better than OLS exist. See for examples the JamesStein estimator (which also drops linearity) or ridge regression. 
Gauss–Newton Algorithm (GNA) 
The GaussNewton algorithm is a method used to solve nonlinear least squares problems. It is a modification of Newton’s method for finding a minimum of a function. Unlike Newton’s method, the GaussNewton algorithm can only be used to minimize a sum of squared function values, but it has the advantage that second derivatives, which can be challenging to compute, are not required. Nonlinear least squares problems arise for instance in nonlinear regression, where parameters in a model are sought such that the model is in good agreement with available observations. 
gcForest  In this paper, we propose gcForest, a decision tree ensemble approach with performance highly competitive to deep neural networks. In contrast to deep neural networks which require great effort in hyperparameter tuning, gcForest is much easier to train. Actually, even when gcForest is applied to different data from different domains, excellent performance can be achieved by almost same settings of hyperparameters. The training process of gcForest is efficient and scalable. In our experiments its training time running on a PC is comparable to that of deep neural networks running with GPU facilities, and the efficiency advantage may be more apparent because gcForest is naturally apt to parallel implementation. Furthermore, in contrast to deep neural networks which require largescale training data, gcForest can work well even when there are only smallscale training data. Moreover, as a treebased approach, gcForest should be easier for theoretical analysis than deep neural networks. 
Gelly  Gelly is a Java Graph API for Flink. It contains a set of methods and utilities which aim to simplify the development of graph analysis applications in Flink. In Gelly, graphs can be transformed and modified using highlevel functions similar to the ones provided by the batch processing API. Gelly provides methods to create, transform and modify graphs, as well as a library of graph algorithms. ➚ “Apache Flink” Research and Development Roadmap for Flink Gelly 
GenAttack  Deep neural networks (DNNs) are vulnerable to adversarial examples, even in the blackbox case, where the attacker is limited to solely query access. Existing blackbox approaches to generating adversarial examples typically require a significant amount of queries, either for training a substitute network or estimating gradients from the output scores. We introduce GenAttack, a gradientfree optimization technique which uses genetic algorithms for synthesizing adversarial examples in the blackbox setting. Our experiments on the MNIST, CIFAR10, and ImageNet datasets show that GenAttack can successfully generate visually imperceptible adversarial examples against stateoftheart image recognition models with orders of magnitude fewer queries than existing approaches. For example, in our CIFAR10 experiments, GenAttack required roughly 2,568 times less queries than the current stateoftheart blackbox attack. Furthermore, we show that GenAttack can successfully attack both the stateoftheart ImageNet defense, ensemble adversarial training, and nondifferentiable, randomized input transformation defenses. GenAttack’s success against ensemble adversarial training demonstrates that its query efficiency enables it to exploit the defense’s weakness to direct blackbox attacks. GenAttack’s success against nondifferentiable input transformations indicates that its gradientfree nature enables it to be applicable against defenses which perform gradient masking/obfuscation to confuse the attacker. Our results suggest that populationbased optimization opens up a promising area of research into effective gradientfree blackbox attacks. 
General Algorithmic Search (GAS) 
In this paper we present a metaheuristic for global optimization called General Algorithmic Search (GAS). Specifically, GAS is a stochastic, singleobjective method that evolves a swarm of agents in search of a global extremum. Numerical simulations with a sample of 31 test functions show that GAS outperforms Basin Hopping, Cuckoo Search, and Differential Evolution, especially in concurrent optimization, i.e., when several runs with different initial settings are executed and the first best wins. Python codes of all algorithms and complementary information are available online. 
General Architecture for Text Engineering (GATE) 
General Architecture for Text Engineering or GATE is a Java suite of tools originally developed at the University of Sheffield beginning in 1995 and now used worldwide by a wide community of scientists, companies, teachers and students for all sorts of natural language processing tasks, including information extraction in many languages. GATE has been compared to NLTK, R and RapidMiner. As well as being widely used in its own right, it forms the basis of the KIM semantic platform. GATE community and research has been involved in several European research projects including TAO, SEKT, NeOn, MediaCampaign, Musing, ServiceFinder, LIRICS and KnowledgeWeb, as well as many other projects. As of May 28, 2011, 881 people are on the gateusers mailing list at SourceForge.net, and 111,932 downloads from SourceForge are recorded since the project moved to SourceForge in 2005. The paper “GATE: A Framework and Graphical Development Environment for Robust NLP Tools and Applications” has received over 800 citations in the seven years since publication (according to Google Scholar). Books covering the use of GATE, in addition to the GATE User Guide, include “Building Search Applications: Lucene, LingPipe, and Gate”, by Manu Konchady, and “Introduction to Linguistic Annotation and Text Analytics”, by Graham Wilcock. 
General Graph Representation Learning Framework (DeepGL) 
This paper presents a general graph representation learning framework called DeepGL for learning deep node and edge representations from large (attributed) graphs. In particular, DeepGL begins by deriving a set of base features (e.g., graphlet features) and automatically learns a multilayered hierarchical graph representation where each successive layer leverages the output from the previous layer to learn features of a higherorder. Contrary to previous work, DeepGL learns relational functions (each representing a feature) that generalize acrossnetworks and therefore useful for graphbased transfer learning tasks. Moreover, DeepGL naturally supports attributed graphs, learns interpretable features, and is spaceefficient (by learning sparse feature vectors). In addition, DeepGL is expressive, flexible with many interchangeable components, efficient with a time complexity of $\mathcal{O}(E)$, and scalable for large networks via an efficient parallel implementation. Compared with the stateoftheart method, DeepGL is (1) effective for acrossnetwork transfer learning tasks and attributed graph representation learning, (2) spaceefficient requiring up to 6x less memory, (3) fast with up to 182x speedup in runtime performance, and (4) accurate with an average improvement of 20% or more on many learning tasks. 
General Language Understanding Evaluation Benchmark (GLUE) 
For natural language understanding (NLU) technology to be maximally useful, both practically and as a scientific object of study, it must be general: it must be able to process language in a way that is not exclusively tailored to any one specific task or dataset. In pursuit of this objective, we introduce the General Language Understanding Evaluation benchmark (GLUE), a tool for evaluating and analyzing the performance of models across a diverse range of existing NLU tasks. GLUE is modelagnostic, but it incentivizes sharing knowledge across tasks because certain tasks have very limited training data. We further provide a handcrafted diagnostic test suite that enables detailed linguistic analysis of NLU models. We evaluate baselines based on current methods for multitask and transfer learning and find that they do not immediately give substantial improvements over the aggregate performance of training a separate model per task, indicating room for improvement in developing general and robust NLU systems. 
General Likelihood Uncertainty Estimation (GLUE) 
The GLUE methodology (Beven and Binley 1992) rejects the idea of one single optimal solution and adopts the concept of equifinality of models, parameters and variables (Beven and Binley 1992; Beven 1993). Equifinality originates from the imperfect knowledge of the system under consideration, and many sets of models, parameters and variables may therefore be considered equal or almost equal simulators of the system. Using the GLUE analysis, the prior set of models, parameters and variables is divided into a set of nonacceptable solutions and a set of acceptable solutions. The GLUE methodology deals with the variable degree of membership of the sets. The degree of membership is determined by assessing the extent to which solutions fit the model, which in turn is determined by subjective likelihood functions. RGLUEANN 
Generalised Method of Codifferential Descent (GMCD) 
This paper is devoted to a detailed convergence analysis of the method of codifferential descent (MCD) developed by professor V.F. Demyanov for solving a large class of nonsmooth nonconvex optimization problems. We propose a generalization of the MCD that is more suitable for applications than the original method, and that utilizes only a part of a codifferential on every iteration, which allows one to reduce the overall complexity of the method. With the use of some general results on uniformly codifferentiable functions obtained in this paper, we prove the global convergence of the generalized MCD in the infinite dimensional case. Also, we propose and analyse a quadratic regularization of the MCD, which is the first general method for minimizing a codifferentiable function over a convex set. Apart from convergence analysis, we also discuss the robustness of the MCD with respect to computational errors, possible step size rules, and a choice of parameters of the algorithm. In the end of the paper we estimate a rate of convergence of the MCD for a class of nonsmooth nonconvex functions that arises, in particular, in cluster analysis. We prove that under some general assumptions the method converges with linear rate, and it convergence quadratically, provided a certain first order sufficient optimality condition holds true. 
Generalization Error  The generalization error of a machine learning model is a function that measures how well a learning machine generalizes to unseen data. It is measured as the distance between the error on the training set and the test set and is averaged over the entire set of possible training data that can be generated after each iteration of the learning process. It has this name because this function indicates the capacity of a machine that learns with the specified algorithm to infer a rule (or generalize) that is used by the teacher machine to generate data based only on a few examples. 
Generalization Error Analysis  Domain generalization is the problem of assigning class labels to an unlabeled test data set, given several labeled training data sets drawn from similar distributions. This problem arises in several applications where data distributions fluctuate because of biological, technical, or other sources of variation. We develop a distributionfree, kernelbased approach that predicts a classifier from the marginal distribution of features, by leveraging the trends present in related classification tasks. This approach involves identifying an appropriate reproducing kernel Hilbert space and optimizing a regularized empirical risk over the space. We present generalization error analysis, describe universal kernels, and establish universal consistency of the proposed methodology. Experimental results on synthetic data and three real data applications demonstrate the superiority of the method with respect to a pooling strategy. 
Generalization Tower Network (GTN) 
Deep learning (DL) advances stateoftheart reinforcement learning (RL), by incorporating deep neural networks in learning representations from the input to RL. However, the conventional deep neural network architecture is limited in learning representations for multitask RL (MTRL), as multiple tasks can refer to different kinds of representations. In this paper, we thus propose a novel deep neural network architecture, namely generalization tower network (GTN), which can achieve MTRL within a single learned model. Specifically, the architecture of GTN is composed of both horizontal and vertical streams. In our GTN architecture, horizontal streams are used to learn representation shared in similar tasks. In contrast, the vertical streams are introduced to be more suitable for handling diverse tasks, which encodes hierarchical shared knowledge of these tasks. The effectiveness of the introduced vertical stream is validated by experimental results. Experimental results further verify that our GTN architecture is able to advance the stateoftheart MTRL, via being tested on 51 Atari games. 
Generalized Additive Mixed Model (GAMM) 
gammSlice 
Generalized Additive Models (GAM) 
In statistics, a generalized additive model (GAM) is a generalized linear model in which the linear predictor depends linearly on unknown smooth functions of some predictor variables, and interest focuses on inference about these smooth functions. GAMs were originally developed by Trevor Hastie and Robert Tibshirani to blend properties of generalized linear models with additive models. https://…additivepartibackgroundandrationale GAM: The Predictive Modeling Silver Bullet gamsel 
Generalized Autoregressive Conditional Heteroscedasticity (GARCH) 
If an autoregressive moving average model (ARMA model) is assumed for the error variance, the model is a generalized autoregressive conditional heteroskedasticity (GARCH, Bollerslev (1986)) model. mfGARCH 
Generalized Autoregressive Moving Average Models (GARMA) 
A class of generalized autoregressive moving average (GARMA) models is developed that extends the univariate Gaussian ARMA time series model to a flexible observationdriven model for nonGaussian time series data. The dependent variable is assumed to have a conditional exponential family distribution given the past history of the process. The model estimation is carried out using an iteratively reweighted least squares algorithm. Properties of the model, including stationarity and marginal moments, are either derived explicitly or investigated using Monte Carlo simulation. The relationship of the GARMA model to other models is shown, including the autoregressive models of Zeger and Qaqish, the moving average models of Li, and the reparameterized generalized autoregressive conditional heteroscedastic GARCH model (providing the formula for its fourth marginal moment not previously derived). The model is demonstrated by the application of the GARMA model with a negative binomial conditional distribution to a wellknown time series dataset of poliomyelitis counts. VGAM 
Generalized Boosted Regression Models  This R package (gbm) implements extensions to Freund and Schapire’s AdaBoost algorithm and J. Friedman’s gradient boosting machine. Includes regression methods for least squares, absolute loss, logistic, Poisson, Cox proportional hazards partial likelihood, multinomial, tdistribution, AdaBoost exponential loss, Learning to Rank, and Huberized hinge loss. gbm 
Generalized Discrimination Score  The Generalized Discrimination Score is a generic forecast verification framework which can be applied to any of the following verification contexts: dichotomous, polychotomous (ordinal and nominal), continuous, probabilistic, and ensemble. A comprehensive description of the Generalized Discrimination Score, including all equations used in this package, is provided by Mason and Weigel (2009) <doi:10.1175/MWRD1005069.1> afc 
Generalized Dissimilarity Modeling (GDM) 
Generalized dissimilarity modelling (GDM) is a statistical technique for analysing and predicting spatial patterns of turnover in community composition (beta diversity) across large regions. gdm 
Generalized Dynamic Principal Components (GDPC) 
Brillinger defined dynamic principal components (DPC) for time series based on a reconstruction criterion. He gave a very elegant theoretical solution and proposed an estimator which is consistent under stationarity. Here, we propose a new enterally empirical approach to DPC. The main differences with the existing methods—mainly Brillinger procedure—are (1) the DPC we propose need not be a linear combination of the observations and (2) it can be based on a variety of loss functions including robust ones. Unlike Brillinger, we do not establish any consistency results; however, contrary to Brillinger’s, which has a very strong stationarity flavor, our concept aims at a better adaptation to possible nonstationary features of the series. We also present a robust version of our procedure that allows to estimate the DPC when the series have outlier contamination. We give iterative algorithms to compute the proposed procedures that can be used with a large number of variables. Our nonrobust and robust procedures are illustrated with real datasets. Supplementary materials for this article are available online. Consistency of Generalized Dynamic Principal Components in Dynamic Factor Models gdpc 
Generalized Entropy Agglomeration (GEA) 
Entropy Agglomeration (EA) is a hierarchical clustering algorithm introduced in 2013. Here, we generalize it to define Generalized Entropy Agglomeration (GEA) that can work with multiset blocks and blocks with rational occurrence numbers. We also introduce a numerical categorization procedure to apply GEA to numerical datasets. The software REBUS 2.0 is published with these capabilities: http://…/rebus2 
Generalized Estimation Equation (GEE) 
In statistics, a generalized estimating equation (GEE) is used to estimate the parameters of a generalized linear model with a possible unknown correlation between outcomes. Parameter estimates from the GEE are consistent even when the covariance structure is misspecified, under mild regularity conditions. The focus of the GEE is on estimating the average response over the population (‘populationaveraged’ effects) rather than the regression parameters that would enable prediction of the effect of changing one or more covariates on a given individual. GEEs are usually used in conjunction with HuberWhite standard error estimates, also known as ‘robust standard error’ or ‘sandwich variance’ estimates. In the case of a linear model with a working independence variance structure, these are known as ‘heteroscedasticity consistent standard error’ estimators. Indeed, the GEE unified several independent formulations of these standard error estimators in a general framework. GEEs belong to a class of semiparametric regression techniques because they rely on specification of only the first two moments. Under correct model specification and mild regularity conditions, parameter estimates from GEEs are consistent. They are a popular alternative to the likelihoodbased generalized linear mixed model which is more sensitive to variance structure specification. They are commonly used in large epidemiological studies, especially multisite cohort studies because they can handle many types of unmeasured dependence between outcomes. mmmgee 
Generalized Gaussian Kernel Adaptive Filtering  The present paper proposes generalized Gaussian kernel adaptive filtering, where the kernel parameters are adaptive and datadriven. The Gaussian kernel is parametrized by a center vector and a symmetric positive definite (SPD) precision matrix, which is regarded as a generalization of the scalar width parameter. These parameters are adaptively updated on the basis of a proposed leastsquaretype rule to minimize the estimation error. The main contribution of this paper is to establish update rules for precision matrices on the SPD manifold in order to keep their symmetric positivedefiniteness. Different from conventional kernel adaptive filters, the proposed regressor is a superposition of Gaussian kernels with all different parameters, which makes such regressor more flexible. The kernel adaptive filtering algorithm is established together with a l1regularized least squares to avoid overfitting and the increase of dimensionality of the dictionary. Experimental results confirm the validity of the proposed method. 
Generalized Graded Unfolding Model (GGUM) 
The generalized graded unfolding model (GGUM) is developed. This model allows for either binary or graded responses and generalizes previous item response models for unfolding in two useful ways. First, it implements a discrimination parameter that varies across items, which allows items to discriminate among respondents in different ways. Second, the GGUM permits response category threshold parameters to vary across items. Amarginal maximum likelihood algorithm is implemented to estimate GGUM item parameters, whereas person parameters are derived from an expected a posteriori technique. The applicability of the GGUM to common attitude testing situations is illustrated with real data on student attitudes toward abortion. http://…/gbm2.pdf ScoreGGUM 
Generalized Hyperbolic Distributions (GH) 
The generalised hyperbolic distribution (GH) is a continuous probability distribution defined as the normal variancemean mixture where the mixing distribution is the generalized inverse Gaussian distribution. Its probability density function is given in terms of modified Bessel function of the second kind. As the name suggests it is of a very general form, being the superclass of, among others, the Student’s tdistribution, the Laplace distribution, the hyperbolic distribution, the normalinverse Gaussian distribution and the variancegamma distribution. It is mainly applied to areas that require sufficient probability of farfield behaviour, which it can model due to its semiheavy tails – a property the normal distribution does not possess. The generalised hyperbolic distribution is often used in economics, with particular application in the fields of modelling financial markets and risk management, due to its semiheavy tails. This class is closed under linear operations. 
Generalized Integration Model  Integrates individuallevel data and summary statistics under a generalized linear model framework. gim 
Generalized Kalman Smoothing  Statespace smoothing has found many applications in science and engineering. Under linear and Gaussian assumptions, smoothed estimates can be obtained using efficient recursions, for example RauchTungStriebel and MayneFraser algorithms. Such schemes are equivalent to linear algebraic techniques that minimize a convex quadratic objective function with structure induced by the dynamic model. These classical formulations fall short in many important circumstances. For instance, smoothers obtained using quadratic penalties can fail when outliers are present in the data, and cannot track impulsive inputs and abrupt state changes. Motivated by these shortcomings, generalized Kalman smoothing formulations have been proposed in the last few years, replacing quadratic models with more suitable, often nonsmooth, convex functions. In contrast to classical models, these general estimators require use of iterated algorithms, and these have received increased attention from control, signal processing, machine learning, and optimization communities. In this survey we show that the optimization viewpoint provides the control and signal processing community great freedom in the development of novel modeling and inference frameworks for dynamical systems. We discuss general statistical models for dynamic systems, making full use of nonsmooth convex penalties and constraints, and providing links to important models in signal processing and machine learning. We also survey optimization techniques for these formulations, paying close attention to dynamic problem structure. Modeling concepts and algorithms are illustrated with numerical examples. 
Generalized Lambda Distribution (GLD) 
Generalized lambda distribution is a generic distribution that can be used for various curve fittings or in general mathematical analysis. It is interesting because of the wide variety of distributional shapes it can take on. There are methods how to use this distribution to approximate various other distributions, or to fit experimental data set to this distribution. GLDEX 
Generalized Least Squares Screening (GLSS) 
Variable selection is a widely studied problem in high dimensional statistics, primarily since estimating the precise relationship between the covariates and the response is of great importance in many scientific disciplines. However, most of theory and methods developed towards this goal for the linear model invoke the assumption of iid subGaussian covariates and errors. This paper analyzes the theoretical properties of Sure Independence Screening (SIS) (Fan and Lv ) for high dimensional linear models with dependent and/or heavy tailed covariates and errors. We also introduce a generalized least squares screening (GLSS) procedure which utilizes the serial correlation present in the data. By utilizing this serial correlation when estimating our marginal effects, GLSS is shown to outperform SIS in many cases. For both procedures we prove sure screening properties, which depend on the moment conditions, and the strength of dependence in the error and covariate processes, amongst other factors. Additionally, combining these screening procedures with the adaptive Lasso is analyzed. Dependence is quantified by functional dependence measures (Wu ), and the results rely on the use of Nagaev type and exponential inequalities for dependent random variables. We also conduct simulations to demonstrate the finite sample performance of these procedures, and include a real data application of forecasting the US inflation rate. 
Generalized Likelihood Ratio Test (GLRT) 

Generalized Linear Mixed Model (GLMM) 
In statistics, a generalized linear mixed model (GLMM) is a particular type of mixed model. It is an extension to the generalized linear model in which the linear predictor contains random effects in addition to the usual fixed effects. These random effects are usually assumed to have a normal distribution. Fitting such models by maximum likelihood involves integrating over these random effects. In general, these integrals cannot be expressed in analytical form. Various approximate methods have been developed, but none has good properties for all possible models and data sets (ungrouped binary data being particularly problematic). For this reason, methods involving numerical quadrature or Markov chain Monte Carlo have increased in use as increasing computing power and advances in methods have made them more practical. 
Generalized Linear Models (GLM) 
In statistics, the generalized linear model (GLM) is a flexible generalization of ordinary linear regression that allows for response variables that have error distribution models other than a normal distribution. The GLM generalizes linear regression by allowing the linear model to be related to the response variable via a link function and by allowing the magnitude of the variance of each measurement to be a function of its predicted value. mdscore,mglmn 
Generalized Logistic Distribution  The term generalized logistic distribution is used as the name for several different families of probability distributions. For example, Johnson et al. list four forms, which are listed below. One family described here has also been called the skewlogistic distribution. For other families of distributions that have also been called generalized logistic distributions, see the shifted loglogistic distribution, which is a generalization of the loglogistic distribution. 
Generalized LpNorm TwoDimensional Linear Discriminant Analysis (G2DLDA) 
Recent advances show that twodimensional linear discriminant analysis (2DLDA) is a successful matrix based dimensionality reduction method. However, 2DLDA may encounter the singularity issue theoretically and the sensitivity to outliers. In this paper, a generalized Lpnorm 2DLDA framework with regularization for an arbitrary $p>0$ is proposed, named G2DLDA. There are mainly two contributions of G2DLDA: one is G2DLDA model uses an arbitrary Lpnorm to measure the betweenclass and withinclass scatter, and hence a proper $p$ can be selected to achieve the robustness. The other one is that by introducing an extra regularization term, G2DLDA achieves better generalization performance, and solves the singularity problem. In addition, G2DLDA can be solved through a series of convex problems with equality constraint, and it has closed solution for each single problem. Its convergence can be guaranteed theoretically when $1\leq p\leq2$. Preliminary experimental results on three contaminated human face databases show the effectiveness of the proposed G2DLDA. 
Generalized Mallows Model Latent Dirichlet Allocation (GMMLDA) 
Modeling document structure is of great importance for discourse analysis and related applications. The goal of this research is to capture the document intent structure by modeling documents as a mixture of topic words and rhetorical words. While the topics are relatively unchanged through one document, the rhetorical functions of sentences usually change following certain orders in discourse. We propose GMMLDA, a topic modeling based Bayesian unsupervised model, to analyze the document intent structure cooperated with order information. Our model is flexible that has the ability to combine the annotations and do supervised learning. Additionally, entropic regularization can be introduced to model the significant divergence between topics and intents. We perform experiments in both unsupervised and supervised settings, results show the superiority of our model over several stateoftheart baselines. 
Generalized Matrix Chain Algorithm  In this paper, we present a generalized version of the matrix chain algorithm to generate efficient code for linear algebra problems, a task for which human experts often invest days or even weeks of works. The standard matrix chain problem consists in finding the parenthesization of a matrix product $M := A_1 A_2 \cdots A_n$ that minimizes the number of scalar operations. In practical applications, however, one frequently encounters more complicated expressions, involving transposition, inversion, and matrix properties. Indeed, the computation of such expressions relies on a set of computational kernels that offer functionality well beyond the simple matrix product. The challenge then shifts from finding an optimal parenthesization to finding an optimal mapping of the input expression to the available kernels. Furthermore, it is often the case that a solution based on the minimization of scalar operations does not result in the optimal solution in terms of execution time. In our experiments, the generated code outperforms other libraries and languages on average by a factor of about 9. The motivation for this work comes from the fact that—despite great advances in the development of compilers—the task of mapping linear algebra problems to optimized kernels is still to be done manually. In order to relieve the user from this complex task, new techniques for the compilation of linear algebra expressions have to be developed. 
Generalized Maximum Entropy Estimation  We consider the problem of estimating a probability distribution that maximizes the entropy while satisfying a finite number of moment constraints, possibly corrupted by noise. Based on duality of convex programming, we present a novel approximation scheme using a smoothed fast gradient method that is equipped with explicit bounds on the approximation error. We further demonstrate how the presented scheme can be used for approximating the chemical master equation through the zeroinformation moment closure method. 
Generalized Method of Wavelet Moments (GMWM) 
gmwm 
Generalized MinMax (GMM) 
We develop some theoretical results for a robust similarity measure named ‘generalized minmax’ (GMM). This similarity has direct applications in machine learning as a positive definite kernel and can be efficiently computed via probabilistic hashing. Owing to the discrete nature, the hashed values can also be used for efficient near neighbor search. We prove the theoretical limit of GMM and the consistency result, assuming that the data follow an elliptical distribution, which is a very general family of distributions and includes the multivariate $t$distribution as a special case. The consistency result holds as long as the data have bounded first moment (an assumption which essentially holds for datasets commonly encountered in practice). Furthermore, we establish the asymptotic normality of GMM. Compared to the ‘cosine’ similarity which is routinely adopted in current practice in statistics and machine learning, the consistency of GMM requires much weaker conditions. Interestingly, when the data follow the $t$distribution with $\nu$ degrees of freedom, GMM typically provides a better measure of similarity than ‘cosine’ roughly when $\nu<8$ (which is already very close to normal). These theoretical results will help explain the recent success of GMM in learning tasks. 
Generalized Multistate Simulation Model  GUIgems,gems 
Generalized Network Dismantling  Finding the set of nodes, which removed or (de)activated can stop the spread of (dis)information, contain an epidemic or disrupt the functioning of a corrupt/criminal organization is still one of the key challenges in network science. In this paper, we introduce the generalized network dismantling problem, which aims to find the set of nodes that, when removed from a network, results in a network fragmentation into subcritical network components at minimum cost. For unit costs, our formulation becomes equivalent to the standard network dismantling problem. Our nonunit cost generalization allows for the inclusion of topological cost functions related to node centrality and nontopological features such as the price, protection level or even social value of a node. In order to solve this optimization problem, we propose a method, which is based on the spectral properties of a novel nodeweighted Laplacian operator. The proposed method is applicable to largescale networks with millions of nodes. It outperforms current stateoftheart methods and opens new directions in understanding the vulnerability and robustness of complex systems. 
Generalized Probability Smoothing  In this work we consider a generalized version of Probability Smoothing, the core elementary model for sequential prediction in the state of the art PAQ family of data compression algorithms. Our main contribution is a code length analysis that considers the redundancy of Probability Smoothing with respect to a Piecewise Stationary Source. The analysis holds for a finite alphabet and expresses redundancy in terms of the total variation in probability mass of the stationary distributions of a Piecewise Stationary Source. By choosing parameters appropriately Probability Smoothing has redundancy $O(S\cdot\sqrt{T\log T})$ for sequences of length $T$ with respect to a Piecewise Stationary Source with $S$ segments. 
Generalized Procrustes Analysis (GPA) 
Generalized Procrustes analysis (GPA) is a method of statistical analysis that can be used to compare the shapes of objects, or the results of surveys, interviews, or panels. It was developed for analysing the results of freechoice profiling, a survey technique which allows respondents (such as sensory panelists) to describe a range of products in their own words or language. GPA is one way to make sense of freechoice profiling data; other ways can be multiple factor analysis (MFA), or the STATIS method. The method was first published by J. C. Gower in 1975. 
Generalized Resistant Hyperplane Mechanisms  This paper is part of an emerging line of work at the intersection of machine learning and mechanism design, which aims to avoid noise in training data by correctly aligning the incentives of data sources. Specifically, we focus on the ubiquitous problem of linear regression, where strategyproof mechanisms have previously been identified in two dimensions. In our setting, agents have singlepeaked preferences and can manipulate only their response variables. Our main contribution is the discovery of a family of group strategyproof linear regression mechanisms in any number of dimensions, which we call generalized resistant hyperplane mechanisms. The gametheoretic properties of these mechanisms — and, in fact, their very existence — are established through a connection to a discrete version of the Ham Sandwich Theorem. 
Generalized Structured Component Analysis (GSCA) 
gesca 
Generalized Value Iteration Network (GVIN) 
In this paper, we introduce a generalized value iteration network (GVIN), which is an endtoend neural network planning module. GVIN emulates the value iteration algorithm by using a novel graph convolution operator, which enables GVIN to learn and plan on irregular spatial graphs. We propose three novel differentiable kernels as graph convolution operators and show that the embedding based kernel achieves the best performance. We further propose episodic Qlearning, an improvement upon traditional nstep Qlearning that stabilizes training for networks that contain a planning module. Lastly, we evaluate GVIN on planning problems in 2D mazes, irregular graphs, and realworld street networks, showing that GVIN generalizes well for both arbitrary graphs and unseen graphs of larger scale and outperforms a naive generalization of VIN (discretizing a spatial graph into a 2D image). 
Generalized Vector Space Model (GVSM) 
The Generalized vector space model is a generalization of the vector space model used in information retrieval. Many classifiers, especially those which are related to document or text classification, use the TFIDF basis of VSM. However, this is where the similarity between the models ends – the generalized model uses the results of the TFIDF dictionary to generate similarity metrics based on distance or angle difference, rather than centroid based classification. Wong et al. presented an analysis of the problems that the pairwise orthogonality assumption of the vector space model (VSM) creates. From here they extended the VSM to the generalized vector space model (GVSM). 
GeneraltoSpecific Model (GETS) 
This paper discusses the econometric methodology of generaltospecific modeling, in which the modeler simplifies an initially general model that adequately characterizes the empirical evidence within his or her theoretical framework. Central aspects of this approach include the theory of reduction, dynamic specification, model selection procedures, model selection criteria, model comparison, encompassing, computer automation, and empirical implementation. This paper thus reviews the theory of reduction, summarizes the approach of generaltospecific modeling, and discusses the econometrics of model selection, noting that generaltospecific modeling is the practical embodiment of reduction. gets 
Generative Adversarial Autoencoder Network (GAAN) 
We introduce an effective model to overcome the problem of mode collapse when training Generative Adversarial Networks (GAN). Firstly, we propose a new generator objective that finds it better to tackle mode collapse. And, we apply an independent Autoencoders (AE) to constrain the generator and consider its reconstructed samples as ‘real’ samples to slow down the convergence of discriminator that enables to reduce the gradient vanishing problem and stabilize the model. Secondly, from mappings between latent and data spaces provided by AE, we further regularize AE by the relative distance between the latent and data samples to explicitly prevent the generator falling into mode collapse setting. This idea comes when we find a new way to visualize the mode collapse on MNIST dataset. To the best of our knowledge, our method is the first to propose and apply successfully the relative distance of latent and data samples for stabilizing GAN. Thirdly, our proposed model, namely Generative Adversarial Autoencoder Networks (GAAN), is stable and has suffered from neither gradient vanishing nor mode collapse issues, as empirically demonstrated on synthetic, MNIST, MNIST1K, CelebA and CIFAR10 datasets. Experimental results show that our method can approximate well multimodal distribution and achieve better results than stateoftheart methods on these benchmark datasets. Our model implementation is published here: https://…/gaan 
Generative Adversarial Capsule Network (CapsuleGAN) 
We present Generative Adversarial Capsule Network (CapsuleGAN), a framework that uses capsule networks (CapsNets) instead of the standard convolutional neural networks (CNNs) as discriminators within the generative adversarial network (GAN) setting, while modeling image data. We provide guidelines for designing CapsNet discriminators and the updated GAN objective function, which incorporates the CapsNet margin loss, for training CapsuleGAN models. We show that CapsuleGAN outperforms convolutionalGAN at modeling image data distribution on the MNIST dataset of handwritten digits, evaluated on the generative adversarial metric and at semisupervised image classification. 
Generative Adversarial Imitation Learning (GAIL) 

Generative Adversarial Mapping Networks (GAMN) 
Generative Adversarial Networks (GANs) have shown impressive performance in generating photorealistic images. They fit generative models by minimizing certain distance measure between the real image distribution and the generated data distribution. Several distance measures have been used, such as JensenShannon divergence, $f$divergence, and Wasserstein distance, and choosing an appropriate distance measure is very important for training the generative network. In this paper, we choose to use the maximum mean discrepancy (MMD) as the distance metric, which has several nice theoretical guarantees. In fact, generative moment matching network (GMMN) (Li, Swersky, and Zemel 2015) is such a generative model which contains only one generator network $G$ trained by directly minimizing MMD between the real and generated distributions. However, it fails to generate meaningful samples on challenging benchmark datasets, such as CIFAR10 and LSUN. To improve on GMMN, we propose to add an extra network $F$, called mapper. $F$ maps both real data distribution and generated data distribution from the original data space to a feature representation space $\mathcal{R}$, and it is trained to maximize MMD between the two mapped distributions in $\mathcal{R}$, while the generator $G$ tries to minimize the MMD. We call the new model generative adversarial mapping networks (GAMNs). We demonstrate that the adversarial mapper $F$ can help $G$ to better capture the underlying data distribution. We also show that GAMN significantly outperforms GMMN, and is also superior to or comparable with other stateoftheart GAN based methods on MNIST, CIFAR10 and LSUNBedrooms datasets. 
Generative Adversarial Network (GAN) 
We propose a new framework for estimating generative models via an adversarial process, in which we simultaneously train two models: a generative model G that captures the data distribution, and a discriminative model D that estimates the probability that a sample came from the training data rather than G. The training procedure for G is to maximize the probability of D making a mistake. This framework corresponds to a minimax twoplayer game. In the space of arbitrary functions G and D, a unique solution exists, with G recovering the training data distribution and D equal to 1/2 everywhere. In the case where G and D are defined by multilayer perceptrons, the entire system can be trained with backpropagation. There is no need for any Markov chains or unrolled approximate inference networks during either training or generation of samples. Experiments demonstrate the potential of the framework through qualitative and quantitative evaluation of the generated samples. GitXiv 
Generative Adversarial Network Embedding (GANE) 
Network embedding has become a hot research topic recently which can provide lowdimensional feature representations for many machine learning applications. Current work focuses on either (1) whether the embedding is designed as an unsupervised learning task by explicitly preserving the structural connectivity in the network, or (2) whether the embedding is a byproduct during the supervised learning of a specific discriminative task in a deep neural network. In this paper, we focus on bridging the gap of the two lines of the research. We propose to adapt the Generative Adversarial model to perform network embedding, in which the generator is trying to generate vertex pairs, while the discriminator tries to distinguish the generated vertex pairs from real connections (edges) in the network. Wasserstein1 distance is adopted to train the generator to gain better stability. We develop three variations of models, including GANE which applies cosine similarity, GANEO1 which preserves the firstorder proximity, and GANEO2 which tries to preserves the secondorder proximity of the network in the lowdimensional embedded vector space. We later prove that GANEO2 has the same objective function as GANEO1 when negative sampling is applied to simplify the training process in GANEO2. Experiments with realworld network datasets demonstrate that our models constantly outperform stateoftheart solutions with significant improvements on precision in link prediction, as well as on visualizations and accuracy in clustering tasks. 
Generative Adversarial Network Game (GANG) 
Generative Adversarial Networks (GAN) have become one of the most successful frameworks for unsupervised generative modeling. As GANs are difficult to train much research has focused on this. However, very little of this research has directly exploited gametheoretic techniques. We introduce Generative Adversarial Network Games (GANGs), which explicitly model a finite zerosum game between a generator ($G$) and classifier ($C$) that use mixed strategies. The size of these games precludes exact solution methods, therefore we define resourcebounded best responses (RBBRs), and a resourcebounded Nash Equilibrium (RBNE) as a pair of mixed strategies such that neither $G$ or $C$ can find a better RBBR. The RBNE solution concept is richer than the notion of `local Nash equilibria’ in that it captures not only failures of escaping local optima of gradient descent, but applies to any approximate best response computations, including methods with random restarts. To validate our approach, we solve GANGs with the Parallel Nash Memory algorithm, which provably monotonically converges to an RBNE. We compare our results to standard GAN setups, and demonstrate that our method deals well with typical GAN problems such as mode collapse, partial mode coverage and forgetting. 
Generative Adversarial Privacy  Preserving the utility of published datasets, while providing provable privacy guarantees, is a wellknown challenge. On the one hand, contextfree privacy solutions, such as differential privacy, provide strong privacy guarantees, but often lead to a significant reduction in utility. On the other hand, contextaware privacy solutions, such as information theoretic privacy, achieve an improved privacyutility tradeoff, but assume that the data holder has access to the dataset’s statistics. To circumvent this problem, we present a novel contextaware privacy framework called generative adversarial privacy (GAP). GAP leverages recent advancements in generative adversarial networks (GANs) to allow the data holder to learn ‘optimal’ privatization schemes from the dataset itself. Under GAP, learning the privacy mechanism is formulated as a constrained minimax game between two players: a privatizer that sanitizes the dataset in a way that limits the risk of inference attacks on the individuals’ private variables, and an adversary that tries to infer the private variables from the sanitized dataset. To evaluate GAP’s performance, we investigate two simple (yet canonical) statistical dataset models: (a) the binary data model, and (b) the binary Gaussian mixture model. For both models, we derive gametheoretically optimal minimax privacy mechanisms, and show that the privacy mechanisms learned from data (in an iterative generative adversarial fashion) match the theoretically optimal ones. This demonstrates that our framework can be easily applied in practice, even in the absence of dataset statistics. 
Generative Autotransporter (GAT) 
In this paper, we aim to introduce the classic Optimal Transport theory to enhance deep generative probabilistic modeling. For this purpose, we design a Generative Autotransporter (GAT) model with explicit distribution optimal transport. Particularly, the GAT model owns a deep distribution transporter to transfer the target distribution to a specific prior probability distribution, which enables a regular decoder to generate target samples from the input data that follows the transported prior distribution. With such a design, the GAT model can be stably trained to generate novel data by merely using a very simple $l_1$ reconstruction loss function with a generalized manifoldbased Adam training algorithm. The experiments on two standard benchmarks demonstrate its strong generation ability. 
Generative Information Lower BOund (GILBO) 
We propose a simple, tractable lower bound on the mutual information contained in the joint generative density of any latent variable generative model: the GILBO (Generative Information Lower BOund). It offers a data independent measure of the complexity of the learned latent variable description, giving the log of the effective description length. It is welldefined for both VAEs and GANs. We compute the GILBO for 800 GANs and VAEs trained on MNIST and discuss the results. 
Generative Learning Algorithms  Algorithms that try to learn p(yx) directly (such as logistic regression), or algorithms that try to learn mappings directly from the space of inputs X to the labels {0, 1}, (such as the perceptron algorithm) are called discrim inative learning algorithms. Here, we’ll talk about algorithms that instead try to model p(xy) (and p(y)). These algorithms are called generative learning algorithms. For instance, if y indicates whether an example is a dog (0) or an elephant (1), then p(xy = 0) models the distribution of dogs’ features, and p(xy = 1) models the distribution of elephants’ features. Naive Bayes Generative Learning Algorithms 
Generative Markov Network (GMN) 
The assumption that data samples are independently identically distributed is the backbone of many learning algorithms. Nevertheless, datasets often exhibit rich structures in practice, and we argue that there exist some unknown orders within the data instances. Aiming to find such orders, we introduce a novel Generative Markov Network (GMN) which we use to extract the order of data instances automatically. Specifically, we assume that the instances are sampled from a Markov chain. Our goal is to learn the transitional operator of the chain as well as the generation order by maximizing the generation probability under all possible data permutations. One of our key ideas is to use neural networks as a soft lookup table for approximating the possibly huge, but discrete transition matrix. This strategy allows us to amortize the space complexity with a single model and make the transitional operator generalizable to unseen instances. To ensure the learned Markov chain is ergodic, we propose a greedy batchwise permutation scheme that allows fast training. Empirically, we evaluate the learned Markov chain by showing that GMNs are able to discover orders among data instances and also perform comparably well to stateoftheart methods on the oneshot recognition benchmark task. 
Generative Mixture of Networks  A generative model based on training deep architectures is proposed. The model consists of K networks that are trained together to learn the underlying distribution of a given data set. The process starts with dividing the input data into K clusters and feeding each of them into a separate network. After few iterations of training networks separately, we use an EMlike algorithm to train the networks together and update the clusters of the data. We call this model Mixture of Networks. The provided model is a platform that can be used for any deep structure and be trained by any conventional objective function for distribution modeling. As the components of the model are neural networks, it has high capability in characterizing complicated data distributions as well as clustering data. We apply the algorithm on MNIST handwritten digits and Yale face datasets. We also demonstrate the clustering ability of the model using some realworld and toy examples. 
Generative Model  In probability and statistics, a generative model is a model for randomly generating observable data, typically given some hidden parameters. It specifies a joint probability distribution over observation and label sequences. Generative models are used in machine learning for either modeling data directly (i.e., modeling observations drawn from a probability density function), or as an intermediate step to forming a conditional probability density function. A conditional distribution can be formed from a generative model through Bayes’ rule. Shannon (1948) gives an example in which a table of frequencies of English word pairs is used to generate a sentence beginning with “representing and speedily is an good”; which is not proper English but which will increasingly approximate it as the table is moved from word pairs to word triplets etc. 
Generative Moment Matching Network (GMMN) 
Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a twosample test based on kernel maximum mean discrepancy (MMD). 
Generative Moment Matching Network – Generative Adversarial Network (MMDGAN) 
Generative moment matching network (GMMN) is a deep generative model that differs from Generative Adversarial Network (GAN) by replacing the discriminator in GAN with a twosample test based on kernel maximum mean discrepancy (MMD). Although some theoretical guarantees of MMD have been studied, the empirical performance of GMMN is still not as competitive as that of GAN on challenging and large benchmark datasets. The computational efficiency of GMMN is also less desirable in comparison with GAN, partially due to its requirement for a rather large batch size during the training. In this paper, we propose to improve both the model expressiveness of GMMN and its computational efficiency by introducing adversarial kernel learning techniques, as the replacement of a fixed Gaussian kernel in the original GMMN. The new approach combines the key ideas in both GMMN and GAN, hence we name it MMDGAN. The new distance measure in MMDGAN is a meaningful loss that enjoys the advantage of weak topology and can be optimized via gradient descent with relatively small batch sizes. In our evaluation on multiple benchmark datasets, including MNIST, CIFAR 10, CelebA and LSUN, the performance of MMDGAN significantly outperforms GMMN, and is competitive with other representative GAN works. 
Generative Topic Embedding  Word embedding maps words into a lowdimensional continuous embedding space by exploiting the local word collocation patterns in a small context window. On the other hand, topic modeling maps documents onto a lowdimensional topic space, by utilizing the global word collocation patterns in the same document. These two types of patterns are complementary. In this paper, we propose a generative topic embedding model to combine the two types of patterns. In our model, topics are represented by embedding vectors, and are shared across documents. The probability of each word is influenced by both its local context and its topic. A variational inference method yields the topic embeddings as well as the topic mixing proportions for each document. Jointly they represent the document in a lowdimensional continuous space. In two d 
Generative Topographic Map (GTM) 
Generative topographic map (GTM) is a machine learning method that is a probabilistic counterpart of the selforganizing map (SOM), is provably convergent and does not require a shrinking neighborhood or a decreasing step size. It is a generative model: the data is assumed to arise by first probabilistically picking a point in a lowdimensional space, mapping the point to the observed highdimensional input space (via a smooth function), then adding noise in that space. The parameters of the lowdimensional probability distribution, the smooth map and the noise are all learned from the training data using the expectationmaximization (EM) algorithm. GTM was introduced in 1996 in a paper by Christopher M. Bishop, Markus Svensen, and Christopher K. I. Williams. 
Genetic Algorithm (GA) 
In the computer science field of artificial intelligence, a genetic algorithm (GA) is a search heuristic that mimics the process of natural selection. This heuristic (also sometimes called a metaheuristic) is routinely used to generate useful solutions to optimization and search problems. Genetic algorithms belong to the larger class of evolutionary algorithms (EA), which generate solutions to optimization problems using techniques inspired by natural evolution, such as inheritance, mutation, selection, and crossover. Genetic algorithms find application in bioinformatics, phylogenetics, computational science, engineering, economics, chemistry, manufacturing, mathematics, physics, pharmacometrics and other fields. 
Genetic Evolution Network (GEN) 
In this paper, we introduce an alternative approach, namely GEN (Genetic Evolution Network) Model, to the deep learning models. Instead of building one single deep model, GEN adopts a geneticevolutionary learning strategy to build a group of unit models generations by generations. Significantly different from the wellknown representation learning models with extremely deep structures, the unit models covered in GEN are of a much shallower architecture. In the training process, from each generation, a subset of unit models will be selected based on their performance to evolve and generate the child models in the next generation. GEN has significant advantages compared with existing deep representation learning models in terms of both learning effectiveness, efficiency and interpretability of the learning process and learned results. Extensive experiments have been done on diverse benchmark datasets, and the experimental results have demonstrated the outstanding performance of GEN compared with the stateoftheart baseline methods in both effectiveness of efficiency. 
Genetic Programming for Reinforcement Learning (GPRL) 
The search for interpretable reinforcement learning policies is of high academic and industrial interest. Especially for industrial systems, domain experts are more likely to deploy autonomously learned controllers if they are understandable and convenient to evaluate. Basic algebraic equations are supposed to meet these requirements, as long as they are restricted to an adequate complexity. Here we introduce the genetic programming for reinforcement learning (GPRL) approach based on modelbased batch reinforcement learning and genetic programming, which autonomously learns policy equations from preexisting default stateaction trajectory samples. GPRL is compared to a straightforward method which utilizes genetic programming for symbolic regression, yielding policies imitating an existing wellperforming, but noninterpretable policy. Experiments on three reinforcement learning benchmarks, i.e., mountain car, cartpole balancing, and industrial benchmark, demonstrate the superiority of our GPRL approach compared to the symbolic regression method. GPRL is capable of producing wellperforming interpretable reinforcement learning policies from preexisting default trajectory data. 
GeneticEvolutionary Adam (GADAM) 
Deep neural network learning can be formulated as a nonconvex optimization problem. Existing optimization algorithms, e.g., Adam, can learn the models fast, but may get stuck in local optima easily. In this paper, we introduce a novel optimization algorithm, namely GADAM (GeneticEvolutionary Adam). GADAM learns deep neural network models based on a number of unit models generations by generations: it trains the unit models with Adam, and evolves them to the new generations with genetic algorithm. We will show that GADAM can effectively jump out of the local optima in the learning process to obtain better solutions, and prove that GADAM can also achieve a very fast convergence. Extensive experiments have been done on various benchmark datasets, and the learning results will demonstrate the effectiveness and efficiency of the GADAM algorithm. 
Geographic Information Systems (GIS) 
A geographic information system (GIS) is a computer system designed to capture, store, manipulate, analyze, manage, and present all types of geographical data. The acronym GIS is sometimes used for geographical information science or geospatial information studies to refer to the academic discipline or career of working with geographic information systems and is a large domain within the broader academic discipline of Geoinformatics. 
Geographic Resources Analysis Support System (GRASS) 
GRASS GIS, commonly referred to as GRASS (Geographic Resources Analysis Support System), is a free and open source Geographic Information System (GIS) software suite used for geospatial data management and analysis, image processing, graphics and maps production, spatial modeling, and visualization. GRASS GIS is currently used in academic and commercial settings around the world, as well as by many governmental agencies and environmental consulting companies. It is a founding member of the Open Source Geospatial Foundation (OSGeo). rgrass7 
GeoJSON  GeoJSON is a format for encoding a variety of geographic data structures. A GeoJSON object may represent a geometry, a feature, or a collection of features. GeoJSON supports the following geometry types: Point, LineString, Polygon, MultiPoint, MultiLineString, MultiPolygon, and GeometryCollection. Features in GeoJSON contain a geometry object and additional properties, and a feature collection represents a list of features. A complete GeoJSON data structure is always an object (in JSON terms). In GeoJSON, an object consists of a collection of name/value pairs — also called members. For each member, the name is always a string. Member values are either a string, number, object, array or one of the literals: true, false, and null. An array consists of elements where each element is a value as described above. 
Geometric Dirichlet Mean  We propose a geometric algorithm for topic learning and inference that is built on the convex geometry of topics arising from the Latent Dirichlet Allocation (LDA) model and its nonparametric extensions. To this end we study the optimization of a geometric loss function, which is a surrogate to the LDA’s likelihood. Our method involves a fast optimization based weighted clustering procedure augmented with geometric corrections, which overcomes the computational and statistical inefficiencies encountered by other techniques based on Gibbs sampling and variational inference, while achieving the accuracy comparable to that of a Gibbs sampler. The topic estimates produced by our method are shown to be statistically consistent under some conditions. The algorithm is evaluated with extensive experiments on simulated and real data. 
Geometric Enclosing Network (GEN) 
Training model to generate data has increasingly attracted research attention and become important in modern world applications. We propose in this paper a new geometrybased optimization approach to address this problem. Orthogonal to current stateoftheart densitybased approaches, most notably VAE and GAN, we present a fresh new idea that borrows the principle of minimal enclosing ball to train a generator G\left(\bz\right) in such a way that both training and generated data, after being mapped to the feature space, are enclosed in the same sphere. We develop theory to guarantee that the mapping is bijective so that its inverse from feature space to data space results in expressive nonlinear contours to describe the data manifold, hence ensuring data generated are also lying on the data manifold learned from training data. Our model enjoys a nice geometric interpretation, hence termed Geometric Enclosing Networks (GEN), and possesses some key advantages over its rivals, namely simple and easytocontrol optimization formulation, avoidance of mode collapsing and efficiently learn data manifold representation in a completely unsupervised manner. We conducted extensive experiments on synthesis and realworld datasets to illustrate the behaviors, strength and weakness of our proposed GEN, in particular its ability to handle multimodal data and quality of generated data. 
Geometric Generative Adversarial Nets (Geometric GAN) 
Generative Adversarial Nets (GANs) represent an important milestone for effective generative models, which has inspired numerous variants seemingly different from each other. One of the main contributions of this paper is to reveal a unified geometric structure in GAN and its variants. Specifically, we show that the adversarial generative model training can be decomposed into three geometric steps: separating hyperplane search, discriminator parameter update away from the separating hyperplane, and the generator update along the normal vector direction of the separating hyperplane. This geometric intuition reveals the limitations of the existing approaches and leads us to propose a new formulation called geometric GAN using SVM separating hyperplane that maximizes the margin. Our theoretical analysis shows that the geometric GAN converges to a Nash equilibrium between the discriminator and generator. In addition, extensive numerical results show that the superior performance of geometric GAN. 
Geometric Mean Metric Learning  We revisit the task of learning a Euclidean metric from data. We approach this problem from first principles and formulate it as a surprisingly simple optimization problem. Indeed, our formulation even admits a closed form solution. This solution possesses several very attractive properties: (i) an innate geometric appeal through the Riemannian geometry of positive definite matrices; (ii) ease of interpretability; and (iii) computational speed several orders of magnitude faster than the widely used LMNN and ITML methods. Furthermore, on standard benchmark datasets, our closedform solution consistently attains higher classification accuracy. 
Geometric Program (GP) 
A geometric program (GP) is a type of mathematical optimization problem characterized by objective and constraint functions that have a special form. Recently developed solution methods can solve even largescale GPs extremely efficiently and reliably; at the same time a number of practical problems, particularly in circuit design, have been found to be equivalent to (or well approximated by) GPs. Putting these two together, we get effective solutions for the practical problems. The basic approach in GP modeling is to attempt to express a practical problem, such as an engineering analysis or design problem, in GP format. In the best case, this formulation is exact; when this is not possible, we settle for an approximate formulation. 
Geometric Semantic Genetic Programming (GSGP) 
In iterative supervised learning algorithms it is common to reach a point in the search where no further induction seems to be possible with the available data. If the search is continued beyond this point, the risk of overfitting increases significantly. Following the recent developments in inductive semantic stochastic methods, this paper studies the feasibility of using information gathered from the semantic neighborhood to decide when to stop the search. Two semantic stopping criteria are proposed and experimentally assessed in Geometric Semantic Genetic Programming (GSGP) and in the Semantic Learning Machine (SLM) algorithm (the equivalent algorithm for neural networks). The experiments are performed on realworld highdimensional regression datasets. The results show that the proposed semantic stopping criteria are able to detect stopping points that result in a competitive generalization for both GSGP and SLM. This approach also yields computationally efficient algorithms as it allows the evolution of neural networks in less than 3 seconds on average, and of GP trees in at most 10 seconds. The usage of the proposed semantic stopping criteria in conjunction with the computation of optimal mutation/learning steps also results in small trees and neural networks. 
Geometrically Designed Spline Regression  Geometrically Designed Spline (‘GeDS’) Regression is a nonparametric geometrically motivated method for fitting variable knots spline predictor models in one or two independent variables, in the context of generalized (non)linear models. ‘GeDS’ estimates the number and position of the knots and the order of the spline, assuming the response variable has a distribution from the exponential family. A description of the method can be found in Kaishev et al. (2016) <doi:10.1007/s0018001506217> and Dimitrova et al. (2017) <https://…/18460>. GeDS 
Geometry Score  One of the biggest challenges in the research of generative adversarial networks (GANs) is assessing the quality of generated samples and detecting various levels of mode collapse. In this work, we construct a novel measure of performance of a GAN by comparing geometrical properties of the underlying data manifold and the generated one, which provides both qualitative and quantitative means for evaluation. Our algorithm can be applied to datasets of an arbitrary nature and is not limited to visual data. We test the obtained metric on various reallife models and datasets and demonstrate that our method provides new insights into properties of GANs. 
GeometryAware Generative Adversarial Network (GAGAN) 
Deep generative models learned through adversarial training have become increasingly popular for their ability to generate naturalistic image textures. However, apart from the visual texture, the visual appearance of objects is significantly affected by their shape geometry, information which is not taken into account by existing generative models. This paper introduces the GeometryAware Generative Adversarial Network (GAGAN) for incorporating geometric information into the image generation process. Specifically, in GAGAN the generator samples latent variables from the probability space of a statistical shape model. By mapping the output of the generator to a canonical coordinate frame through a differentiable geometric transformation, we enforce the geometry of the objects and add an implicit connection from the prior to the generated object. Experimental results on face generation indicate that the GAGAN can generate realistic images of faces with arbitrary facial attributes such as facial expression, pose, and morphology, that are of better quality compared to current GANbased methods. Finally, our method can be easily incorporated into and improve the quality of the images generated by any existing GAN architecture. 
Gephi  Gephi is an interactive visualization and exploration platform for all kinds of networks and complex systems, dynamic and hierarchical graphs. Runs on Windows, Linux and Mac OS X. Gephi is opensource and free. 
GEPPG  In continuous action domains, standard deep reinforcement learning algorithms like DDPG suffer from inefficient exploration when facing sparse or deceptive reward problems. Conversely, evolutionary and developmental methods focusing on exploration like novelty search, qualitydiversity or goal exploration processes are less sample efficient during exploitation. In this paper, we present the GEPPG approach, taking the best of both worlds by sequentially combining two variants of a goal exploration process and two variants of DDPG. We study the learning performance of these components and their combination on a low dimensional deceptive reward problem and on the larger HalfCheetah benchmark. Among other things, we show that DDPG fails on the former and that GEPPG obtains performance above the stateoftheart on the latter. 
GGQID3  Usually, decision tree induction algorithms are limited to work with non relational data. Given a record, they do not take into account other objects attributes even though they can provide valuable information for the learning task. In this paper we present GGQID3, a multirelational decision tree learning algorithm that uses Generalized Graph Queries (GGQ) as predicates in the decision nodes. GGQs allow to express complex patterns (including cycles) and they can be refined stepbystep. Also, they can evaluate structures (not only single records) and perform Regular Pattern Matching. GGQ are built dynamically (pattern mining) during the GGQID3 tree construction process. We will show how to use GGQID3 to perform multirelational machine learning keeping complexity under control. Finally, some real examples of automatically obtained classification trees and semantic patterns are shown. —– Normalmente, los algoritmos de inducci\’on de \’arboles de decisi\’on trabajan con datos no relacionales. Dado un registro, no tienen en cuenta los atributos de otros objetos a pesar de que \’estos pueden proporcionar informaci\’on \’util para la tarea de aprendizaje. En este art\’iculo presentamos GGQID3, un algoritmo de aprendizaje de \’arboles de decisiones multirelacional que utiliza Generalized Graph Queries (GGQ) como predicados en los nodos de decisi\’on. Los GGQs permiten expresar patrones complejos (incluyendo ciclos) y pueden ser refinados paso a paso. Adem\’as, pueden evaluar estructuras (no solo registros) y llevar a cabo Regular Pattern Matching. En GGQID3, los GGQ son construidos din\’amicamente (pattern mining) durante el proceso de construcci\’on del \’arbol. Adem\’as, se muestran algunos ejemplos reales de \’arboles de clasificaci\’on multirelacionales y patrones sem\’anticos obtenidos autom\’aticamente. 
Gibbs Sampling  In statistics and in statistical physics, Gibbs sampling or a Gibbs sampler is a Markov chain Monte Carlo (MCMC) algorithm for obtaining a sequence of observations which are approximated from a specified multivariate probability distribution (i.e. from the joint probability distribution of two or more random variables), when direct sampling is difficult. This sequence can be used to approximate the joint distribution (e.g., to generate a histogram of the distribution); to approximate the marginal distribution of one of the variables, or some subset of the variables (for example, the unknown parameters or latent variables); or to compute an integral (such as the expected value of one of the variables). Typically, some of the variables correspond to observations whose values are known, and hence do not need to be sampled. 
GibbsNet  Directed latent variable models that formulate the joint distribution as $p(x,z) = p(z) p(x \mid z)$ have the advantage of fast and exact sampling. However, these models have the weakness of needing to specify $p(z)$, often with a simple fixed prior that limits the expressiveness of the model. Undirected latent variable models discard the requirement that $p(z)$ be specified with a prior, yet sampling from them generally requires an iterative procedure such as blocked Gibbssampling that may require many steps to draw samples from the joint distribution $p(x, z)$. We propose a novel approach to learning the joint distribution between the data and a latent code which uses an adversarially learned iterative procedure to gradually refine the joint distribution, $p(x, z)$, to better match with the data distribution on each step. GibbsNet is the best of both worlds both in theory and in practice. Achieving the speed and simplicity of a directed latent variable model, it is guaranteed (assuming the adversarial game reaches the virtual training criteria global minimum) to produce samples from $p(x, z)$ with only a few sampling iterations. Achieving the expressiveness and flexibility of an undirected latent variable model, GibbsNet does away with the need for an explicit $p(z)$ and has the ability to do attribute prediction, classconditional generation, and joint imageattribute modeling in a single model which is not trained for any of these specific tasks. We show empirically that GibbsNet is able to learn a more complex $p(z)$ and show that this leads to improved inpainting and iterative refinement of $p(x, z)$ for dozens of steps and stable generation without collapse for thousands of steps, despite being trained on only a few steps. 
Gini Impurity  Used by the CART (classification and regression tree) algorithm, Gini impurity is a measure of how often a randomly chosen element from the set would be incorrectly labeled if it were randomly labeled according to the distribution of labels in the subset. Gini impurity can be computed by summing the probability of each item being chosen times the probability of a mistake in categorizing that item. It reaches its minimum (zero) when all cases in the node fall into a single target category. 
GirvanNewman Algorithm  The GirvanNewman algorithm detects communities by progressively removing edges from the original network. The connected components of the remaining network are the communities. Instead of trying to construct a measure that tells us which edges are the most central to communities, the GirvanNewman algorithm focuses on edges that are most likely “between” communities. 
GitHub Gist  Tom PrestonWerner presented the new Gist feature at a punk rock Ruby conference in 2008. Gist builds upon that idea by adding version control for code snippets, easy forking, and SSL encryption for private pastes. Because each “gist” is its own Git repository, multiple code snippets can be contained in a single paste and they can be pushed and pulled using Git. Further, forked code can be pushed back to the original author in the form of a patch, so pastes can become more like miniprojects. The main benefit of forking is that it allows you to freely experiment with changes without affecting the original project. Gist is a simple way to share snippets and pastes with others. All gists are Git repositories, so they are automatically versioned, forkable and usable from Git. gistr 
GitHub Hosted R Repository (ghrr) 
This ghrr (for ‘GitHub Hosted R Repository’) uses drat for both insertion of packages, and usage from R. http://…/#introducing_ghrr drat 
GitXiv  arXiv + Github + Links + Discussion: GitXiv is a space to share links to open computer science projects. Countless Github and arXiv links are floating around the web. Its hard to keep track of these gems. GitXiv attempts to solve this problem by offering a collaboratively curated feed of projects. Each project is conveniently presented as arXiv + Github + Links + Discussion. Members can submit their findings and let the community rank and discuss it. A regular newsletter makes it easy to stay uptodate on recent advancements. It´s free and open. 
GLearning  Modelfree reinforcement learning algorithms such as Qlearning perform poorly in the early stages of learning in noisy environments, because much effort is spent on unlearning biased estimates of the stateaction function. The bias comes from selecting, among several noisy estimates, the apparent optimum, which may actually be suboptimal. We propose Glearning, a new offpolicy learning algorithm that regularizes the noise in the space of optimal actions by penalizing deterministic policies at the beginning of the learning. Moreover, it enables naturally incorporating prior distributions over optimal actions when available. The stochastic nature of Glearning also makes it more costeffective than Qlearning in noiseless but explorationrisky domains. We illustrate these ideas in several examples where Glearning results in significant improvements of the learning rate and the learning cost. 
Global Interpreter Lock (GIL) 
In CPython, the global interpreter lock, or GIL, is a mutex that prevents multiple native threads from executing Python bytecodes at once. This lock is necessary mainly because CPython’s memory management is not threadsafe. (However, since the GIL exists, other features have grown to depend on the guarantees that it enforces.) CPython extensions must be GILaware in order to avoid defeating threads. For an explanation, see Global interpreter lock. The GIL is controversial because it prevents multithreaded CPython programs from taking full advantage of multiprocessor systems in certain situations. Note that potentially blocking or longrunning operations, such as I/O, image processing, and NumPy number crunching, happen outside the GIL. Therefore it is only in multithreaded programs that spend a lot of time inside the GIL, interpreting CPython bytecode, that the GIL becomes a bottleneck. However the GIL degrades performance even when it is not a bottleneck. Summarizing those slides: The system call overhead is significant, especially on multicore hardware. Two threads calling a function may take twice as much time as a single thread calling the function twice. The GIL can cause I/Obound threads to be scheduled ahead of CPUbound threads. And it prevents signals from being delivered. 
Global Sensitivity Analysis (GSA) 
This presentation aims to introduce global sensitivity analysis (SA), targeting an audience unfamiliar with the topic, and to give practical hints about the associated advantages and the effort needed. To this effect, we shall review some techniques for sensitivity analysis, including those that are not global, by applying them to a simple example. This will give the audience a chance to contrast each method’s result against the audience’s own expectation of what the sensitivity pattern for the simple model should be. We shall also try to relate the discourse on the relative importance of model input factors to specific questions, such as ‘Which of the uncertain input factor(s) is so noninfluential that we can safely fix it/them?’ or ‘If we could eliminate the uncertainty in one of the input factors, which factor should we choose to reduce the most the variance of the output?’ In this way, the selection of the method for sensitivity analysis will be put in relation to the framing of the analysis and to the interpretation and presentation of the results. The choice of the output of interest will be discussed in relation to the purpose of the model based analysis. The main methods that we present in this lecture are all related with one another, and are the method of Morris for factors’ screening and the variancebased measures. All are modelfree, in the sense that their application does not rely on special assumptions on the behaviour of the model (such as linearity, monotonicity and additivity of the relationship between input factor and model output). Monte Carlo filtering will be also be discussed to demonstrate the usefulness of global sensitivity analysis in relation to estimation. Global sensitivity analysis: An introduction (PDF Download Available) Global sensitivity analysis for statistical model parameters 
Global Style Token (GST) 
In this work, we propose ‘global style tokens’ (GSTs), a bank of embeddings that are jointly trained within Tacotron, a stateoftheart endtoend speech synthesis system. The embeddings are trained with no explicit labels, yet learn to model a large range of acoustic expressiveness. GSTs lead to a rich set of significant results. The soft interpretable ‘labels’ they generate can be used to control synthesis in novel ways, such as varying speed and speaking style – independently of the text content. They can also be used for style transfer, replicating the speaking style of a single audio clip across an entire longform text corpus. When trained on noisy, unlabeled found data, GSTs learn to factorize noise and speaker identity, providing a path towards highly scalable but robust speech synthesis. 
Global Vectors for Word Representation (GloVe) 
Recent methods for learning vector space representations of words have succeeded in capturing finegrained semantic and syntactic regularities using vector arithmetic, but the origin of these regularities has remained opaque. We analyze and make explicit the model properties needed for such regularities to emerge in word vectors. The result is a new global logbilinear regression model that combines the advantages of the two major model families in the literature: global matrix factorization and local context window methods. Our model efficiently leverages statistical information by training only on the nonzero elements in a wordword cooccurrence matrix, rather than on the entire sparse matrix or on individual context windows in a large corpus. The model produces a vector space with meaningful substructure, as evidenced by its performance of 75% on a recent word analogy task. It also outperforms related models on similarity tasks and named entity recognition. 
Globally Improved ANT (GIANT) 
For distributed computing environments, we consider the canonical machine learning problem of empirical risk minimization (ERM) with quadratic regularization, and we propose a distributed and communicationefficient Newtontype optimization method. At every iteration, each worker locally finds an Approximate NewTon (ANT) direction, and then it sends this direction to the main driver. The driver, then, averages all the ANT directions received from workers to form a Globally Improved ANT (GIANT) direction. GIANT naturally exploits the tradeoffs between local computations and global communications in that more local computations result in fewer overall rounds of communications. GIANT is highly communication efficient in that, for $d$dimensional data uniformly distributed across $m$ workers, it has $4$ or $6$ rounds of communication and $O (d \log m)$ communication complexity per iteration. Theoretically, we show that GIANT’s convergence rate is faster than firstorder methods and existing distributed Newtontype methods. From a practical pointofview, a highly beneficial feature of GIANT is that it has only one tuning parameter—the iterations of the local solver for computing an ANT direction. This is indeed in sharp contrast with many existing distributed Newtontype methods, as well as popular first order methods, which have several tuning parameters, and whose performance can be greatly affected by the specific choices of such parameters. In this light, we empirically demonstrate the superior performance of GIANT compared with other competing methods. 
Glue Code  The term glue code is sometimes used to describe implementations of the adapter pattern. It does not serve any use in calculation or computation. Rather it serves as a proxy between otherwise incompatible parts of software, to make them compatible. The standard practice is to keep logic out of the glue code and leave that to the code blocks it connects to. 
Gnowee  This paper introduces Gnowee, a modular, Pythonbased, opensource hybrid metaheuristic optimization algorithm (Available from https://…/Gnowee ). Gnowee is designed for rapid convergence to nearly globally optimum solutions for complex, constrained nuclear engineering problems with mixedinteger and combinatorial design vectors and highcost, noisy, discontinuous, black box objective function evaluations. Gnowee’s hybrid metaheuristic framework is a new combination of a set of diverse, robust heuristics that appropriately balance diversification and intensification strategies across a wide range of optimization problems. This novel algorithm was specifically developed to optimize complex nuclear design problems; the motivating research problem was the design of material stackups to modify neutron energy spectra to specific targeted spectra for applications in nuclear medicine, technical nuclear forensics, nuclear physics, etc. However, there are a wider range of potential applications for this algorithm both within the nuclear community and beyond. To demonstrate Gnowee’s behavior for a variety of problem types, comparisons between Gnowee and several wellestablished metaheuristic algorithms are made for a set of eighteen continuous, mixedinteger, and combinatorial benchmarks. These results demonstrate Gnoweee to have superior flexibility and convergence characteristics over a wide range of design spaces. We anticipate this wide range of applicability will make this algorithm desirable for many complex engineering applications. 
Gnu Regression Econometrics and TimeSeries Library (gretl) 
Is a crossplatform software package for econometric analysis, written in the C programming language. It is free, opensource software. You may redistribute it and/or modify it under the terms of the GNU General Public License (GPL) as published by the Free Software Foundation. 
GNU Scientific Library (GSL) 
The GNU Scientific Library (GSL) is a numerical library for C and C++ programmers. It is free software under the GNU General Public License. The library provides a wide range of mathematical routines such as random number generators, special functions and leastsquares fitting. There are over 1000 functions in total with an extensive test suite. RcppGSL 
Goal Oriented Optimal Design of Experiments (GOODE) 
We develop a framework for goal oriented optimal design of experiments (GOODE) for largescale Bayesian linear inverse problems governed by PDEs. This framework differs from classical Bayesian optimal design of experiments (ODE) in the following sense: we seek experimental designs that minimize the posterior uncertainty in a predicted quantity of interest (QoI) rather than the estimated parameter itself. This is suitable for scenarios in which the solution of an inverse problem is an intermediate step and the estimated parameter is then used to compute a prediction QoI. In such problems, a GOODE approach has two benefits: the designs can avoid wastage of experimental resources by a targeted collection of data, and the resulting design criteria are computationally easier to evaluate due to the often low dimensionality of prediction QoIs. We present two modified design criteria, AGOODE and DGOODE, which are natural analogues of classical Bayesian A and Doptimal criteria. We analyze the connections to other ODE criteria, and provide interpretations for the GOODE criteria by using tools from information theory. Then, we develop an efficient gradientbased optimization framework for solving the GOODE optimization problems. Additionally, we present comprehensive numerical experiments testing the various aspects of the presented approach. The driving application is the optimal placement of sensors to identify the source of contaminants in a diffusion and transport problem. We enforce sparsity of the sensor placements using an $\ell_1$norm penalty approach, and propose a practical strategy for specifying the associated penalty parameter. 
Google AI  At Google AI, we’re conducting research that advances the stateoftheart in the field, applying AI to products and to new domains, and developing tools to ensure that everyone can access AI. Google’s mission is to organize the world’s information and make it universally accessible and useful. AI is helping us do that in exciting new ways, solving problems for our users, our customers, and the world. AI is making it easier for people to do things every day, whether it’s searching for photos of loved ones, breaking down language barriers in Google Translate, typing emails on the go, or getting things done with the Google Assistant. AI also provides new ways of looking at existing problems, from rethinking healthcare to advancing scientific discovery. 
Google Brain Project  Google Brain is an unofficial name for a deep learning research project at Google. 
Google Cloud Dataflow  Simplified stream and batch data processing, with equal reliability and expressiveness. Cloud Dataflow is a fullymanaged service for transforming and enriching data in stream (real time) and batch (historical) modes with equal reliability and expressiveness — no more complex workarounds or compromises needed. And with its serverless approach to resource provisioning and management, you have access to virtually limitless capacity to solve your biggest data processing challenges, while paying only for what you use. Cloud Dataflow unlocks transformational use cases across industries, including: • check Clickstream, PointofSale, and segmentation analysis in retail • check Fraud detection in financial services • check Personalized user experience in gaming • check IoT analytics in manufacturing, healthcare, and logistics 
Google Prediction API  Google’s cloudbased machine learning tools. Google’s machine learning algorithms to analyze data and predict future outcomes using a familiar RESTful interface. 
GossipGraD  In this paper, we present GossipGraD – a gossip communication protocol based Stochastic Gradient Descent (SGD) algorithm for scaling Deep Learning (DL) algorithms on largescale systems. The salient features of GossipGraD are: 1) reduction in overall communication complexity from {\Theta}(log(p)) for p compute nodes in wellstudied SGD to O(1), 2) model diffusion such that compute nodes exchange their updates (gradients) indirectly after every log(p) steps, 3) rotation of communication partners for facilitating direct diffusion of gradients, 4) asynchronous distributed shuffle of samples during the feedforward phase in SGD to prevent overfitting, 5) asynchronous communication of gradients for further reducing the communication cost of SGD and GossipGraD. We implement GossipGraD for GPU and CPU clusters and use NVIDIA GPUs (Pascal P100) connected with InfiniBand, and Intel Knights Landing (KNL) connected with Aries network. We evaluate GossipGraD using wellstudied dataset ImageNet1K (~250GB), and widely studied neural network topologies such as GoogLeNet and ResNet50 (current winner of ImageNet Large Scale Visualization Research Challenge (ILSVRC)). Our performance evaluation using both KNL and Pascal GPUs indicates that GossipGraD can achieve perfect efficiency for these datasets and their associated neural network topologies. Specifically, for ResNet50, GossipGraD is able to achieve ~100% compute efficiency using 128 NVIDIA Pascal P100 GPUs – while matching the top1 classification accuracy published in literature. 
GouldenJackson Cluster Method  Finding the generating function for the number of words avoiding, as factors, the members of a prescribed set of ‘dirty words’. 
Gower’s Distance  Idea: Use distance measure between 0 and 1 for each variable and aggregate. gower 
GPflowOpt  A novel Python framework for Bayesian optimization known as GPflowOpt is introduced. The package is based on the popular GPflow library for Gaussian processes, leveraging the benefits of TensorFlow including automatic differentiation, parallelization and GPU computations for Bayesian optimization. Design goals focus on a framework that is easy to extend with custom acquisition functions and models. The framework is thoroughly tested and well documented, and provides scalability. The current released version of GPflowOpt includes some standard singleobjective acquisition functions, the stateoftheart maxvalue entropy search, as well as a Bayesian multiobjective approach. Finally, it permits easy use of custom modeling strategies implemented in GPflow. 
GPU Open Analytics Initiative (GOAI) 
Recently, Continuum Analytics, H2O.ai, and MapD announced the formation of the GPU Open Analytics Initiative (GOAI). GOAI—also joined by BlazingDB, Graphistry and the Gunrock project from the University of California, Davis—aims to create open frameworks that allow developers and data scientists to build applications using standard data formats and APIs on GPUs. Bringing standard analytics data formats to GPUs will allow data analytics to be even more efficient, and to take advantage of the high throughput of GPUs. NVIDIA believes this initiative is a key contributor to the continued growth of GPU computing in accelerated analytics. 
Grabit Model (Grabit) 
We introduce a novel model which is obtained by applying gradient tree boosting to the Tobit model. The so called Grabit model allows for modeling data that consist of a mixture of a continuous part and discrete point masses at the borders. Examples of this include censored data, fractional response data, corner solution response data, rainfall data, and binary classification data where additional information, that is related to the underlying classification mechanism, is available. In contrast to the Tobit model, the Grabit model can account for general forms of nonlinearities and interactions, it is robust against outliers in covariates and scale invariant to monotonic transformations for the covariates, and its predictive performance is not impaired by multicollinearity. We apply the Grabit model for predicting defaults on loans made to Swiss small and mediumsized enterprises (SME), and we obtain a large improvement in predictive performance compared to other stateoftheart approaches. 
Gradient Boosted Regression Trees (GBRT) 

Gradient Boosting (GBDT,MART,TreeNet,BTE) 
Gradient boosting is a machine learning technique for regression problems, which produces a prediction model in the form of an ensemble of weak prediction models, typically decision trees. It builds the model in a stagewise fashion like other boosting methods do, and it generalizes them by allowing optimization of an arbitrary differentiable loss function. The gradient boosting method can also be used for classification problems by reducing them to regression with a suitable loss function. 
Gradient Boosting Machine  Gradient boosting machines are a family of powerful machinelearning techniques that have shown considerable success in a wide range of practical applications. They are highly customizable to the particular needs of the application, like being learned with respect to different loss functions. This article gives a tutorial introduction into the methodology of gradient boosting methods with a strong focus on machine learning aspects of modeling. A theoretical information is complemented with descriptive examples and illustrations which cover all the stages of the gradient boosting model design. Considerations on handling the model complexity are discussed. Three practical examples of gradient boosting applications are presented and comprehensively analyzed. gbm 
Gradient Normalization  Deep multitask networks, in which one neural network produces multiple predictive outputs, are more scalable and often better regularized than their singletask counterparts. Such advantages can potentially lead to gains in both speed and performance, but multitask networks are also difficult to train without finding the right balance between tasks. We present a novel gradient normalization (GradNorm) technique which automatically balances the multitask loss function by directly tuning the gradients to equalize task training rates. We show that for various network architectures, for both regression and classification tasks, and on both synthetic and real datasets, GradNorm improves accuracy and reduces overfitting over single networks, static baselines, and other adaptive multitask loss balancing techniques. GradNorm also matches or surpasses the performance of exhaustive grid search methods, despite only involving a single asymmetry hyperparameter $\alpha$. Thus, what was once a tedious search process which incurred exponentially more compute for each task added can now be accomplished within a few training runs, irrespective of the number of tasks. Ultimately, we hope to demonstrate that direct gradient manipulation affords us great control over the training dynamics of multitask networks and may be one of the keys to unlocking the potential of multitask learning. 
Gradient Projection Classical Sketch (GPCS) 

Gradient Projection Iterative Sketch (GPIS) 
We propose a randomized first order optimization algorithm Gradient Projection Iterative Sketch (GPIS) and an accelerated variant for efficiently solving large scale constrained Least Squares (LS). We provide theoretical convergence analysis for both proposed algorithms and demonstrate our methods’ computational efficiency compared to classical accelerated gradient method, and the state of the art variancereduced stochastic gradient methods through numerical experiments in various large synthetic/real data sets. 
Gradual Tuning  In this paper we present an alternative strategy for finetuning the parameters of a network. We named the technique Gradual Tuning. Once trained on a first task, the network is finetuned on a second task by modifying a progressively larger set of the network’s parameters. We test Gradual Tuning on different transfer learning tasks, using networks of different sizes trained with different regularization techniques. The result shows that compared to the usual fine tuning, our approach significantly reduces catastrophic forgetting of the initial task, while still retaining comparable if not better performance on the new task. 
Graduated Symbol Map  A map with symbols that change in size according to the value of the attribute they represent. For example, denser populations might be represented by larger dots, or larger rivers by thicker lines. 
Granger Causality  The Granger causality test is a statistical hypothesis test for determining whether one time series is useful in forecasting another, first proposed in 1969. Ordinarily, regressions reflect ‘mere’ correlations, but Clive Granger argued that causality in economics could be reflected by measuring the ability of predicting the future values of a time series using past values of another time series. Since the question of ‘true causality’ is deeply philosophical, econometricians assert that the Granger test finds only ‘predictive causality’. A time series X is said to Grangercause Y if it can be shown, usually through a series of ttests and Ftests on lagged values of X (and with lagged values of Y also included), that those X values provide statistically significant information about future values of Y. Granger also stressed that some studies using ‘Granger causality’ testing in areas outside economics reached ‘ridiculous’ conclusions. ‘Of course, many ridiculous papers appeared’, he said in his Nobel Lecture, December 8, 2003. However, it remains a popular method for causality analysis in time series due to its computational simplicity. The original definition of Granger causality does not account for latent confounding effects and does not capture instantaneous and nonlinear causal relationships, though several extensions have been proposed to address these issues. https://…/grangercausalitytest Cointegration & Granger Causality 
Granger Causality Network  We present a new framework for learning Granger causality networks for multivariate categorical time series, based on the mixture transition distribution (MTD) model. Traditionally, MTD is plagued by a nonconvex objective, nonidentifiability, and presence of many local optima. To circumvent these problems, we recast inference in the MTD as a convex problem. The new formulation facilitates the application of MTD to highdimensional multivariate time series. As a baseline, we also formulate a multioutput logistic autoregressive model (mLTD), which while a straightforward extension of autoregressive Bernoulli generalized linear models, has not been previously applied to the analysis of multivariate categorial time series. We develop novel identifiability conditions of the MTD model and compare them to those for mLTD. We further devise novel and efficient optimization algorithm for the MTD based on the new convex formulation, and compare the MTD and mLTD in both simulated and real data experiments. Our approach simultaneously provides a comparison of methods for network inference in categorical time series and opens the door to modern, regularized inference with the MTD model. 
GRap  Finding the best neural network configuration for a given goal can be challenging, especially when it is not possible to assess the output quality of a network automatically. We present GRap, an interactive interface based on Visual Analytics principles for comparing outputs of multiple RNNs for the same training data. GRap enables an iterative result generation process that allows a user to evaluate the outputs with contextual statistics. 
Graph Attention Network (GAT) 
We present graph attention networks (GATs), novel neural network architectures that operate on graphstructured data, leveraging masked selfattentional layers to address the shortcomings of prior methods based on graph convolutions or their approximations. By stacking layers in which nodes are able to attend over their neighborhoods’ features, we enable (implicitly) specifying different weights to different nodes in a neighborhood, without requiring any kind of costly matrix operation (such as inversion) or depending on knowing the graph structure upfront. In this way, we address several key challenges of spectralbased graph neural networks simultaneously, and make our model readily applicable to inductive as well as transductive problems. Our GAT models have achieved stateoftheart results across three established transductive and inductive graph benchmarks: the Cora and Citeseer citation network datasets, as well as a proteinprotein interaction dataset (wherein test graphs are entirely unseen during training). 
Graph Based SemiSupervised Learning (GSSL) 

Graph Bayesian Optimization  Network structure optimization is a fundamental task in complex network analysis. However, almost all the research on Bayesian optimization is aimed at optimizing the objective functions with vectorial inputs. In this work, we first present a flexible framework, denoted graph Bayesian optimization, to handle arbitrary graphs in the Bayesian optimization community. By combining the proposed framework with graph kernels, it can take full advantage of implicit graph structural features to supplement explicit features guessed according to the experience, such as tags of nodes and any attributes of graphs. The proposed framework can identify which features are more important during the optimization process. We apply the framework to solve four problems including two evaluations and two applications to demonstrate its efficacy and potential applications. 
Graph Branch Distance (GBD) 
Graph similarity search is a common and fundamental operation in graph databases. One of the most popular graph similarity measures is the Graph Edit Distance (GED) mainly because of its broad applicability and high interpretability. Despite its prevalence, exact GED computation is proved to be NPhard, which could result in unsatisfactory computational efficiency on large graphs. However, exactly accurate search results are usually unnecessary for realworld applications especially when the responsiveness is far more important than the accuracy. Thus, in this paper, we propose a novel probabilistic approach to efficiently estimate GED, which is further leveraged for the graph similarity search. Specifically, we first take branches as elementary structures in graphs, and introduce a novel graph similarity measure by comparing branches between graphs, i.e., Graph Branch Distance (GBD), which can be efficiently calculated in polynomial time. Then, we formulate the relationship between GED and GBD by considering branch variations as the result ascribed to graph edit operations, and model this process by probabilistic approaches. By applying our model, the GED between any two graphs can be efficiently estimated by their GBD, and these estimations are finally utilized in the graph similarity search. Extensive experiments show that our approach has better accuracy, efficiency and scalability than other comparable methods in the graph similarity search over real and synthetic data sets. 
Graph Capsule Network (GCAPSCNN) 
Graph Convolutional Neural Networks (GCNNs) are the most recent exciting advancement in deep learning field and their applications are quickly spreading in multicrossdomains including bioinformatics, chemoinformatics, social networks, natural language processing and computer vision. In this paper, we expose and tackle some of the basic weaknesses of a GCNN model with a capsule idea presented in~\cite{hinton2011transforming} and propose our Graph Capsule Network (GCAPSCNN) model. In addition, we design our GCAPSCNN model to solve especially graph classification problem which current GCNN models find challenging. Through extensive experiments, we show that our proposed Graph Capsule Network can significantly outperforms both the existing stateofart deep learning methods and graph kernels on graph classification benchmark datasets. 
Graph Convolutional Network  This paper explores the recently proposed Graph Convolutional Network architecture proposed in (Kipf & Welling, 2016) The key points of their work is summarized and their results are reproduced. Graph regularization and alternative graph convolution approaches are explored. I find that explicit graph regularization was correctly rejected by (Kipf & Welling, 2016). I attempt to improve the performance of GCN by approximating a kstep transition matrix in place of the normalized graph laplacian, but I fail to find positive results. Nonetheless, the performance of several configurations of this GCN variation is shown for the Cora, Citeseer, and Pubmed datasets. 
Graph Convolutional Neural Network (Graph CNN) 
Graph Convolutional Neural Networks (Graph CNNs) are generalizations of classical CNNs to handle graph data such as molecular data, point could and social networks. Current filters in graph CNNs are built for fixed and shared graph structure. However, for most real data, the graph structures varies in both size and connectivity. The paper proposes a generalized and flexible graph CNN taking data of arbitrary graph structure as input. In that way a taskdriven adaptive graph is learned for each graph data while training. To efficiently learn the graph, a distance metric learning is proposed. Extensive experiments on nine graphstructured datasets have demonstrated the superior performance improvement on both convergence speed and predictive accuracy. 
Graph Cube  In a paper from the University of Illinois at UrbanaChampaign this time in collaboration with Microsoft and Google, a novel data warehousing model called Graph Cube is introduced. Based on a restricted graph model (e.g., no attributes on edges) introduced as multidimensional network (with the dimensions being the vertex attributes), they define the notion of an aggregate network (called cuboid). A graph cube constitutes then the set of all possible aggregations of the original network. 
Graph Database  In computing, a graph database is a database that uses graph structures with nodes, edges, and properties to represent and store data. A graph database is any storage system that provides indexfree adjacency. This means that every element contains a direct pointer to its adjacent elements and no index lookups are necessary. General graph databases that can store any graph are distinct from specialized graph databases such as triplestores and network databases. 
Graph Function Library  A graph abstraction layer with an objectoriented programming interface has been introduced, which enables the implementation of custom graph algorithms for example within a stored procedure. A set of parameterizable implementations of frequentlyused algorithms will be provided in the form of a Graph Function Library for application developers to choose from. 
Graph Information Criterion (GIC) 
statGraph 
Graph Information Ratio  We introduce the notion of information ratio $\text{Ir}(H/G)$ between two (simple, undirected) graphs $G$ and $H$, defined as the supremum of ratios $k/n$ such that there exists a mapping between the strong products $G^k$ to $H^n$ that preserves nonadjacency. Operationally speaking, the information ratio is the maximal number of source symbols per channel use that can be reliably sent over a channel with a confusion graph $H$, where reliability is measured w.r.t. a source confusion graph $G$. Various results are provided, including in particular lower and upper bounds on $\text{Ir}(H/G)$ in terms of different graph properties, inequalities and identities for behavior under strong product and disjoint union, relations to graph cores, and notions of graph criticality. Informally speaking, $\text{Ir}(H/G)$ can be interpreted as a measure of similarity between $G$ and $H$. We make this notion precise by introducing the concept of information equivalence between graphs, a more quantitative version of homomorphic equivalence. We then describe a natural partial ordering over the space of information equivalence classes, and endow it with a suitable metric structure that is contractive under the strong product. Various examples and open problems are discussed. 
Graph Learning  The construction of a meaningful graph topology plays a crucial role in the effective representation, processing, analysis and visualization of structured data. When a natural choice of the graph is not readily available from the datasets, it is thus desirable to infer or learn a graph topology from the data. In this tutorial overview, we survey solutions to the problem of graph learning, including classical viewpoints from statistics and physics, and more recent approaches that adopt a graph signal processing (GSP) perspective. We further emphasize the conceptual similarities and differences between classical and GSP graph inference methods and highlight the potential advantage of the latter in a number of theoretical and practical scenarios. We conclude with several open issues and challenges that are keys to the design of future signal processing and machine learning algorithms for learning graphs from data. 
Graph Neural Network (GNN) 
Many underlying relationships among data in several areas of science and engineering, e.g., computer vision, molecular chemistry, molecular biology, pattern recognition, and data mining, can be represented in terms of graphs. In this paper, we propose a new neural network model, called graph neural network (GNN) model, that extends existing neural network methods for processing the data represented in graph domains. This GNN model, which can directly process most of the practically useful types of graphs, e.g., acyclic, cyclic, directed, and undirected, implements a function tau(G,n) isin IRm that maps a graph G and one of its nodes n into an mdimensional Euclidean space. A supervised learning algorithm is derived to estimate the parameters of the proposed GNN model. The computational cost of the proposed algorithm is also considered. Some experimental results are shown to validate the proposed learning algorithm, and to demonstrate its generalization capabilities. 
Graph Partition Neural Network  We present graph partition neural networks (GPNN), an extension of graph neural networks (GNNs) able to handle extremely large graphs. GPNNs alternate between locally propagating information between nodes in small subgraphs and globally propagating information between the subgraphs. To efficiently partition graphs, we experiment with several partitioning algorithms and also propose a novel variant for fast processing of large scale graphs. We extensively test our model on a variety of semisupervised node classification tasks. Experimental results indicate that GPNNs are either superior or comparable to stateoftheart methods on a wide variety of datasets for graphbased semisupervised classification. We also show that GPNNs can achieve similar performance as standard GNNs with fewer propagation steps. 
Graph Processing Framework for Large Dynamic Graphs (BLADYG) 
Recently, distributed processing of large dynamic graphs has become very popular, especially in certain domains such as social network analysis, Web graph analysis and spatial network analysis. In this context, many distributed/parallel graph processing systems have been proposed, such as Pregel, GraphLab, and Trinity. These systems can be divided into two categories: (1) vertexcentric and (2) blockcentric approaches. In vertexcentric approaches, each vertex corresponds to a process, and message are exchanged among vertices. In blockcentric approaches, the unit of computation is a block, a connected subgraph of the graph, and message exchanges occur among blocks. In this paper, we are considering the issues of scale and dynamism in the case of blockcentric approaches. We present bladyg, a blockcentric framework that addresses the issue of dynamism in largescale graphs. We present an implementation of BLADYG on top of akka framework. We experimentally evaluate the performance of the proposed framework. 
Graph sketchingbased Massive Data Clustering (DBMSTClu) 
In this paper, we address the problem of recovering arbitraryshaped data clusters from massive datasets. We present DBMSTClu a new densitybased nonparametric method working on a limited number of linear measurements i.e. a sketched version of the similarity graph $G$ between the $N$ objects to cluster. Unlike $k$means, $k$medians or $k$medoids algorithms, it does not fail at distinguishing clusters with particular structures. No input parameter is needed contrarily to DBSCAN or the Spectral Clustering method. DBMSTClu as a graphbased technique relies on the similarity graph $G$ which costs theoretically $O(N^2)$ in memory. However, our algorithm follows the dynamic semistreaming model by handling $G$ as a stream of edge weight updates and sketches it in one pass over the data into a compact structure requiring $O(\operatorname{poly} \operatorname{log} (N))$ space. Thanks to the property of the Minimum Spanning Tree (MST) for expressing the underlying structure of a graph, our algorithm successfully detects the right number of nonconvex clusters by recovering an approximate MST from the graph sketch of $G$. We provide theoretical guarantees on the quality of the clustering partition and also demonstrate its advantage over the existing stateoftheart on several datasets. 
Graph Structured Recurrent Neural Network (GSRNN) 
We present a generic framework for spatiotemporal (ST) data modeling, analysis, and forecasting, with a special focus on data that is sparse in both space and time. Our multiscaled framework is a seamless coupling of two major components: a selfexciting point process that models the macroscale statistical behaviors of the ST data and a graph structured recurrent neural network (GSRNN) to discover the microscale patterns of the ST data on the inferred graph. This novel deep neural network (DNN) incorporates the real time interactions of the graph nodes to enable more accurate real time forecasting. The effectiveness of our method is demonstrated on both crime and traffic forecasting. 
Graph2Seq  Celebrated \emph{Sequence to Sequence learning (Seq2Seq)} and its fruitful variants are powerful models to achieve excellent performance on the tasks that map sequences to sequences. However, these are many machine learning tasks with inputs naturally represented in a form of graphs, which imposes significant challenges to existing Seq2Seq models for lossless conversion from its graph form to the sequence. In this work, we present a general endtoend approach to map the input graph to a sequence of vectors, and then another attentionbased LSTM to decode the target sequence from these vectors. Specifically, to address inevitable information loss for data conversion, we introduce a novel graphtosequence neural network model that follows the encoderdecoder architecture. Our method first uses an improved graphbased neural network to generate the node and graph embeddings by a novel aggregation strategy to incorporate the edge direction information into the node embeddings. We also propose an attention based mechanism that aligns node embeddings and decoding sequence to better cope with large graphs. Experimental results on bAbI task, Shortest Path Task, and Natural Language Generation Task demonstrate that our model achieves the stateoftheart performance and significantly outperforms other baselines. We also show that with the proposed aggregation strategy, our proposed model is able to quickly converge to good performance. 
graph2vec  Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn datadriven distributed representations of arbitrary sized graphs. graph2vec’s embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and even seeding supervised representation learning approaches. Our experiments on several benchmark and large realworld datasets show that graph2vec achieves significant improvements in classification and clustering accuracies over substructure representation learning approaches and are competitive with stateoftheart graph kernels. 
Graphbased Activity Regularization (GAR) 
In this paper, we propose a novel graphbased approach for semisupervised learning problems, which considers an adaptive adjacency of the examples throughout the unsupervised portion of the training. Adjacency of the examples is inferred using the predictions of a neural network model which is first initialized by a supervised pretraining. These predictions are then updated according to a novel unsupervised objective which regularizes another adjacency, now linking the output nodes. Regularizing the adjacency of the output nodes, inferred from the predictions of the network, creates an easier optimization problem and ultimately provides that the predictions of the network turn into the optimal embedding. Ultimately, the proposed framework provides an effective and scalable graphbased solution which is natural to the operational mechanism of deep neural networks. Our results show stateoftheart performance within semisupervised learning with the highest accuracies reported to date in the literature for SVHN and NORB datasets. 
GraphBased Collaborative Filtering (GCF) 
Introducing consumed items as users’ implicit feedback in matrix factorization (MF) method, SVD++ is one of the most effective collaborative filtering methods for personalized recommender systems. Though powerful, SVD++ has two limitations: (i). only userside implicit feedback is utilized, whereas itemside implicit feedback, which can also enrich item representations, is not leveraged;(ii). in SVD++, the interacted items are equally weighted when combining the implicit feedback, which can not reflect user’s true preferences accurately. To tackle the above limitations, in this paper we propose Graphbased collaborative filtering (GCF) model, Weighted Graphbased collaborative filtering (WGCF) model and Attentive Graphbased collaborative filtering (AGCF) model, which (i). generalize the implicit feedback to item side based on the useritem bipartite graph; (ii). flexibly learn the weights of individuals in the implicit feedback hence improve the model’s capacity. Comprehensive experiments show that our proposed models outperform stateoftheart models.For sparse implicit feedback scenarios, additional improvement is further achieved by leveraging the steptwo implicit feedback information. 
GraphBLAS  An effort to define standard building blocks for Graph Algorithms in the language of Linear Algebra. 
GraphConnect  Deep neural networks have proved very successful in domains where large training sets are available, but when the number of training samples is small, their performance suffers from overfitting. Prior methods of reducing overfitting such as weight decay, Dropout and DropConnect are dataindependent. This paper proposes a new method, GraphConnect, that is datadependent, and is motivated by the observation that data of interest lie close to a manifold. The new method encourages the relationships between the learned decisions to resemble a graph representing the manifold structure. Essentially GraphConnect is designed to learn attributes that are present in data samples in contrast to weight decay, Dropout and DropConnect which are simply designed to make it more difficult to fit to random error or noise. Empirical Rademacher complexity is used to connect the generalization error of the neural network to spectral properties of the graph learned from the input data. This framework is used to show that GraphConnect is superior to weight decay. Experimental results on several benchmark datasets validate the theoretical analysis, and show that when the number of training samples is small, GraphConnect is able to significantly improve performance over weight decay. 
GraphGAN  The goal of graph representation learning is to embed each vertex in a graph into a lowdimensional vector space. Existing graph representation learning methods can be classified into two categories: generative models that learn the underlying connectivity distribution in the graph, and discriminative models that predict the probability of edge existence between a pair of vertices. In this paper, we propose GraphGAN, an innovative graph representation learning framework unifying above two classes of methods, in which the generative model and discriminative model play a gametheoretical minimax game. Specifically, for a given vertex, the generative model tries to fit its underlying true connectivity distribution over all other vertices and produces ‘fake’ samples to fool the discriminative model, while the discriminative model tries to detect whether the sampled vertex is from ground truth or generated by the generative model. With the competition between these two models, both of them can alternately and iteratively boost their performance. Moreover, when considering the implementation of generative model, we propose a novel graph softmax to overcome the limitations of traditional softmax function, which can be proven satisfying desirable properties of normalization, graph structure awareness, and computational efficiency. Through extensive experiments on realworld datasets, we demonstrate that GraphGAN achieves substantial gains in a variety of applications, including link prediction, node classification, and recommendation, over stateoftheart baselines. 
GraphGuided Fused LASSO (GFLASSO) 
Let X be a matrix of size n × p , with n observations and p predictors and Y a matrix of size n × k, with the same n observations and k responses, say, 1390 distinct electronics purchase records in 73 countries, to predict the ratings of 50 Netflix productions over all 73 countries. Models well poised for modeling pairs of highdimensional datasets include orthogonal twoway Partial Least Squares (O2PLS), Canonical Correlation Analysis (CCA) and CoInertia Analysis (CIA), all of which involving matrix decomposition. Additionally, since these models are based on latent variables (that is, projections based on the original predictors), the computational efficiency comes at a cost of interpretability. However, this tradeoff does not always pay off, and can be reverted with the direct prediction of k individual responses from selected features in X, in a unified regression framework that takes into account the relationships among the responses. Mathematically, the GFLASSO borrows the regularization of the LASSO discussed above and builds the model on the graph dependency structure underlying Y, as quantified by the k × k correlation matrix (that is the ‘strength of association’ that you read about earlier). As a result, similar (or dissimilar) responses will be explained by a similar (or dissimilar) subset of selected predictors. 
GraphH  It is common for realworld applications to analyze big graphs using distributed graph processing systems. Popular inmemory systems require an enormous amount of resources to handle big graphs. While several outofcore systems have been proposed recently for processing big graphs using secondary storage, the high disk I/O overhead could significantly reduce performance. In this paper, we propose GraphH to enable high performance big graph analytics in small clusters. Specifically, we design a twostage graph partition scheme to evenly divide the input graph into partitions, and propose a GAB (GatherApply Broadcast) computation model to make each worker process a partition in memory at a time. We use an edge cache mechanism to reduce the disk I/O overhead, and design a hybrid strategy to improve the communication performance. GraphH can efficiently process big graphs in small clusters or even a single commodity server. Extensive evaluations have shown that GraphH could be up to 7.8x faster compared to popular inmemory systems, such as Pregel+ and PowerGraph when processing generic graphs, and more than 100x faster than recently proposed outofcore systems, such as GraphD and Chaos when processing big graphs. 
Graphical Causal Models  A species of the broader genus of graphical models, especially intended to help with problems of causal inference. 
Graphical Generative Adversarial Network (GraphicalGAN) 
We propose Graphical Generative Adversarial Networks (GraphicalGAN) to model structured data. GraphicalGAN conjoins the power of Bayesian networks on compactly representing the dependency structures among random variables and that of generative adversarial networks on learning expressive dependency functions. We introduce a structured recognition model to infer the posterior distribution of latent variables given observations. We propose two alternative divergence minimization approaches to learn the generative model and recognition model jointly. The first one treats all variables as a whole, while the second one utilizes the structural information by checking the individual local factors defined by the generative model and works better in practice. Finally, we present two important instances of GraphicalGAN, i.e. Gaussian Mixture GAN (GMGAN) and State Space GAN (SSGAN), which can successfully learn the discrete and temporal structures on visual datasets, respectively. 
Graphical Markov Models (GMM) 
A central aspect of statistical science is the assessment of dependence among stochastic variables. The familiar concepts of correlation, regression, and prediction are special cases, and identification of causal relationships ultimately rests on representations of multivariate dependence. Graphical Markov models (GMM) use graphs, either undirected, directed, or mixed, to represent multivariate dependences in a visual and computationally efficient manner. A GMM is usually constructed by specifying local dependences for each variable, equivalently, node of the graph in terms of its immediate neighbors and/or parents by means of undirected and/or directed edges. This simple local specification can represent a highly varied and complex system of multivariate dependences by means of the global structure of the graph, thereby obtaining efficiency in modeling, inference, and probabilistic calculations. For a fixed graph, equivalently model, the classical methods of statistical inference may be utilized. In many applied domains, however, such as expert systems for medical diagnosis or weather forecasting, or the analysis of geneexpression data, the graph is unknown and is itself the first goal of the analysis. This poses numerous challenges, including the following: • The numbers of possible graphs and models grow superexponentially in the number of variables. • Distinct graphs G may be Markov equivalent = statistically indistinguishable. • Conversely, the same graph may possess different Markov interpretations. ggm 
Graphical Model  A graphical model is a probabilistic model for which a graph denotes the conditional dependence structure between random variables. They are commonly used in probability theory, statistics – particularly Bayesian statistics – and machine learning. Generally, probabilistic graphical models use a graphbased representation as the foundation for encoding a complete distribution over a multidimensional space and a graph that is a compact or factorized representation of a set of independences that hold in the specific distribution. Two branches of graphical representations of distributions are commonly used, namely, Bayesian networks and Markov networks. Both families encompass the properties of factorization and independences, but they differ in the set of independences they can encode and the factorization of the distribution that they induce. 
Graphite  Graphs are a fundamental abstraction for modeling relational data. However, graphs are discrete and combinatorial in nature, and learning representations suitable for machine learning tasks poses statistical and computational challenges. In this work, we propose Graphite an algorithmic framework for unsupervised learning of representations over nodes in a graph using deep latent variable generative models. Our model is based on variational autoencoders (VAE), and differs from existing VAE frameworks for data modalities such as images, speech, and text in the use of graph neural networks for parameterizing both the generative model (i.e., decoder) and inference model (i.e., encoder). The use of graph neural networks directly incorporates inductive biases due to the spatial, local structure of graphs directly in the generative model. Moreover, we draw novel connections between graph neural networks and approximate inference via kernel embeddings of distributions. We demonstrate empirically that Graphite outperforms stateoftheart approaches for the tasks of density estimation, link prediction, and node classification on synthetic and benchmark datasets. 
GraphRNN  Modeling and generating graphs is fundamental for studying networks in biology, engineering, and social sciences. However, modeling complex distributions over graphs and then efficiently sampling from these distributions is challenging due to the nonunique, highdimensional nature of graphs and the complex, nonlocal dependencies that exist between edges in a given graph. Here we propose GraphRNN, a deep autoregressive model that addresses the above challenges and approximates any distribution of graphs with minimal assumptions about their structure. GraphRNN learns to generate graphs by training on a representative set of graphs and decomposes the graph generation process into a sequence of node and edge formations, conditioned on the graph structure generated so far. In order to quantitatively evaluate the performance of GraphRNN, we introduce a benchmark suite of datasets, baselines and novel evaluation metrics based on Maximum Mean Discrepancy, which measure distances between sets of graphs. Our experiments show that GraphRNN significantly outperforms all baselines, learning to generate diverse graphs that match the structural characteristics of a target set, while also scaling to graphs 50 times larger than previous deep models. 
GraphSAGE  Lowdimensional embeddings of nodes in large graphs have proved extremely useful in a variety of prediction tasks, from content recommendation to identifying protein functions. However, most existing approaches require that all nodes in the graph are present during training of the embeddings; these previous approaches are inherently transductive and do not naturally generalize to unseen nodes. Here we present GraphSAGE, a general, inductive framework that leverages node feature information (e.g., text attributes) to efficiently generate node embeddings for previously unseen data. Instead of training individual embeddings for each node, we learn a function that generates embeddings by sampling and aggregating features from a node’s local neighborhood. Our algorithm outperforms strong baselines on three inductive nodeclassification benchmarks: we classify the category of unseen nodes in evolving information graphs based on citation and Reddit post data, and we show that our algorithm generalizes to completely unseen graphs using a multigraph dataset of proteinprotein interactions. 
GraphSparse Logistic Regression  We introduce GraphSparse Logistic Regression, a new algorithm for classification for the case in which the support should be sparse but connected on a graph. We val idate this algorithm against synthetic data and benchmark it against L1regularized Logistic Regression. We then explore our technique in the bioinformatics context of proteomics data on the interactome graph. We make all our experimental code public and provide GSLR as an open source package. 
Graphviz  Graphviz (short for Graph Visualization Software) is a package of opensource tools initiated by AT&T Labs Research for drawing graphs specified in DOT language scripts. It also provides libraries for software applications to use the tools. Graphviz is free software licensed under the Eclipse Public License. https://…/viz.js 
GraphX  GraphX is Apache Spark’s API for graphs and graphparallel computation. 
Gravitational Clustering  The downfall of many supervised learning algorithms, such as neural networks, is the inherent need for a large amount of training data. Although there is a lot of buzz about big data, there is still the problem of doing classification from a small dataset. Other methods such as support vector machines, although capable of dealing with few samples, are inherently binary classifiers, and are in need of learning strategies such as One vs All in the case of multiclassification. In the presence of a large number of classes this can become problematic. In this paper we present, a novel approach to supervised learning through the method of clustering. Unlike traditional methods such as KMeans, Gravitational Clustering does not require the initial number of clusters, and automatically builds the clusters, individual samples can be arbitrarily weighted and it requires only few samples while staying resilient to overfitting. 
Greedy Algorithm  A greedy algorithm is an algorithm that follows the problem solving heuristic of making the locally optimal choice at each stage with the hope of finding a global optimum. In many problems, a greedy strategy does not in general produce an optimal solution, but nonetheless a greedy heuristic may yield locally optimal solutions that approximate a global optimal solution in a reasonable time. For example, a greedy strategy for the traveling salesman problem (which is of a high computational complexity) is the following heuristic: ‘At each stage visit an unvisited city nearest to the current city’. This heuristic need not find a best solution, but terminates in a reasonable number of steps; finding an optimal solution typically requires unreasonably many steps. In mathematical optimization, greedy algorithms solve combinatorial problems having the properties of s. 
Greedy Neural Architecture Search (GNAS) 
A key problem in deep multiattribute learning is to effectively discover the interattribute correlation structures. Typically, the conventional deep multiattribute learning approaches follow the pipeline of manually designing the network architectures based on taskspecific expertise prior knowledge and careful network tunings, leading to the inflexibility for various complicated scenarios in practice. Motivated by addressing this problem, we propose an efficient greedy neural architecture search approach (GNAS) to automatically discover the optimal treelike deep architecture for multiattribute learning. In a greedy manner, GNAS divides the optimization of global architecture into the optimizations of individual connections step by step. By iteratively updating the local architectures, the global treelike architecture gets converged where the bottom layers are shared across relevant attributes and the branches in top layers more encode attributespecific features. Experiments on three benchmark multiattribute datasets show the effectiveness and compactness of neural architectures derived by GNAS, and also demonstrate the efficiency of GNAS in searching neural architectures. 
Greedy Randomized Adaptive Search Procedures (GRASP) 
The greedy randomized adaptive search procedure (also known as GRASP) is a metaheuristic algorithm commonly applied to combinatorial optimization problems. GRASP typically consists of iterations made up from successive constructions of a greedy randomized solution and subsequent iterative improvements of it through a local search. The greedy randomized solutions are generated by adding elements to the problem’s solution set from a list of elements ranked by a greedy function according to the quality of the solution they will achieve. To obtain variability in the candidate set of greedy solutions, wellranked candidate elements are often placed in a restricted candidate list (also known as RCL), and chosen at random when building up the solution. This kind of greedy randomized construction method is also known as a semigreedy heuristic, first described in Hart and Shogan (1987). GRASP was first introduced in Feo and Resende (1989). Survey papers on GRASP include Feo and Resende (1995), Pitsoulis and Resende (2002), and Resende and Ribeiro (2003). An annotated bibliography of GRASP can be found in Festa, G. C Resende (2002). 
Greenhouse  Greenhouse – a zeropositive machine learning system for timeseries anomaly detection. 
Greenplum Database (GPDB) 
The Greenplum Database (GPDB) is an advanced, fully featured, open source data warehouse. It provides powerful and rapid analytics on petabyte scale data volumes. Uniquely geared toward big data analytics, Greenplum Database is powered by the world’s most advanced costbased query optimizer delivering high analytical query performance on large data volumes. The Greenplum project is released under the Apache 2 license. We want to thank all our current community contributors and are really interested in all new potential contributions. For the Greenplum Database community no contribution is too small, we encourage all types of contributions. 
Greenwald and Khanna Algorithm  Copulas for Streaming Data 
greta  greta lets us build statistical models interactively in R, and then sample from them by MCMC. We build greta models with greta array objects, which behave much like R’s array, matrix and vector objects for numeric data. Like those numeric data objects, greta arrays can be manipulated with functions and mathematical operators to create new greta arrays. The key difference between greta arrays and numeric data objects is that when you do something to a greta array, greta doesn’t calculate the values of the new greta array. Instead, it just remembers what operation to do, and works out the size and shape of the result. 
Grey Box Model  In mathematics, statistics, and computational modelling, a grey box model combines a partial theoretical structure with data to complete the model. The theoretical structure may vary from information on the smoothness of results, to models that need only parameter values from data or existing literature. Thus, almost all models are grey box models as opposed to black box where no model form is assumed or white box models that are purely theoretical. Some models assume a special form such as a linear regression or neural network. These have special analysis methods. In particular linear regression techniques are much more efficient than most nonlinear techniques. The model can be deterministic or stochastic (i.e. containing random components) depending on its planned use. 
Grey Machine Learning  A brief introduction to the Grey Machine Learning 
Grey Machine Learning Model based Variable Separable (VSGML) 
The Grey Machine Learning Model based Variable Separable (VSGML) is presented in this paper. The VSGML’s function set is composed of the variable separable function. The DivideandConquer architecture based Radial Basis Function (DCRBF) Network is constructed to implement VSGML. This DCRBF is composed of several subRBF networks which takes each subspace as its input. The output of DCRBF is the sum of each subRBF networks’ output. The algorithm of DCRBF is given and its approximation ability also is discussed in this paper. The experimental results have shown that the DCRBF is outperforms the conventional RBF. A Grey Machine Learning Model with application to time series. Available from: https://…ing_Model_with_application_to_time_series [accessed May 07 2018]. 
Grid Search  The de facto standard way of performing hyperparameter optimization is grid search, which is simply an exhaustive searching through a manually specified subset of the hyperparameter space of a learning algorithm. A grid search algorithm must be guided by some performance metric, typically measured by crossvalidation on the training set or evaluation on a heldout validation set. Since the parameter space of a machine learner may include realvalued or unbounded value spaces for certain parameters, manually set bounds and discretization may be necessary before applying grid search. 
GridNet  This paper presents GridNet, a new Convolutional Neural Network (CNN) architecture for semantic image segmentation (full scene labelling). Classical neural networks are implemented as one stream from the input to the output with subsampling operators applied in the stream in order to reduce the feature maps size and to increase the receptive field for the final prediction. However, for semantic image segmentation, where the task consists in providing a semantic class to each pixel of an image, feature maps reduction is harmful because it leads to a resolution loss in the output prediction. To tackle this problem, our GridNet follows a grid pattern allowing multiple interconnected streams to work at different resolutions. We show that our network generalizes many well known networks such as convdeconv, residual or UNet networks. GridNet is trained from scratch and achieves competitive results on the Cityscapes dataset. 
Gridster.js  Gridster is a jQuery plugin that allows building intuitive draggable layouts from elements spanning multiple columns. You can even dynamically add and remove elements from the grid. It is on par with sliced bread, or possibly better. MIT licensed. Drag and Drop Visuals in your Interactive Dashboard 
Grounded Recurrent Neural Network (GRNN) 
In this work, we present the Grounded Recurrent Neural Network (GRNN), a recurrent neural network architecture for multilabel prediction which explicitly ties labels to specific dimensions of the recurrent hidden state (we call this process ‘grounding’). The approach is particularly wellsuited for extracting large numbers of concepts from text. We apply the new model to address an important problem in healthcare of understanding what medical concepts are discussed in clinical text. Using a publicly available dataset derived from Intensive Care Units, we learn to label a patient’s diagnoses and procedures from their discharge summary. Our evaluation shows a clear advantage to using our proposed architecture over a variety of strong baselines. 
Group Fused Multinomial Regression  gfmR 
Group Method of Data Handling (GMDH) 
Group method of data handling (GMDH) is a family of inductive algorithms for computerbased mathematical modeling of multiparametric datasets that features fully automatic structural and parametric optimization of models. GMDH is used in such fields as data mining, knowledge discovery, prediction, complex systems modeling, optimization and pattern recognition. GMDH algorithms are characterized by inductive procedure that performs sortingout of gradually complicated polynomial models and selecting the best solution by means of the socalled external criterion. GMDH,GMDH2 
Group Normalization (GN) 
Batch Normalization (BN) is a milestone technique in the development of deep learning, enabling various networks to train. However, normalizing along the batch dimension introduces problems — BN’s error increases rapidly when the batch size becomes smaller, caused by inaccurate batch statistics estimation. This limits BN’s usage for training larger models and transferring features to computer vision tasks including detection, segmentation, and video, which require small batches constrained by memory consumption. In this paper, we present Group Normalization (GN) as a simple alternative to BN. GN divides the channels into groups and computes within each group the mean and variance for normalization. GN’s computation is independent of batch sizes, and its accuracy is stable in a wide range of batch sizes. On ResNet50 trained in ImageNet, GN has 10.6% lower error than its BN counterpart when using a batch size of 2; when using typical batch sizes, GN is comparably good with BN and outperforms other normalization variants. Moreover, GN can be naturally transferred from pretraining to finetuning. GN can outperform or compete with its BNbased counterparts for object detection and segmentation in COCO, and for video classification in Kinetics, showing that GN can effectively replace the powerful BN in a variety of tasks. GN can be easily implemented by a few lines of code in modern libraries. 
Grouped Merging Net (GMNet) 
Deep Convolutional Neural Networks (CNNs) are capable of learning unprecedentedly effective features from images. Some researchers have struggled to enhance the parameters’ efficiency using grouped convolution. However, the relation between the optimal number of convolutional groups and the recognition performance remains an open problem. In this paper, we propose a series of Basic Units (BUs) and a twolevel merging strategy to construct deep CNNs, referred to as a joint Grouped Merging Net (GMNet), which can produce joint grouped and reused deep features while maintaining the feature discriminability for classification tasks. Our GMNet architectures with the proposed BU_A (dense connection) and BU_B (straight mapping) lead to significant reduction in the number of network parameters and obtain performance improvement in image classification tasks. Extensive experiments are conducted to validate the superior performance of the GMNet than the stateofthearts on the benchmark datasets, e.g., MNIST, CIFAR10, CIFAR100 and SVHN. 
GroupFused Graphical Lasso (GFGL) 
We consider the consistency properties of a regularised estimator for the simultaneous identification of both changepoints and graphical dependency structure in multivariate timeseries. Traditionally, estimation of Gaussian Graphical Models (GGM) is performed in an i.i.d setting. More recently, such models have been extended to allow for changes in the distribution, but only where changepoints are known apriori. In this work, we study the GroupFused Graphical Lasso (GFGL) which penalises partialcorrelations with an L1 penalty while simultaneously inducing blockwise smoothness over time to detect multiple changepoints. We present a proof of consistency for the estimator, both in terms of changepoints, and the structure of the graphical models in each segment. 
GroupRemMap Penalty  Expression quantitative trait loci (eQTLs) are genomic loci that regulate expression levels of mRNAs or proteins. Understanding these regulatory provides important clues to biological pathways that underlie diseases. In this paper, we propose a new statistical method, GroupRemMap, for identifying eQTLs. We model the relationship between gene expression and single nucleotide variants (SNVs) through multivariate linear regression models, in which gene expression levels are responses and SNV genotypes are predictors. To handle the highdimensionality as well as to incorporate the intrinsic group structure of SNVs, we introduce a new regularization scheme to (1) control the overall sparsity of the model; (2) encourage the group selection of SNVs from the same gene; and (3) facilitate the detection of transhubeQTLs. We apply the proposed method to the colorectal and breast cancer data sets from The Cancer Genome Atlas (TCGA), and identify several biologically interesting eQTLs. These ndings may provide insight into biological processes associated with cancers and generate hypotheses for future studies. groupRemMap 
Growth Curve Analysis (GCA) 
Growth curve analysis (GCA) is a multilevel regression technique designed for analysis of time course or longitudinal data. A major advantage of this approach is that it can be used to simultaneously analyze both grouplevel effects (e.g., experimental manipulations) and individuallevel effects (i.e., individual differences). 
Growth Hacking  Growth hacking is a marketing technique developed by technology startups which uses creativity, analytical thinking, and social metrics to sell products and gain exposure. It can be seen as part of the online marketing ecosystem, as in many cases growth hackers are simply good at using techniques such as search engine optimization, website analytics, content marketing and A/B testing which are already mainstream. Growth hackers focus on lowcost and innovative alternatives to traditional marketing, e.g. utilizing social media and viral marketing instead of buying advertising through more traditional media such as radio, newspaper, and television. Growth hacking is particularly important for startups, as it allows for a ‘lean’ launch that focuses on ‘growth first, budgets second.’ Facebook, Twitter, LinkedIn, AirBnB and Dropbox are all companies that use growth hacking techniques. 
Grubbs Test  Grubbs’ test (named after Frank E. Grubbs, who published the test in 1950), also known as the maximum normed residual test or extreme studentized deviate test, is a statistical test used to detect outliers in a univariate data set assumed to come from a normally distributed population. outliers 
Guided Labeling  Over the last couple of years, deep learning and especially convolutional neural networks have become one of the work horses of computer vision. One limiting factor for the applicability of supervised deep learning to more areas is the need for large, manually labeled datasets. In this paper we propose an easy to implement method we call guided labeling, which automatically determines which samples from an unlabeled dataset should be labeled. We show that using this procedure, the amount of samples that need to be labeled is reduced considerably in comparison to labeling images arbitrarily. 
Guided Local Search (GLS) 
Guided Local Search is a metaheuristic search method. A metaheuristic method is a method that sits on top of a local search algorithm to change its behavior. Guided Local Search builds up penalties during a search. It uses penalties to help local search algorithms escape from local minimal and plateaus. When the given local search algorithm settles in a local optimum, GLS modifies the objective function using a specific scheme (explained below). Then the local search will operate using an augmented objective function, which is designed to bring the search out of the local optimum. The key is in the way that the objective function is modified. 
Advertisements