I Don’t Know – Prediction Cascades Framework |
Advances in deep learning have led to substantial increases in prediction accuracy as well as the cost of rendering predictions. We conjecture that for a majority of real-world inputs, the recent advances in deep learning have created models that effectively ‘over-think’ on simple inputs. In this paper we revisit the classic idea of prediction cascades to reduce prediction costs. We introduce the ‘I Don’t Know’ (IDK) prediction cascades framework, a general framework for constructing prediction cascades for arbitrary multi-class prediction tasks. We propose two baseline methods for constructing cascades as well as a new objective within this framework and evaluate these techniques on a range of benchmark and real-world datasets to demonstrate the prediction cascades can achieve 1.7-10.5x speedups in image classification tasks while maintaining comparable accuracy to state-of-the-art models. When combined with human experts, prediction cascades can achieve nearly perfect accuracy(within 5%) while requiring human intervention on less than 30% of the queries. |

Ibis |
Ibis is a new Python data analysis framework with the goal of enabling data scientists and data engineers to be as productive working with big data as they are working with small and medium data today. In doing so, we will enable Python to become a true first-class language for Apache Hadoop, without compromises in functionality, usability, or performance. Having spent much of the last decade improving the usability of the single-node Python experience (with pandas and other projects), we are looking to achieve: • 100% Python end-to-end user workflows • Native hardware speeds for a broad set of use cases • Full-fidelity data analysis without extractions or sampling • Scalability for big data • Integration with the existing Python data ecosystem (pandas, scikit-learn, NumPy, and so on) |

IDEBench |
Existing benchmarks for analytical database systems such as TPC-DS and TPC-H are designed for static reporting scenarios. The main metric of these benchmarks is the performance of running individual SQL queries over a synthetic database. In this paper, we argue that such benchmarks are not suitable for evaluating database workloads originating from interactive data exploration (IDE) systems where most queries are ad-hoc, not based on predefined reports, and built incrementally. As a main contribution, we present a novel benchmark called IDEBench that can be used to evaluate the performance of database systems for IDE workloads. As opposed to traditional benchmarks for analytical database systems, our goal is to provide more meaningful workloads and datasets that can be used to benchmark IDE query engines, with a particular focus on metrics that capture the trade-off between query performance and quality of the result. As a second contribution, this paper evaluates and discusses the performance results of selected IDE query engines using our benchmark. The study includes two commercial systems, as well as two research prototypes (IDEA, approXimateDB/XDB), and one traditional analytical database system (MonetDB). |

iFair |
People are rated and ranked, towards algorithmic decision making in an increasing number of applications, typically based on machine learning. Research on how to incorporate fairness into such tasks has prevalently pursued the paradigm of group fairness: ensuring that each ethnic or social group receives its fair share in the outcome of classifiers and rankings. In contrast, the alternative paradigm of individual fairness has received relatively little attention. This paper introduces a method for probabilistically clustering user records into a low-rank representation that captures individual fairness yet also achieves high accuracy in classification and regression models. Our notion of individual fairness requires that users who are similar in all task-relevant attributes such as job qualification, and disregarding all potentially discriminating attributes such as gender, should have similar outcomes. Since the case for fairness is ubiquitous across many tasks, we aim to learn general representations that can be applied to arbitrary downstream use-cases. We demonstrate the versatility of our method by applying it to classification and learning-to-rank tasks on two real-world datasets. Our experiments show substantial improvements over the best prior work for this setting. |

IllinoisSL |
IllinoisSL is a Java library for learning structured prediction models. It supports structured Support Vector Machines and structured Perceptron. The library consists of a core learning module and several applications, which can be executed from command-lines. Documentation is provided to guide users. In Comparison to other structured learning libraries, IllinoisSL is efficient, general, and easy to use. |

Image Processing Language for Performance Portability on Heterogeneous Systems( ImageCL) |
Modern computer systems typically conbine multicore CPUs with accelerators like GPUs for inproved performance and energy efficiency. However, these sys- tems suffer from poor performance portability, code tuned for one device must be retuned to achieve high performance on another. Image processing is increas- ing in importance , with applications ranging from seismology and medicine to Photoshop. Based on our experience with medical image processing, we propose ImageCL, a high-level domain-specific language and source-to-source compiler, targeting heterogeneous hardware. ImageCL resembles OpenCL, but abstracts away per- formance optimization details, allowing the programmer to focus on algorithm development, rather than performance tuning. The latter is left to our source-to- source compiler and auto-tuner. From high-level ImageCL kernels, our source- to-source compiler can generate multiple OpenCL implementations with different optimizations applied. We rely on auto-tuning rather than machine models or ex- pert programmer knowledge to determine which optimizations to apply, making our tuning procedure highly robust. Furthermore, we can generate high perform- ing implementations for different devices from a single source code, thereby im- proving performance portability. We evaluate our approach on three image processing benchmarks, on different GPU and CPU devices, and are able to outperform other state of the art solutions in several cases, achieving speedups of up to 4.57x. |

Image-Text-Image( I2T2I) |
Translating information between text and image is a fundamental problem in artificial intelligence that connects natural language processing and computer vision. In the past few years, performance in image caption generation has seen significant improvement through the adoption of recurrent neural networks (RNN). Meanwhile, text-to-image generation begun to generate plausible images using datasets of specific categories like birds and flowers. We’ve even seen image generation from multi-category datasets such as the Microsoft Common Objects in Context (MSCOCO) through the use of generative adversarial networks (GANs). Synthesizing objects with a complex shape, however, is still challenging. For example, animals and humans have many degrees of freedom, which means that they can take on many complex shapes. We propose a new training method called Image-Text-Image (I2T2I) which integrates text-to-image and image-to-text (image captioning) synthesis to improve the performance of text-to-image synthesis. We demonstrate that %the capability of our method to understand the sentence descriptions, so as to I2T2I can generate better multi-categories images using MSCOCO than the state-of-the-art. We also demonstrate that I2T2I can achieve transfer learning by using a pre-trained image captioning module to generate human images on the MPII Human Pose |

Imagination-Augmented Agents( I2A) |
We introduce Imagination-Augmented Agents (I2As), a novel architecture for deep reinforcement learning combining model-free and model-based aspects. In contrast to most existing model-based reinforcement learning and planning methods, which prescribe how a model should be used to arrive at a policy, I2As learn to interpret predictions from a learned environment model to construct implicit plans in arbitrary ways, by using the predictions as additional context in deep policy networks. I2As show improved data efficiency, performance, and robustness to model misspecification compared to several baselines. |

Imitation Learning |
Learning from Demonstration’: Imitation learning, a.k.a behavioral cloning, is learning from demonstration. In other words, in imitation learning, a machine learns how to behave by looking at what a teacher (or expert) does and then mimics that behavior. An example can be when we collect driving data from human and then use that data for a self driving car. Imitation Learning in Tensorflow |

Imitation Network |
In this paper, we propose imitation networks, a simple but effective method for training neural networks with a limited amount of training data. Our approach inherits the idea of knowledge distillation that transfers knowledge from a deep or wide reference model to a shallow or narrow target model. The proposed method employs this idea to mimic predictions of reference estimators that are much more robust against overfitting than the network we want to train. Different from almost all the previous work for knowledge distillation that requires a large amount of labeled training data, the proposed method requires only a small amount of training data. Instead, we introduce pseudo training examples that are optimized as a part of model parameters. Experimental results for several benchmark datasets demonstrate that the proposed method outperformed all the other baselines, such as naive training of the target model and standard knowledge distillation. |

Imperialist Competitive Algorithm( ICA) |
In computer science, Imperialist Competitive Algorithm (ICA) is a computational method that is used to solve optimization problems of different types. Like most of the methods in the area of evolutionary computation, ICA does not need the gradient of the function in its optimization process. From a specific point of view, ICA can be thought of as the social counterpart of genetic algorithms (GAs). ICA is the mathematical model and the computer simulation of human social evolution, while GAs are based on the biological evolution of species. ICAFF,ICAOD |

Implicit Association Test( IAT) |
The implicit-association test (IAT) is a measure within social psychology designed to detect the strength of a person’s automatic association between mental representations of objects (concepts) in memory. The IAT was introduced in the scientific literature in 1998 by Anthony Greenwald, Debbie McGhee, Joyce Sherry, and Jordan Schwartz. The IAT is now widely used in social psychology research and is used to some extent in clinical, cognitive, and developmental psychology research. Although some controversy still exists regarding the IAT and what it measures, much research into its validity and psychometric properties has been conducted since its introduction into the literature. IATscores |

Implicit Regression |
In 2011, Wooten introduced Non-Response Analysis the founding theory in Implicit Regression where Implicit Regression treats the variables implicitly as codependent variables and not as an explicit function with dependent or independent variables as in standard regression. The motivation of this paper is to introduce methods of implicit regression to determine the constant nature of a variable or the interactive term, and address inverse relationship among measured variables with random error present in both directions. |

Import Vector Machines |
The Import Vector Machines (Zhu and Hastie 2005) are a sparse, discriminative and probabilistic classifier. The algorithm is based on the Kernel Logistic Regression model, but uses only a few data points to define the decision hyperplane in the feature space. These data points are called import vectors. The Import Vector Machine shows similar results to the widely used Support Vector Machine, but has a probabilistic output. |

Importance Sampling |
In statistics, importance sampling is a general technique for estimating properties of a particular distribution, while only having samples generated from a different distribution than the distribution of interest. It is related to umbrella sampling in computational physics. Depending on the application, the term may refer to the process of sampling from this alternative distribution, the process of inference, or both. |

Importance Weighted Autoencoder( IWAE) |
The variational autoencoder (VAE; Kingma, Welling (2014)) is a recently proposed generative model pairing a top-down generative network with a bottom-up recognition network which approximates posterior inference. It makes two strong assumptions about posterior inference: that the posterior distribution is approximately factorial, and that its parameters can be approximated with nonlinear regression from the observations. As we show empirically, the VAE objective can lead to overly simplified representations which fail to use the network’s entire modeling capacity. We present the importance weighted autoencoder (IWAE), a generative model with the same architecture as the VAE, but which uses a strictly tighter log-likelihood lower bound derived from importance weighting. In the IWAE, the recognition network uses multiple samples to approximate the posterior, giving it increased flexibility to model complex posteriors which do not fit the VAE modeling assumptions. We show empirically that IWAEs learn richer latent space representations than VAEs, leading to improved test log-likelihood on density estimation benchmarks. GitXiv |

Importance-Weighted Actor Learner Architecture( IMPALA) |
In this work we aim to solve a large collection of tasks using a single reinforcement learning agent with a single set of parameters. A key challenge is to handle the increased amount of data and extended training time, which is already a problem in single task learning. We have developed a new distributed agent IMPALA (Importance-Weighted Actor Learner Architecture) that can scale to thousands of machines and achieve a throughput rate of 250,000 frames per second. We achieve stable learning at high throughput by combining decoupled acting and learning with a novel off-policy correction method called V-trace, which was critical for achieving learning stability. We demonstrate the effectiveness of IMPALA for multi-task reinforcement learning on DMLab-30 (a set of 30 tasks from the DeepMind Lab environment (Beattie et al., 2016)) and Atari-57 (all available Atari games in Arcade Learning Environment (Bellemare et al., 2013a)). Our results show that IMPALA is able to achieve better performance than previous agents, use less data and crucially exhibits positive transfer between tasks as a result of its multi-task approach. |

Imputation |
In statistics, imputation is the process of replacing missing data with substituted values. When substituting for a data point, it is known as “unit imputation”; when substituting for a component of a data point, it is known as “item imputation”. Because missing data can create problems for analyzing data, imputation is seen as a way to avoid pitfalls involved with listwise deletion of cases that have missing values. That is to say, when one or more values are missing for a case, most statistical packages default to discarding any case that has a missing value, which may introduce bias or affect the representativeness of the results. Imputation preserves all cases by replacing missing data with a probable value based on other available information. Once all missing values have been imputed, the data set can then be analysed using standard techniques for complete data. |

Imputation Regularized Optimization( IRO) |
Missing data are frequently encountered in high-dimensional data analysis, but they are usually difficult to deal with using standard algorithms, such as the EM algorithm and its variants. You can refer to Liang, F., Jia, B., Xue, J., Li, Q. and Luo, Y. (2018) at <https://…/ica10.pdf> for detail. The publication ‘An Imputation Regularized Optimization Algorithm for High-Dimensional Missing Data Problems and Beyond’ will be appear on Journal of the Royal Statistical Society Series B soon. IROmiss |

Incident Analytics |
http://…through-big-data-predictive-analytics.pdf |

Incremental Classifier and Representation Learning( iCaRL) |
A major open problem on the road to artificial intelligence is the development of incrementally learning systems that learn about more and more concepts over time from a stream of data. In this work, we introduce a new training strategy, iCaRL, that allows learning in such a class-incremental way: only the training data for a small number of classes has to be present at the same time and new classes can be added progressively. iCaRL learns strong classifiers and a data representation simultaneously. This distinguishes it from earlier works that were fundamentally limited to fixed data representations and therefore incompatible with deep learning architectures. We show by experiments on the CIFAR-100 and ImageNet ILSVRC 2012 datasets that iCaRL can learn many classes incrementally over a long period of time where other strategies quickly fail. |

Incremental Decision Tree |
An incremental decision tree algorithm is an online machine learning algorithm that outputs a decision tree. Many decision tree methods, such as C4.5, construct a tree using a complete dataset. Incremental decision tree methods allow an existing tree to be updated using only new individual data instances, without having to re-process past instances. This may be useful in situations where the entire dataset is not available when the tree is updated (i.e. the data was not stored), the original data set is too large to process or the characteristics of the data change over time. |

Incremental IRL( I2RL) |
Inverse reinforcement learning (IRL) is the problem of learning the preferences of an agent from the observations of its behavior on a task. While this problem has been well investigated, the related problem of {\em online} IRL—where the observations are incrementally accrued, yet the demands of the application often prohibit a full rerun of an IRL method—has received relatively less attention. We introduce the first formal framework for online IRL, called incremental IRL (I2RL), and a new method that advances maximum entropy IRL with hidden variables, to this setting. Our formal analysis shows that the new method has a monotonically improving performance with more demonstration data, as well as probabilistically bounded error, both under full and partial observability. Experiments in a simulated robotic application of penetrating a continuous patrol under occlusion shows the relatively improved performance and speed up of the new method and validates the utility of online IRL. |

Incremental Kernel PCA |
Incremental versions of batch algorithms are often desired, for increased time efficiency in the streaming data setting, or increased memory efficiency in general. In this paper we present a novel algorithm for incremental kernel PCA, based on rank one updates to the eigendecomposition of the kernel matrix, which is more computationally efficient than comparable existing algorithms. We extend our algorithm to incremental calculation of the Nystr\’om approximation to the kernel matrix, the first such algorithm proposed. Incremental calculation of the Nystr\’om approximation leads to further gains in memory efficiency, and allows for empirical evaluation of when a subset of sufficient size has been obtained. |

Incremental Sequence Learning |
Deep learning research over the past years has shown that by increasing the scope or difficulty of the learning problem over time, increasingly complex learning problems can be addressed. We study incremental learning in the context of sequence learning, using generative RNNs in the form of multi-layer recurrent Mixture Density Networks. We introduce Incremental Sequence Learning, a simple incremental approach to sequence learning. Incremental Sequence Learning starts out by using only the first few steps of each sequence as training data. Each time a performance criterion has been reached, the length of the parts of the sequences used for training is increased. To evaluate Incremental Sequence Learning and comparison methods, we introduce and make available a novel sequence learning task and data set: predicting and classifying MNIST pen stroke sequences, where the familiar handwritten digit images have been transformed to pen stroke sequences representing the skeletons of the digits. We find that Incremental Sequence Learning greatly speeds up sequence learning and reaches the best test performance level of regular sequence learning 20 times faster, reduces the test error by 74%, and in general performs more robustly; it displays lower variance and achieves sustained progress after all three comparison method have stopped improving. A trained sequence prediction model is also used in transfer learning to the task of sequence classification, where it is found that transfer learning realizes improved classification performance compared to methods that learn to classify from scratch. |

In-Database Entity Linking( IDEL) |
We present a novel architecture, In-Database Entity Linking (IDEL), in which we integrate the analytics-optimized RDBMS MonetDB with neural text mining abilities. Our system design abstracts core tasks of most neural entity linking systems for MonetDB. To the best of our knowledge, this is the first defacto implemented system integrating entity-linking in a database. We leverage the ability of MonetDB to support in-database-analytics with user defined functions (UDFs) implemented in Python. These functions call machine learning libraries for neural text mining, such as TensorFlow. The system achieves zero cost for data shipping and transformation by utilizing MonetDB’s ability to embed Python processes in the database kernel and exchange data in NumPy arrays. IDEL represents text and relational data in a joint vector space with neural embeddings and can compensate errors with ambiguous entity representations. For detecting matching entities, we propose a novel similarity function based on joint neural embeddings which are learned via minimizing pairwise contrastive ranking loss. This function utilizes a high dimensional index structures for fast retrieval of matching entities. Our first implementation and experiments using the WebNLG corpus show the effectiveness and the potentials of IDEL. |

Independent and identically distributed( iid, i.i.d.) |
In probability theory and statistics, a sequence or other collection of random variables is independent and identically distributed (i.i.d.) if each random variable has the same probability distribution as the others and all are mutually independent. The abbreviation i.i.d. is particularly common in statistics (often as iid, sometimes written IID), where observations in a sample are often assumed to be effectively i.i.d. for the purposes of statistical inference. The assumption (or requirement) that observations be i.i.d. tends to simplify the underlying mathematics of many statistical methods. However, in practical applications of statistical modeling the assumption may or may not be realistic. To test how realistic the assumption is on a given data set the autocorrelation can be computed, lag plots drawn or turning point test performed. The generalization of exchangeable random variables is often sufficient and more easily met. |

Independent Component Analysis( ICA) |
In signal processing, independent component analysis (ICA) is a computational method for separating a multivariate signal into additive subcomponents. This is done by assuming that the subcomponents are non-Gaussian signals and that they are statistically independent from each other. ICA is a special case of blind source separation. A common example application is the ‘cocktail party problem’ of listening in on one person’s speech in a noisy room. |

Independently Interpretable Lasso( IILasso) |
Sparse regularization such as $\ell_1$ regularization is a quite powerful and widely used strategy for high dimensional learning problems. The effectiveness of sparse regularization have been supported practically and theoretically by several studies. However, one of the biggest issues in sparse regularization is that its performance is quite sensitive to correlations between features. Ordinary $\ell_1$ regularization often selects variables correlated with each other, which results in deterioration of not only its generalization error but also interpretability. In this paper, we propose a new regularization method, ‘Independently Interpretable Lasso’ (IILasso for short). Our proposed regularizer suppresses selecting correlated variables, and thus each active variables independently affect the objective variable in the model. Hence, we can interpret regression coefficients intuitively and also improve the performance by avoiding overfitting. We analyze theoretical property of IILasso and show that the proposed method is much advantageous for its sign recovery and achieves almost minimax optimal convergence rate. Synthetic and real data analyses also indicate the effectiveness of IILasso. |

Independently Recurrent Neural Network( IndRNN) |
Recurrent neural networks (RNNs) have been widely used for processing sequential data. However, RNNs are commonly difficult to train due to the well-known gradient vanishing and exploding problems and hard to learn long-term patterns. Long short-term memory (LSTM) and gated recurrent unit (GRU) were developed to address these problems, but the use of hyperbolic tangent and the sigmoid action functions results in gradient decay over layers. Consequently, construction of an efficiently trainable deep network is challenging. In addition, all the neurons in an RNN layer are entangled together and their behaviour is hard to interpret. To address these problems, a new type of RNN, referred to as independently recurrent neural network (IndRNN), is proposed in this paper, where neurons in the same layer are independent of each other and they are connected across layers. We have shown that an IndRNN can be easily regulated to prevent the gradient exploding and vanishing problems while allowing the network to learn long-term dependencies. Moreover, an IndRNN can work with non-saturated activation functions such as relu (rectified linear unit) and be still trained robustly. Multiple IndRNNs can be stacked to construct a network that is deeper than the existing RNNs. Experimental results have shown that the proposed IndRNN is able to process very long sequences (over 5000 time steps), can be used to construct very deep networks (21 layers used in the experiment) and still be trained robustly. Better performances have been achieved on various tasks by using IndRNNs compared with the traditional RNN and LSTM. |

Index of Sensitivity to Nonignorability( ISNI) |
Standard methods of analysis can give misleading results when some observations are nonignorably missing. Analysts currently assess nonignorability by performing sensitivity analyses using models with and without a nonignorable component. Because this approach can involve complicated modeling and arduous computation, and can yield results that are highly sensitive to untestable model assumptions, there is a need for a simple screening tool that measures the potential impact of nonignorability on an analysis. We propose a measure based on a Taylor-series approximation to the nonignorable likelihood, evaluated at the parameter estimates under the assumption of ignorability. From this approximate likelihood, we derive an index of sensitivity to nonignorability, or ISNI. One can compute ISNI without estimating a nonignorable model or positing specific values of a nonignorability parameter. We interpret ISNI in terms of an intuitive parameter that captures the extent of sensitivity. We derive a general expression for ISNI in the generalized linear model with fully observed predictors and potentially missing outcomes. isni |

Indexation |
Indexation is a technique to adjust income payments by means of a price index, in order to maintain the purchasing power of the public after inflation, while Deindexation refers to the unwinding of indexation. From a macroeconomics standpoint there are four main categories of indexation: wage indexation, financial instruments rate indexation, tax rate indexation, and exchange rate indexation. The first three are indexed to inflation. The last one is typically indexed to a foreign currency mainly the US dollar. Any of these different types of indexation can be reversed (deindexation). |

Indirect Inference |
Indirect inference is a simulation-based method for estimating the parameters of economic models. Its hallmark is the use of an auxiliary model to capture aspects of the data upon which to base the estimation. The parameters of the auxiliary model can be estimated using either the observed data or data simulated from the economic model. Indirect inference chooses the parameters of the economic model so that these two estimates of the parameters of the auxiliary model are as close as possible. The auxiliary model need not be correctly specified; when it is, indirect inference is equivalent to maximum likelihood. |

Inductive Logic Programming( ILP) |
Inductive logic programming (ILP) is a subfield of machine learning which uses logic programming as a uniform representation for examples, background knowledge and hypotheses. Given an encoding of the known background knowledge and a set of examples represented as a logical database of facts, an ILP system will derive a hypothesised logic program which entails all the positive and none of the negative examples. Schema: positive examples + negative examples + background knowledge => hypothesis. Inductive logic programming is particularly useful in bioinformatics and natural language processing. Ehud Shapiro laid the theoretical foundation for inductive logic programming and built its first implementation (Model Inference System) in 1981: a Prolog program that inductively inferred logic programs from positive and negative examples. The term Inductive Logic Programming was first introduced in a paper by Stephen Muggleton in 1991. The term ‘inductive’ here refers to philosophical (i.e. suggesting a theory to explain observed facts) rather than mathematical (i.e. proving a property for all members of a well-ordered set) induction. |

Industry 4.0 |
Industry 4.0 is a project in the high-tech strategy of the German government, which promotes the computerization of the manufacturing industry. The goal is the intelligent factory (Smart Factory), which is characterized by adaptability, resource efficiency and ergonomics as well as the integration of customers and business partners in business and value processes. Technological basis are cyber-physical systems and the Internet of Things. Experts believe that Industry 4.0 or the fourth industrial revolution could be a reality in about 10 to 20 years. |

Inertial Regularization and Selection( IRS) |
In this paper, we develop a new sequential regression modeling approach for data streams. Data streams are commonly found around us, e.g in a retail enterprise sales data is continuously collected every day. A demand forecasting model is an important outcome from the data that needs to be continuously updated with the new incoming data. The main challenge in such modeling arises when there is a) high dimensional and sparsity, b) need for an adaptive use of prior knowledge, and/or c) structural changes in the system. The proposed approach addresses these challenges by incorporating an adaptive L1-penalty and inertia terms in the loss function, and thus called Inertial Regularization and Selection (IRS). The former term performs model selection to handle the first challenge while the latter is shown to address the last two challenges. A recursive estimation algorithm is developed, and shown to outperform the commonly used state-space models, such as Kalman Filters, in experimental studies and real data. |

Inferactive Data Analysis |
We describe inferactive data analysis, so-named to denote an interactive approach to data analysis with an emphasis on inference after data analysis. Our approach is a compromise between Tukey’s exploratory (roughly speaking ‘model free’) and confirmatory data analysis (roughly speaking classical and ‘model based’), also allowing for Bayesian data analysis. We view this approach as close in spirit to current practice of applied statisticians and data scientists while allowing frequentist guarantees for results to be reported in the scientific literature, or Bayesian results where the data scientist may choose the statistical model (and hence the prior) after some initial exploratory analysis. While this approach to data analysis does not cover every scenario, and every possible algorithm data scientists may use, we see this as a useful step in concrete providing tools (with frequentist statistical guarantees) for current data scientists. The basis of inference we use is selective inference [Lee et al., 2016, Fithian et al., 2014], in particular its randomized form [Tian and Taylor, 2015a]. The randomized framework, besides providing additional power and shorter confidence intervals, also provides explicit forms for relevant reference distributions (up to normalization) through the {\em selective sampler} of Tian et al. [2016]. The reference distributions are constructed from a particular conditional distribution formed from what we call a DAG-DAG — a Data Analysis Generative DAG. As sampling conditional distributions in DAGs is generally complex, the selective sampler is crucial to any practical implementation of inferactive data analysis. Our principal goal is in reviewing the recent developments in selective inference as well as describing the general philosophy of selective inference. |

Inferential Model( IM) |
Probability is a useful tool for describing uncertainty, so it is natural to strive for a system of statistical inference based on probabilities for or against various hypotheses. But existing probabilistic inference methods struggle to provide a meaningful interpretation of the probabilities across experiments in sufficient generality. In this paper we further develop a promising new approach based on what are called inferential models (IMs). The fundamental idea behind IMs is that there is an unobservable auxiliary variable that itself describes the inherent uncertainty about the parameter of interest, and that posterior probabilistic inference can be accomplished by predicting this unobserved quantity. We describe a simple and intuitive threestep construction of a random set of candidate parameter values, each being consistent with the model, the observed data, and a auxiliary variable prediction. Then prior-free posterior summaries of the available statistical evidence for and against a hypothesis of interest are obtained by calculating the probability that this random set falls completely in and completely out of the hypothesis, respectively. We prove that these IM-based measures of evidence are calibrated in a frequentist sense, showing that IMs give easily-interpretable results both within and across experiments. |

Inferential Statistics |
In statistics, statistical inference is the process of drawing conclusions from data that are subject to random variation, for example, observational errors or sampling variation. More substantially, the terms statistical inference, statistical induction and inferential statistics are used to describe systems of procedures that can be used to draw conclusions from datasets arising from systems affected by random variation, such as observational errors, random sampling, or random experimentation. Initial requirements of such a system of procedures for inference and induction are that the system should produce reasonable answers when applied to well-defined situations and that it should be general enough to be applied across a range of situations. Inferential statistics are used to test hypotheses and make estimations using sample data. |

Infinite Feature Selection( IFS) |
Supervised Infinite Feature Selection |

Infinite Latent Feature Selection |
Feature selection is playing an increasingly significant role with respect to many computer vision applications spanning from object recognition to visual object tracking. However, most of the recent solutions in feature selection are not robust across different and heterogeneous set of data. In this paper, we address this issue proposing a robust probabilistic latent graph-based feature selection algorithm that performs the ranking step while considering all the possible subsets of features, as paths on a graph, bypassing the combinatorial problem analytically. An appealing characteristic of the approach is that it aims to discover an abstraction behind low-level sensory data, that is, relevancy. Relevancy is modelled as a latent variable in a PLSA-inspired generative process that allows the investigation of the importance of a feature when injected into an arbitrary set of cues. The proposed method has been tested on ten diverse benchmarks, and compared against eleven state of the art feature selection methods. Results show that the proposed approach attains the highest performance levels across many different scenarios and difficulties, thereby confirming its strong robustness while setting a new state of the art in feature selection domain. |

Infinite Layer Networks( ILN) |
Infinite Layer Networks (ILN) have recently been proposed as an architecture that mimics neural networks while enjoying some of the advantages of kernel methods. ILN are networks that integrate over infinitely many nodes within a single hidden layer. It has been demonstrated by several authors that the problem of learning ILN can be reduced to the kernel trick, implying that whenever a certain integral can be computed analytically they are efficiently learnable. In this work we give an online algorithm for ILN, which avoids the kernel trick assumption. More generally and of independent interest, we show that kernel methods in general can be exploited even when the kernel cannot be efficiently computed but can only be estimated via sampling. We provide a regret analysis for our algorithm, showing that it matches the sample complexity of methods which have access to kernel values. Thus, our method is the first to demonstrate that the kernel trick is not necessary as such, and random features suffice to obtain comparable performance. |

Infinite Variational Autoencoder( VAE) |
This paper presents an infinite variational autoencoder (VAE) whose capacity adapts to suit the input data. This is achieved using a mixture model where the mixing coefficients are modeled by a Dirichlet process, allowing us to integrate over the coefficients when performing inference. Critically, this then allows us to automatically vary the number of autoencoders in the mixture based on the data. Experiments show the flexibility of our method, particularly for semi-supervised learning, where only a small number of training samples are available. |

InfiniteBoost |
In machine learning ensemble methods have demonstrated high accuracy for the variety of problems in different areas. The most known algorithms intensively used in practice are random forests and gradient boosting. In this paper we present InfiniteBoost – a novel algorithm, which combines the best properties of these two approaches. The algorithm constructs the ensemble of trees for which two properties hold: trees of the ensemble incorporate the mistakes done by others; at the same time the ensemble could contain the infinite number of trees without the over-fitting effect. The proposed algorithm is evaluated on the regression, classification, and ranking tasks using large scale, publicly available datasets. |

InfiniteInsight Function Library( IFL) |
InfiniteInsight function library (“IFL”) for SAP HANA to allow in-memory execution of InfiniteInsight-classic workflows. |

Infinitely Differentiable Monte-Carlo Estimator( DiCE) |
The score function estimator is widely used for estimating gradients of stochastic objectives in Stochastic Computation Graphs (SCG), eg. in reinforcement learning and meta-learning. While deriving the first-order gradient estimators by differentiating a surrogate loss (SL) objective is computationally and conceptually simple, using the same approach for higher-order gradients is more challenging. Firstly, analytically deriving and implementing such estimators is laborious and not compliant with automatic differentiation. Secondly, repeatedly applying SL to construct new objectives for each order gradient involves increasingly cumbersome graph manipulations. Lastly, to match the first-order gradient under differentiation, SL treats part of the cost as a fixed sample, which we show leads to missing and wrong terms for higher-order gradient estimators. To address all these shortcomings in a unified way, we introduce DiCE, which provides a single objective that can be differentiated repeatedly, generating correct gradient estimators of any order in SCGs. Unlike SL, DiCE relies on automatic differentiation for performing the requisite graph manipulations. We verify the correctness of DiCE both through a proof and through numerical evaluation of the DiCE gradient estimates. We also use DiCE to propose and evaluate a novel approach for multi-agent learning. Our code is available at https://goo.gl/xkkGxN. |

Influence Diagram( ID) |
An influence diagram (ID) (also called a relevance diagram, decision diagram or a decision network) is a compact graphical and mathematical representation of a decision situation. It is a generalization of a Bayesian network, in which not only probabilistic inference problems but also decision making problems (following maximum expected utility criterion) can be modeled and solved. |

Infobesity |
Information overload (also known as infobesity or infoxication) refers to the difficulty a person can have understanding an issue and making decisions that can be caused by the presence of too much information. The term is popularized by Alvin Toffler in his bestselling 1970 book Future Shock, but is mentioned in a 1964 book by Bertram Gross, The Managing of Organizations. Speier et al. (1999) stated: “Information overload occurs when the amount of input to a system exceeds its processing capacity. Decision makers have fairly limited cognitive processing capacity. Consequently, when information overload occurs, it is likely that a reduction in decision quality will occur.” In recent years, the term ‘information overload’ has evolved into phrases such as ‘information glut’ and ‘data smog’ (Shenk, 1997). What was once a term grounded in cognitive psychology has evolved into a rich metaphor used outside the world of academia. In many ways, the advent of information technology has increased the focus on information overload: information technology may be a primary reason for information overload due to its ability to produce more information more quickly and to disseminate this information to a wider audience than ever before (Evaristo, Adams, & Curley, 1995; Hiltz & Turoff, 1985). |

Information Coefficient( IC) |
The information coefficient (IC) is a measure of the merit of a predicted value. In finance, the information coefficient is used as a performance metric for the predictive skill of a financial analyst. The information coefficient is similar to correlation in that it can be seen to measure the linear relationship between two random variables, e.g. predicted stock returns and the actualized returns. The information coefficient ranges from 0 to 1, with 0 denoting no linear relationship between predictions and actual values (poor forecasting skills) and 1 denoting a perfect linear relationship (good forecasting skills). |

Information Extraction( IE) |
Information extraction (IE) is the task of automatically extracting structured information from unstructured and/or semi-structured machine-readable documents. In most of the cases this activity concerns processing human language texts by means of natural language processing (NLP). Recent activities in multimedia document processing like automatic annotation and content extraction out of images/audio/video could be seen as information extraction. |

Information Fusion |
Information integration (II) (also called deduplication and referential integrity) is the merging of information from heterogeneous sources with differing conceptual, contextual and typographical representations. It is used in data mining and consolidation of data from unstructured or semi-structured resources. Typically, information integration refers to textual representations of knowledge but is sometimes applied to rich-media content. information fusion which is a related term involves the combination of information into a new set of information towards reducing uncertainty. |

Information Fuzzy Networks( IFN) |
Info Fuzzy Networks (IFN) is a greedy machine learning algorithm for supervised learning. The data structure produced by the learning algorithm is also called Info Fuzzy Network. IFN construction is quite similar to decision trees’ construction. However, IFN constructs a directed graph and not a tree. IFN also uses the conditional mutual information metric in order to choose features during the construction stage while decision trees usually use other metrics like entropy or gini. |

Information Gain |
In information theory and machine learning, information gain is a synonym for Kullback-Leibler divergence. However, in the context of decision trees, the term is sometimes used synonymously with mutual information, which is the expectation value of the Kullback-Leibler divergence of a conditional probability distribution. |

Information Harvesting |
Information Harvesting (IH) was an early data mining product from the 1990s. It was invented by Ralphe Wiggins and produced by the Ryan Corp, later Information Harvesting Inc., of Cambridge, Massachusetts. IH sought to infer rules from sets of data. It did this first by classifying various input variables into one of a number of bins, thereby putting some structure on the continuous variables in the input. IH then proceeds to generate rules, trading off generalization against memorization, that will infer the value of the prediction variable, possibly creating many levels of rules in the process. It included strategies for checking if overfitting took place and, if so, correcting for it. Because of its strategies for correcting for overfitting by considering more data, and refining the rules based on that data, IH might also be considered to be a form of machine learning. |

Information Integration |
Information integration (II) (also called deduplication and referential integrity) is the merging of information from heterogeneous sources with differing conceptual, contextual and typographical representations. It is used in data mining and consolidation of data from unstructured or semi-structured resources. Typically, information integration refers to textual representations of knowledge but is sometimes applied to rich-media content. information fusion which is a related term involves the combination of information into a new set of information towards reducing uncertainty. |

Information Maximization( Infomax) |
Infomax is an optimization principle for artificial neural networks and other information processing systems. It prescribes that a function that maps a set of input values I to a set of output values O should be chosen or learned so as to maximize the average Shannon mutual information between I and O, subject to a set of specified constraints and/or noise processes. Infomax algorithms are learning algorithms that perform this optimization process. The principle was described by Linsker in 1987. Infomax, in its zero-noise limit, is related to the principle of redundancy reduction proposed for biological sensory processing by Horace Barlow in 1961, and applied quantitatively to retinal processing by Atick and Redlich. One of the applications of infomax has been to an independent component analysis algorithm that finds independent signals by maximising entropy. Infomax-based ICA was described by Bell and Sejnowski in 1995. |

Information Potential Auto-Encoders |
In this paper, we suggest a framework to make use of mutual information as a regularization criterion to train Auto-Encoders (AEs). In the proposed framework, AEs are regularized by minimization of the mutual information between input and encoding variables of AEs during the training phase. In order to estimate the entropy of the encoding variables and the mutual information, we propose a non-parametric method. We also give an information theoretic view of Variational AEs (VAEs), which suggests that VAEs can be considered as parametric methods that estimate entropy. Experimental results show that the proposed non-parametric models have more degree of freedom in terms of representation learning of features drawn from complex distributions such as Mixture of Gaussians, compared to methods which estimate entropy using parametric approaches, such as Variational AEs. |

Information Retrieval( IR) |
Information retrieval is the activity of obtaining information resources relevant to an information need from a collection of information resources. Searches can be based on metadata or on full-text (or other content-based) indexing. Automated information retrieval systems are used to reduce what has been called “information overload”. Many universities and public libraries use IR systems to provide access to books, journals and other documents. Web search engines are the most visible IR applications. An information retrieval process begins when a user enters a query into the system. Queries are formal statements of information needs, for example search strings in web search engines. In information retrieval a query does not uniquely identify a single object in the collection. Instead, several objects may match the query, perhaps with different degrees of relevancy. |

Information Value( IV) |
In statistical data mining, sometimes we need to determine out of a set of variables which ones are best in capturing a desired behavior. For example, let’s say you have a pool of customers for your credit card company, and you want to determine who out of them are about to default (i.e. refuse to pay up after possibly making a huge expense). You need to then identify which of the attributes you have on the customer can potentially identify and alert you of such behavior. One of the popular ways in which this is done by analysts is by looking at something called ‘Information Value’. In the context of data mining is also sometimes referred to by the short form – InfoVal. |

Information Visualization |
Information visualization or information visualisation is the study of (interactive) visual representations of abstract data to reinforce human cognition. The abstract data include both numerical and non-numerical data, such as text and geographic information. However, information visualization differs from scientific visualization: “it’s infovis (information visualization) when the spatial representation is chosen, and it’s scivis (scientific visualization) when the spatial representation is given”. |

Information-Anchored Sensitivity Analysis |
Analysis of longitudinal randomised controlled trials is frequently complicated because patients deviate from the protocol. Where such deviations are relevant for the estimand, we are typically required to make an untestable assumption about post-deviation behaviour in order to perform our primary analysis and estimate the treatment effect. In such settings, it is now widely recognised that we should follow this with sensitivity analyses to explore the robustness of our inferences to alternative assumptions about post-deviation behaviour. Although there has been a lot of work on how to conduct such sensitivity analyses, little attention has been given to the appropriate loss of information due to missing data within sensitivity analysis. We argue more attention needs to be given to this issue, showing it is quite possible for sensitivity analysis to decrease and increase the information about the treatment effect. To address this critical issue, we introduce the concept of information-anchored sensitivity analysis. By this we mean sensitivity analysis in which the proportion of information about the treatment estimate lost due to missing data is the same as the proportion of information about the treatment estimate lost due to missing data in the primary analysis. We argue this forms a transparent, practical starting point for interpretation of sensitivity analysis. We then derive results showing that, for longitudinal continuous data, a broad class of controlled and reference-based sensitivity analyses performed by multiple imputation are information-anchored. We illustrate the theory with simulations and an analysis of a peer review trial, then discuss our work in the context of other recent work in this area. Our results give a theoretical basis for the use of controlled multiple imputation procedures for sensitivity analysis. |

Information-Based Optimal Subdata Selection( IBOSS) |
Extraordinary amounts of data are being produced in many branches of science. Proven statistical methods are no longer applicable with extraordinary large data sets due to computational limitations. A critical step in big data analysis is data reduction. Existing investigations in the context of linear regression focus on subsampling-based methods. However, not only is this approach prone to sampling errors, it also leads to a covariance matrix of the estimators that is typically bounded from below by a term that is of the order of the inverse of the subdata size. We propose a novel approach, termed information-based optimal subdata selection (IBOSS). Compared to leading existing subdata methods, the IBOSS approach has the following advantages: (i) it is significantly faster; (ii) it is suitable for distributed parallel computing; (iii) the variances of the slope parameter estimators converge to 0 as the full data size increases even if the subdata size is fixed, i.e., the convergence rate depends on the full data size; (iv) data analysis for IBOSS subdata is straightforward and the sampling distribution of an IBOSS estimator is easy to assess. Theoretical results and extensive simulations demonstrate that the IBOSS approach is superior to subsampling-based methods, sometimes by orders of magnitude. The advantages of the new approach are also illustrated through analysis of real data. |

Inhomogeneous Self-Exciting Process( IHSEP) |
IHSEP |

Initial Data Analysis( IDA) |
The most important distinction between the initial data analysis phase and the main analysis phase, is that during initial data analysis one refrains from any analysis that is aimed at answering the original research question. The initial data analysis phase is guided by the following four questions: • Quality of data • Quality of measurements • Initial transformations • Did the implementation of the study fulfill the intentions of the research design? |

Innovation Management |
Innovation management is the management of innovation processes. It refers both to product and organizational innovation. Innovation management includes a set of tools that allow managers and engineers to cooperate with a common understanding of processes and goals. Innovation management allows the organization to respond to external or internal opportunities, and use its creativity to introduce new ideas, processes or products. It is not relegated to R&D; it involves workers at every level in contributing creatively to a company’s product development, manufacturing and marketing. |

Innovation Pursuit( iPursuit) |
In subspace clustering, a group of data points belonging to a union of subspaces are assigned membership to their respective subspaces. This paper presents a new approach dubbed Innovation Pursuit (iPursuit) to the problem of subspace clustering using a new geometrical idea whereby each subspace is identified based on its novelty with respect to the other subspaces. The proposed approach finds the subspaces consecutively by solving a series of simple linear optimization problems, each searching for some direction in the span of the data that is potentially orthogonal to all subspaces except for the one to be identified in one step of the algorithm. A detailed mathematical analysis is provided establishing sufficient conditions for the proposed approach to correctly cluster the data points. Remarkably, the proposed approach can provably yield exact clustering even when the subspaces have significant intersections under mild conditions on the distribution of the data points in the subspaces. Moreover, It is shown that the complexity of iPursuit is almost independent of the dimension of the data. The numerical simulations demonstrate that iPursuit can often outperform the state-of-the-art subspace clustering algorithms, more so for subspaces with significant intersections. |

Input Fast-Forwarding |
This paper introduces a new architectural framework, known as input fast-forwarding, that can enhance the performance of deep networks. The main idea is to incorporate a parallel path that sends representations of input values forward to deeper network layers. This scheme is substantially different from ‘deep supervision’ in which the loss layer is re-introduced to earlier layers. The parallel path provided by fast-forwarding enhances the training process in two ways. First, it enables the individual layers to combine higher-level information (from the standard processing path) with lower-level information (from the fast-forward path). Second, this new architecture reduces the problem of vanishing gradients substantially because the fast-forwarding path provides a shorter route for gradient backpropagation. In order to evaluate the utility of the proposed technique, a Fast-Forward Network (FFNet), with 20 convolutional layers along with parallel fast-forward paths, has been created and tested. The paper presents empirical results that demonstrate improved learning capacity of FFNet due to fast-forwarding, as compared to GoogLeNet (with deep supervision) and CaffeNet, which are 4x and 18x larger in size, respectively. All of the source code and deep learning models described in this paper will be made available to the entire research community |

Instance Segmentation |
Instance segmentation is the problem of detecting and delineating each object of interest appearing in an image. Current instance segmentation approaches consist of ensembles of modules that are trained independently of each other, thus missing learning opportunities. |

Instance Selection( IS) |
In supervised learning, a training set providing previously known information is used to classify new instances. Commonly, several instances are stored in the training set but some of them are not useful for classifying therefore it is possible to get acceptable classification rates ignoring non useful cases; this process is known as instance selection. Through instance selection the training set is reduced which allows reducing runtimes in the classification and/or training stages of classifiers. |

Instance-Based Learning( IBL) |
In machine learning, instance-based learning or memory-based learning is a family of learning algorithms that, instead of performing explicit generalization, compares new problem instances with instances seen in training, which have been stored in memory. Instance-based learning is a kind of lazy learning. |

Instancewise Feature Selection |
We introduce instancewise feature selection as a methodology for model interpretation. Our method is based on learning a function to extract a subset of features that are most informative for each given example. This feature selector is trained to maximize the mutual information between selected features and the response variable, where the conditional distribution of the response variable given the input is the model to be explained. We develop an efficient variational approximation to the mutual information, and show that the resulting method compares favorably to other model explanation methods on a variety of synthetic and real data sets using both quantitative metrics and human evaluation. |

Instantaneous Rates( IRATE) |
The Instantaneous Rates (IRATE) model is used to analyze tagging data. It is based on the Hoenig et al. (1998) alternate formulation of the Brownie et al. (1985) band recovery models that allow fishing and natural mortality to be derived from the exploitation rate and survival rate estimates of a Type II (continuous) fishery. IRATE allows both age-independent and age-dependent instantaneous rates models (Hoenig et al., 1998; Jiang et al., 2007) to be fitted to multi-year fish tag return data. IRATE allows model development with either age-dependent harvest-only or harvest and catch-release tag returns or similar age independent models. The software, developed by Dr. Gary Nelson of the Massachusetts Division of Marine Fisheries, also allows estimation of harvest reporting rates, catch and release reporting rates, and tag retention of harvested and/or released fish. However, not all parameters in the model can be estimated simultaneously with tag data alone. Some parameters must be fixed and assumed known (usually reporting rate and tag loss) to obtain good estimates of remaining parameters. Additionally, the model can account for non-mixing of the tagged fish in the first release year and adjust for harvest and M selectivity in the age-based models. The negative log likelihood is used as the objective function to obtain maximum likelihood estimates of parameters. Several model fit statistics are provided that can be used to select the best model formulation; these include the Akaike Information Criterion (AIC), c-hat (a measure of overdispersion) and standard residuals. The calculation engine is written in AD Model Builder. IRATER |

Instrumental Panel Data Models |
ivpanel |

Instrumental Variable( IV) |
In statistics, econometrics, epidemiology and related disciplines, the method of instrumental variables (IV) is used to estimate causal relationships when controlled experiments are not feasible or when a treatment is not successfully delivered to every unit in a randomized experiment. Instrumental variable methods allow consistent estimation when the explanatory variables (covariates) are correlated with the error terms of a regression relationship. Such correlation may occur when the dependent variable causes at least one of the covariates (‘reverse’ causation), when there are relevant explanatory variables which are omitted from the model, or when the covariates are subject to measurement error. In this situation, ordinary linear regression generally produces biased and inconsistent estimates. However, if an instrument is available, consistent estimates may still be obtained. An instrument is a variable that does not itself belong in the explanatory equation and is correlated with the endogenous explanatory variables, conditional on the other covariates. In linear models, there are two main requirements for using an IV: • The instrument must be correlated with the endogenous explanatory variables, conditional on the other covariates. • The instrument cannot be correlated with the error term in the explanatory equation (conditional on the other covariates), that is, the instrument cannot suffer from the same problem as the original predicting variable. ivmodel |

Integer Echo State Network( intESN) |
We propose an integer approximation of Echo State Networks (ESN) based on the mathematics of hyperdimensional computing. The reservoir of the proposed Integer Echo State Network (intESN) contains only n-bits integers and replaces the recurrent matrix multiply with an efficient cyclic shift operation. Such an architecture results in dramatic improvements in memory footprint and computational efficiency, with minimal performance loss. Our architecture naturally supports the usage of the trained reservoir in symbolic processing tasks of analogy making and logical inference. |

Integer Linear Programming( ILP) |
An integer programming problem is a mathematical optimization or feasibility program in which some or all of the variables are restricted to be integers. In many settings the term refers to integer linear programming (ILP), in which the objective function and the constraints (other than the integer constraints) are linear. Integer programming is NP-hard. A special case, 0-1 integer linear programming, in which unknowns are binary, and only the restrictions must be satisfied, is one of Karp’s 21 NP-complete problems. Book: Compact Extended Linear Programming Models |

Integrated Discrimination Improvement( IDI) |
Integrated Discrimination Improvement (IDI) described in the paper: Jialiang Li (2013) <doi:10.1093/biostatistics/kxs047>. mcca |

Integrated Nested Laplace Approximation( INLA) |
A fully automatic approach for approximate inference in latent Gaussian models. INLA,meta4diag |

Integrative Connectionist Learning Systems( ICOS) |
The so far developed and widely utilized connectionist systems (artificial neural networks) are mainly based on a single brain-like connectionist principle of information processing, where learning and information exchange occur in the connections. This paper extends this paradigm of connectionist systems to a new trend—integrative connectionist learning systems (ICOS) that integrate in their structure and learning algorithms principles from different hierarchical levels of information processing in the brain, including neuronal-, genetic-, quantum. Spiking neural networks (SNN) are used as a basic connectionist learning model which is further extended with other information learning principles to create different ICOS. For example, evolving SNN for multitask learning are presented and illustrated on a case study of person authentification based on multimodal auditory and visual information. Integrative gene-SNN are presented, where gene interactions are included in the functioning of a spiking neuron. They are applied on a case study of computational neurogenetic modeling. Integrative quantum-SNN are introduced with a quantum Hebbian learning, where input features as well as information spikes are represented by quantum bits that result in exponentially faster feature selection and model learning. ICOS can be used to solve more efficiently challenging biological and engineering problems when fast adaptive learning systems are needed to incrementally learn in a large dimensional space. They can also help to better understand complex information processes in the brain especially how information processes at different information levels interact. Open questions, challenges and directions for further research are presented. |

Intel Machine Learning Scalability Library( MLSL) |
The exponential growth in use of large deep neural networks has accelerated the need for training these deep neural networks in hours or even minutes. This can only be achieved through scalable and efficient distributed training, since a single node/card cannot satisfy the compute, memory, and I/O requirements of today’s state-of-the-art deep neural networks. However, scaling synchronous Stochastic Gradient Descent (SGD) is still a challenging problem and requires continued research/development. This entails innovations spanning algorithms, frameworks, communication libraries, and system design. In this paper, we describe the philosophy, design, and implementation of Intel Machine Learning Scalability Library (MLSL) and present proof-points demonstrating scaling DL training on 100s to 1000s of nodes across Cloud and HPC systems. |

Intel nGraph |
The Deep Learning (DL) community sees many novel topologies published each year. Achieving high performance on each new topology remains challenging, as each requires some level of manual effort. This issue is compounded by the proliferation of frameworks and hardware platforms. The current approach, which we call ‘direct optimization’, requires deep changes within each framework to improve the training performance for each hardware backend (CPUs, GPUs, FPGAs, ASICs) and requires $\mathcal{O}(fp)$ effort; where $f$ is the number of frameworks and $p$ is the number of platforms. While optimized kernels for deep-learning primitives are provided via libraries like Intel Math Kernel Library for Deep Neural Networks (MKL-DNN), there are several compiler-inspired ways in which performance can be further optimized. Building on our experience creating neon (a fast deep learning library on GPUs), we developed Intel nGraph, a soon to be open-sourced C++ library to simplify the realization of optimized deep learning performance across frameworks and hardware platforms. Initially-supported frameworks include TensorFlow, MXNet, and Intel neon framework. Initial backends are Intel Architecture CPUs (CPU), the Intel(R) Nervana Neural Network Processor(R) (NNP), and NVIDIA GPUs. Currently supported compiler optimizations include efficient memory management and data layout abstraction. In this paper, we describe our overall architecture and its core components. In the future, we envision extending nGraph API support to a wider range of frameworks, hardware (including FPGAs and ASICs), and compiler optimizations (training versus inference optimizations, multi-node and multi-device scaling via efficient sub-graph partitioning, and HW-specific compounding of operations). |

Intelligence Amplification |
Intelligence amplification (IA) (also referred to as cognitive augmentation and machine augmented intelligence) refers to the effective use of information technology in augmenting human intelligence. The idea was first proposed in the 1950s and 1960s by cybernetics and early computer pioneers. IA is sometimes contrasted with AI (Artificial Intelligence), that is, the project of building a human-like intelligence in the form of an autonomous technological system such as a computer or robot. AI has encountered many fundamental obstacles, practical as well as theoretical, which for IA seem moot, as it needs technology merely as an extra support for an autonomous intelligence that has already proven to function. Moreover, IA has a long history of success, since all forms of information technology, from the abacus to writing to the Internet, have been developed basically to extend the information processing capabilities of the human mind (see extended mind and distributed cognition). |

Intelligence Graph |
In fact, there exist three genres of intelligence architectures: logics (e.g. \textit{Random Forest, A$^*$ Searching}), neurons (e.g. \textit{CNN, LSTM}) and probabilities (e.g. \textit{Naive Bayes, HMM}), all of which are incompatible to each other. However, to construct powerful intelligence systems with various methods, we propose the intelligence graph (short as \textbf{\textit{iGraph}}), which is composed by both of neural and probabilistic graph, under the framework of forward-backward propagation. By the paradigm of iGraph, we design a recommendation model with semantic principle. First, the probabilistic distributions of categories are generated from the embedding representations of users/items, in the manner of neurons. Second, the probabilistic graph infers the distributions of features, in the manner of probabilities. Last, for the recommendation diversity, we perform an expectation computation then conduct a logic judgment, in the manner of logics. Experimentally, we beat the state-of-the-art baselines and verify our conclusions. |

Intelligent Data Analytics( IDA) |
The art of Conquering Data with Intelligent Systems includes all areas of Research and Development in Intelligent Data Analytics , the area including Data Analytics and Intelligent Systems, that focus on computational, mathematical, statistical, cognitive, and algorithmic techniques for modeling high dimensional data with the ultimate goal of extracting meaning from (raw) data. This requires methods ranging from learning, inference, prediction, knowledge discovery and visualisation that are applicable on both small and large volumes of mostly dynamic data sets collected and integrated from multiple sources, across multiple modalities. These methods and techniques trigger the need for assessment and evaluation: automated and by humans. Intelligent Data Analytics enables automated hypothesis generation, event correlation, and anomaly detection and helps in explaining phenomena and inferring results that would otherwise remain hidden. Intelligent Data Analytics is a cornerstone in modern Big Data, amplifying perhaps its most important aspect: Value. |

Intelligent K-Means( ik-Means) |
Intelligent K-Means (iK-Means) is an K-Means initialization algorithm. It is a simple algorithm based on the concept of anomalous patterns, its of easy implementation and may even help you to find how many clusters there are in a dataset (remember, you need to know this in order to run K-Means!). Intelligent Choice of the Number of Clusters in K -Means Clustering: An Experimental Study with Different Cluster Spreads |

Intelligent Personal Agent( IPA) |
An Intelligent Personal Agent (IPA) is an agent that has the purpose of helping the user to gain information through reliable resources with the help of knowledge navigation techniques and saving time to search the best content. The agent is also responsible for responding to the chat-based queries with the help of Conversation Corpus. |

Intelligent Software |
Christopher Bishop: “Software that can adapt, learn and reason” |

Intent-Aware Multi-Agent Reinforcement Learning( IAMARL) |
This paper proposes an intent-aware multi-agent planning framework as well as a learning algorithm. Under this framework, an agent plans in the goal space to maximize the expected utility. The planning process takes the belief of other agents’ intents into consideration. Instead of formulating the learning problem as a partially observable Markov decision process (POMDP), we propose a simple but effective linear function approximation of the utility function. It is based on the observation that for humans, other people’s intents will pose an influence on our utility for a goal. The proposed framework has several major advantages: i) it is computationally feasible and guaranteed to converge. ii) It can easily integrate existing intent prediction and low-level planning algorithms. iii) It does not suffer from sparse feedbacks in the action space. We experiment our algorithm in a real-world problem that is non-episodic, and the number of agents and goals can vary over time. Our algorithm is trained in a scene in which aerial robots and humans interact, and tested in a novel scene with a different environment. Experimental results show that our algorithm achieves the best performance and human-like behaviors emerge during the dynamic process. |

Intention Analysis |
Intention Analysis is the identification of intentions from text, be it the intention to purchase or the intention to sell or to complain, accuse, inquire, opine, advocate or to quit, in incoming customer messages or in call center transcripts. Intention analysis using topic models |

Inter Rater Reliability( IRR) |
In statistics, inter-rater reliability, inter-rater agreement, or concordance is the degree of agreement among raters. It gives a score of how much homogeneity, or consensus, there is in the ratings given by judges. It is useful in refining the tools given to human judges, for example by determining if a particular scale is appropriate for measuring a particular variable. If various raters do not agree, either the scale is defective or the raters need to be re-trained. There are a number of statistics which can be used to determine inter-rater reliability. Different statistics are appropriate for different types of measurement. Some options are: joint-probability of agreement, Cohen’s kappa and the related Fleiss’ kappa, inter-rater correlation, concordance correlation coefficient and intra-class correlation. rhoR |

Interactive Growing Hierarchical SOM( interactive GHSOM) |
Self Organizing Map is trained using unsupervised learning to produce a two-dimensional discretized representation of input space of the training cases. Growing Hierarchical SOM is an architecture which grows both in a hierarchical way representing the structure of data distribution and in a horizontal way representation the size of each individual maps. The control method of the growing degree of GHSOM by pruning off the redundant branch of hierarchy in SOM is proposed in this paper. Moreover, the interface tool for the proposed method called interactive GHSOM is developed. We discuss the computation results of Iris data by using the developed tool. |

Interactive Report |
An “Interactive Report” provides a new paradigm to fill the gap between Static Report and BI Tool. It has the following characteristics … 1. Like a static report, “Interactive Report” is still based on “static data”, which is a fixed set of data generated in a periodic batch fashion. 2. Unlike static report, this pre-generated “static data” is much larger and wider that covers a broader scope of questions that the execs may ask. 3. Because the “static data” is large and wide, it is impossible to visualize all aspects in the report. Therefore, only one perspective of the static data (based on the exec’s pre-specified requirement) is shown in the report. 4. However, if the exec wants to ask a different question, he/she can switch to a different perspective of the same “static data”. |

Inter-Annotator Agreement Network |
This work develops a simple information theoretic framework that captures the dynamic of the inter-annotator agreement process and unifies a wide range of approaches in unsupervised learning. Our model consists of a pair of annotators whose goal is to maximize the mutual information between their annotations. Training the model with standard stochastic gradient descent is challenging, but we find an ablation of the model that admits variational approximation to be empirically effective. We illustrate the strength our framework by achieving new state-of-the-art accuracy on unsupervised part-of-speech tagging, in particular 78.7% on the 45-tag Penn WSJ dataset. We also show clear performance improvement in unsupervised entity typing. |

Interior Point( IP) |
Interior point methods (also referred to as barrier methods) are a certain class of algorithms to solve linear and nonlinear convex optimization problems. Example solution John von Neumann suggested an interior point method of linear programming which was neither a polynomial time method nor an efficient method in practice. In fact, it turned out to be slower in practice compared to simplex method which is not a polynomial time method. In 1984, Narendra Karmarkar developed a method for linear programming called Karmarkar’s algorithm which runs in provably polynomial time and is also very efficient in practice. It enabled solutions of linear programming problems which were beyond the capabilities of simplex method. Contrary to the simplex method, it reaches a best solution by traversing the interior of the feasible region. The method can be generalized to convex programming based on a self-concordant barrier function used to encode the convex set. Any convex optimization problem can be transformed into minimizing (or maximizing) a linear function over a convex set by converting to the epigraph form. The idea of encoding the feasible set using a barrier and designing barrier methods was studied by Anthony V. Fiacco, Garth P. McCormick, and others in the early 1960s. These ideas were mainly developed for general nonlinear programming, but they were later abandoned due to the presence of more competitive methods for this class of problems (e.g. sequential quadratic programming). Yurii Nesterov and Arkadi Nemirovski came up with a special class of such barriers that can be used to encode any convex set. They guarantee that the number of iterations of the algorithm is bounded by a polynomial in the dimension and accuracy of the solution. Karmarkar’s breakthrough revitalized the study of interior point methods and barrier problems, showing that it was possible to create an algorithm for linear programming characterized by polynomial complexity and, moreover, that was competitive with the simplex method. Already Khachiyan’s ellipsoid method was a polynomial time algorithm; however, it was too slow to be of practical interest. The class of primal-dual path-following interior point methods is considered the most successful. Mehrotra’s predictor-corrector algorithm provides the basis for most implementations of this class of methods. |

Interior Point Optimizer( Ipopt) |
Ipopt (Interior Point OPTimizer, pronounced eye-pea-Opt) is a software package for large-scale nonlinear optimization. It is designed to find (local) solutions of mathematical optimization problems of the form: min f(x) for x in R^n, so that gL <= g(x) <= gU; xL <= x <= xU. Ipopt is written in C++ and is released as open source code under the Eclipse Public License (EPL). It is available from the COIN-OR initiative. The code has been written by Andreas Wächter and Carl Laird. The COIN-OR project managers for Ipopt are Andreas Wächter und Stefan Vigerske. |

Internal Node Bagging |
We introduce a novel view to understand how dropout works as an inexplicit ensemble learning method, which do not point out how many and which nodes to learn a certain feature. We propose a new training method named internal node bagging, this method explicitly force a group of nodes to learn a certain feature in training time, and combine those nodes to be one node in inference time. It means we can use much more parameters to improve model’s fitting ability in training time while keeping model small in inference time. We test our method on several benchmark datasets and find it significantly more efficiency than dropout on small model. |

International Conference on Data Mining( ICDM) |
The IEEE International Conference on Data Mining series (ICDM) has established itself as the world’s premier research conference in data mining. It provides an international forum for presentation of original research results, as well as exchange and dissemination of innovative, practical development experiences. The conference covers all aspects of data mining, including algorithms, software and systems, and applications. ICDM draws researchers and application developers from a wide range of data mining related areas such as statistics, machine learning, pattern recognition, databases and data warehousing, data visualization, knowledge-based systems, and high performance computing. By promoting novel, high quality research findings, and innovative solutions to challenging data mining problems, the conference seeks to continuously advance the state-of-the-art in data mining. Besides the technical program, the conference features workshops, tutorials, panels and, since 2007, the ICDM data mining contest. |

International Institute for Analytics( IIA) |
Founded in 2010 by CEO Jack Phillips and Research Director Thomas H. Davenport, the International Institute for Analytics is an independent research firm that works with organizations to build strong and competitive analytics programs. IIA offers unbiased advice in an industry dominated by hardware and software vendors, consultants and system integrators. With a vast network of analytics experts, academics and leaders at successful companies, we guide our clients as they build and grow successful analytics programs. |

International Mathematics and Statistics Library( IMSL) |
IMSL (International Mathematics and Statistics Library) is a commercial collection of software libraries of numerical analysis functionality that are implemented in the computer programming languages of C, Java, C#.NET, and Fortran. A Python interface is also available. The IMSL Libraries are provided by Rogue Wave Software. |

International Phonetic Alphabet( IPA) |
The International Phonetic Alphabet (unofficially—though commonly—abbreviated IPA) is an alphabetic system of phonetic notation based primarily on the Latin alphabet. It was devised by the International Phonetic Association as a standardized representation of the sounds of oral language. The IPA is used by lexicographers, foreign language students and teachers, linguists, speech-language pathologists, singers, actors, constructed language creators, and translators. The IPA is designed to represent only those qualities of speech that are part of oral language: phones, phonemes, intonation, and the separation of words and syllables. To represent additional qualities of speech, such as tooth gnashing, lisping, and sounds made with a cleft palate, an extended set of symbols called the Extensions to the IPA may be used. IPA symbols are composed of one or more elements of two basic types, letters and diacritics. For example, the sound of the English letter ⟨t⟩ may be transcribed in IPA with a single letter, , or with a letter plus diacritics, , depending on how precise one wishes to be. Often, slashes are used to signal broad or phonemic transcription; thus, /t/ is less specific than, and could refer to, either or , depending on the context and language. Occasionally letters or diacritics are added, removed, or modified by the International Phonetic Association. As of the most recent change in 2005, there are 107 letters, 52 diacritics, and four prosodic marks in the IPA. These are shown in the current IPA chart, posted below in this article and at the website of the IPA. International Phonetic Association |

Internet of Everything( IoE) |
The Internet of Everything describes the networked connections between devices, people, processes and data. The Digitally Connected World. |

Internet of Things( IoT) |
The Internet of Things (IoT) is the interconnection of uniquely identifiable embedded computing devices within the existing Internet infrastructure. Typically, IoT is expected to offer advanced connectivity of devices, systems, and services that goes beyond machine-to-machine communications (M2M) and covers a variety of protocols, domains, and applications. The interconnection of these embedded devices (including smart objects), is expected to usher in automation in nearly all fields, while also enabling advanced applications like a Smart Grid. Things, in the IoT, can refer to a wide variety of devices such as heart monitoring implants, biochip transponders on farm animals, automobiles with built-in sensors, or field operation devices that assist fire-fighters in search and rescue. Current market examples include smart thermostat systems and washer/dryers that utilize wifi for remote monitoring. |

Internet of Us( IoU) |
Call it the internet of bodies, call it emotionally intelligent wearable tech. Designers, engineers and artists want to wake the mainstream tech giants up to the realities of asking people to wear technology. https://…/111811056605813020209 |

Internet Shopping Problem |
Introduced by Blazewicz et al. (2010), where a customer wants to buy a list of products at the lowest possible total cost from shops which offer discounts when purchases exceed a certain threshold. The problem is NP-hard. |

InterpNET |
Humans are able to explain their reasoning. On the contrary, deep neural networks are not. This paper attempts to bridge this gap by introducing a new way to design interpretable neural networks for classification, inspired by physiological evidence of the human visual system’s inner-workings. This paper proposes a neural network design paradigm, termed InterpNET, which can be combined with any existing classification architecture to generate natural language explanations of the classifications. The success of the module relies on the assumption that the network’s computation and reasoning is represented in its internal layer activations. While in principle InterpNET could be applied to any existing classification architecture, it is evaluated via an image classification and explanation task. Experiments on a CUB bird classification and explanation dataset show qualitatively and quantitatively that the model is able to generate high-quality explanations. While the current state-of-the-art METEOR score on this dataset is 29.2, InterpNET achieves a much higher METEOR score of 37.9. |

Interpretable Reasoning Network |
Multi-relation Question Answering is a challenging task, due to the requirement of elaborated analysis on questions and reasoning over multiple fact triples in knowledge base. In this paper, we present a novel model called Interpretable Reasoning Network that employs an interpretable, hop-by-hop reasoning process for question answering. The model dynamically decides which part of an input question should be analyzed at each hop; predicts a relation that corresponds to the current parsed results; utilizes the predicted relation to update the question representation and the state of the reasoning process; and then drives the next-hop reasoning. Experiments show that our model yields state-of-the-art results on two datasets. More interestingly, the model can offer traceable and observable intermediate predictions for reasoning analysis and failure diagnosis. |

Interpretive Structural Modelling( ISM) |
The development of ISM was made by Warfield in 1974. ISM is the process of collaborating distinct or related essentials into a simplified and an organized format. Hence, ISM is a methodology that seeks the interrelationships among the various elements considered and endows with a hierarchical and multilevel structure. ISM |

Inter-rater Reliability (Concordance) |
In statistics, inter-rater reliability, inter-rater agreement, or concordance is the degree of agreement among raters. It gives a score of how much homogeneity, or consensus, there is in the ratings given by judges. It is useful in refining the tools given to human judges, for example by determining if a particular scale is appropriate for measuring a particular variable. If various raters do not agree, either the scale is defective or the raters need to be re-trained. |

Interval-based Prediction Uncertainty Bounding( IPUB) |
The problem of machine learning with missing values is common in many areas. A simple approach is to first construct a dataset without missing values simply by discarding instances with missing entries or by imputing a fixed value for each missing entry, and then train a prediction model with the new dataset. A drawback of this naive approach is that the uncertainty in the missing entries is not properly incorporated in the prediction. In order to evaluate prediction uncertainty, the multiple imputation (MI) approach has been studied, but the performance of MI is sensitive to the choice of the probabilistic model of the true values in the missing entries, and the computational cost of MI is high because multiple models must be trained. In this paper, we propose an alternative approach called the Interval-based Prediction Uncertainty Bounding (IPUB) method. The IPUB method represents the uncertainties due to missing entries as intervals, and efficiently computes the lower and upper bounds of the prediction results when all possible training sets constructed by imputing arbitrary values in the intervals are considered. The IPUB method can be applied to a wide class of convex learning algorithms including penalized least-squares regression, support vector machine (SVM), and logistic regression. We demonstrate the advantages of the IPUB method by comparing it with an existing method in numerical experiment with benchmark datasets. |

Intervention Analysis( IA) |
Intervention analysis is the application of modeling procedures for incorporating the effects of exogenous forces or interventions in time series analysis. These interventions, like policy changes, strikes, floods, and price changes, cause unusual changes in time series, resulting in unexpected, extraordinary observations known as outliers. Specifically, four types of outliers resulting from interventions, additive outliers (AO), innovational outliers (IO), temporary changes (TC), and level shifts (LS), have generated a lot of interest in literature. They pose nonstationarity challenges, which cannot be represented by the usual Box and Jenkins (1976) autoregressive integrated moving average (ARIMA) models alone. The most popular modeling procedures are those where “intervention” detection and estimation is paramount. Box and Tiao (1975) pioneered this type of analysis in their quest to solve the Los Angeles pollution problem. Important extensions and contributions have been made by Chan … |

Intervention in Prediction Measure( IPM) |
Random forests are a popular method in many fields since they can be successfully applied to complex data, with a small sample size, complex interactions and correlations, mixed type predictors, etc. Furthermore, they provide variable importance measures that aid qualitative interpretation and also the selection of relevant predictors. However, most of these measures rely on the choice of a performance measure. But measures of prediction performance are not unique or there is not even a clear definition, as in the case of multivariate response random forests. A new alternative importance measure, called Intervention in Prediction Measure, is investigated. It depends on the structure of the trees, without depending on performance measures. It is compared with other well-known variable importance measures in different contexts, such as a classification problem with variables of different types, another classification problem with correlated predictor variables, and problems with multivariate responses and predictors of different types. IPMRF |

Intervention Time Series Analysis( ITSA) |
Intervention time series analysis (ITSA) is an important method for analysing the effect of sudden events on time series data. ITSA methods are quasi-experimental in nature and the validity of modelling with these methods depends upon assumptions about the timing of the intervention and the response of the process to it. |

Intrablocks Correspondence Analysis( IBCA) |
We propose a new method to describe contingency tables with double partition structures in columns and rows. Furthermore, we propose new superimposed representations, based on the introduction of variable dilations for the partial clouds associated with the partitions of the columns and the rows. pamctdp |

Intra-Class Correlation( ICC) |
In statistics, the intraclass correlation (or the intraclass correlation coefficient, abbreviated ICC) is a descriptive statistic that can be used when quantitative measurements are made on units that are organized into groups. It describes how strongly units in the same group resemble each other. While it is viewed as a type of correlation, unlike most other correlation measures it operates on data structured as groups, rather than data structured as paired observations. The intraclass correlation is commonly used to quantify the degree to which individuals with a fixed degree of relatedness (e.g. full siblings) resemble each other in terms of a quantitative trait. Another prominent application is the assessment of consistency or reproducibility of quantitative measurements made by different observers measuring the same quantity. ICC.Sample.Size |

Intrinsic Credible Regions |
This paper defines intrinsic credible regions, a method to produce objective Bayesian credible regions which only depends on the assumed model and the available data. Lowest posterior loss (LPL) regions are defined as Bayesian credible regions which contain values of minimum posterior expected loss: they depend both on the loss function and on the prior specification. An invariant, information-theory based loss function, the intrinsic discrepancy is argued to be appropriate for scientific communication. Intrinsic credible regions are the lowest posterior loss regions with respect to the intrinsic discrepancy loss and the appropriate reference prior. The proposed procedure is completely general, and it is invariant under both reparametrization and marginalization. The exact derivation of intrinsic credible regions often requires numerical integration, but good analytical approximations are provided. Special attention is given to one-dimensional intrinsic credible intervals; their coverage properties show that they are always approximate (and sometimes exact) frequentist confidence intervals. |

Intrinsic Dimension( ID) |
In signal processing of multidimensional signals, for example in computer vision, the intrinsic dimension of the signal describes how many variables are needed to represent the signal. For a signal of N variables, its intrinsic dimension M satisfies 0 = M = N. Usually the intrinsic dimension of a signal relates to variables defined in a Cartesian coordinate system. In general, however, it is also possible to describe the concept for non-Cartesian coordinates, for example, using polar coordinates. IDmining |

Invariant Causal Prediction( ICP) |
InvariantCausalPrediction |

Invariant Coordinate Selection( ICS) |
A general method for exploring multivariate data by comparing different estimates of multivariate scatter is presented. The method is based upon the eigenvalue-eigenvector decomposition of one scatter matrix relative to another. In particular, it is shown that the eigenvectors can be used to generate an affine invariant coordinate system for the multivariate data. Consequently, we view this method as a method for invariant coordinate selection (ICS). By plotting the data with respect to this new invariant coordinate system, various data structures can be revealed. For example, under certain independent components models, it is shown that the invariant coordinates correspond to the independent components. Another example pertains to mixtures of elliptical distributions. In this case, it is shown that a subset of the invariant coordinates corresponds to Fisher’s linear discriminant subspace, even though the class identi cations of the data points are unknown. Invariant Co-Ordinate Selection Multivariate Outlier Detection With ICS ICS |

Invariant Encoding Generative Adversarial Network( IVE-GAN) |
Generative adversarial networks (GANs) are a powerful framework for generative tasks. However, they are difficult to train and tend to miss modes of the true data generation process. Although GANs can learn a rich representation of the covered modes of the data in their latent space, the framework misses an inverse mapping from data to this latent space. We propose Invariant Encoding Generative Adversarial Networks (IVE-GANs), a novel GAN framework that introduces such a mapping for individual samples from the data by utilizing features in the data which are invariant to certain transformations. Since the model maps individual samples to the latent space, it naturally encourages the generator to cover all modes. We demonstrate the effectiveness of our approach in terms of generative performance and learning rich representations on several datasets including common benchmark image generation tasks. |

Invariant Transformer Net |
Convolutional Neural Networks (CNNs) define an exceptionally powerful class of models for image classification, but the theoretical background and the understanding of how invariances to certain transformations are learned is limited. In a large scale screening with images modified by different affine and nonaffine transformations of varying magnitude, we analyzed the behavior of the CNN architectures AlexNet and ResNet. If the magnitude of different transformations does not exceed a class- and transformation dependent threshold, both architectures show invariant behavior. In this work we furthermore introduce a new learnable module, the Invariant Transformer Net, which enables us to learn differentiable parameters for a set of affine transformations. This allows us to extract the space of transformations to which the CNN is invariant and its class prediction robust. |

Inverse Autoregressive Flows( IAF) |
➘ “Neural Autoregressive Flows” |

Inverse Classification |
Inverse classification is the process of perturbing an instance in a meaningful way such that it is more likely to conform to a specific class. Historical methods that address such a problem are often framed to leverage only a single classifier, or specific set of classifiers. These works are often accompanied by naive assumptions. In this work we propose generalized inverse classification (GIC), which avoids restricting the classification model that can be used. We incorporate this formulation into a refined framework in which GIC takes place. Under this framework, GIC operates on features that are immediately actionable. Each change incurs an individual cost, either linear or non-linear. Such changes are subjected to occur within a specified level of cumulative change (budget). Furthermore, our framework incorporates the estimation of features that change as a consequence of direct actions taken (indirectly changeable features). To solve such a problem, we propose three real-valued heuristic-based methods and two sensitivity analysis-based comparison methods, each of which is evaluated on two freely available real-world datasets. Our results demonstrate the validity and benefits of our formulation, framework, and methods. |

Inverse Distance Weighting( IDW) |
Inverse Distance Weighting (IDW) is a type of deterministic method for multivariate interpolation with a known scattered set of points. The assigned values to unknown points are calculated with a weighted average of the values available at the known points. The name given to this type of methods was motivated by the weighted average applied, since it resorts to the inverse of the distance to each known point (‘amount of proximity’) when assigning weights. geosptdb |

Inverse Reinforcement Learning( IRL) |
Inverse Reinforcement Learning (IRL) in Markov decision processes is the problem of extracting a reward function given observed, optimal behavior. |

Inverse Reward Design( IRD) |
Autonomous agents optimize the reward function we give them. What they don’t know is how hard it is for us to design a reward function that actually captures what we want. When designing the reward, we might think of some specific training scenarios, and make sure that the reward will lead to the right behavior in those scenarios. Inevitably, agents encounter new scenarios (e.g., new types of terrain) where optimizing that same reward may lead to undesired behavior. Our insight is that reward functions are merely observations about what the designer actually wants, and that they should be interpreted in the context in which they were designed. We introduce inverse reward design (IRD) as the problem of inferring the true objective based on the designed reward and the training MDP. We introduce approximate methods for solving IRD problems, and use their solution to plan risk-averse behavior in test MDPs. Empirical results suggest that this approach can help alleviate negative side effects of misspecified reward functions and mitigate reward hacking. |

Inverse Visual Question Answering( iVQA) |
In recent years, visual question answering (VQA) has become topical as a long-term goal to drive computer vision and multi-disciplinary AI research. The premise of VQA’s significance, is that both the image and textual question need to be well understood and mutually grounded in order to infer the correct answer. However, current VQA models perhaps `understand’ less than initially hoped, and instead master the easier task of exploiting cues given away in the question and biases in the answer distribution. In this paper we propose the inverse problem of VQA (iVQA), and explore its suitability as a benchmark for visuo-linguistic understanding. The iVQA task is to generate a question that corresponds to a given image and answer pair. Since the answers are less informative than the questions, and the questions have less learnable bias, an iVQA model needs to better understand the image to be successful. We pose question generation as a multi-modal dynamic inference process and propose an iVQA model that can gradually adjust its focus of attention guided by both a partially generated question and the answer. For evaluation, apart from existing linguistic metrics, we propose a new ranking metric. This metric compares the ground truth question’s rank among a list of distractors, which allows the drawbacks of different algorithms and sources of error to be studied. Experimental results show that our model can generate diverse, grammatically correct and content correlated questions that match the given answer. |

I-Optimality |
The generalized linear model plays an important role in statistical analysis and the related design issues are undoubtedly challenging. The state-of-the-art works mostly apply to design criteria on the estimates of regression coefficients. It is of importance to study optimal designs for generalized linear models, especially on the prediction aspects. In this work, we propose a prediction-oriented design criterion, I-optimality, and develop an efficient sequential algorithm of constructing I-optimal designs for generalized linear models. Through establishing the General Equivalence Theorem of the I-optimality for generalized linear models, we obtain an insightful understanding for the proposed algorithm on how to sequentially choose the support points and update the weights of support points of the design. The proposed algorithm is computationally efficient with guaranteed convergence property. Numerical examples are conducted to evaluate the feasibility and computational efficiency of the proposed algorithm. |

IPMAN |
We present a new methodology, called IPMAN, that combines interior point methods and generative adversarial networks to solve constrained optimization problems with feasible sets that are non-convex or not explicitly defined. Our methodology produces {\epsilon}-optimal solutions and demonstrates that, when there are multiple global optima, it learns a distribution over the optimal set. We apply our approach to synthetic examples to demonstrate its effectiveness and to a problem in radiation therapy treatment optimization with a non-convex feasible set. |

Iris |
Today’s conversational agents are restricted to simple standalone commands. In this paper, we present Iris, an agent that draws on human conversational strategies to combine commands, allowing it to perform more complex tasks that it has not been explicitly designed to support: for example, composing one command to ‘plot a histogram’ with another to first ‘log-transform the data’. To enable this complexity, we introduce a domain specific language that transforms commands into automata that Iris can compose, sequence, and execute dynamically by interacting with a user through natural language, as well as a conversational type system that manages what kinds of commands can be combined. We have designed Iris to help users with data science tasks, a domain that requires support for command combination. In evaluation, we find that data scientists complete a predictive modeling task significantly faster (2.6 times speedup) with Iris than a modern non-conversational programming environment. Iris supports the same kinds of commands as today’s agents, but empowers users to weave together these commands to accomplish complex goals. |

Irregular Convolutional Neural Network( ICNN) |
Convolutional kernels are basic and vital components of deep Convolutional Neural Networks (CNN). In this paper, we equip convolutional kernels with shape attributes to generate the deep Irregular Convolutional Neural Networks (ICNN). Compared to traditional CNN applying regular convolutional kernels like ${3\times3}$, our approach trains irregular kernel shapes to better fit the geometric variations of input features. In other words, shapes are learnable parameters in addition to weights. The kernel shapes and weights are learned simultaneously during end-to-end training with the standard back-propagation algorithm. Experiments for semantic segmentation are implemented to validate the effectiveness of our proposed ICNN. |

Irrelevant Variability |
We say that data variability is correlated with a specific task “if the removal of this variability from the data deteriorates (on average) the results of clustering or retrieval”. Variability is irrelevant if it is “maintained in the data” but “not correlated with the specific task” |

Isomeric Condition |
Prevalent matrix completion theories reply on an assumption that the locations of missing data are distributed independently and randomly (i.e., uniform sampling). Nevertheless, the reason for an observation being missing often depends on the unseen observations themselves, and thus the locations of the missing data in practice usually occur in a correlated fashion (i.e., nonuniform sampling) rather than independently. To break through the limits of uniform sampling, we introduce in this work a new hypothesis called isomeric condition, which is provably weaker than the assumption of uniform sampling. Equipped with this new tool, we prove a collection of theorems for missing data recovery as well as matrix completion. In particular, we prove that the exact solutions that identify the target matrix are included as critical points by the commonly used bilinear programs. Even more, when an extra condition called relative well-conditionedness is obeyed as well, we prove that the local optimality of the exact solutions is guaranteed in a deterministic fashion. Among other things, we study in detail a Schatten quasi-norm induced method termed isomeric dictionary pursuit (IsoDP), and we show that IsoDP exhibits some distinct behaviors absent in the traditional bilinear programs. |

Isometry Blind Dynamic Time Warping( IBDTW) |
In this work, we explore the problem of aligning two time-ordered point clouds which are spatially transformed and re-parameterized versions of each other. This has a diverse array of applications such as cross modal time series synchronization (e.g. MOCAP to video) and alignment of discretized curves in images. Most other works that address this problem attempt to jointly uncover a spatial alignment and correspondences between the two point clouds, or to derive local invariants to spatial transformations such as curvature before computing correspondences. By contrast, we sidestep spatial alignment completely by using self-similarity matrices (SSMs) as a proxy to the time-ordered point clouds, since self-similarity matrices are blind to isometries and respect global geometry. Our algorithm, dubbed ‘Isometry Blind Dynamic Time Warping’ (IBDTW), is simple and general, and we show that its associated dissimilarity measure lower bounds the L1 Gromov-Hausdorff distance between the two point sets when restricted to warping paths. We also present a local, partial alignment extension of IBDTW based on the Smith Waterman algorithm. This eliminates the need for tedious manual cropping of time series, which is ordinarily necessary for global alignment algorithms to function properly. |

Isotonic Proportional Hazards Model |
isoph |

Isotonic Regression( IR) |
General isotonic regression is approximating given series of values with values satisfying a given partial ordering. The idea is to fit a piecewise-constant non-decreasing function to the data. http://…/Isotonic_regression |

ISOTYPE |
Isotype (International System of TYpographic Picture Education) is a method of showing social, technological, biological and historical connections in pictorial form. It was first known as the Vienna Method of Pictorial Statistics (Wiener Methode der Bildstatistik), due to its having been developed at the Gesellschafts- und Wirtschaftsmuseum in Wien (Social and economic museum of Vienna) between 1925 and 1934. The founding director of this museum, Otto Neurath, was the initiator and chief theorist of the Vienna Method. The term Isotype was applied to the method around 1935, after its key practitioners were forced to leave Vienna by the rise of Austrian fascism. http://…/Haroz_CHI_2015.pdf |

IT Operations Analytics( ITOA) |
In the fields of information technology and systems management, IT Operations Analytics (ITOA) is an approach or method applied to application software designed to retrieve, analyze and report data for IT operations. ITOA has been described as applying big data analytics to the IT realm. In its Hype Cycle Report, Gartner rated the business impact of ITOA as being ‘high’, meaning that its use will see businesses enjoy significantly increased revenue or cost saving opportunities. IT Operations Analytics (ITOA) (also known as Advanced Operational Analytics, or IT Data Analytics) technologies are primarily used to discover complex patterns in high volumes of often ‘noisy’ IT system availability and performance data. Forrester Research defines IT analytics as ‘The use of mathematical algorithms and other innovations to extract meaningful information from the sea of raw data collected by management and monitoring technologies.’ Taking a Horizontal Approach to Big Data for Better IT and Business Outcomes |

Item Explorer |
Item explorer is an approach to provide insights into a ubiquitous class of business questions like: • what kind of products do customers typically buy together? • what kind of web pages (on a web site) do users visit? • what combination of symptoms do patients have? • … For this class of business questions, the exponential number of combinations poses a severe practical challenge. Due to the explorative nature, visualization is well-suited for such business questions. More specifically, a visualization can provide a unique representation for both revealing insights and for intuitive user interaction based on business knowledge or own hypotheses. |

Item Factor Analysis |
➘ “Item Response Theory” ifaTools |

Item Response Theory( IRT) |
Item response theory (IRT) models are a class of statistical models used to describe the response behaviors of individuals to a set of items having a certain number of options. They are adopted by researchers in social science, particularly in the analysis of performance or attitudinal data, in psychology, education, medicine, marketing and other fields where the aim is to measure latent constructs. Most IRT analyses use parametric models that rely on assumptions that often are not satisfied. In such cases, a nonparametric approach might be preferable; nevertheless, there are not many software implementations allowing to use that. MLCIRTwithin |

Iterated Filtering |
Iterated filtering algorithms are a tool for maximum likelihood inference on partially observed dynamical systems. Stochastic perturbations to the unknown parameters are used to explore the parameter space. Applying sequential Monte Carlo (the particle filter) to this extended model results in the selection of the parameter values that are more consistent with the data. Appropriately constructed procedures, iterating with successively diminished perturbations, converge to the maximum likelihood estimate. Iterated filtering methods have so far been used most extensively to study infectious disease transmission dynamics. Case studies include cholera, Ebola virus, influenza, malaria, HIV, pertussis, poliovirus and measles. Other areas which have been proposed to be suitable for these methods include ecological dynamics and finance. The perturbations to the parameter space play several different roles. Firstly, they smooth out the likelihood surface, enabling the algorithm to overcome small-scale features of the likelihood during early stages of the global search. Secondly, Monte Carlo variation allows the search to escape from local minima. Thirdly, the iterated filtering update uses the perturbed parameter values to construct an approximation to the derivative of the log likelihood even though this quantity is not typically available in closed form. Fourthly, the parameter perturbations help to overcome numerical difficulties that can arise during sequential Monte Carlo. Accelerate iterated filtering |

Iterative Classification Algorithm( ICA) |
see also ➘ “Recurrent Collective Classification” |

Iterative Compressed-Thresholding and K-Means( IcTKM) |
In this paper we show that the computational complexity of the Iterative Thresholding and K-Residual-Means (ITKrM) algorithm for dictionary learning can be significantly reduced by using dimensionality reduction techniques based on the Johnson-Lindenstrauss Lemma. We introduce the Iterative Compressed-Thresholding and K-Means (IcTKM) algorithm for fast dictionary learning and study its convergence properties. We show that IcTKM can locally recover a generating dictionary with low computational complexity up to a target error $\tilde{\varepsilon}$ by compressing $d$-dimensional training data into $m < d$ dimensions, where $m$ is proportional to $\log d$ and inversely proportional to the distortion level $\delta$ incurred by compressing the data. Increasing the distortion level $\delta$ reduces the computational complexity of IcTKM at the cost of an increased recovery error and reduced admissible sparsity level for the training data. For generating dictionaries comprised of $K$ atoms, we show that IcTKM can stably recover the dictionary with distortion levels up to the order $\delta \leq O(1/\sqrt{\log K})$. The compression effectively shatters the data dimension bottleneck in the computational cost of the ITKrM algorithm. For training data with sparsity levels $S \leq O(K^{2/3})$, ITKrM can locally recover the dictionary with a computational cost that scales as $O(d K \log(\tilde{\varepsilon}^{-1}))$ per training signal. We show that for these same sparsity levels the computational cost can be brought down to $O(\log^5 (d) K \log(\tilde{\varepsilon}^{-1}))$ with IcTKM, a significant reduction when high-dimensional data is considered. Our theoretical results are complemented with numerical simulations which demonstrate that IcTKM is a powerful, low-cost algorithm for learning dictionaries from high-dimensional data sets. |

Iterative Dichotomiser 3( ID3) |
In decision tree learning, ID3 (Iterative Dichotomiser 3) is an algorithm invented by Ross Quinlan used to generate a decision tree from a dataset. ID3 is the precursor to the C4.5 algorithm, and is typically used in the machine learning and natural language processing domains. |

Iterative Method |
In computational mathematics, an iterative method is a mathematical procedure that generates a sequence of improving approximate solutions for a class of problems. A specific implementation of an iterative method, including the termination criteria, is an algorithm of the iterative method. An iterative method is called convergent if the corresponding sequence converges for given initial approximations. A mathematically rigorous convergence analysis of an iterative method is usually performed; however, heuristic-based iterative methods are also common. In the problems of finding the root of an equation (or a solution of a system of equations), an iterative method uses an initial guess to generate successive approximations to a solution. In contrast, direct methods attempt to solve the problem by a finite sequence of operations. In the absence of rounding errors, direct methods would deliver an exact solution (like solving a linear system of equations Ax=b by Gaussian elimination). Iterative methods are often the only choice for nonlinear equations. However, iterative methods are often useful even for linear problems involving a large number of variables (sometimes of the order of millions), where direct methods would be prohibitively expensive (and in some cases impossible) even with the best available computing power. |

Iterative Proportional Fitting Procedure( IPFP) |
The iterative proportional fitting procedure (IPFP, also known as biproportional fitting in statistics, RAS algorithm in economics and matrix raking or matrix scaling in computer science) is an iterative algorithm for estimating cell values of a contingency table such that the marginal totals remain fixed and the estimated table decomposes into an outer product. mipfp |

Iterative Self-Organizing Data Analysis Technique( ISODATA) |
This is a more sophisticated algorithm which allows the number of clusters to be automatically adjusted during the iteration by merging similar clusters and splitting clusters with large standard deviations. |

Iterative Sequential Regression( ISR) |
Imputation of missing values is one of the major tasks for data pre-processing in many areas. Whenever imputation of data from o cial statistics comes into mind, several (additional) challenges almost always arise, like large data sets, data sets consisting of a mixture of di erent variable types, or data outliers. The aim is to propose an automatic algorithm called IRMI for iterative model-based imputation using robust methods, encountering for the mentioned challenges, and to provide a software tool in R. This algorithm is compared to the algorithm IVEWARE, which is the \recommended software’ for imputations in international and national statistical institutions. Using arti cial data and real data sets from o cial statistics and other elds, the advantages of IRMI over IVEWARE { especially with respect to robustness { are demonstrated. ISR3 |

Iterative Supervised Principal Components( ISPC) |
In high-dimensional prediction problems, where the number of features may greatly exceed the number of training instances, fully Bayesian approach with a sparsifying prior is known to produce good results but is computationally challenging. To alleviate this computational burden, we propose to use a preprocessing step where we first apply a dimension reduction to the original data to reduce the number of features to something that is computationally conveniently handled by Bayesian methods. To do this, we propose a new dimension reduction technique, called iterative supervised principal components (ISPC), which combines variable screening and dimension reduction and can be considered as an extension to the existing technique of supervised principal components (SPCs). Our empirical evaluations confirm that, although not foolproof, the proposed approach provides very good results on several microarray benchmark datasets with very affordable computation time, and can also be very useful for visualizing high-dimensional data. |

Iterative Thresholding and K-Residual Means( ITKrM) |
Dictionary learning – from local towards global and adaptive Compressed Dictionary Learning |

Iterative Weighted Least Squares( IWLS) |
The Iterative Weighted Least Squares (IWLS) method is one of the estimation procedures in logistic regression modeling. |

Iteratively Reweighted Least Squares( IRLS) |
IRLS is used to find the maximum likelihood estimates of a generalized linear model, and in robust regression to find an M-estimator, as a way of mitigating the influence of outliers in an otherwise normally-distributed data set. For example, by minimizing the least absolute error rather than the least square error. Although not a linear regression problem, Weiszfeld’s algorithm for approximating the geometric median can also be viewed as a special case of iteratively reweighted least squares, in which the objective function is the sum of distances of the estimator from the samples. One of the advantages of IRLS over linear programming and convex programming is that it can be used with Gauss-Newton and Levenberg-Marquardt numerical algorithms. |

Advertisements