Naive Bayes Classifier A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be “independent feature model”. An overview of statistical classifiers is given in the article on pattern recognition.
Named Entity Extraction
Named Entity Recognition
Named-entity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. Most research on NER systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp. in 2006. And producing an annotated block of text that highlights the names of entities: Person bought 300 shares of Organization in Time. In this example, a person name consisting of one token, a two-token company name and a temporal expression have been detected and classified. State-of-the-art NER systems for English produce near-human performance. For example, the best system entering MUC-7 scored 93.39% of F-measure while human annotators scored 97.60% and 96.95%.
Named Entity Recognition and Classification
The term ‘Named Entity’, now widely used in Natural Language Processing, was coined for the Sixth Message Understanding Conference (MUC-6) (R. Grishman & Sundheim 1996). At that time, MUC was focusing on Information Extraction (IE) tasks where structured information of company activities and defense related activities is extracted from unstructured text, such as newspaper articles. In defining the task, people noticed that it is essential to recognize information units like names, including person, organization and location names, and numeric expressions including time, date, money and percent expressions. Identifying references to these entities in text was recognized as one of the important sub-tasks of IE and was called ‘Named Entity Recognition and Classification (NERC)’.
Named Entity Recognizer
Stanford NER is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with well-engineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION), and we also make available on this page various other models for different languages and circumstances, including models trained on just the CoNLL 2003 English training data. The distributional similarity features in some models improve performance but the models require considerably more memory. Stanford NER is also known as CRFClassifier. The software provides a general implementation of (arbitrary order) linear chain Conditional Random Field (CRF) sequence models. That is, by training your own models, you can actually use this code to build sequence models for any task. (CRF models were pioneered by Lafferty, McCallum, and Pereira (2001); see Sutton and McCallum (2006) or Sutton and McCallum (2010) for more comprehensible introductions.)
Named Entity Sequence Classification
Named Entity Recognition (NER) aims at locating and classifying named entities in text. In some use cases of NER, including cases where detected named entities are used in creating content recommendations, it is crucial to have a reliable confidence level for the detected named entities. In this work we study the problem of finding confidence levels for detected named entities. We refer to this problem as Named Entity Sequence Classification (NESC). We frame NESC as a binary classification problem and we use NER as well as recurrent neural networks to find the probability of candidate named entity is a real named entity. We apply this approach to Tweet texts and we show how we could find named entities with high confidence levels from Tweets.
Natural Language Aggregate Query
Natural language question-answering over RDF data has received widespread attention. Although there have been several studies that have dealt with a small number of aggregate queries, they have many restrictions (i.e., interactive information, controlled question or query template). Thus far, there has been no natural language querying mechanism that can process general aggregate queries over RDF data. Therefore, we propose a framework called NLAQ (Natural Language Aggregate Query). First, we propose a novel algorithm to automatically understand a users query intention, which mainly contains semantic relations and aggregations. Second, to build a better bridge between the query intention and RDF data, we propose an extended paraphrase dictionary ED to obtain more candidate mappings for semantic relations, and we introduce a predicate-type adjacent set PT to filter out inappropriate candidate mapping combinations in semantic relations and basic graph patterns. Third, we design a suitable translation plan for each aggregate category and effectively distinguish whether an aggregate item is numeric or not, which will greatly affect the aggregate result. Finally, we conduct extensive experiments over real datasets (QALD benchmark and DBpedia), and the experimental results demonstrate that our solution is effective.
Natural Language Generation Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. Psycholinguists prefer the term language production when such formal representations are interpreted as models for mental representations.
It could be said an NLG system is like a translator that converts a computer based representation into a natural language representation. However, the methods to produce the final language are different from those of a compiler due to the inherent expressivity of natural languages.
NLG may be viewed as the opposite of natural language understanding: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words.
Simple examples are systems that generate form letters. These do not typically involve grammar rules, but may generate a letter to a consumer, e.g. stating that a credit card spending limit was reached. More complex NLG systems dynamically create texts to meet a communicative goal. As in other areas of natural language processing, this can be done using either explicit models of language (e.g., grammars) and the domain, or using statistical models derived by analysing human-written texts.
Natural Language Inference
Inference has been a central topic in artificial intelligence from the start, but while automatic methods for formal deduction have advanced tremendously, comparatively little progress has been made on the problem of natural language inference (NLI), that is, determining whether a natural language hypothesis h can justifiably be inferred from a natural language premise p. The challenges of NLI are quite different from those encountered in formal deduction: the emphasis is on informal reasoning, lexical semantic knowledge, and variability of linguistic expression.
Natural Language Processing
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human – computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation.
Natural Language Query A natural language query consists only of normal terms in the user’s language, without any special syntax or format.
Natural Language Toolkit
The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. NLTK includes graphical demonstrations and sample data. It is accompanied by a book that explains the underlying concepts behind the language processing tasks supported by the toolkit, plus a cookbook. NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning. NLTK has been used successfully as a teaching tool, as an individual study tool, and as a platform for prototyping and building research systems.
Natural Language Understanding
Natural language understanding (NLU) is a subtopic of natural language processing in artificial intelligence that deals with machine reading comprehension. NLU is considered an AI-hard problem. The process of disassembling and parsing input is more complex than the reverse process of assembling output in natural language generation because of the occurrence of unknown and unexpected features in the input and the need to determine the appropriate syntactic and semantic schemes to apply to it, factors which are pre-determined when outputting language.[dubious – discuss] There is considerable commercial interest in the field because of its application to news-gathering, text categorization, voice-activation, archiving, and large-scale content-analysis.
Natural Parameter Networks
Neural networks (NN) have achieved state-of-the-art performance in various applications. Unfortunately in applications where training data is insufficient, they are often prone to overfitting. One effective way to alleviate this problem is to exploit the Bayesian approach by using Bayesian neural networks (BNN). Another shortcoming of NN is the lack of flexibility to customize different distributions for the weights and neurons according to the data, as is often done in probabilistic graphical models. To address these problems, we propose a class of probabilistic neural networks, dubbed natural-parameter networks (NPN), as a novel and lightweight Bayesian treatment of NN. NPN allows the usage of arbitrary exponential-family distributions to model the weights and neurons. Different from traditional NN and BNN, NPN takes distributions as input and goes through layers of transformation before producing distributions to match the target output distributions. As a Bayesian treatment, efficient backpropagation (BP) is performed to learn the natural parameters for the distributions over both the weights and neurons. The output distributions of each layer, as byproducts, may be used as second-order representations for the associated tasks such as link prediction. Experiments on real-world datasets show that NPN can achieve state-of-the-art performance.
ND4J ND4J is a scientific computing library for the JVM. It is meant to be used in production environments rather than as a research tool, which means routines are designed to run fast with minimum RAM requirements.
NearBucket Locality Sensitive Hashing
We present NearBucket-LSH, an effective algorithm for similarity search in large-scale distributed online social networks organized as peer-to-peer overlays. As communication is a dominant consideration in distributed systems, we focus on minimizing the network cost while guaranteeing good search quality. Our algorithm is based on Locality Sensitive Hashing (LSH), which limits the search to collections of objects, called buckets, that have a high probability to be similar to the query. More specifically, NearBucket-LSH employs an LSH extension that searches in near buckets, and improves search quality but also significantly increases the network cost. We decrease the network cost by considering the internals of both LSH and the P2P overlay, and harnessing their properties to our needs. We show that our NearBucket-LSH increases search quality for a given network cost compared to previous art. In many cases, the search quality increases by more than 50%.
Nearest Descent
Nearest Neighbor Descent
Near-Far Matching Near-far matching is a study design technique for preprocessing observational data to mimic a pair-randomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable.
Necessary Condition Analysis
Theoretical ‘necessary but not sufficient’ statements are common in the organizational sciences. Traditional data analyses approaches (e.g., correlation or multiple regression) are not appropriate for testing or inducing such statements. This paper proposes Necessary Condition Analysis (NCA) as a general and straightforward methodology for identifying necessary conditions in datasets. The paper presents the logic and methodology of necessary but not sufficient contributions of organizational determinants (e.g., events, characteristics, resources, efforts) to a desired outcome (e.g., good performance). A necessary determinant must be present for achieving an outcome, but its presence is not sufficient to obtain that outcome. Without the necessary condition, there is guaranteed failure, which cannot be compensated by other determinants of the outcome. This logic and its related methodology are fundamentally different from the traditional sufficiency-based logic and methodology. Practical recommendations and free software are offered to support researchers to apply NCA.
Negative Binomial Regression
Negative binomial regression is for modeling count variables, usually for over-dispersed count outcome variables.
Nelder-Mead Method The Nelder-Mead method or downhill simplex method or amoeba method is a commonly applied numerical method used to find the minimum or maximum of an objective function in a many-dimensional space. It is applied to nonlinear optimization problems for which derivatives may not be known. However, the Nelder-Mead technique is a heuristic search method that can converge to non-stationary points on problems that can be solved by alternative methods. The Nelder-Mead technique was proposed by John Nelder & Roger Mead (1965).
Neo4j Neo4j is an open-source graph database, implemented in Java. The developers describe Neo4j as ’embedded, disk-based, fully transactional Java persistence engine that stores data structured in graphs rather than in tables’. Neo4j is the most popular graph database. Neo4j version 1.0 was released in February, 2010. The community edition of the database is licensed under the free GNU General Public License (GPL) v3. The additional modules, such as online backup and high availability, are licensed under the free Affero General Public License (AGPL) v3. The database, with the additional modules, is also available under a commercial license, in a dual license model. Neo4j version 2.0 was released in December, 2013. Neo4j was developed by Neo Technology, Inc., based in the San Francisco Bay Area, US and Malmö, Sweden.
Nested Association Mapping
Nested association mapping (NAM) is a technique designed by the labs of Edward Buckler, James Holland, and Michael McMullen for identifying and dissecting the genetic architecture of complex traits in corn (Zea mays). It is important to note that nested association mapping (unlike Association mapping) is a specific technique that cannot be performed outside of a specifically designed population such as the Maize NAM population.
Nested Chinese Restaurant Process
The nested Chinese restaurant process (nCRP) is a stochastic process that assigns probability distributions to ensembles of inÞnitely deep, inÞnitely branching trees.
Nested Dirichlet Process Mixture of Products of Multinomial Distributions
We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a group-level latent class, and (ii) each unit is a member of a unit-level latent class nested within its group-level latent class. This structure allows the model to capture dependence among units in the same group. It also fa- cilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the Ameri- can Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use les. Supplementary materials for this article are available online.
Nested Error Regression Model This paper suggests the nested error regression model, with use of uncertain random effects, which means that the random effects in each area are expressed as a mixture of a normal distribution and a positive mass at 0. For estimation of model parameters and prediction of random effects, we consider Bayesian yet objective inference by setting improper prior distributions on the model parameters. We show under the mild sufficient condition that the posterior distribution is proper and the posterior variances are finite to confirm validity of posterior inference. To generate samples from the posterior distribution, we provide the Gibbs sampling method. The full conditional distributions of the posterior distribution are all familiar forms such that the proposed methodology is easy to implement. This paper also addresses the problem of prediction of finite population means and we provide a sampling based method to tackle this issue. We compare the proposed model with the conventional nested error regression model through simulation and empirical studies.
Nested Sampling Algorithm The nested sampling algorithm is a computational approach to the problem of comparing models in Bayesian statistics, developed in 2004 by physicist John Skilling.
Nesterov’s Accelerated Gradient
Nesterov’s Accelerated Gradient Descent performs a simple step of gradient descent to go from x_s to y_{s+1}, and then it ‘slides’ a little bit further than y_{s+1} in the direction given by the previous point y_s. The intuition behind the algorithm is quite difficult to grasp, and unfortunately the analysis will not be very enlightening either. Nonetheless Nesterov’s Accelerated Gradient is an optimal method (in terms of oracle complexity) for smooth convex optimization,
Net Reclassification Improvement
Net Reclassification Improvement (NRI) described in the paper: Jialiang Li (2013) <doi:10.1093/biostatistics/kxs047>.
Net# Neural networks are one of the most popular machine learning algorithms today. One of the challenges when using neural networks is how to define a network topology given the variety of possible layer types, connections among them, and activation functions. Net# solves this problem by providing a succinct way to define almost any neural network architecture in a descriptive, easy-to-read format. This post provides a short tutorial for building a neural network using the Net# language to classify images of handwritten numeric digits in Microsoft Azure Machine Learning.
Net2Vec In an effort to understand the meaning of the intermediate representations captured by deep networks, recent papers have tried to associate specific semantic concepts to individual neural network filter responses, where interesting correlations are often found, largely by focusing on extremal filter responses. In this paper, we show that this approach can favor easy-to-interpret cases that are not necessarily representative of the average behavior of a representation. A more realistic but harder-to-study hypothesis is that semantic representations are distributed, and thus filters must be studied in conjunction. In order to investigate this idea while enabling systematic visualization and quantification of multiple filter responses, we introduce the Net2Vec framework, in which semantic concepts are mapped to vectorial embeddings based on corresponding filter responses. By studying such embeddings, we are able to show that 1., in most cases, multiple filters are required to code for a concept, that 2., often filters are not concept specific and help encode multiple concepts, and that 3., compared to single filter activations, filter embeddings are able to better characterize the meaning of a representation and its relationship to other concepts.
netinf Given a set of events that spread between a set of nodes the algorithm infers the most likely stable diffusion network that is underlying the diffusion process.
Network Analysis Network analysis is a quantitative methodology for studying properties related to connectivity and distances in graphs, with diverse applications like citation indexing and information retrieval on the Web.
“Network Theory”
“Social Network Analysis”
A Short Course on Network Analysis
Network Analysis for Wikipedia
Network Based Diffusion Analysis
Social learning has been documented in a wide diversity of animals. In free-living animals, however, it has been difficult to discern whether animals learn socially by observing other group members or asocially by acquiring a new behaviour independently. We addressed this challenge by developing network-based diffusion analysis (NBDA), which analyses the spread of traits through animal groups and takes into account that social network structure directs social learning opportunities. NBDA fits agent-based models of social and asocial learning to the observed data using maximum-likelihood estimation. The underlying learning mechanism can then be identified using model selection based on the Akaike information criterion.
Network for Adversary Generation
Adversarial perturbations can pose a serious threat for deploying machine learning systems. Recent works have shown existence of image-agnostic perturbations that can fool classifiers over most natural images. Existing methods present optimization approaches that solve for a fooling objective with an imperceptibility constraint to craft the perturbations. However, for a given classifier, they generate one perturbation at a time, which is a single instance from the manifold of adversarial perturbations. Also, in order to build robust models, it is essential to explore the manifold of adversarial perturbations. In this paper, we propose for the first time, a generative approach to model the distribution of adversarial perturbations. The architecture of the proposed model is inspired from that of GANs and is trained using fooling and diversity objectives. Our trained generator network attempts to capture the distribution of adversarial perturbations for a given classifier and readily generates a wide variety of such perturbations. Our experimental evaluation demonstrates that perturbations crafted by our model (i) achieve state-of-the-art fooling rates, (ii) exhibit wide variety and (iii) deliver excellent cross model generalizability. Our work can be deemed as an important step in the process of inferring about the complex manifolds of adversarial perturbations.
Network In Network
We propose a novel deep network structure called ‘Network In Network’ (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the state-of-the-art classification performances with NIN on CIFAR-10 and CIFAR-100, and reasonable performances on SVHN and MNIST datasets.
Network Lasso
A recently proposed learning algorithm for massive network-structured data sets (big data over networks) is the network Lasso (nLasso), which extends the well- known Lasso estimator from sparse models to network-structured datasets. Efficient implementations of the nLasso have been presented using modern convex optimization methods.
Network Mapping Network mapping is the study of the physical connectivity of networks. Internet mapping is the study of the physical connectivity of the Internet. Network mapping discovers the devices on the network and their connectivity. It is not to be confused with network discovery or network enumerating which discovers devices on the network and their characteristics such as (operating system, open ports, listening network services, etc.). The field of automated network mapping has taken on greater importance as networks become more dynamic and complex in nature.
Network Maximal Correlation
We introduce Network Maximal Correlation (NMC) as a multivariate measure of nonlinear association among random variables. NMC is defined via an optimization that infers (non-trivial) transformations of variables by maximizing aggregate inner products between transformed variables. We characterize a solution of the NMC optimization using geometric properties of Hilbert spaces for finite discrete and jointly Gaussian random variables. For finite discrete variables, we propose an algorithm based on alternating conditional expectation to determine NMC. We also show that empirically computed NMC converges to NMC exponentially fast in sample size. For jointly Gaussian variables, we show that under some conditions the NMC optimization is an instance of the Max-Cut problem. We then illustrate an application of NMC and multiple MC in inference of graphical model for bijective, possibly non-monotone, functions of jointly Gaussian variables generalizing the copula setup developed by Liu et al. Finally, we illustrate NMC’s utility in a real data application of learning nonlinear dependencies among genes in a cancer dataset.
Network Meta-Analysis I present methods for assessing the relative effectiveness of two treatments when they have not been compared directly in a randomized trial but have each been compared to other treatments. These network meta-analysis techniques allow estimation of both heterogeneity in the effect of any given treatment and inconsistency (‘incoherence’) in the evidence from different pairs of treatments.
Network Scale Up Method
The network scale-up method was developed by a team of researchers under grants from the U. S. National Science Foundation to H. Russell Bernard and Christopher McCarty at the University of Florida. The method can be applied now to estimating the size of hard-to-count (or impossible-to-count) populations but the method is a work in progress. Each new application provides data for improving the validity and accuracy of the estimates. As with the development of the model, these improvements require the efforts of survey researchers, mathematicians, and ethnographers. The network scale-up method was developed in conjunction with our team’s research on the rules governing who people know and how they know them. The particular list of people who people come to know in a lifetime may appear random, but the rules governing who we come to know are surely not random. One basic component of social structure is the number of people whom people know.
Network Science Network science is an interdisciplinary academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as ‘the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena.’
Network Sketching Convolutional neural networks (CNNs) with deep architectures have substantially advanced the state-of-the-art in computer vision tasks. However, deep networks are typically resource-intensive and thus difficult to be deployed on mobile devices. Recently, CNNs with binary weights have shown compelling efficiency to the community, whereas the accuracy of such models is usually unsatisfactory in practice. In this paper, we introduce network sketching as a novel technique of pursuing binary-weight CNNs, targeting at more faithful inference and better trade-off for practical applications. Our basic idea is to exploit binary structure directly in pre-trained filter banks and produce binary-weight models via tensor expansion. The whole process can be treated as a coarse-to-fine model approximation, akin to the pencil drawing steps of outlining and shading. To further speedup the generated models, namely the sketches, we also propose an associative implementation of binary tensor convolutions. Experimental results demonstrate that a proper sketch of AlexNet (or ResNet) outperforms the existing binary-weight models by large margins on the ImageNet large scale classification task, while the committed memory for network parameters only exceeds a little.
Network Theory In computer and network science, network theory is the study of graphs as a representation of either symmetric relations or, more generally, of asymmetric relations between discrete objects. Network theory is a part of graph theory. It has applications in many disciplines including statistical physics, particle physics, computer science, electrical engineering, biology, economics, operations research, and sociology. Applications of network theory include logistical networks, the World Wide Web, Internet, gene regulatory networks, metabolic networks, social networks, epistemological networks, etc; see List of network theory topics for more examples. Euler’s solution of the Seven Bridges of Königsberg problem is considered to be the first true proof in the theory of networks.
Network Vector We propose a neural embedding algorithm called Network Vector, which learns distributed representations of nodes and the entire networks simultaneously. By embedding networks in a low-dimensional space, the algorithm allows us to compare networks in terms of structural similarity and to solve outstanding predictive problems. Unlike alternative approaches that focus on node level features, we learn a continuous global vector that captures each node’s global context by maximizing the predictive likelihood of random walk paths in the network. Our algorithm is scalable to real world graphs with many nodes. We evaluate our algorithm on datasets from diverse domains, and compare it with state-of-the-art techniques in node classification, role discovery and concept analogy tasks. The empirical results show the effectiveness and the efficiency of our algorithm.
Neumann Optimizer Progress in deep learning is slowed by the days or weeks it takes to train large models. The natural solution of using more hardware is limited by diminishing returns, and leads to inefficient use of additional resources. In this paper, we present a large batch, stochastic optimization algorithm that is both faster than widely used algorithms for fixed amounts of computation, and also scales up substantially better as more computational resources become available. Our algorithm implicitly computes the inverse Hessian of each mini-batch to produce descent directions; we do so without either an explicit approximation to the Hessian or Hessian-vector products. We demonstrate the effectiveness of our algorithm by successfully training large ImageNet models (Inception-V3, Resnet-50, Resnet-101 and Inception-Resnet-V2) with mini-batch sizes of up to 32000 with no loss in validation error relative to current baselines, and no increase in the total number of steps. At smaller mini-batch sizes, our optimizer improves the validation error in these models by 0.8-0.9%. Alternatively, we can trade off this accuracy to reduce the number of training steps needed by roughly 10-30%. Our work is practical and easily usable by others — only one hyperparameter (learning rate) needs tuning, and furthermore, the algorithm is as computationally cheap as the commonly used Adam optimizer.
Neural Architecture Search
Neural Architecture Search (NAS) is a Reinforcement Learning approach that has been proposed to automate architecture design. NAS has been successfully applied to generate Neural Networks that rival the best human-designed architectures. However, NAS requires sampling, constructing, and training hundreds to thousands of models to achieve well-performing architectures. This procedure needs to be executed from scratch for each new task. The application of NAS to a wide set of tasks currently lacks a way to transfer generalizable knowledge across tasks.
Neural Block Sampling Efficient Monte Carlo inference often requires manual construction of model-specific proposals. We propose an approach to automated proposal construction by training neural networks to provide fast approximations to block Gibbs conditionals. The learned proposals generalize to occurrences of common structural motifs both within a given model and across models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no model-specific training required. We explore several applications including open-universe Gaussian mixture models, in which our learned proposals outperform a hand-tuned sampler, and a real-world named entity recognition task, in which our sampler’s ability to escape local modes yields higher final F1 scores than single-site Gibbs.
Neural Collaborative Filtering
In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation — collaborative filtering — on the basis of implicit feedback. Although some recent work has employed deep learning for recommendation, they primarily used it to model auxiliary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in collaborative filtering — the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items. By replacing the inner product with a neural architecture that can learn an arbitrary function from data, we present a general framework named NCF, short for Neural network-based Collaborative Filtering. NCF is generic and can express and generalize matrix factorization under its framework. To supercharge NCF modelling with non-linearities, we propose to leverage a multi-layer perceptron to learn the user-item interaction function. Extensive experiments on two real-world datasets show significant improvements of our proposed NCF framework over the state-of-the-art methods. Empirical evidence shows that using deeper layers of neural networks offers better recommendation performance.
Neural Component Analysis
Principal component analysis (PCA) is largely adopted for chemical process monitoring and numerous PCA-based systems have been developed to solve various fault detection and diagnosis problems. Since PCA-based methods assume that the monitored process is linear, nonlinear PCA models, such as autoencoder models and kernel principal component analysis (KPCA), has been proposed and applied to nonlinear process monitoring. However, KPCA-based methods need to perform eigen-decomposition (ED) on the kernel Gram matrix whose dimensions depend on the number of training data. Moreover, prefixed kernel parameters cannot be most effective for different faults which may need different parameters to maximize their respective detection performances. Autoencoder models lack the consideration of orthogonal constraints which is crucial for PCA-based algorithms. To address these problems, this paper proposes a novel nonlinear method, called neural component analysis (NCA), which intends to train a feedforward neural work with orthogonal constraints such as those used in PCA. NCA can adaptively learn its parameters through backpropagation and the dimensionality of the nonlinear features has no relationship with the number of training samples. Extensive experimental results on the Tennessee Eastman (TE) benchmark process show the superiority of NCA in terms of missed detection rate (MDR) and false alarm rate (FAR). The source code of NCA can be found in https://…/Neural-Component-Analysis.git.
Neural Decision Trees In this paper we propose a synergistic melting of neural networks and decision trees into a deep hashing neural network (HNN) having a modeling capability exponential with respect to its number of neurons. We first derive a soft decision tree named neural decision tree allowing the optimization of arbitrary decision function at each split node. We then rewrite this soft space partitioning as a new kind of neural network layer, namely the hashing layer (HL), which can be seen as a generalization of the known soft-max layer. This HL can easily replace the standard last layer of ANN in any known network topology and thus can be used after a convolutional or recurrent neural network for example. We present the modeling capacity of this deep hashing function on small datasets where one can reach at least equally good results as standard neural networks by diminishing the number of output neurons. Finally, we show that for the case where the number of output neurons is large, the neural network can mitigate the absence of linear decision boundaries by learning for each difficult class a collection of not necessarily connected sub-regions of the space leading to more flexible decision surfaces. Finally, the HNN can be seen as a deep locality sensitive hashing function which can be trained in a supervised or unsupervised setting as we will demonstrate for classification and regression problems.
Neural Decomposition
We present a neural network technique for the analysis and extrapolation of time-series data called Neural Decomposition (ND). Units with a sinusoidal activation function are used to perform a Fourier-like decomposition of training samples into a sum of sinusoids, augmented by units with nonperiodic activation functions to capture linear trends and other nonperiodic components. We show how careful weight initialization can be combined with regularization to form a simple model that generalizes well. Our method generalizes effectively on the Mackey-Glass series, a dataset of unemployment rates as reported by the U.S. Department of Labor Statistics, a time-series of monthly international airline passengers, the monthly ozone concentration in downtown Los Angeles, and an unevenly sampled time-series of oxygen isotope measurements from a cave in north India. We find that ND outperforms popular time-series forecasting techniques including LSTM, echo state networks, ARIMA, SARIMA, SVR with a radial basis function, and Gashler and Ashmore’s model.
Neural Hawkes Process Many events occur in the world. Some event types are stochastically excited or inhibited—in the sense of having their probabilities elevated or decreased—by patterns in the sequence of previous events. Discovering such patterns can help us predict which type of event will happen next and when. Learning such structure should benefit various applications, including medical prognosis, consumer behavior, and social media activity prediction. We propose to model streams of discrete events in continuous time, by constructing a neurally self-modulating multivariate point process. This generative model allows past events to influence the future in complex ways, by conditioning future event intensities on the hidden state of a recurrent neural network that has consumed the stream of past events. We evaluate our model on multiple datasets and show that it significantly outperforms other strong baselines.
Neural Machine Translation
Neural machine translation (NMT) is the approach to machine translation in which a large neural network is trained to maximize translation performance. It is a radical departure from the phrase-based statistical translation approaches, in which a translation system consists of subcomponents that are separately optimized. The artificial neural network (ANN) is a model inspired by the functional aspects and structure of the brain’s biological neural networks. With use of ANN, it is possible to execute a number of tasks, such as classification, clustering, and prediction, using machine learning techniques like supervised or reinforced learning to learn or adjust net connections. A bidirectional recurrent neural network (RNN), known as an encoder, is used by the neural network to encode a source sentence for a second RNN, known as a decoder, that is used to predict words in the target language. NMT models are inspired by deep representation learning. They require only a fraction of the memory needed by traditional statistical machine translation (SMT) models. Furthermore, unlike conventional translation systems, each and every component of the neural translation model is trained jointly to maximize the translation performance. When a new neural network is created, it is trained for certain domains or applications. Once an automatic learning mechanism is established, the network practices. With time it starts operating according to its own judgment, turning into an ‘expert’.
Neural Network Synthesis Tool
Neural networks (NNs) have begun to have a pervasive impact on various applications of machine learning. However, the problem of finding an optimal NN architecture for large applications has remained open for several decades. Conventional approaches search for the optimal NN architecture through extensive trial-and-error. Such a procedure is quite inefficient. In addition, the generated NN architectures incur substantial redundancy. To address these problems, we propose an NN synthesis tool (NeST) that automatically generates very compact architectures for a given dataset. NeST starts with a seed NN architecture. It iteratively tunes the architecture with gradient-based growth and magnitude-based pruning of neurons and connections. Our experimental results show that NeST yields accurate yet very compact NNs with a wide range of seed architecture selection. For example, for the LeNet-300-100 (LeNet-5) NN architecture derived from the MNIST dataset, we reduce network parameters by 34.1x (74.3x) and floating-point operations (FLOPs) by 35.8x (43.7x). For the AlexNet NN architecture derived from the ImageNet dataset, we reduce network parameters by 15.7x and FLOPs by 4.6x. All these results are the current state-of-the-art for these architectures.
Neural Networks / Artificial Neural Networks
In computer science and related fields, artificial neural networks (ANNs) are computational models inspired by an animal’s central nervous systems (in particular the brain) which is capable of machine learning as well as pattern recognition. Artificial neural networks are generally presented as systems of interconnected “neurons” which can compute values from inputs.
Neural Programmer Deep neural networks have achieved impressive supervised classification performance in many tasks including image recognition, speech recognition, and sequence to sequence learning. However, this success has not been translated to applications like question answering that may involve complex arithmetic and logic reasoning. A major limitation of these models is in their inability to learn even simple arithmetic and logic operations. For example, it has been shown that neural networks fail to learn to add two binary numbers reliably. In this work, we propose Neural Programmer, an end-to-end differentiable neural network augmented with a small set of basic arithmetic and logic operations. Neural Programmer can call these augmented operations over several steps, thereby inducing compositional programs that are more complex than the built-in operations. The model learns from a weak supervision signal which is the result of execution of the correct program, hence it does not require expensive annotation of the correct program itself. The decisions of what operations to call, and what data segments to apply to are inferred by Neural Programmer. Such decisions, during training, are done in a differentiable fashion so that the entire network can be trained jointly by gradient descent. We find that training the model is difficult, but it can be greatly improved by adding random noise to the gradient. On a fairly complex synthetic table-comprehension dataset, traditional recurrent networks and attentional models perform poorly while Neural Programmer typically obtains nearly perfect accuracy.
Neural Reasoner We propose Neural Reasoner, a framework for neural network-based reasoning over natural language sentences. Given a question, Neural Reasoner can infer over multiple supporting facts and find an answer to the question in specific forms. Neural Reasoner has 1) a specific interaction-pooling mechanism, allowing it to examine multiple facts, and 2) a deep architecture, allowing it to model the complicated logical relations in reasoning tasks. Assuming no particular structure exists in the question and facts, Neural Reasoner is able to accommodate different types of reasoning and different forms of language expressions. Despite the model complexity, Neural Reasoner can still be trained effectively in an end-to-end manner. Our empirical studies show that Neural Reasoner can outperform existing neural reasoning systems with remarkable margins on two difficult artificial tasks (Positional Reasoning and Path Finding) proposed in. For example, it improves the accuracy on Path Finding(10K) from 33.4% to over 98%.
Neural Semantic Encoders
We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders (NSE). NSE has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read, compose and write operations. NSE can access multiple and shared memories depending on the complexity of a task. We demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks, natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved state-of-the-art performance when evaluated on publically available benchmarks. For example, our shared-memory model showed an encouraging result on neural machine translation, improving an attention-based baseline by approximately 1.0 BLEU.
Neural SLAM We present an approach for agents to learn representations of a global map from sensor data, to aid their exploration in new environments. To achieve this, we embed procedures mimicking that of traditional Simultaneous Localization and Mapping (SLAM) into the soft attention based addressing of external memory architectures, in which the external memory acts as an internal representation of the environment. This structure encourages the evolution of SLAM-like behaviors inside a completely differentiable deep neural network. We show that this approach can help reinforcement learning agents to successfully explore new environments where long-term memory is essential. We validate our approach in both challenging grid-world environments and preliminary Gazebo experiments.
Neural SPARQL Machine In the last years, the Linked Data Cloud has achieved a size of more than 100 billion facts pertaining to a multitude of domains. However, accessing this information has been significantly challenging for lay users. Approaches to problems such as Question Answering on Linked Data and Link Discovery have notably played a role in increasing information access. These approaches are often based on handcrafted and/or statistical models derived from data observation. Recently, Deep Learning architectures based on Neural Networks called seq2seq have shown to achieve state-of-the-art results at translating sequences into sequences. In this direction, we propose Neural SPARQL Machines, end-to-end deep architectures to translate any natural language expression into sentences encoding SPARQL queries. Our preliminary results, restricted on selected DBpedia classes, show that Neural SPARQL Machines are a promising approach for Question Answering on Linked Data, as they can deal with known problems such as vocabulary mismatch and perform graph pattern composition.
Neural Style Transfer The recent work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNN) in creating artistic fantastic imagery by separating and recombing the image content and style. This process of using CNN to migrate the semantic content of one image to different styles is referred to as Neural Style Transfer.
Neural Task Programming
In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of few-shot learning from demonstration and neural program induction. NTP takes as input a task specification (e.g., video demonstration of a task) and recursively decomposes it into finer sub-task specifications. These specifications are fed to a hierarchical neural program, where bottom-level programs are callable subroutines that interact with the environment. We validate our method in three robot manipulation tasks. NTP achieves strong generalization across sequential tasks that exhibit hierarchal and compositional structures. The experimental results show that NTP learns to generalize well towards unseen tasks with increasing lengths, variable topologies, and changing objectives.
Neural Tensor Network
The Neural Tensor Network (NTN) replaces a standard linear neural network layer with a bilinear tensor layer that directly relates two entity vectors across multiple dimensions. The model computes a score of how likely it is that two entities are in a certain relationship.
Neural Turing Machines
We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable end-to-end, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples. Neural Turing Machines are fully differentiable computers that use backpropagation to learn their own programming.
Neural Vector Space Model
We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In the NVSM paradigm, we learn low-dimensional representations of words and documents from scratch using gradient descent and rank documents according to their similarity with query representations that are composed from word representations. We show that NVSM performs better at document ranking than existing latent semantic vector space methods. The addition of NVSM to a mixture of lexical language models and a state-of-the-art baseline vector space model yields a statistically significant increase in retrieval effectiveness. Consequently, NVSM adds a complementary relevance signal. Next to semantic matching, we find that NVSM performs well in cases where lexical matching is needed. NVSM learns a notion of term specificity directly from the document collection without feature engineering. We also show that NVSM learns regularities related to Luhn significance. Finally, we give advice on how to deploy NVSM in situations where model selection (e.g., cross-validation) is infeasible. We find that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single cross-validated model. Therefore, NVSM can safely be used for ranking documents without supervised relevance judgments.
Neuroevolution The quest to evolve neural networks through evolutionary algorithms.
Neuro-Fuzzy In the field of artificial intelligence, neuro-fuzzy refers to combinations of artificial neural networks and fuzzy logic. Neuro-fuzzy was proposed by J. S. R. Jang. Neuro-fuzzy hybridization results in a hybrid intelligent system that synergizes these two techniques by combining the human-like reasoning style of fuzzy systems with the learning and connectionist structure of neural networks. Neuro-fuzzy hybridization is widely termed as Fuzzy Neural Network (FNN) or Neuro-Fuzzy System (NFS) in the literature. Neuro-fuzzy system (the more popular term is used henceforth) incorporates the human-like reasoning style of fuzzy systems through the use of fuzzy sets and a linguistic model consisting of a set of IF-THEN fuzzy rules. The main strength of neuro-fuzzy systems is that they are universal approximators with the ability to solicit interpretable IF-THEN rules. The strength of neuro-fuzzy systems involves two contradictory requirements in fuzzy modeling: interpretability versus accuracy. In practice, one of the two properties prevails. The neuro-fuzzy in fuzzy modeling research field is divided into two areas: linguistic fuzzy modeling that is focused on interpretability, mainly the Mamdani model; and precise fuzzy modeling that is focused on accuracy, mainly the Takagi-Sugeno-Kang (TSK) model. Although generally assumed to be the realization of a fuzzy system through connectionist networks, this term is also used to describe some other configurations including:
• Deriving fuzzy rules from trained RBF networks.
• Fuzzy logic based tuning of neural network training parameters.
• Fuzzy logic criteria for increasing a network size.
• Realising fuzzy membership function through clustering algorithms in unsupervised learning in SOMs and neural networks.
• Representing fuzzification, fuzzy inference and defuzzification through multi-layers feed-forward connectionist networks.
It must be pointed out that interpretability of the Mamdani-type neuro-fuzzy systems can be lost. To improve the interpretability of neuro-fuzzy systems, certain measures must be taken, wherein important aspects of interpretability of neuro-fuzzy systems are also discussed. A recent research line addresses the data stream mining case, where neuro-fuzzy systems are sequentially updated with new incoming samples on demand and on-the-fly. Thereby, system updates do not only include a recursive adaptation of model parameters, but also a dynamic evolution and pruning of model components (neurons, rules), in order to handle concept drift and dynamically changing system behavior adequately and to keep the systems/models ‘up-to-date’ anytime. Comprehensive surveys of various evolving neuro-fuzzy systems approaches can be found in and.
Neuro-Fuzzy System Modern neuro-fuzzy systems are usually represented as special multilayer feedforward neural networks (see for example models like ANFIS , FuNe , Fuzzy RuleNet , GARIC , or NEFCLASS and NEFCON ). However, fuzzifications of other neural network architectures are also considered, for example self-organizing feature maps. In those neuro-fuzzy networks, connection weights and propagation and activation functions differ from common neural networks. Although there are a lot of different approaches , we usually use the term neuro–fuzzy system for approaches which display the following properties:
• A neuro-fuzzy system is based on a fuzzy system which is trained by a learning algorithm derived from neural network theory. The (heuristical) learning procedure operates on local information, and causes only local modifications in the underlying fuzzy system.
• A neuro-fuzzy system can be viewed as a 3-layer feedforward neural network. The first layer represents input variables, the middle (hidden) layer represents fuzzy rules and the third layer represents output variables. Fuzzy sets are encoded as (fuzzy) connection weights. It is not necessary to represent a fuzzy system like this to apply a learning algorithm to it. However, it can be convenient, because it represents the data flow of input processing and learning within the model. Remark: Sometimes a 5-layer architecture is used, where the fuzzy sets are represented in the units of the second and fourth layer.
• A neuro-fuzzy system can be always (i.e.\ before, during and after learning) interpreted as a system of fuzzy rules. It is also possible to create the system out of training data from scratch, as it is possible to initialize it by prior knowledge in form of fuzzy rules. Remark: Not all neuro-fuzzy models specifiy learning procedures for fuzzy rule creation.
• The learning procedure of a neuro-fuzzy system takes the semantical properties of the underlying fuzzy system into account. This results in constraints on the possible modifications applicable to the system parameters. Remark: Not all neuro-fuzzy approaches have this property.
• A neuro-fuzzy system approximates an $n$-dimensional (unknown) function that is partially defined by the training data. The fuzzy rules encoded within the system represent vague samples, and can be viewed as prototypes of the training data. A neuro-fuzzy system should not be seen as a kind of (fuzzy) expert system, and it has nothing to do with fuzzy logic in the narrow sense.
Neuro-Index The article describes a new data structure called neuro-index. It is an alternative to well-known file indexes. The neuro-index is fundamentally different because it stores weight coefficients in neural network. It is not a reference type like ‘keyword-position in a file’.
Neuroinformatics Neuroinformatics is a research field concerned with the organization of neuroscience data by the application of computational models and analytical tools. These areas of research are important for the integration and analysis of increasingly large-volume, high-dimensional, and fine-grain experimental data. Neuroinformaticians provide computational tools, mathematical models, and create interoperable databases for clinicians and research scientists. Neuroscience is a heterogeneous field, consisting of many and various sub-disciplines (e.g., Cognitive Psychology, Behavioral Neuroscience, and Behavioral Genetics). In order for our understanding of the brain to continue to deepen, it is necessary that these sub-disciplines are able to share data and findings in a meaningful way; Neuroinformaticians facilitate this. Neuroinformatics stands at the intersection of neuroscience and information science. Other fields, like genomics, have demonstrated the effectiveness of freely-distributed databases and the application of theoretical and computational models for solving complex problems. In Neuroinformatics, such facilities allow researchers to more easily quantitatively confirm their working theories by computational modeling. Additionally, neuroinformatics fosters collaborative research—an important fact that facilitates the field’s interest in studying the multi-level complexity of the brain. There are three main directions where neuroinformatics has to be applied:
1. the development of tools and databases for management and sharing of neuroscience data at all levels of analysis,
2. the development of tools for analyzing and modeling neuroscience data,
3. the development of computational models of the nervous system and neural processes.
Newick Format In mathematics, Newick tree format (or Newick notation or New Hampshire tree format) is a way of representing graph-theoretical trees with edge lengths using parentheses and commas. It was adopted by James Archie, William H. E. Day, Joseph Felsenstein, Wayne Maddison, Christopher Meacham, F. James Rohlf, and David Swofford, at two meetings in 1986, the second of which was at Newick’s restaurant in Dover, New Hampshire, US. The adopted format is a generalization of the format developed by Meacham in 1984 for the first tree-drawing programs in Felsenstein’s PHYLIP package.
NewSQL NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) read-write workloads while still maintaining the ACID guarantees of a traditional database system.
Neyman-Pearson Classification
N-Gram In the fields of computational linguistics and probability, an n-gram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The n-grams typically are collected from a text or speech corpus. An n-gram of size 1 is referred to as a “unigram”; size 2 is a “bigram” (or, less commonly, a “digram”); size 3 is a “trigram”. Larger sizes are sometimes referred to by the value of n, e.g., “four-gram”, “five-gram”, and so on.
N-Gram Machine
Deep neural networks (DNNs) had great success on NLP tasks such as language modeling, machine translation and certain question answering (QA) tasks. However, the success is limited at more knowledge intensive tasks such as QA from a big corpus. Existing end-to-end deep QA models (Miller et al., 2016; Weston et al., 2014) need to read the entire text after observing the question, and therefore their complexity in responding a question is linear in the text size. This is prohibitive for practical tasks such as QA from Wikipedia, a novel, or the Web. We propose to solve this scalability issue by using symbolic meaning representations, which can be indexed and retrieved efficiently with complexity that is independent of the text size. More specifically, we use sequence-to-sequence models to encode knowledge symbolically and generate programs to answer questions from the encoded knowledge. We apply our approach, called the N-Gram Machine (NGM), to the bAbI tasks (Weston et al., 2015) and a special version of them (‘life-long bAbI’) which has stories of up to 10 million sentences. Our experiments show that NGM can successfully solve both of these tasks accurately and efficiently. Unlike fully differentiable memory models, NGM’s time complexity and answering quality are not affected by the story length. The whole system of NGM is trained end-to-end with REINFORCE (Williams, 1992). To avoid high variance in gradient estimation, which is typical in discrete latent variable models, we use beam search instead of sampling. To tackle the exponentially large search space, we use a stabilized auto-encoding objective and a structure tweak procedure to iteratively reduce and refine the search space.
Niching Simply put, niching is a class of methods that try to converge to more than one solution during a single run. Niching is the idea of segmenting the population of the GA into disjoint sets, intended so that you have at least one member in each region of the fitness function that is ‘interesting’; generally by this we mean that you cover more than one local optima.
Algorithm of the Week: Niching in Genetic Algorithms
No Free Lunch Theorem
In mathematical folklore, the ‘no free lunch’ theorem (sometimes pluralized) of David Wolpert and William Macready appears in the 1997 ‘No Free Lunch Theorems for Optimization’. Wolpert had previously derived no free lunch theorems for machine learning (statistical inference). In 2005, Wolpert and Macready themselves indicated that the first theorem in their paper ‘state that any two optimization algorithms are equivalent when their performance is averaged across all possible problems’. The 1997 theorems of Wolpert and Macready are mathematically technicaland some find them unintuitive. The folkloric ‘no free lunch’ (NFL) theorem is an easily stated and easily understood consequence of theorems Wolpert and Macready actually prove. It is weaker than the proven theorems, and thus does not encapsulate them. Various investigators have extended the work of Wolpert and Macready substantively.
Node Link Diagram Graphs are frequently drawn as node-link diagrams in which the vertices are represented as disks, boxes, or textual labels and the edges are represented as line segments, polylines, or curves in the Euclidean plane. Node-link diagrams can be traced back to the 13th century work of Ramon Llull, who drew diagrams of this type for complete graphs in order to analyze all pairwise combinations among sets of metaphysical concepts.
Noisy Expectation Maximization
We present a noise-injected version of the Expectation-Maximization (EM) algorithm: the Noisy Expectation Maximization (NEM) algorithm. The NEM algorithm uses noise to speed up the convergence of the EM algorithm. The NEM theorem shows that injected noise speeds up the average convergence of the EM algorithm to a local maximum of the likelihood surface if a positivity condition holds. The generalized form of the noisy expectation-maximization (NEM) algorithm allow for arbitrary modes of noise injection including adding and multiplying noise to the data. We demonstrate these noise benefits on EM algorithms for the Gaussian mixture model (GMM) with both additive and multiplicative NEM noise injection. A separate theorem (not presented here) shows that the noise benefit for independent identically distributed additive noise decreases with sample size in mixture models. This theorem implies that the noise benefit is most pronounced if the data is sparse. Injecting blind noise only slowed convergence.
NoisyNet We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent’s policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and $\epsilon$-greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.
Nomogram A nomogram, also called a nomograph, alignment chart or abaque, is a graphical calculating device, a two-dimensional diagram designed to allow the approximate graphical computation of a function. The field of nomography was invented in 1884 by the French engineer Philbert Maurice d’Ocagne (1862-1938) and used extensively for many years to provide engineers with fast graphical calculations of complicated formulas to a practical precision. Nomograms use a parallel coordinate system invented by d’Ocagne rather than standard Cartesian coordinates. A nomogram consists of a set of n scales, one for each variable in an equation. Knowing the values of n-1 variables, the value of the unknown variable can be found, or by fixing the values of some variables, the relationship between the unfixed ones can be studied. The result is obtained by laying a straightedge across the known values on the scales and reading the unknown value from where it crosses the scale for that variable. The virtual or drawn line created by the straightedge is called an index line or isopleth.
Non-convex Conditional Gradient Sliding
We investigate a projection free method, namely conditional gradient sliding on batched, stochastic and finite-sum non-convex problem. CGS is a smart combination of Nesterov’s accelerated gradient method and Frank-Wolfe (FW) method, and outperforms FW in the convex setting by saving gradient computations. However, the study of CGS in the non-convex setting is limited. In this paper, we propose the non-convex conditional gradient sliding (NCGS) which surpasses the non-convex Frank-Wolfe method in batched, stochastic and finite-sum setting.
Non-Homogeneous Markov Switching Autoregressive Models
In this paper, non-homogeneous Markov-Switching Autoregressive (MS-AR) models are proposed to describe wind time series. In these models, several au- toregressive models are used to describe the time evolution of the wind speed and the switching between these different models is controlled by a hidden Markov chain which represents the weather types. We first block the data by month in order to remove seasonal components and propose a MS-AR model with non-homogeneous autoregressive models to describe daily components. Then we discuss extensions where the hidden Markov chain is also non-stationary to handle seasonal and inter-annual fluctuations.
Nonlinear Dimensionality Reduction
High-dimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded non-linear manifold within the higher-dimensional space. If the manifold is of low enough dimension, the data can be visualised in the low-dimensional space. Top-left: a 3D dataset of 1000 points in a spiraling band (a.k.a. the Swiss roll) with a rectangular hole in the middle. Top-right: the original 2D manifold used to generate the 3D dataset. Bottom left and right: 2D recoveries of the manifold respectively using the LLE and Hessian LLE algorithms as implemented by the Modular Data Processing toolkit. Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these non-linear dimensionality reduction methods are related to the linear methods listed below. Non-linear methods can be broadly classified into two groups: those that provide a mapping (either from the high-dimensional space to the low-dimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements.
Nonlinear expectation In probability theory, a nonlinear expectation is a nonlinear generalization of the expectation. Nonlinear expectations are useful in utility theory as they more closely match human behavior than traditional expectations.
Non-linear Iterative Partial Least Squares
In statistics, non-linear iterative partial least squares (NIPALS) is an algorithm for computing the first few components in a principal component or partial least squares analysis. For very-high-dimensional datasets, such as those generated in the ‘omics sciences (e.g., genomics, metabolomics) it is usually only necessary to compute the first few principal components. The nonlinear iterative partial least squares (NIPALS) algorithm calculates t1 and p1′ from X. The outer product, t1p1’ can then be subtracted from X leaving the residual matrix E1. This can be then used to calculate subsequent principal components. This results in a dramatic reduction in computational time since calculation of the covariance matrix is avoided.
Non-local Neural Network Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present non-local operations as a generic family of building blocks for capturing long-range dependencies. Inspired by the classical non-local means method in computer vision, our non-local operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our non-local models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our non-local models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available.
Nonmetric Multi-Dimensional Scaling
Nonmetric multidimensional scaling (MDS, also NMDS and NMS) is an ordination technique that differs in several ways from nearly all other ordination methods. In most ordination methods, many axes are calculated, but only a few are viewed, owing to graphical limitations. In MDS, a small number of axes are explicitly chosen prior to the analysis and the data are fitted to those dimensions; there are no hidden axes of variation. Second, most other ordination methods are analytical and therefore result in a single unique solution to a set of data. In contrast, MDS is a numerical technique that iteratively seeks a solution and stops computation when an acceptable solution has been found, or it stops after some pre-specified number of attempts. As a result, an MDS ordination is not a unique solution and a subsequent MDS analysis on the same set of data and following the same methodology will likely result in a somewhat different ordination. Third, MDS is not an eigenvalue-eigenvector technique like principal components analysis or correspondence analysis that ordinates the data such that axis 1 explains the greatest amount of variance, axis 2 explains the next greatest amount of variance, and so on. As a result, an MDS ordination can be rotated, inverted, or centered to any desired configuration.
Non-negative Matrix Factorization
Non-negative matrix factorization (NMF), also non-negative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This non-negativity makes the resulting matrices easier to inspect. Since the problem is not exactly solvable in general, it is commonly approximated numerically. NMF finds applications in such fields as computer vision, document clustering, chemometrics and recommender systems.
Non-negative Matrix Factorization Expectation-Maximization
Mixture models are among the most popular tools for model based clustering. However, when the dimension and the number of clusters is large, the estimation as well as the interpretation of the clusters become challenging. We propose a reduced-dimension mixture model, where the K components parameters are combinations of words from a small dictionary – say H words with H«K . Including a Nonnegative Matrix Factorization (NMF) in the EM algorithm allows to simultaneously estimate the dictionary and the parameters of the mixture. We propose the acronym NMF-EM for this algorithm. This original approach is motivated by passengers clustering from ticketing data: we apply NMF-EM to ticketing data from two Transdev public transport networks. In this case, the words are easily interpreted as typical slots in a timetable.
Non-parametric Behavior Clustering Inverse Reinforcement Learning Inverse Reinforcement Learning (IRL) is the task of learning a single reward function given a Markov Decision Process (MDP) without defining the reward function, and a set of demonstrations generated by humans/experts. However, in practice, it may be unreasonable to assume that human behaviors can be explained by one reward function since they may be inherently inconsistent. Also, demonstrations may be collected from various users and aggregated to infer and predict user’s behaviors. In this paper, we introduce the Non-parametric Behavior Clustering IRL algorithm to simultaneously cluster demonstrations and learn multiple reward functions from demonstrations that may be generated from more than one behaviors. Our method is iterative: It alternates between clustering demonstrations into different behavior clusters and inverse learning the reward functions until convergence. It is built upon the Expectation-Maximization formulation and non-parametric clustering in the IRL setting. Further, to improve the computation efficiency, we remove the need of completely solving multiple IRL problems for multiple clusters during the iteration steps and introduce a resampling technique to avoid generating too many unlikely clusters. We demonstrate the convergence and efficiency of the proposed method through learning multiple driver behaviors from demonstrations generated from a grid-world environment and continuous trajectories collected from autonomous robot cars using the Gazebo robot simulator.
Nonparametric Canonical Correlation Analysis
Canonical correlation analysis (CCA) is a fundamental technique in multi-view data analysis and representation learning. Several nonlinear extensions of the classical linear CCA method have been proposed, including kernel and deep neural network methods. These approaches restrict attention to certain families of nonlinear projections, which the user must specify (by choosing a kernel or a neural network architecture), and are computationally demanding. Interestingly, the theory of nonlinear CCA without any functional restrictions, has been studied in the population setting by Lancaster already in the 50’s. However, these results, have not inspired practical algorithms. In this paper, we revisit Lancaster’s theory, and use it to devise a practical algorithm for nonparametric CCA (NCCA). Specifically, we show that the most correlated nonlinear projections of two random vectors can be expressed in terms of the singular value decomposition of a certain operator associated with their joint density. Thus, by estimating the population density from data, NCCA reduces to solving an eigenvalue system, superficially like kernel CCA but, importantly, without having to compute the inverse of any kernel matrix. We also derive a partially linear CCA (PLCCA) variant in which one of the views undergoes a linear projection while the other is nonparametric. PLCCA turns out to have a similar form to the classical linear CCA, but with a nonparametric regression term replacing the linear regression in CCA. Using a kernel density estimate based on a small number of nearest neighbors, our NCCA and PLCCA algorithms are memory-efficient, often run much faster, and achieve better performance than kernel CCA and comparable performance to deep CCA.
Non-Parametric Generalized Linear Model
In this paper, we try to solve the problem of temporal link prediction in information networks. This implies predicting the time it takes for a link to appear in the future, given its features that have been extracted at the current network snapshot. To this end, we introduce a probabilistic non-parametric approach, called ‘Non-Parametric Generalized Linear Model’ (NP-GLM), which infers the hidden underlying probability distribution of the link advent time given its features. We then present a learning algorithm for NP-GLM and an inference method to answer time-related queries. Extensive experiments conducted on both synthetic data and real-world Sina Weibo social network demonstrate the effectiveness of NP-GLM in solving temporal link prediction problem vis-a-vis competitive baselines.
Nonparametric Neural Networks Automatically determining the optimal size of a neural network for a given task without prior information currently requires an expensive global search and training many networks from scratch. In this paper, we address the problem of automatically finding a good network size during a single training cycle. We introduce *nonparametric neural networks*, a non-probabilistic framework for conducting optimization over all possible network sizes and prove its soundness when network growth is limited via an L_p penalty. We train networks under this framework by continuously adding new units while eliminating redundant units via an L_2 penalty. We employ a novel optimization algorithm, which we term *adaptive radial-angular gradient descent* or *AdaRad*, and obtain promising results.
Non-Parametric Transformation Network
ConvNets have been very effective in many applications where it is required to learn invariances to within-class nuisance transformations. However, through their architecture, ConvNets only enforce invariance to translation. In this paper, we introduce a new class of convolutional architectures called Non-Parametric Transformation Networks (NPTNs) which can learn general invariances and symmetries directly from data. NPTNs are a direct and natural generalization of ConvNets and can be optimized directly using gradient descent. They make no assumption regarding structure of the invariances present in the data and in that aspect are very flexible and powerful. We also model ConvNets and NPTNs under a unified framework called Transformation Networks which establishes the natural connection between the two. We demonstrate the efficacy of NPTNs on natural data such as MNIST and CIFAR 10 where it outperforms ConvNet baselines with the same number of parameters. We show it is effective in learning invariances unknown apriori directly from data from scratch. Finally, we apply NPTNs to Capsule Networks and show that they enable them to perform even better.
Non-Response Bias Non-response bias occurs in statistical surveys if the answers of respondents differ from the potential answers of those who did not answer.
Non-stationary Stochastic Processes A stochastic process (a collection of random variables ordered in time, e.g. GDP(t)) is said to be (weakly) stationary if its mean and variance are constant over time, i.e. time invariant (along with its autocovariance). Such a time series will tend to return to its mean (mean reversion) and fluctuations around this mean will have a broadly constant amplitude. Alternatively, a stationary process will not drift too far away from its mean value because of the nite variance. By contrast, a nonstationary time series will have a time-varying mean or a time-varying variance or both.
Non-Uniform Fast Fourier Transform
Fourier analysis plays a natural role in a wide variety of applications, from medical imaging to radio astronomy, data analysis and the numerical solution of partial differential equations. When the sampling is uniform and the Fourier transform is desired at equispaced frequencies, the classical fast Fourier transform (FFT) has played a fundamental role in computation. The FFT requires O(N log N) work to compute N Fourier modes from N data points rather than O(N2) work. When the data is irregular in either the ‘physical’ or ‘frequency’ domain, unfortunately, the FFT does not apply. Over the last twenty years, a number of algorithms have been developed to overcome this limitation – generally referred to as non-uniform FFTs (NUFFT), non-equispaced FFTs (NFFT) or unequally-spaced FFTs (USFFT). They achieve the same O(N log N) computational complexity, but with a larger, precision-dependent, and dimension-dependent constant.
Norm In linear algebra, functional analysis and related areas of mathematics, a norm is a function that assigns a strictly positive length or size to each vector in a vector space – save possibly for the zero vector, which is assigned a length of zero. A seminorm, on the other hand, is allowed to assign zero length to some non-zero vectors (in addition to the zero vector). A norm must also satisfy certain properties pertaining to scalability and additivity which are given in the formal definition below. A simple example is the 2-dimensional Euclidean space R2 equipped with the Euclidean norm. Elements in this vector space (e.g., (3, 7)) are usually drawn as arrows in a 2-dimensional cartesian coordinate system starting at the origin (0, 0). The Euclidean norm assigns to each vector the length of its arrow. Because of this, the Euclidean norm is often known as the magnitude. A vector space on which a norm is defined is called a normed vector space. Similarly, a vector space with a seminorm is called a seminormed vector space. It is often possible to supply a norm for a given vector space in more than one way.
Normalization In statistics and applications of statistics, normalization can have a range of meanings. In the simplest cases, normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging. In more complicated cases, normalization may refer to more sophisticated adjustments where the intention is to bring the entire probability distributions of adjusted values into alignment. In the case of normalization of scores in educational assessment, there may be an intention to align distributions to a normal distribution. A different approach to normalization of probability distributions is quantile normalization, where the quantiles of the different measures are brought into alignment.
Normalized Mutual Information
Normalized Nonnegative Models
We introduce normalized nonnegative models (NNM) for explorative data analysis. NNMs are partial convexifications of models from probability theory. We demonstrate their value at the example of item recommendation. We show that NNM-based recommender systems satisfy three criteria that all recommender systems should ideally satisfy: high predictive power, computational tractability, and expressive representations of users and items. Expressive user and item representations are important in practice to succinctly summarize the pool of customers and the pool of items. In NNMs, user representations are expressive because each user’s preference can be regarded as normalized mixture of preferences of stereotypical users. The interpretability of item and user representations allow us to arrange properties of items (e.g., genres of movies or topics of documents) or users (e.g., personality traits) hierarchically.
Not only SQL
A NoSQL or Not Only SQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. The data structure (e.g. key-value, graph, or document) differs from the RDBMS, and therefore some operations are faster in NoSQL and some in RDBMS. There are differences though and the particular suitability of a given NoSQL DB depends on the problem to be solved (e.g. does the solution use graph algorithms?). The appearance of mature NoSQL databases has reduced the rationale for Java content repository (JCR) implementations.
NoSQL databases are finding significant and growing industry use in big data and real-time web applications. NoSQL systems are also referred to as “Not only SQL” to emphasize that they may in fact allow SQL-like query languages to be used. Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor of availability and partition tolerance. Barriers to the greater adoption of NoSQL stores include the use of low-level query languages, the lack of standardized interfaces, and the huge investments already made in SQL by enterprises. Most NoSQL stores lack true ACID transactions, although a few recent systems, such as FairCom c-treeACE, Google Spanner and FoundationDB, have made them central to their designs.
Novel Data Streams
We define NDS as those data streams whose content is initiated directly by the user (patient) themselves. This would exclude data sources such as electronic health records, disease registries, vital statistics, electronic lab reporting, emergency department visits, ambulance call data, school absenteeism, prescription pharmacy sales, serology, amongst others. Although ready access to aggregated information from these excluded sources is novel in many health settings, our focus here is on those streams which are both directly initiated by the user and also not alreadymaintained by public health departments or other health professionals. Despite this more narrow definition our suggestions for improving NDS surveillancemay also be applicable to more established surveillance systems, participatory systems (e.g., Flu Near You, influenzaNet) , and new data streams aggregated from established systems, such as Biosense and ISDS DiSTRIBuTE network. While much of the recent focus on using NDS for disease surveillance has centered on Internet search queries andTwitter posts , there aremanyNDS outside of these two sources.Our aim therefore is to provide a general framework for enhancing and developing NDS surveillance systems, which applies to more than just search data and Tweets. At aminimum, our definition ofNDS would include Internet search data and socialmedia, such as Google searches, Google Plus, Facebook, and Twitter posts, as well asWikipedia access logs, restaurant reservation and review logs, non-prescription pharmacy sales, news source scraping , and prediction markets.
Novel Integration of the Sample and Thresholded covariance estimators
We propose a ‘NOVEL Integration of the Sample and Thresholded covariance estimators’ (NOVELIST) to estimate the large covariance (correlation) and precision matrix. NOVELIST performs shrinkage of the sample covariance (correlation) towards its thresholded version. The sample covariance (correlation) component is non-sparse and can be low-rank in high dimensions. The thresholded sample covariance (correlation) component is sparse, and its addition ensures the stable invertibility of NOVELIST. The bene ts of the NOVELIST estimator include simplicity, ease of implementation, computational e ciency and the fact that its application avoids eigenanalysis. We obtain an explicit convergence rate in the operator norm over a large class of covariance (correlation) matrices when the dimension p and the sample size n satisfy log(p/n) -> 0. In empirical comparisons with several popular estimators, the NOVELIST estimator in which the amount of shrinkage and thresholding is chosen by cross-validation performs well in estimating covariance and precision matrices over a wide range of models and sparsity classes.
Novelty Detection Novelty detection is the identification of new or unknown data that a machine learning system has not been trained with and was not previously aware of, with the help of either statistical or machine learning based approaches. Novelty detection is one of the fundamental requirements of a good classification system. A machine learning system can never be trained with all the possible object classes and hence the performance of the network will be poor for those classes that are under-represented in the training set. A good classification system must have the ability to differentiate between known and unknown objects during testing. For this purpose, different models for novelty detection have been proposed. Novelty detection is a hard problem in machine learning since it depends on the statistics of the already known information. A generally applicable, parameter-free method for outlier detection in a high-dimensional space is not yet known. Novelty detection finds a variety of applications especially in signal processing, computer vision, pattern recognition, data mining and robotics. Another important application is the detection of a disease or potential fault whose class may be under-represented in the training set. The statistical approaches to novelty detection may be classified into parametric and non-parametric approaches. Parametric approaches assume a specific statistical distribution (such as a Gaussian distribution) of data and statistical modeling based on data mean and covariance, whereas non-parametric approaches do not make any assumption on the statistical properties of data.
Null Hypothesis Significance Testing
Null Hypothesis Significance Testing (NHST) is a statistical method for testing whether the factor we are talking about has the effect on our observation. For example, a t test or an ANOVA test for comparing the means is a good example of NHST. It probably the most common statistical testing used in HCI.
NullHop Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many state-of-the-art (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power consumption becomes a problem for real time mobile applications. We propose a flexible and efficient CNN accelerator architecture which can support the implementation of SOA CNNs in low-power and low-latency application scenarios. This architecture exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across a wide range of convolutional network kernel sizes; and numbers of input and output feature maps. We implemented the proposed architecture on an FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. We show how in RTL simulations in a 28nm process with a clock frequency of 500MHz, the NullHop core is able to reach over 450 GOp/s and efficiency of 368%, maintaining over 98% utilization of the MAC units and achieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2
Numenta Anomaly Benchmark
Much of the world’s data is streaming, time-series data, where anomalies give significant information in critical situations; examples abound in domains such as finance, IT, security, medical, and energy. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in real-time, not batches, and learn while simultaneously making predictions. There are no benchmarks to adequately test and score the efficacy of real-time anomaly detectors. Here we propose the Numenta Anomaly Benchmark (NAB), which attempts to provide a controlled and repeatable environment of open-source tools to test and measure anomaly detection algorithms on streaming data. The perfect detector would detect all anomalies as soon as possible, trigger no false alarms, work with real-world time-series data across a variety of domains, and automatically adapt to changing statistics. Rewarding these characteristics is formalized in NAB, using a scoring algorithm designed for streaming data. NAB evaluates detectors on a benchmark dataset with labeled, real-world time-series data. We present these components, and give results and analyses for several open source, commercially-used algorithms. The goal for NAB is to provide a standard, open source framework with which the research community can compare and evaluate different algorithms for detecting anomalies in streaming data.
Numerical Formal Concept Analysis
Numerical Formal Concept Analysis (nFCA) technique: Formal Concept Analysis (FCA) is a powerful method in computer science (CS) for identifying overall inherent structures within and between the row and column variables (called objects and attributes in CS) of a binary data set. It is a bit like lifting up the overall hierarchical structure of a forest from a superposition based on simple local information, ie. pairwise relationships between variables of the data. The objective of nFCA is to combine FCA and statistics to translate what an FCA can offer for binary data to numerical data. The end product of our nFCA is a pair of nFCA graphs, where the H-graph is a clustered lattice graph indicating inherent hierarchical and clustered relations and the I-graph is a complementary tree plot indicating the strength and directions of each of the relations and additional network relationships. The nFCA performs better than the conventional hierarchical clustering methods in terms of the Cophenetic correlation coefficient and the relational structure.
Numerical Template Toolbox
The Numerical Template Toolbox (NT2) is an Open Source C++ library aimed at simplifying the development, debugging and optimization of high-performance computing applications by providing a Matlab like syntax that eases the transition between prototype and actual application.
nuts-flow/ml Data preprocessing is a fundamental part of any machine learning application and frequently the most time-consuming aspect when developing a machine learning solution. Preprocessing for deep learning is characterized by pipelines that lazily load data and perform data transformation, augmentation, batching and logging. Many of these functions are common across applications but require different arrangements for training, testing or inference. Here we introduce a novel software framework named nuts-flow/ml that encapsulates common preprocessing operations as components, which can be flexibly arranged to rapidly construct efficient preprocessing pipelines for deep learning.
NVIDIA Deep Learning GPU Training System
The NVIDIA Deep Learning GPU Training System (DIGITS) puts the power of deep learning in the hands of data scientists and researchers. Quickly design the best deep neural network (DNN) for your data using real-time network behavior visualization. Best of all, DIGITS is a complete system so you don’t have to write any code. Get started with DIGITS in under an hour.
Nyquist-Shannon Sampling Theorem In the field of digital signal processing, the sampling theorem is a fundamental bridge between continuous-time signals (often called ‘analog signals’) and discrete-time signals (often called ‘digital signals’). It establishes a sufficient condition between a signal’s bandwidth and the sample rate that permits a discrete sequence of samples to capture all the information from the continuous-time signal. Strictly speaking, the theorem only applies to a class of mathematical functions having a Fourier transform that is zero outside of a finite region of frequencies. Intuitively we expect that when one reduces a continuous function to a discrete sequence and interpolates back to a continuous function, the fidelity of the result depends on the density (or sample rate) of the original samples. The sampling theorem introduces the concept of a sample rate that is sufficient for perfect fidelity for the class of functions that are bandlimited to a given bandwidth, such that no actual information is lost in the sampling process. It expresses the sufficient sample rate in terms of the bandwidth for the class of functions. The theorem also leads to a formula for perfectly reconstructing the original continuous-time function from the samples. Perfect reconstruction may still be possible when the sample-rate criterion is not satisfied, provided other constraints on the signal are known. (See § Sampling of non-baseband signals below, and Compressed sensing.) The name Nyquist-Shannon sampling theorem honors Harry Nyquist and Claude Shannon. The theorem was also discovered independently by E. T. Whittaker, by Vladimir Kotelnikov, and by others. So it is also known by the names Nyquist-Shannon-Kotelnikov, Whittaker-Shannon-Kotelnikov, Whittaker-Nyquist-Kotelnikov-Shannon, and cardinal theorem of interpolation.