N2Net  We present N2Net, a system that implements binary neural networks using commodity switching chips deployed in network switches and routers. Our system shows that these devices can run simple neural network models, whose input is encoded in the network packets’ header, at packet processing speeds (billions of packets per second). Furthermore, our experience highlights that switching chips could support even more complex models, provided that some minor and cheap modifications to the chip’s design are applied. We believe N2Net provides an interesting building block for future endtoend networked systems. 
Naive Bayes Classifier  A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be “independent feature model”. An overview of statistical classifiers is given in the article on pattern recognition. 
Named Entity Extraction  
Named Entity Recognition (NER) 
Namedentity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. Most research on NER systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp. in 2006. And producing an annotated block of text that highlights the names of entities: Person bought 300 shares of Organization in Time. In this example, a person name consisting of one token, a twotoken company name and a temporal expression have been detected and classified. Stateoftheart NER systems for English produce nearhuman performance. For example, the best system entering MUC7 scored 93.39% of Fmeasure while human annotators scored 97.60% and 96.95%. http://…/aijwikiner.pdf 
Named Entity Recognition and Classification (NERC) 
The term ‘Named Entity’, now widely used in Natural Language Processing, was coined for the Sixth Message Understanding Conference (MUC6) (R. Grishman & Sundheim 1996). At that time, MUC was focusing on Information Extraction (IE) tasks where structured information of company activities and defense related activities is extracted from unstructured text, such as newspaper articles. In defining the task, people noticed that it is essential to recognize information units like names, including person, organization and location names, and numeric expressions including time, date, money and percent expressions. Identifying references to these entities in text was recognized as one of the important subtasks of IE and was called ‘Named Entity Recognition and Classification (NERC)’. 
Named Entity Recognizer (NER) 
Stanford NER is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with wellengineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION), and we also make available on this page various other models for different languages and circumstances, including models trained on just the CoNLL 2003 English training data. The distributional similarity features in some models improve performance but the models require considerably more memory. Stanford NER is also known as CRFClassifier. The software provides a general implementation of (arbitrary order) linear chain Conditional Random Field (CRF) sequence models. That is, by training your own models, you can actually use this code to build sequence models for any task. (CRF models were pioneered by Lafferty, McCallum, and Pereira (2001); see Sutton and McCallum (2006) or Sutton and McCallum (2010) for more comprehensible introductions.) 
Named Entity Sequence Classification (NESC) 
Named Entity Recognition (NER) aims at locating and classifying named entities in text. In some use cases of NER, including cases where detected named entities are used in creating content recommendations, it is crucial to have a reliable confidence level for the detected named entities. In this work we study the problem of finding confidence levels for detected named entities. We refer to this problem as Named Entity Sequence Classification (NESC). We frame NESC as a binary classification problem and we use NER as well as recurrent neural networks to find the probability of candidate named entity is a real named entity. We apply this approach to Tweet texts and we show how we could find named entities with high confidence levels from Tweets. 
Nash Averaging  Progress in machine learning is measured by careful evaluation on problems of outstanding common interest. However, the proliferation of benchmark suites and environments, adversarial attacks, and other complications has diluted the basic evaluation model by overwhelming researchers with choices. Deliberate or accidental cherry picking is increasingly likely, and designing wellbalanced evaluation suites requires increasing effort. In this paper we take a step back and propose Nash averaging. The approach builds on a detailed analysis of the algebraic structure of evaluation in two basic scenarios: agentvsagent and agentvstask. The key strength of Nash averaging is that it automatically adapts to redundancies in evaluation data, so that results are not biased by the incorporation of easy tasks or weak agents. Nash averaging thus encourages maximally inclusive evaluation — since there is no harm (computational cost aside) from including all available tasks and agents. 
Natural Language Aggregate Query (NLAQ) 
Natural language questionanswering over RDF data has received widespread attention. Although there have been several studies that have dealt with a small number of aggregate queries, they have many restrictions (i.e., interactive information, controlled question or query template). Thus far, there has been no natural language querying mechanism that can process general aggregate queries over RDF data. Therefore, we propose a framework called NLAQ (Natural Language Aggregate Query). First, we propose a novel algorithm to automatically understand a users query intention, which mainly contains semantic relations and aggregations. Second, to build a better bridge between the query intention and RDF data, we propose an extended paraphrase dictionary ED to obtain more candidate mappings for semantic relations, and we introduce a predicatetype adjacent set PT to filter out inappropriate candidate mapping combinations in semantic relations and basic graph patterns. Third, we design a suitable translation plan for each aggregate category and effectively distinguish whether an aggregate item is numeric or not, which will greatly affect the aggregate result. Finally, we conduct extensive experiments over real datasets (QALD benchmark and DBpedia), and the experimental results demonstrate that our solution is effective. 
Natural Language Generation  Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. Psycholinguists prefer the term language production when such formal representations are interpreted as models for mental representations. It could be said an NLG system is like a translator that converts a computer based representation into a natural language representation. However, the methods to produce the final language are different from those of a compiler due to the inherent expressivity of natural languages. NLG may be viewed as the opposite of natural language understanding: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words. Simple examples are systems that generate form letters. These do not typically involve grammar rules, but may generate a letter to a consumer, e.g. stating that a credit card spending limit was reached. More complex NLG systems dynamically create texts to meet a communicative goal. As in other areas of natural language processing, this can be done using either explicit models of language (e.g., grammars) and the domain, or using statistical models derived by analysing humanwritten texts. 
Natural Language Inference (NLI) 
Inference has been a central topic in artificial intelligence from the start, but while automatic methods for formal deduction have advanced tremendously, comparatively little progress has been made on the problem of natural language inference (NLI), that is, determining whether a natural language hypothesis h can justifiably be inferred from a natural language premise p. The challenges of NLI are quite different from those encountered in formal deduction: the emphasis is on informal reasoning, lexical semantic knowledge, and variability of linguistic expression. 
Natural Language Interfaces for Databases (NLIDBs) 
The ability to extract insights from new data sets is critical for decision making. Visual interactive tools play an important role in data exploration since they provide nontechnical users with an effective way to visually compose queries and comprehend the results. Natural language has recently gained traction as an alternative query interface to databases with the potential to enable nonexpert users to formulate complex questions and information needs efficiently and effectively. However, understanding natural language questions and translating them accurately to SQL is a challenging task, and thus Natural Language Interfaces for Databases (NLIDBs) have not yet made their way into practical tools and commercial products. In this paper, we present DBPal, a novel data exploration tool with a natural language interface. DBPal leverages recent advances in deep models to make query understanding more robust in the following ways: First, DBPal uses a deep model to translate natural language statements to SQL, making the translation process more robust to paraphrasing and other linguistic variations. Second, to support the users in phrasing questions without knowing the database schema and the query features, DBPal provides a learned autocompletion model that suggests partial query extensions to users during query formulation and thus helps to write complex queries. 
Natural Language Processing (NLP) 
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human – computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation. NLP,openNLP 
Natural Language Query  A natural language query consists only of normal terms in the user’s language, without any special syntax or format. 
Natural Language Toolkit (NLTK) 
The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. NLTK includes graphical demonstrations and sample data. It is accompanied by a book that explains the underlying concepts behind the language processing tasks supported by the toolkit, plus a cookbook. NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning. NLTK has been used successfully as a teaching tool, as an individual study tool, and as a platform for prototyping and building research systems. http://www.nltk.org 
Natural Language Understanding (NLU) 
Natural language understanding (NLU) is a subtopic of natural language processing in artificial intelligence that deals with machine reading comprehension. NLU is considered an AIhard problem. The process of disassembling and parsing input is more complex than the reverse process of assembling output in natural language generation because of the occurrence of unknown and unexpected features in the input and the need to determine the appropriate syntactic and semantic schemes to apply to it, factors which are predetermined when outputting language.[dubious – discuss] There is considerable commercial interest in the field because of its application to newsgathering, text categorization, voiceactivation, archiving, and largescale contentanalysis. 
Natural Parameter Networks (NPN) 
Neural networks (NN) have achieved stateoftheart performance in various applications. Unfortunately in applications where training data is insufficient, they are often prone to overfitting. One effective way to alleviate this problem is to exploit the Bayesian approach by using Bayesian neural networks (BNN). Another shortcoming of NN is the lack of flexibility to customize different distributions for the weights and neurons according to the data, as is often done in probabilistic graphical models. To address these problems, we propose a class of probabilistic neural networks, dubbed naturalparameter networks (NPN), as a novel and lightweight Bayesian treatment of NN. NPN allows the usage of arbitrary exponentialfamily distributions to model the weights and neurons. Different from traditional NN and BNN, NPN takes distributions as input and goes through layers of transformation before producing distributions to match the target output distributions. As a Bayesian treatment, efficient backpropagation (BP) is performed to learn the natural parameters for the distributions over both the weights and neurons. The output distributions of each layer, as byproducts, may be used as secondorder representations for the associated tasks such as link prediction. Experiments on realworld datasets show that NPN can achieve stateoftheart performance. 
NBaIoT  The proliferation of IoT devices which can be more easily compromised than desktop computers has led to an increase in the occurrence of IoT based botnet attacks. In order to mitigate this new threat there is a need to develop new methods for detecting attacks launched from compromised IoT devices and differentiate between hour and millisecond long IoTbased attacks. In this paper we propose and empirically evaluate a novel network based anomaly detection method which extracts behavior snapshots of the network and uses deep autoencoders to detect anomalous network traffic emanating from compromised IoT devices. To evaluate our method, we infected nine commercial IoT devices in our lab with two of the most widely known IoT based botnets, Mirai and BASHLITE. Our evaluation results demonstrated our proposed method’s ability to accurately and instantly detect the attacks as they were being launched from the compromised IoT devices which were part of a botnet. 
NBody Network  We describe Nbody networks, a neural network architecture for learning the behavior and properties of complex many body physical systems. Our specific application is to learn atomic potential energy surfaces for use in molecular dynamics simulations. Our architecture is novel in that (a) it is based on a hierarchical decomposition of the many body system into subsytems, (b) the activations of the network correspond to the internal state of each subsystem, (c) the ‘neurons’ in the network are constructed explicitly so as to guarantee that each of the activations is covariant to rotations, (d) the neurons operate entirely in Fourier space, and the nonlinearities are realized by tensor products followed by ClebschGordan decompositions. As part of the description of our network, we give a characterization of what way the weights of the network may interact with the activations so as to ensure that the covariance property is maintained. 
NCRF++  This paper describes NCRF++, a toolkit for neural sequence labeling. NCRF++ is designed for quick implementation of different neural sequence labeling models with a CRF inference layer. It provides users with an inference for building the custom model structure through configuration file with flexible neural feature design and utilization. Built on PyTorch, the core operations are calculated in batch, making the toolkit efficient with the acceleration of GPU. It also includes the implementations of most stateoftheart neural sequence labeling models such as LSTMCRF, facilitating reproducing and refinement on those methods. 
ND4J  ND4J is a scientific computing library for the JVM. It is meant to be used in production environments rather than as a research tool, which means routines are designed to run fast with minimum RAM requirements. 
NearBucket Locality Sensitive Hashing (NearBucketLSH) 
We present NearBucketLSH, an effective algorithm for similarity search in largescale distributed online social networks organized as peertopeer overlays. As communication is a dominant consideration in distributed systems, we focus on minimizing the network cost while guaranteeing good search quality. Our algorithm is based on Locality Sensitive Hashing (LSH), which limits the search to collections of objects, called buckets, that have a high probability to be similar to the query. More specifically, NearBucketLSH employs an LSH extension that searches in near buckets, and improves search quality but also significantly increases the network cost. We decrease the network cost by considering the internals of both LSH and the P2P overlay, and harnessing their properties to our needs. We show that our NearBucketLSH increases search quality for a given network cost compared to previous art. In many cases, the search quality increases by more than 50%. 
Nearest Descent (ND) 

Nearest Neighbor Descent (NND) 

NearFar Matching  Nearfar matching is a study design technique for preprocessing observational data to mimic a pairrandomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable. nearfar 
Necessary Condition Analysis (NCA) 
Theoretical ‘necessary but not sufficient’ statements are common in the organizational sciences. Traditional data analyses approaches (e.g., correlation or multiple regression) are not appropriate for testing or inducing such statements. This paper proposes Necessary Condition Analysis (NCA) as a general and straightforward methodology for identifying necessary conditions in datasets. The paper presents the logic and methodology of necessary but not sufficient contributions of organizational determinants (e.g., events, characteristics, resources, efforts) to a desired outcome (e.g., good performance). A necessary determinant must be present for achieving an outcome, but its presence is not sufficient to obtain that outcome. Without the necessary condition, there is guaranteed failure, which cannot be compensated by other determinants of the outcome. This logic and its related methodology are fundamentally different from the traditional sufficiencybased logic and methodology. Practical recommendations and free software are offered to support researchers to apply NCA. NCA 
Negative Binomial Regression (NBR) 
Negative binomial regression is for modeling count variables, usually for overdispersed count outcome variables. NegBinBetaBinreg 
NelderMead Method  The NelderMead method or downhill simplex method or amoeba method is a commonly applied numerical method used to find the minimum or maximum of an objective function in a manydimensional space. It is applied to nonlinear optimization problems for which derivatives may not be known. However, the NelderMead technique is a heuristic search method that can converge to nonstationary points on problems that can be solved by alternative methods. The NelderMead technique was proposed by John Nelder & Roger Mead (1965). 
Neo4j  Neo4j is an opensource graph database, implemented in Java. The developers describe Neo4j as ’embedded, diskbased, fully transactional Java persistence engine that stores data structured in graphs rather than in tables’. Neo4j is the most popular graph database. Neo4j version 1.0 was released in February, 2010. The community edition of the database is licensed under the free GNU General Public License (GPL) v3. The additional modules, such as online backup and high availability, are licensed under the free Affero General Public License (AGPL) v3. The database, with the additional modules, is also available under a commercial license, in a dual license model. Neo4j version 2.0 was released in December, 2013. Neo4j was developed by Neo Technology, Inc., based in the San Francisco Bay Area, US and Malmö, Sweden. RNeo4j 
Nested Association Mapping (NAM) 
Nested association mapping (NAM) is a technique designed by the labs of Edward Buckler, James Holland, and Michael McMullen for identifying and dissecting the genetic architecture of complex traits in corn (Zea mays). It is important to note that nested association mapping (unlike Association mapping) is a specific technique that cannot be performed outside of a specifically designed population such as the Maize NAM population. NAM 
Nested Chinese Restaurant Process (NCRP) 
The nested Chinese restaurant process (nCRP) is a stochastic process that assigns probability distributions to ensembles of inÞnitely deep, inÞnitely branching trees. 
Nested Dirichlet Process Mixture of Products of Multinomial Distributions (NDPMPM) 
We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a grouplevel latent class, and (ii) each unit is a member of a unitlevel latent class nested within its grouplevel latent class. This structure allows the model to capture dependence among units in the same group. It also fa cilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the Ameri can Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use les. Supplementary materials for this article are available online. NestedCategBayesImpute 
Nested Error Regression Model  This paper suggests the nested error regression model, with use of uncertain random effects, which means that the random effects in each area are expressed as a mixture of a normal distribution and a positive mass at 0. For estimation of model parameters and prediction of random effects, we consider Bayesian yet objective inference by setting improper prior distributions on the model parameters. We show under the mild sufficient condition that the posterior distribution is proper and the posterior variances are finite to confirm validity of posterior inference. To generate samples from the posterior distribution, we provide the Gibbs sampling method. The full conditional distributions of the posterior distribution are all familiar forms such that the proposed methodology is easy to implement. This paper also addresses the problem of prediction of finite population means and we provide a sampling based method to tackle this issue. We compare the proposed model with the conventional nested error regression model through simulation and empirical studies. 
Nested LSTM (NLSTM) 
We propose Nested LSTMs (NLSTM), a novel RNN architecture with multiple levels of memory. Nested LSTMs add depth to LSTMs via nesting as opposed to stacking. The value of a memory cell in an NLSTM is computed by an LSTM cell, which has its own inner memory cell. Specifically, instead of computing the value of the (outer) memory cell as $c^{outer}_t = f_t \odot c_{t1} + i_t \odot g_t$, NLSTM memory cells use the concatenation $(f_t \odot c_{t1}, i_t \odot g_t)$ as input to an inner LSTM (or NLSTM) memory cell, and set $c^{outer}_t$ = $h^{inner}_t$. Nested LSTMs outperform both stacked and singlelayer LSTMs with similar numbers of parameters in our experiments on various characterlevel language modeling tasks, and the inner memories of an LSTM learn longer term dependencies compared with the higherlevel units of a stacked LSTM. 
Nested Sampling Algorithm  The nested sampling algorithm is a computational approach to the problem of comparing models in Bayesian statistics, developed in 2004 by physicist John Skilling. 
Nesterov’s Accelerated Gradient (NAG) 
Nesterov’s Accelerated Gradient Descent performs a simple step of gradient descent to go from x_s to y_{s+1}, and then it ‘slides’ a little bit further than y_{s+1} in the direction given by the previous point y_s. The intuition behind the algorithm is quite difficult to grasp, and unfortunately the analysis will not be very enlightening either. Nonetheless Nesterov’s Accelerated Gradient is an optimal method (in terms of oracle complexity) for smooth convex optimization, 
Net Reclassification Improvement (NRI) 
Net Reclassification Improvement (NRI) described in the paper: Jialiang Li (2013) <doi:10.1093/biostatistics/kxs047>. mcca 
Net#  Neural networks are one of the most popular machine learning algorithms today. One of the challenges when using neural networks is how to define a network topology given the variety of possible layer types, connections among them, and activation functions. Net# solves this problem by providing a succinct way to define almost any neural network architecture in a descriptive, easytoread format. This post provides a short tutorial for building a neural network using the Net# language to classify images of handwritten numeric digits in Microsoft Azure Machine Learning. 
Net2Vec  In an effort to understand the meaning of the intermediate representations captured by deep networks, recent papers have tried to associate specific semantic concepts to individual neural network filter responses, where interesting correlations are often found, largely by focusing on extremal filter responses. In this paper, we show that this approach can favor easytointerpret cases that are not necessarily representative of the average behavior of a representation. A more realistic but hardertostudy hypothesis is that semantic representations are distributed, and thus filters must be studied in conjunction. In order to investigate this idea while enabling systematic visualization and quantification of multiple filter responses, we introduce the Net2Vec framework, in which semantic concepts are mapped to vectorial embeddings based on corresponding filter responses. By studying such embeddings, we are able to show that 1., in most cases, multiple filters are required to code for a concept, that 2., often filters are not concept specific and help encode multiple concepts, and that 3., compared to single filter activations, filter embeddings are able to better characterize the meaning of a representation and its relationship to other concepts. 
netinf  Given a set of events that spread between a set of nodes the algorithm infers the most likely stable diffusion network that is underlying the diffusion process. NetworkInference 
NetSim  Networks are everywhere and their many types, including social networks, the Internet, food webs etc., have been studied for the last few decades. However, in realworld networks, it’s hard to find examples that can be easily comparable, i.e. have the same density or even number of nodes and edges. We propose a flexible and extensible NetSim framework to understand how properties in different types of networks change with varying number of edges and vertices. Our approach enables to simulate three classical network models (random, smallworld and scalefree) with easily adjustable model parameters and network size. To be able to compare different networks, for a single experimental setup we kept the number of edges and vertices fixed across the models. To understand how they change depending on the number of nodes and edges we ran over 30,000 simulations and analysed different network characteristics that cannot be derived analytically. Two of the main findings from the analysis are that the average shortest path does not change with the density of the scalefree network but changes for smallworld and random networks; the apparent difference in mean betweenness centrality of the scalefree network compared with random and smallworld networks. 
NetTrim  We develop a fast, tractable technique called NetTrim for simplifying a trained neural network. The method is a convex postprocessing module, which prunes (sparsifies) a trained network layer by layer, while preserving the internal responses. We present a comprehensive analysis of NetTrim from both the algorithmic and sample complexity standpoints, centered on a fast, scalable convex optimization program. Our analysis includes consistency results between the initial and retrained models before and after NetTrim application and guarantees on the number of training samples needed to discover a network that can be expressed using a certain number of nonzero terms. Specifically, if there is a set of weights that uses at most $s$ terms that can recreate the layer outputs from the layer inputs, we can find these weights from $\mathcal{O}(s\log N/s)$ samples, where $N$ is the input size. These theoretical results are similar to those for sparse regression using the Lasso, and our analysis uses some of the same recentlydeveloped tools (namely recent results on the concentration of measure and convex analysis). Finally, we propose an algorithmic framework based on the alternating direction method of multipliers (ADMM), which allows a fast and simple implementation of NetTrim for network pruning and compression. 
Network Analysis  Network analysis is a quantitative methodology for studying properties related to connectivity and distances in graphs, with diverse applications like citation indexing and information retrieval on the Web. ➘ “Network Theory” ➘ “Social Network Analysis” A Short Course on Network Analysis Network Analysis for Wikipedia 
Network Based Diffusion Analysis (NBDA) 
Social learning has been documented in a wide diversity of animals. In freeliving animals, however, it has been difficult to discern whether animals learn socially by observing other group members or asocially by acquiring a new behaviour independently. We addressed this challenge by developing networkbased diffusion analysis (NBDA), which analyses the spread of traits through animal groups and takes into account that social network structure directs social learning opportunities. NBDA fits agentbased models of social and asocial learning to the observed data using maximumlikelihood estimation. The underlying learning mechanism can then be identified using model selection based on the Akaike information criterion. spatialnbda 
Network for Adversary Generation (NAG) 
Adversarial perturbations can pose a serious threat for deploying machine learning systems. Recent works have shown existence of imageagnostic perturbations that can fool classifiers over most natural images. Existing methods present optimization approaches that solve for a fooling objective with an imperceptibility constraint to craft the perturbations. However, for a given classifier, they generate one perturbation at a time, which is a single instance from the manifold of adversarial perturbations. Also, in order to build robust models, it is essential to explore the manifold of adversarial perturbations. In this paper, we propose for the first time, a generative approach to model the distribution of adversarial perturbations. The architecture of the proposed model is inspired from that of GANs and is trained using fooling and diversity objectives. Our trained generator network attempts to capture the distribution of adversarial perturbations for a given classifier and readily generates a wide variety of such perturbations. Our experimental evaluation demonstrates that perturbations crafted by our model (i) achieve stateoftheart fooling rates, (ii) exhibit wide variety and (iii) deliver excellent cross model generalizability. Our work can be deemed as an important step in the process of inferring about the complex manifolds of adversarial perturbations. 
Network In Network (NIN) 
We propose a novel deep network structure called ‘Network In Network’ (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the stateoftheart classification performances with NIN on CIFAR10 and CIFAR100, and reasonable performances on SVHN and MNIST datasets. GitXiv 
Network Laplacian Spectral Descriptor (NetLSD) 
Comparison among graphs is ubiquitous in graph analytics. However, it is a hard task in terms of the expressiveness of the employed similarity measure and the efficiency of its computation. Ideally, graph comparison should be invariant to the order of nodes and the sizes of compared graphs, adaptive to the scale of graph patterns, and scalable. Unfortunately, these properties have not been addressed together. Graph comparisons still rely on direct approaches, graph kernels, or representationbased methods, which are all inefficient and impractical for large graph collections. In this paper, we propose NetLSD (Network Laplacian Spectral Descriptor), a permutation and sizeinvariant, scaleadaptive, and scalably computable graph representation method that allows for straightforward comparisons. NetLSD hears the shape of a graph by extracting a compact signature that inherits the formal properties of the Laplacian spectrum, specifically its heat or wave kernel. To our knowledge, NetLSD is the first expressive graph representation that allows for efficient comparisons of large graphs, our evaluation on a variety of realworld graphs demonstrates that it outperforms previous works in both expressiveness and efficiency. 
Network Lasso (nLasso) 
A recently proposed learning algorithm for massive networkstructured data sets (big data over networks) is the network Lasso (nLasso), which extends the well known Lasso estimator from sparse models to networkstructured datasets. Efficient implementations of the nLasso have been presented using modern convex optimization methods. 
Network Mapping  Network mapping is the study of the physical connectivity of networks. Internet mapping is the study of the physical connectivity of the Internet. Network mapping discovers the devices on the network and their connectivity. It is not to be confused with network discovery or network enumerating which discovers devices on the network and their characteristics such as (operating system, open ports, listening network services, etc.). The field of automated network mapping has taken on greater importance as networks become more dynamic and complex in nature. 
Network Maximal Correlation (NMC) 
We introduce Network Maximal Correlation (NMC) as a multivariate measure of nonlinear association among random variables. NMC is defined via an optimization that infers (nontrivial) transformations of variables by maximizing aggregate inner products between transformed variables. We characterize a solution of the NMC optimization using geometric properties of Hilbert spaces for finite discrete and jointly Gaussian random variables. For finite discrete variables, we propose an algorithm based on alternating conditional expectation to determine NMC. We also show that empirically computed NMC converges to NMC exponentially fast in sample size. For jointly Gaussian variables, we show that under some conditions the NMC optimization is an instance of the MaxCut problem. We then illustrate an application of NMC and multiple MC in inference of graphical model for bijective, possibly nonmonotone, functions of jointly Gaussian variables generalizing the copula setup developed by Liu et al. Finally, we illustrate NMC’s utility in a real data application of learning nonlinear dependencies among genes in a cancer dataset. 
Network MetaAnalysis  I present methods for assessing the relative effectiveness of two treatments when they have not been compared directly in a randomized trial but have each been compared to other treatments. These network metaanalysis techniques allow estimation of both heterogeneity in the effect of any given treatment and inconsistency (‘incoherence’) in the evidence from different pairs of treatments. 
Network Scale Up Method (NSUM) 
The network scaleup method was developed by a team of researchers under grants from the U. S. National Science Foundation to H. Russell Bernard and Christopher McCarty at the University of Florida. The method can be applied now to estimating the size of hardtocount (or impossibletocount) populations but the method is a work in progress. Each new application provides data for improving the validity and accuracy of the estimates. As with the development of the model, these improvements require the efforts of survey researchers, mathematicians, and ethnographers. The network scaleup method was developed in conjunction with our team’s research on the rules governing who people know and how they know them. The particular list of people who people come to know in a lifetime may appear random, but the rules governing who we come to know are surely not random. One basic component of social structure is the number of people whom people know. NSUM 
Network Science  Network science is an interdisciplinary academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as ‘the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena.’ 
Network Sketching  Convolutional neural networks (CNNs) with deep architectures have substantially advanced the stateoftheart in computer vision tasks. However, deep networks are typically resourceintensive and thus difficult to be deployed on mobile devices. Recently, CNNs with binary weights have shown compelling efficiency to the community, whereas the accuracy of such models is usually unsatisfactory in practice. In this paper, we introduce network sketching as a novel technique of pursuing binaryweight CNNs, targeting at more faithful inference and better tradeoff for practical applications. Our basic idea is to exploit binary structure directly in pretrained filter banks and produce binaryweight models via tensor expansion. The whole process can be treated as a coarsetofine model approximation, akin to the pencil drawing steps of outlining and shading. To further speedup the generated models, namely the sketches, we also propose an associative implementation of binary tensor convolutions. Experimental results demonstrate that a proper sketch of AlexNet (or ResNet) outperforms the existing binaryweight models by large margins on the ImageNet large scale classification task, while the committed memory for network parameters only exceeds a little. 
Network Theory  In computer and network science, network theory is the study of graphs as a representation of either symmetric relations or, more generally, of asymmetric relations between discrete objects. Network theory is a part of graph theory. It has applications in many disciplines including statistical physics, particle physics, computer science, electrical engineering, biology, economics, operations research, and sociology. Applications of network theory include logistical networks, the World Wide Web, Internet, gene regulatory networks, metabolic networks, social networks, epistemological networks, etc; see List of network theory topics for more examples. Euler’s solution of the Seven Bridges of Königsberg problem is considered to be the first true proof in the theory of networks. 
Network Tikhono  Recovering a function or highdimensional parameter vector from indirect measurements is a central task in various scientific areas. Several methods for solving such inverse problems are well developed and well understood. Recently, novel algorithms using deep learning and neural networks for inverse problems appeared. While still in their infancy, these techniques show astonishing performance for applications like lowdose CT or various sparse data problems. However, theoretical results for deep learning in inverse problems are missing so far. In this paper, we establish such a convergence analysis for the proposed NETT (Network Tikhonov) approach to inverse problems. NETT considers regularized solutions having small value of a regularizer defined by a trained neural network. Opposed to existing deep learning approaches, our regularization scheme enforces data consistency also for the actual unknown to be recovered. This is beneficial in case the unknown to be recovered is not sufficiently similar to available training data. We present a complete convergence analysis for NETT, where we derive wellposedness results and quantitative error estimates, and propose a possible strategy for training the regularizer. Numerical results are presented for a tomographic sparse data problem using the $\ell^q$norm of autoencoder as trained regularizer, which demonstrate good performance of NETT even for unknowns of different type from the training data. 
Network Transplanting  This paper focuses on a novel problem, i.e., transplanting a categoryandtaskspecific neural network to a generic, distributed network without strong supervision. Like playing LEGO blocks, incrementally constructing a generic network by asynchronously merging specific neural networks is a crucial bottleneck for deep learning. Suppose that the pretrained specific network contains a module $f$ to extract features of the target category, and the generic network has a module $g$ for a target task, which is trained using other categories except for the target category. Instead of using numerous training samples to teach the generic network a new category, we aim to learn a small adapter module to connect $f$ and $g$ to accomplish the task on a target category in a weaklysupervised manner. The core challenge is to efficiently learn feature projections between the two connected modules. We propose a new distillation algorithm, which exhibited superior performance. Our method without training samples even significantly outperformed the baseline with 100 training samples. 
Network Vector  We propose a neural embedding algorithm called Network Vector, which learns distributed representations of nodes and the entire networks simultaneously. By embedding networks in a lowdimensional space, the algorithm allows us to compare networks in terms of structural similarity and to solve outstanding predictive problems. Unlike alternative approaches that focus on node level features, we learn a continuous global vector that captures each node’s global context by maximizing the predictive likelihood of random walk paths in the network. Our algorithm is scalable to real world graphs with many nodes. We evaluate our algorithm on datasets from diverse domains, and compare it with stateoftheart techniques in node classification, role discovery and concept analogy tasks. The empirical results show the effectiveness and the efficiency of our algorithm. 
NetworkClustered MultiModal Bug Localization (NetML) 
Developers often spend much effort and resources to debug a program. To help the developers debug, numerous information retrieval (IR)based and spectrumbased bug localization techniques have been devised. IRbased techniques process textual information in bug reports, while spectrumbased techniques process program spectra (i.e., a record of which program elements are executed for each test case). While both techniques ultimately generate a ranked list of program elements that likely contain a bug, they only consider one source of information–either bug reports or program spectra–which is not optimal. In light of this deficiency, this paper presents a new approach dubbed Networkclustered Multimodal Bug Localization (NetML), which utilizes multimodal information from both bug reports and program spectra to localize bugs. NetML facilitates an effective bug localization by carrying out a joint optimization of bug localization error and clustering of both bug reports and program elements (i.e., methods). The clustering is achieved through the incorporation of network Lasso regularization, which incentivizes the model parameters of similar bug reports and similar program elements to be close together. To estimate the model parameters of both bug reports and methods, NetML employs an adaptive learning procedure based on Newton method that updates the parameters on a perfeature basis. Extensive experiments on 355 real bugs from seven software systems have been conducted to benchmark NetML against various stateoftheart localization methods. The results show that NetML surpasses the bestperforming baseline by 31.82%, 22.35%, 19.72%, and 19.24%, in terms of the number of bugs successfully localized when a developer inspects the top 1, 5, and 10 methods and Mean Average Precision (MAP), respectively. 
Neumann Optimizer  Progress in deep learning is slowed by the days or weeks it takes to train large models. The natural solution of using more hardware is limited by diminishing returns, and leads to inefficient use of additional resources. In this paper, we present a large batch, stochastic optimization algorithm that is both faster than widely used algorithms for fixed amounts of computation, and also scales up substantially better as more computational resources become available. Our algorithm implicitly computes the inverse Hessian of each minibatch to produce descent directions; we do so without either an explicit approximation to the Hessian or Hessianvector products. We demonstrate the effectiveness of our algorithm by successfully training large ImageNet models (InceptionV3, Resnet50, Resnet101 and InceptionResnetV2) with minibatch sizes of up to 32000 with no loss in validation error relative to current baselines, and no increase in the total number of steps. At smaller minibatch sizes, our optimizer improves the validation error in these models by 0.80.9%. Alternatively, we can trade off this accuracy to reduce the number of training steps needed by roughly 1030%. Our work is practical and easily usable by others — only one hyperparameter (learning rate) needs tuning, and furthermore, the algorithm is as computationally cheap as the commonly used Adam optimizer. 
Neural Architecture Search (NAS) 
Neural Architecture Search (NAS) is a laborious process. Prior work on automated NAS targets mainly on improving accuracy, but lacks consideration of computational resource use. We propose the ResourceEfficient Neural Architect (RENA), an efficient resourceconstrained NAS using reinforcement learning with network embedding. RENA uses a policy network to process the network embeddings to generate new configurations. We demonstrate RENA on image recognition and keyword spotting (KWS) problems. RENA can find novel architectures that achieve high performance even with tight resource constraints. For CIFAR10, it achieves 2.95% test error when compute intensity is greater than 100 FLOPs/byte, and 3.87% test error when model size is less than 3M parameters. For Google Speech Commands Dataset, RENA achieves the stateoftheart accuracy without resource constraints, and it outperforms the optimized architectures with tight resource constraints. 
Neural Autoregressive Flows  Normalizing flows and autoregressive models have been successfully combined to produce stateoftheart results in density estimation, via Masked Autoregressive Flows (MAF), and to accelerate stateoftheart WaveNetbased speech synthesis to 20x faster than realtime, via Inverse Autoregressive Flows (IAF). We unify and generalize these approaches, replacing the (conditionally) affine univariate transformations of MAF/IAF with a more general class of invertible univariate transformations expressed as monotonic neural networks. We demonstrate that the proposed neural autoregressive flows (NAF) are universal approximators for continuous probability distributions, and their greater expressivity allows them to better capture multimodal target distributions. Experimentally, NAF yields stateoftheart performance on a suite of density estimation tasks and outperforms IAF in variational autoencoders trained on binarized MNIST. 
Neural Block Sampling  Efficient Monte Carlo inference often requires manual construction of modelspecific proposals. We propose an approach to automated proposal construction by training neural networks to provide fast approximations to block Gibbs conditionals. The learned proposals generalize to occurrences of common structural motifs both within a given model and across models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no modelspecific training required. We explore several applications including openuniverse Gaussian mixture models, in which our learned proposals outperform a handtuned sampler, and a realworld named entity recognition task, in which our sampler’s ability to escape local modes yields higher final F1 scores than singlesite Gibbs. 
Neural Collaborative Filtering (NCF) 
In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation — collaborative filtering — on the basis of implicit feedback. Although some recent work has employed deep learning for recommendation, they primarily used it to model auxiliary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in collaborative filtering — the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items. By replacing the inner product with a neural architecture that can learn an arbitrary function from data, we present a general framework named NCF, short for Neural networkbased Collaborative Filtering. NCF is generic and can express and generalize matrix factorization under its framework. To supercharge NCF modelling with nonlinearities, we propose to leverage a multilayer perceptron to learn the useritem interaction function. Extensive experiments on two realworld datasets show significant improvements of our proposed NCF framework over the stateoftheart methods. Empirical evidence shows that using deeper layers of neural networks offers better recommendation performance. 
Neural Component Analysis (NCA) 
Principal component analysis (PCA) is largely adopted for chemical process monitoring and numerous PCAbased systems have been developed to solve various fault detection and diagnosis problems. Since PCAbased methods assume that the monitored process is linear, nonlinear PCA models, such as autoencoder models and kernel principal component analysis (KPCA), has been proposed and applied to nonlinear process monitoring. However, KPCAbased methods need to perform eigendecomposition (ED) on the kernel Gram matrix whose dimensions depend on the number of training data. Moreover, prefixed kernel parameters cannot be most effective for different faults which may need different parameters to maximize their respective detection performances. Autoencoder models lack the consideration of orthogonal constraints which is crucial for PCAbased algorithms. To address these problems, this paper proposes a novel nonlinear method, called neural component analysis (NCA), which intends to train a feedforward neural work with orthogonal constraints such as those used in PCA. NCA can adaptively learn its parameters through backpropagation and the dimensionality of the nonlinear features has no relationship with the number of training samples. Extensive experimental results on the Tennessee Eastman (TE) benchmark process show the superiority of NCA in terms of missed detection rate (MDR) and false alarm rate (FAR). The source code of NCA can be found in https://…/NeuralComponentAnalysis.git. 
Neural Decision Trees  In this paper we propose a synergistic melting of neural networks and decision trees into a deep hashing neural network (HNN) having a modeling capability exponential with respect to its number of neurons. We first derive a soft decision tree named neural decision tree allowing the optimization of arbitrary decision function at each split node. We then rewrite this soft space partitioning as a new kind of neural network layer, namely the hashing layer (HL), which can be seen as a generalization of the known softmax layer. This HL can easily replace the standard last layer of ANN in any known network topology and thus can be used after a convolutional or recurrent neural network for example. We present the modeling capacity of this deep hashing function on small datasets where one can reach at least equally good results as standard neural networks by diminishing the number of output neurons. Finally, we show that for the case where the number of output neurons is large, the neural network can mitigate the absence of linear decision boundaries by learning for each difficult class a collection of not necessarily connected subregions of the space leading to more flexible decision surfaces. Finally, the HNN can be seen as a deep locality sensitive hashing function which can be trained in a supervised or unsupervised setting as we will demonstrate for classification and regression problems. 
Neural Decomposition (ND) 
We present a neural network technique for the analysis and extrapolation of timeseries data called Neural Decomposition (ND). Units with a sinusoidal activation function are used to perform a Fourierlike decomposition of training samples into a sum of sinusoids, augmented by units with nonperiodic activation functions to capture linear trends and other nonperiodic components. We show how careful weight initialization can be combined with regularization to form a simple model that generalizes well. Our method generalizes effectively on the MackeyGlass series, a dataset of unemployment rates as reported by the U.S. Department of Labor Statistics, a timeseries of monthly international airline passengers, the monthly ozone concentration in downtown Los Angeles, and an unevenly sampled timeseries of oxygen isotope measurements from a cave in north India. We find that ND outperforms popular timeseries forecasting techniques including LSTM, echo state networks, ARIMA, SARIMA, SVR with a radial basis function, and Gashler and Ashmore’s model. 
Neural Hawkes Process  Many events occur in the world. Some event types are stochastically excited or inhibited—in the sense of having their probabilities elevated or decreased—by patterns in the sequence of previous events. Discovering such patterns can help us predict which type of event will happen next and when. Learning such structure should benefit various applications, including medical prognosis, consumer behavior, and social media activity prediction. We propose to model streams of discrete events in continuous time, by constructing a neurally selfmodulating multivariate point process. This generative model allows past events to influence the future in complex ways, by conditioning future event intensities on the hidden state of a recurrent neural network that has consumed the stream of past events. We evaluate our model on multiple datasets and show that it significantly outperforms other strong baselines. 
Neural Inference Network (NIN) 
Neural networks have been learning complex multihop reasoning in various domains. One such formal setting for reasoning, logic, provides a challenging case for neural networks. In this article, we propose a Neural Inference Network (NIN) for learning logical inference over classes of logic programs. Trained in an endtoend fashion NIN learns representations of normal logic programs, by processing them at a character level, and the reasoning algorithm for checking whether a logic program entails a given query. We define 12 classes of logic programs that exemplify increased level of complexity of the inference process (multihop and default reasoning) and show that our NIN passes 10 out of the 12 tasks. We also analyse the learnt representations of logic programs that NIN uses to perform the logical inference. 
Neural Lattice Decoder  Lattice decoders constructed with neural networks are presented. Firstly, we show how the fundamental parallelotope is used as a compact set for the approximation by a neural lattice decoder. Secondly, we introduce the notion of Voronoireduced lattice basis. As a consequence, a first optimal neural lattice decoder is built from Boolean equations and the facets of the Voronoi region. This decoder needs no learning. Finally, we present two neural decoders with learning. It is shown that L1 regularization and a priori information about the lattice structure lead to a simplification of the model. 
Neural Lattice Language Model  In this work, we propose a new language modeling paradigm that has the ability to perform both prediction and moderation of information flow at multiple granularities: neural lattice language models. These models construct a lattice of possible paths through a sentence and marginalize across this lattice to calculate sequence probabilities or optimize parameters. This approach allows us to seamlessly incorporate linguistic intuitions – including polysemy and existence of multiword lexical items – into our language model. Experiments on multiple language modeling tasks show that English neural lattice language models that utilize polysemous embeddings are able to improve perplexity by 9.95% relative to a wordlevel baseline, and that a Chinese model that handles multicharacter tokens is able to improve perplexity by 20.94% relative to a characterlevel baseline. 
Neural Machine Translation (NMT) 
Neural machine translation (NMT) is the approach to machine translation in which a large neural network is trained to maximize translation performance. It is a radical departure from the phrasebased statistical translation approaches, in which a translation system consists of subcomponents that are separately optimized. The artificial neural network (ANN) is a model inspired by the functional aspects and structure of the brain’s biological neural networks. With use of ANN, it is possible to execute a number of tasks, such as classification, clustering, and prediction, using machine learning techniques like supervised or reinforced learning to learn or adjust net connections. A bidirectional recurrent neural network (RNN), known as an encoder, is used by the neural network to encode a source sentence for a second RNN, known as a decoder, that is used to predict words in the target language. NMT models are inspired by deep representation learning. They require only a fraction of the memory needed by traditional statistical machine translation (SMT) models. Furthermore, unlike conventional translation systems, each and every component of the neural translation model is trained jointly to maximize the translation performance. When a new neural network is created, it is trained for certain domains or applications. Once an automatic learning mechanism is established, the network practices. With time it starts operating according to its own judgment, turning into an ‘expert’. 
Neural Network Quine  Selfreplication is a key aspect of biological life that has been largely overlooked in Artificial Intelligence systems. Here we describe how to build and train selfreplicating neural networks. The network replicates itself by learning to output its own weights. The network is designed using a loss function that can be optimized with either gradientbased or nongradientbased methods. We also describe a method we call regeneration to train the network without explicit optimization, by injecting the network with predictions of its own parameters. The best solution for a selfreplicating network was found by alternating between regeneration and optimization steps. Finally, we describe a design for a selfreplicating neural network that can solve an auxiliary task such as MNIST image classification. We observe that there is a tradeoff between the network’s ability to classify images and its ability to replicate, but training is biased towards increasing its specialization at image classification at the expense of replication. This is analogous to the tradeoff between reproduction and other tasks observed in nature. We suggest that a selfreplication mechanism for artificial intelligence is useful because it introduces the possibility of continual improvement through natural selection. 
Neural Network Synthesis Tool (NeST) 
Neural networks (NNs) have begun to have a pervasive impact on various applications of machine learning. However, the problem of finding an optimal NN architecture for large applications has remained open for several decades. Conventional approaches search for the optimal NN architecture through extensive trialanderror. Such a procedure is quite inefficient. In addition, the generated NN architectures incur substantial redundancy. To address these problems, we propose an NN synthesis tool (NeST) that automatically generates very compact architectures for a given dataset. NeST starts with a seed NN architecture. It iteratively tunes the architecture with gradientbased growth and magnitudebased pruning of neurons and connections. Our experimental results show that NeST yields accurate yet very compact NNs with a wide range of seed architecture selection. For example, for the LeNet300100 (LeNet5) NN architecture derived from the MNIST dataset, we reduce network parameters by 34.1x (74.3x) and floatingpoint operations (FLOPs) by 35.8x (43.7x). For the AlexNet NN architecture derived from the ImageNet dataset, we reduce network parameters by 15.7x and FLOPs by 4.6x. All these results are the current stateoftheart for these architectures. 
Neural Networks / Artificial Neural Networks (ANN) 
In computer science and related fields, artificial neural networks (ANNs) are computational models inspired by an animal’s central nervous systems (in particular the brain) which is capable of machine learning as well as pattern recognition. Artificial neural networks are generally presented as systems of interconnected “neurons” which can compute values from inputs. neural,neuralnet 
Neural Predictive Coding (NPC) 
Learning speakerspecific features is vital in many applications like speaker recognition, diarization and speech recognition. This paper provides a novel approach, we term Neural Predictive Coding (NPC), to learn speakerspecific characteristics in a completely unsupervised manner from large amounts of unlabeled training data that even contain multispeaker audio streams. The NPC framework exploits the proposed shortterm activespeaker stationarity hypothesis which assumes two temporallyclose short speech segments belong to the same speaker, and thus a common representation that can encode the commonalities of both the segments, should capture the vocal characteristics of that speaker. We train a convolutional deep siamese network to produce ‘speaker embeddings’ by optimizing a loss function that increases betweenspeaker variability and decreases withinspeaker variability. The trained NPC model can produce these embeddings by projecting any test audio stream into a high dimensional manifold where speech frames of the same speaker come closer than they do in the raw feature space. Results in the framelevel speaker classification experiment along with the visualization of the embeddings manifest the distinctive ability of the NPC model to learn shortterm speakerspecific features as compared to raw MFCC features and ivectors. The utterancelevel speaker classification experiments show that concatenating simple statistics of the shortterm NPC embeddings over the whole utterance with the utterancelevel ivectors can give useful complimentary information to the ivectors and boost the classification accuracy. The results also show the efficacy of this technique to learn those characteristics from large amounts of unlabeled training set which has no prior information about the environment of the test set. 
Neural Process (NP) 
A neural network (NN) is a parameterised function that can be tuned via gradient descent to approximate a labelled collection of data with high precision. A Gaussian process (GP), on the other hand, is a probabilistic model that defines a distribution over possible functions, and is updated in light of data via the rules of probabilistic inference. GPs are probabilistic, dataefficient and flexible, however they are also computationally intensive and thus limited in their applicability. We introduce a class of neural latent variable models which we call Neural Processes (NPs), combining the best of both worlds. Like GPs, NPs define distributions over functions, are capable of rapid adaptation to new observations, and can estimate the uncertainty in their predictions. Like NNs, NPs are computationally efficient during training and evaluation but also learn to adapt their priors to data. We demonstrate the performance of NPs on a range of learning tasks, including regression and optimisation, and compare and contrast with related models in the literature. 
Neural Programmer  Deep neural networks have achieved impressive supervised classification performance in many tasks including image recognition, speech recognition, and sequence to sequence learning. However, this success has not been translated to applications like question answering that may involve complex arithmetic and logic reasoning. A major limitation of these models is in their inability to learn even simple arithmetic and logic operations. For example, it has been shown that neural networks fail to learn to add two binary numbers reliably. In this work, we propose Neural Programmer, an endtoend differentiable neural network augmented with a small set of basic arithmetic and logic operations. Neural Programmer can call these augmented operations over several steps, thereby inducing compositional programs that are more complex than the builtin operations. The model learns from a weak supervision signal which is the result of execution of the correct program, hence it does not require expensive annotation of the correct program itself. The decisions of what operations to call, and what data segments to apply to are inferred by Neural Programmer. Such decisions, during training, are done in a differentiable fashion so that the entire network can be trained jointly by gradient descent. We find that training the model is difficult, but it can be greatly improved by adding random noise to the gradient. On a fairly complex synthetic tablecomprehension dataset, traditional recurrent networks and attentional models perform poorly while Neural Programmer typically obtains nearly perfect accuracy. 
Neural Reasoner  We propose Neural Reasoner, a framework for neural networkbased reasoning over natural language sentences. Given a question, Neural Reasoner can infer over multiple supporting facts and find an answer to the question in specific forms. Neural Reasoner has 1) a specific interactionpooling mechanism, allowing it to examine multiple facts, and 2) a deep architecture, allowing it to model the complicated logical relations in reasoning tasks. Assuming no particular structure exists in the question and facts, Neural Reasoner is able to accommodate different types of reasoning and different forms of language expressions. Despite the model complexity, Neural Reasoner can still be trained effectively in an endtoend manner. Our empirical studies show that Neural Reasoner can outperform existing neural reasoning systems with remarkable margins on two difficult artificial tasks (Positional Reasoning and Path Finding) proposed in. For example, it improves the accuracy on Path Finding(10K) from 33.4% to over 98%. 
Neural Relational Inference (NRI) 
Interacting systems are prevalent in nature, from dynamical systems in physics to complex societal dynamics. The interplay of components can give rise to complex behavior, which can often be explained using a simple model of the system’s constituent parts. In this work, we introduce the neural relational inference (NRI) model: an unsupervised model that learns to infer interactions while simultaneously learning the dynamics purely from observational data. Our model takes the form of a variational autoencoder, in which the latent code represents the underlying interaction graph and the reconstruction is based on graph neural networks. In experiments on simulated physical systems, we show that our NRI model can accurately recover groundtruth interactions in an unsupervised manner. We further demonstrate that we can find an interpretable structure and predict complex dynamics in real motion capture and sports tracking data. 
Neural Semantic Encoders (NSE) 
We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders (NSE). NSE has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read, compose and write operations. NSE can access multiple and shared memories depending on the complexity of a task. We demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks, natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved stateoftheart performance when evaluated on publically available benchmarks. For example, our sharedmemory model showed an encouraging result on neural machine translation, improving an attentionbased baseline by approximately 1.0 BLEU. 
Neural SLAM  We present an approach for agents to learn representations of a global map from sensor data, to aid their exploration in new environments. To achieve this, we embed procedures mimicking that of traditional Simultaneous Localization and Mapping (SLAM) into the soft attention based addressing of external memory architectures, in which the external memory acts as an internal representation of the environment. This structure encourages the evolution of SLAMlike behaviors inside a completely differentiable deep neural network. We show that this approach can help reinforcement learning agents to successfully explore new environments where longterm memory is essential. We validate our approach in both challenging gridworld environments and preliminary Gazebo experiments. 
Neural Sobolev Descent  We introduce Regularized Kernel and Neural Sobolev Descent for transporting a source distribution to a target distribution along smooth paths of minimum kinetic energy (defined by the Sobolev discrepancy), related to dynamic optimal transport. In the kernel version, we give a simple algorithm to perform the descent along gradients of the Sobolev critic, and show that it converges asymptotically to the target distribution in the MMD sense. In the neural version, we parametrize the Sobolev critic with a neural network with input gradient norm constrained in expectation. We show in theory and experiments that regularization has an important role in favoring smooth transitions between distributions, avoiding large discrete jumps. Our analysis could provide a new perspective on the impact of critic updates (early stopping) on the paths to equilibrium in the GAN setting. 
Neural SPARQL Machine  In the last years, the Linked Data Cloud has achieved a size of more than 100 billion facts pertaining to a multitude of domains. However, accessing this information has been significantly challenging for lay users. Approaches to problems such as Question Answering on Linked Data and Link Discovery have notably played a role in increasing information access. These approaches are often based on handcrafted and/or statistical models derived from data observation. Recently, Deep Learning architectures based on Neural Networks called seq2seq have shown to achieve stateoftheart results at translating sequences into sequences. In this direction, we propose Neural SPARQL Machines, endtoend deep architectures to translate any natural language expression into sentences encoding SPARQL queries. Our preliminary results, restricted on selected DBpedia classes, show that Neural SPARQL Machines are a promising approach for Question Answering on Linked Data, as they can deal with known problems such as vocabulary mismatch and perform graph pattern composition. 
Neural Style Transfer  The recent work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNN) in creating artistic fantastic imagery by separating and recombing the image content and style. This process of using CNN to migrate the semantic content of one image to different styles is referred to as Neural Style Transfer. 
Neural Task Programming (NTP) 
In this work, we propose a novel robot learning framework called Neural Task Programming (NTP), which bridges the idea of fewshot learning from demonstration and neural program induction. NTP takes as input a task specification (e.g., video demonstration of a task) and recursively decomposes it into finer subtask specifications. These specifications are fed to a hierarchical neural program, where bottomlevel programs are callable subroutines that interact with the environment. We validate our method in three robot manipulation tasks. NTP achieves strong generalization across sequential tasks that exhibit hierarchal and compositional structures. The experimental results show that NTP learns to generalize well towards unseen tasks with increasing lengths, variable topologies, and changing objectives. 
Neural Tensor Factorization (NTF) 
Neural collaborative filtering (NCF) and recurrent recommender systems (RRN) have been successful in modeling useritem relational data. However, they are also limited in their assumption of static or sequential modeling of relational data as they do not account for evolving users’ preference over time as well as changes in the underlying factors that drive the change in useritem relationship over time. We address these limitations by proposing a Neural Tensor Factorization (NTF) model for predictive tasks on dynamic relational data. The NTF model generalizes conventional tensor factorization from two perspectives: First, it leverages the long shortterm memory architecture to characterize the multidimensional temporal interactions on relational data. Second, it incorporates the multilayer perceptron structure for learning the nonlinearities between different latent factors. Our extensive experiments demonstrate the significant improvement in rating prediction and link prediction on dynamic relational data by our NTF model over both neural network based factorization models and other traditional methods. 
Neural Tensor Network (NTN) 
The Neural Tensor Network (NTN) replaces a standard linear neural network layer with a bilinear tensor layer that directly relates two entity vectors across multiple dimensions. The model computes a score of how likely it is that two entities are in a certain relationship. 
Neural Turing Machines (NTM) 
We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable endtoend, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples. Neural Turing Machines are fully differentiable computers that use backpropagation to learn their own programming. 
Neural Vector Space Model (NVSM) 
We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In the NVSM paradigm, we learn lowdimensional representations of words and documents from scratch using gradient descent and rank documents according to their similarity with query representations that are composed from word representations. We show that NVSM performs better at document ranking than existing latent semantic vector space methods. The addition of NVSM to a mixture of lexical language models and a stateoftheart baseline vector space model yields a statistically significant increase in retrieval effectiveness. Consequently, NVSM adds a complementary relevance signal. Next to semantic matching, we find that NVSM performs well in cases where lexical matching is needed. NVSM learns a notion of term specificity directly from the document collection without feature engineering. We also show that NVSM learns regularities related to Luhn significance. Finally, we give advice on how to deploy NVSM in situations where model selection (e.g., crossvalidation) is infeasible. We find that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single crossvalidated model. Therefore, NVSM can safely be used for ranking documents without supervised relevance judgments. 
Neurally Directed Program Search (NDPS) 
We study the problem of generating interpretable and verifiable policies through reinforcement learning. Unlike the popular Deep Reinforcement Learning (DRL) paradigm, in which the policy is represented by a neural network, the aim in Programmatically Interpretable Reinforcement Learning is to find a policy that can be represented in a highlevel programming language. Such programmatic policies have the benefits of being more easily interpreted than neural networks, and being amenable to verification by symbolic methods. We propose a new method, called Neurally Directed Program Search (NDPS), for solving the challenging nonsmooth optimization problem of finding a programmatic policy with maxima reward. NDPS works by first learning a neural policy network using DRL, and then performing a local search over programmatic policies that seeks to minimize a distance from this neural ‘oracle’. We evaluate NDPS on the task of learning to drive a simulated car in the TORCS carracing environment. We demonstrate that NDPS is able to discover humanreadable policies that pass some significant performance bars. We also find that a welldesigned policy language can serve as a regularizer, and result in the discovery of policies that lead to smoother trajectories and are more easily transferred to environments not encountered during training. 
Neuroevolution  The quest to evolve neural networks through evolutionary algorithms. 
NeuroFuzzy  In the field of artificial intelligence, neurofuzzy refers to combinations of artificial neural networks and fuzzy logic. Neurofuzzy was proposed by J. S. R. Jang. Neurofuzzy hybridization results in a hybrid intelligent system that synergizes these two techniques by combining the humanlike reasoning style of fuzzy systems with the learning and connectionist structure of neural networks. Neurofuzzy hybridization is widely termed as Fuzzy Neural Network (FNN) or NeuroFuzzy System (NFS) in the literature. Neurofuzzy system (the more popular term is used henceforth) incorporates the humanlike reasoning style of fuzzy systems through the use of fuzzy sets and a linguistic model consisting of a set of IFTHEN fuzzy rules. The main strength of neurofuzzy systems is that they are universal approximators with the ability to solicit interpretable IFTHEN rules. The strength of neurofuzzy systems involves two contradictory requirements in fuzzy modeling: interpretability versus accuracy. In practice, one of the two properties prevails. The neurofuzzy in fuzzy modeling research field is divided into two areas: linguistic fuzzy modeling that is focused on interpretability, mainly the Mamdani model; and precise fuzzy modeling that is focused on accuracy, mainly the TakagiSugenoKang (TSK) model. Although generally assumed to be the realization of a fuzzy system through connectionist networks, this term is also used to describe some other configurations including: · Deriving fuzzy rules from trained RBF networks. · Fuzzy logic based tuning of neural network training parameters. · Fuzzy logic criteria for increasing a network size. · Realising fuzzy membership function through clustering algorithms in unsupervised learning in SOMs and neural networks. · Representing fuzzification, fuzzy inference and defuzzification through multilayers feedforward connectionist networks. It must be pointed out that interpretability of the Mamdanitype neurofuzzy systems can be lost. To improve the interpretability of neurofuzzy systems, certain measures must be taken, wherein important aspects of interpretability of neurofuzzy systems are also discussed. A recent research line addresses the data stream mining case, where neurofuzzy systems are sequentially updated with new incoming samples on demand and onthefly. Thereby, system updates do not only include a recursive adaptation of model parameters, but also a dynamic evolution and pruning of model components (neurons, rules), in order to handle concept drift and dynamically changing system behavior adequately and to keep the systems/models ‘uptodate’ anytime. Comprehensive surveys of various evolving neurofuzzy systems approaches can be found in and. frbs 
NeuroFuzzy System  Modern neurofuzzy systems are usually represented as special multilayer feedforward neural networks (see for example models like ANFIS , FuNe , Fuzzy RuleNet , GARIC , or NEFCLASS and NEFCON ). However, fuzzifications of other neural network architectures are also considered, for example selforganizing feature maps. In those neurofuzzy networks, connection weights and propagation and activation functions differ from common neural networks. Although there are a lot of different approaches , we usually use the term neuro–fuzzy system for approaches which display the following properties: · A neurofuzzy system is based on a fuzzy system which is trained by a learning algorithm derived from neural network theory. The (heuristical) learning procedure operates on local information, and causes only local modifications in the underlying fuzzy system. · A neurofuzzy system can be viewed as a 3layer feedforward neural network. The first layer represents input variables, the middle (hidden) layer represents fuzzy rules and the third layer represents output variables. Fuzzy sets are encoded as (fuzzy) connection weights. It is not necessary to represent a fuzzy system like this to apply a learning algorithm to it. However, it can be convenient, because it represents the data flow of input processing and learning within the model. Remark: Sometimes a 5layer architecture is used, where the fuzzy sets are represented in the units of the second and fourth layer. · A neurofuzzy system can be always (i.e.\ before, during and after learning) interpreted as a system of fuzzy rules. It is also possible to create the system out of training data from scratch, as it is possible to initialize it by prior knowledge in form of fuzzy rules. Remark: Not all neurofuzzy models specifiy learning procedures for fuzzy rule creation. · The learning procedure of a neurofuzzy system takes the semantical properties of the underlying fuzzy system into account. This results in constraints on the possible modifications applicable to the system parameters. Remark: Not all neurofuzzy approaches have this property. · A neurofuzzy system approximates an $n$dimensional (unknown) function that is partially defined by the training data. The fuzzy rules encoded within the system represent vague samples, and can be viewed as prototypes of the training data. A neurofuzzy system should not be seen as a kind of (fuzzy) expert system, and it has nothing to do with fuzzy logic in the narrow sense. frbs 
NeuroIndex  The article describes a new data structure called neuroindex. It is an alternative to wellknown file indexes. The neuroindex is fundamentally different because it stores weight coefficients in neural network. It is not a reference type like ‘keywordposition in a file’. 
Neuroinformatics  Neuroinformatics is a research field concerned with the organization of neuroscience data by the application of computational models and analytical tools. These areas of research are important for the integration and analysis of increasingly largevolume, highdimensional, and finegrain experimental data. Neuroinformaticians provide computational tools, mathematical models, and create interoperable databases for clinicians and research scientists. Neuroscience is a heterogeneous field, consisting of many and various subdisciplines (e.g., Cognitive Psychology, Behavioral Neuroscience, and Behavioral Genetics). In order for our understanding of the brain to continue to deepen, it is necessary that these subdisciplines are able to share data and findings in a meaningful way; Neuroinformaticians facilitate this. Neuroinformatics stands at the intersection of neuroscience and information science. Other fields, like genomics, have demonstrated the effectiveness of freelydistributed databases and the application of theoretical and computational models for solving complex problems. In Neuroinformatics, such facilities allow researchers to more easily quantitatively confirm their working theories by computational modeling. Additionally, neuroinformatics fosters collaborative researchan important fact that facilitates the field’s interest in studying the multilevel complexity of the brain. There are three main directions where neuroinformatics has to be applied: 1. the development of tools and databases for management and sharing of neuroscience data at all levels of analysis, 2. the development of tools for analyzing and modeling neuroscience data, 3. the development of computational models of the nervous system and neural processes. 
NEURON  Natural language interfaces for relational databases have been explored for several decades. Majority of the work have focused on translating natural language sentences to SQL queries or narrating SQL queries in natural language. Scant attention has been paid for natural language understanding of query execution plans (QEP) of SQL queries. In this demonstration, we present a novel generic system called NEURON that facilitates natural language interaction with QEPs. NEURON accepts a SQL query (which may include joins, aggregation, nesting, among other things) as input, executes it, and generates a natural languagebased description (both in text and voice form) of the execution strategy deployed by the underlying RDBMS. Furthermore, it facilitates understanding of various features related to the QEP through a natural languagebased question answering framework. NEURON can be potentially useful to database application developers in comprehending query execution strategies and to database instructors and students for pedagogical support. 
Newick Format  In mathematics, Newick tree format (or Newick notation or New Hampshire tree format) is a way of representing graphtheoretical trees with edge lengths using parentheses and commas. It was adopted by James Archie, William H. E. Day, Joseph Felsenstein, Wayne Maddison, Christopher Meacham, F. James Rohlf, and David Swofford, at two meetings in 1986, the second of which was at Newick’s restaurant in Dover, New Hampshire, US. The adopted format is a generalization of the format developed by Meacham in 1984 for the first treedrawing programs in Felsenstein’s PHYLIP package. ggtree 
NewSQL  NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) readwrite workloads while still maintaining the ACID guarantees of a traditional database system. 
Newtontype Alternating Minimization Algorithm (NAMA) 
We propose NAMA (Newtontype Alternating Minimization Algorithm) for solving structured nonsmooth convex optimization problems where the sum of two functions is to be minimized, one being strongly convex and the other composed with a linear mapping. The proposed algorithm is a linesearch method over a continuous, realvalued, exact penalty function for the corresponding dual problem, which is computed by evaluating the augmented Lagrangian at the primal points obtained by alternating minimizations. As a consequence, NAMA relies on exactly the same computations as the classical alternating minimization algorithm (AMA), also known as the dual proximal gradient method. Under standard assumptions the proposed algorithm possesses strong convergence properties, while under mild additional assumptions the asymptotic convergence is superlinear, provided that the search directions are chosen according to quasiNewton formulas. Due to its simplicity, the proposed method is well suited for embedded applications and largescale problems. Experiments show that using limitedmemory directions in NAMA greatly improves the convergence speed over AMA and its accelerated variant. 
NeymanPearson Classification  
NGram  In the fields of computational linguistics and probability, an ngram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The ngrams typically are collected from a text or speech corpus. An ngram of size 1 is referred to as a “unigram”; size 2 is a “bigram” (or, less commonly, a “digram”); size 3 is a “trigram”. Larger sizes are sometimes referred to by the value of n, e.g., “fourgram”, “fivegram”, and so on. 
NGram Machine (NGM) 
Deep neural networks (DNNs) had great success on NLP tasks such as language modeling, machine translation and certain question answering (QA) tasks. However, the success is limited at more knowledge intensive tasks such as QA from a big corpus. Existing endtoend deep QA models (Miller et al., 2016; Weston et al., 2014) need to read the entire text after observing the question, and therefore their complexity in responding a question is linear in the text size. This is prohibitive for practical tasks such as QA from Wikipedia, a novel, or the Web. We propose to solve this scalability issue by using symbolic meaning representations, which can be indexed and retrieved efficiently with complexity that is independent of the text size. More specifically, we use sequencetosequence models to encode knowledge symbolically and generate programs to answer questions from the encoded knowledge. We apply our approach, called the NGram Machine (NGM), to the bAbI tasks (Weston et al., 2015) and a special version of them (‘lifelong bAbI’) which has stories of up to 10 million sentences. Our experiments show that NGM can successfully solve both of these tasks accurately and efficiently. Unlike fully differentiable memory models, NGM’s time complexity and answering quality are not affected by the story length. The whole system of NGM is trained endtoend with REINFORCE (Williams, 1992). To avoid high variance in gradient estimation, which is typical in discrete latent variable models, we use beam search instead of sampling. To tackle the exponentially large search space, we use a stabilized autoencoding objective and a structure tweak procedure to iteratively reduce and refine the search space. 
Niching  Simply put, niching is a class of methods that try to converge to more than one solution during a single run. Niching is the idea of segmenting the population of the GA into disjoint sets, intended so that you have at least one member in each region of the fitness function that is ‘interesting’; generally by this we mean that you cover more than one local optima. Algorithm of the Week: Niching in Genetic Algorithms 
No Free Lunch Theorem (NFL) 
In mathematical folklore, the ‘no free lunch’ theorem (sometimes pluralized) of David Wolpert and William Macready appears in the 1997 ‘No Free Lunch Theorems for Optimization’. Wolpert had previously derived no free lunch theorems for machine learning (statistical inference). In 2005, Wolpert and Macready themselves indicated that the first theorem in their paper ‘state that any two optimization algorithms are equivalent when their performance is averaged across all possible problems’. The 1997 theorems of Wolpert and Macready are mathematically technicaland some find them unintuitive. The folkloric ‘no free lunch’ (NFL) theorem is an easily stated and easily understood consequence of theorems Wolpert and Macready actually prove. It is weaker than the proven theorems, and thus does not encapsulate them. Various investigators have extended the work of Wolpert and Macready substantively. http://…/No_free_lunch_in_search_and_optimization 
Node Link Diagram  Graphs are frequently drawn as nodelink diagrams in which the vertices are represented as disks, boxes, or textual labels and the edges are represented as line segments, polylines, or curves in the Euclidean plane. Nodelink diagrams can be traced back to the 13th century work of Ramon Llull, who drew diagrams of this type for complete graphs in order to analyze all pairwise combinations among sets of metaphysical concepts. 
Noise Sensitivity Score (NSS) 
Deep Neural Networks (DNN) have excessively advanced the field of computer vision by achieving state of the art performance in various vision tasks. These results are not limited to the field of vision but can also be seen in speech recognition and machine translation tasks. Recently, DNNs are found to poorly fail when tested with samples that are crafted by making imperceptible changes to the original input images. This causes a gap between the validation and adversarial performance of a DNN. An effective and generalizable robustness metric for evaluating the performance of DNN on these adversarial inputs is still missing from the literature. In this paper, we propose Noise Sensitivity Score (NSS), a metric that quantifies the performance of a DNN on a specific input under different forms of fixdirectional attacks. An insightful mathematical explanation is provided for deeply understanding the proposed metric. By leveraging the NSS, we also proposed a skewness based dataset robustness metric for evaluating a DNN’s adversarial performance on a given dataset. Extensive experiments using widely used state of the art architectures along with popular classification datasets, such as MNIST, CIFAR10, CIFAR100, and ImageNet, are used to validate the effectiveness and generalization of our proposed metrics. Instead of simply measuring a DNN’s adversarial robustness in the input domain, as previous works, the proposed NSS is built on top of insightful mathematical understanding of the adversarial attack and gives a more explicit explanation of the robustness. 
NoiseContrastive Estimation (NCE) 
Many parametric statistical models are not properly normalised and only specified up to an intractable partition function, which renders parameter estimation difficult. Examples of unnormalised models are Gibbs distributions, Markov random fields, and neural network models in unsupervised deep learning. In previous work, the estimation principle called noisecontrastive estimation (NCE) was introduced where unnormalised models are estimated by learning to distinguish between data and auxiliary noise. An open question is how to best choose the auxiliary noise distribution. We here propose a new method that addresses this issue. The proposed method shares with NCE the idea of formulating density estimation as a supervised learning problem but in contrast to NCE, the proposed method leverages the observed data when generating noise samples. The noise can thus be generated in a semiautomated manner. We first present the underlying theory of the new method, show that score matching emerges as a limiting case, validate the method on continuous and discrete valued synthetic data, and show that we can expect an improved performance compared to NCE when the data lie in a lowerdimensional manifold. Then we demonstrate its applicability in unsupervised deep learning by estimating a fourlayer neural image model. 
Noisin  Recurrent neural networks (RNNs) are powerful models of sequential data. They have been successfully used in domains such as text and speech. However, RNNs are susceptible to overfitting; regularization is important. In this paper we develop Noisin, a new method for regularizing RNNs. Noisin injects random noise into the hidden states of the RNN and then maximizes the corresponding marginal likelihood of the data. We show how Noisin applies to any RNN and we study many different types of noise. Noisin is unbiased–it preserves the underlying RNN on average. We characterize how Noisin regularizes its RNN both theoretically and empirically. On language modeling benchmarks, Noisin improves over dropout by as much as 12.2% on the Penn Treebank and 9.4% on the Wikitext2 dataset. We also compared the stateoftheart language model of Yang et al. 2017, both with and without Noisin. On the Penn Treebank, the method with Noisin more quickly reaches stateoftheart performance. 
Noisy Expectation Maximization (NEM) 
We present a noiseinjected version of the ExpectationMaximization (EM) algorithm: the Noisy Expectation Maximization (NEM) algorithm. The NEM algorithm uses noise to speed up the convergence of the EM algorithm. The NEM theorem shows that injected noise speeds up the average convergence of the EM algorithm to a local maximum of the likelihood surface if a positivity condition holds. The generalized form of the noisy expectationmaximization (NEM) algorithm allow for arbitrary modes of noise injection including adding and multiplying noise to the data. We demonstrate these noise benefits on EM algorithms for the Gaussian mixture model (GMM) with both additive and multiplicative NEM noise injection. A separate theorem (not presented here) shows that the noise benefit for independent identically distributed additive noise decreases with sample size in mixture models. This theorem implies that the noise benefit is most pronounced if the data is sparse. Injecting blind noise only slowed convergence. 
NoisyNet  We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent’s policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and $\epsilon$greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to superhuman performance. 
Nomogram  A nomogram, also called a nomograph, alignment chart or abaque, is a graphical calculating device, a twodimensional diagram designed to allow the approximate graphical computation of a function. The field of nomography was invented in 1884 by the French engineer Philbert Maurice d’Ocagne (18621938) and used extensively for many years to provide engineers with fast graphical calculations of complicated formulas to a practical precision. Nomograms use a parallel coordinate system invented by d’Ocagne rather than standard Cartesian coordinates. A nomogram consists of a set of n scales, one for each variable in an equation. Knowing the values of n1 variables, the value of the unknown variable can be found, or by fixing the values of some variables, the relationship between the unfixed ones can be studied. The result is obtained by laying a straightedge across the known values on the scales and reading the unknown value from where it crosses the scale for that variable. The virtual or drawn line created by the straightedge is called an index line or isopleth. 
Non Metric Space (Approximate) Library (NMSLIB) 
A NonMetric Space Library (‘NMSLIB’ <https://…/nmslib> ) wrapper, which according to the authors ‘is an efficient crossplatform similarity search library and a toolkit for evaluation of similarity search methods. The goal of the ‘NMSLIB’ <https://…/nmslib> Library is to create an effective and comprehensive toolkit for searching in generic nonmetric spaces. Being comprehensive is important, because no single method is likely to be sufficient in all cases. Also note that exact solutions are hardly efficient in high dimensions and/or nonmetric spaces. Hence, the main focus is on approximate methods’. The wrapper also includes Approximate Kernel kNearestNeighbor functions based on the ‘NMSLIB’ <https://…/nmslib> ‘Python’ Library. nmslibR 
Nonconvex Conditional Gradient Sliding (NCGS) 
We investigate a projection free method, namely conditional gradient sliding on batched, stochastic and finitesum nonconvex problem. CGS is a smart combination of Nesterov’s accelerated gradient method and FrankWolfe (FW) method, and outperforms FW in the convex setting by saving gradient computations. However, the study of CGS in the nonconvex setting is limited. In this paper, we propose the nonconvex conditional gradient sliding (NCGS) which surpasses the nonconvex FrankWolfe method in batched, stochastic and finitesum setting. 
NonHomogeneous Markov Switching Autoregressive Models (MSAR) 
In this paper, nonhomogeneous MarkovSwitching Autoregressive (MSAR) models are proposed to describe wind time series. In these models, several au toregressive models are used to describe the time evolution of the wind speed and the switching between these different models is controlled by a hidden Markov chain which represents the weather types. We first block the data by month in order to remove seasonal components and propose a MSAR model with nonhomogeneous autoregressive models to describe daily components. Then we discuss extensions where the hidden Markov chain is also nonstationary to handle seasonal and interannual fluctuations. NHMSAR 
Nonlinear Dimensionality Reduction (NLDR) 
Highdimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded nonlinear manifold within the higherdimensional space. If the manifold is of low enough dimension, the data can be visualised in the lowdimensional space. Topleft: a 3D dataset of 1000 points in a spiraling band (a.k.a. the Swiss roll) with a rectangular hole in the middle. Topright: the original 2D manifold used to generate the 3D dataset. Bottom left and right: 2D recoveries of the manifold respectively using the LLE and Hessian LLE algorithms as implemented by the Modular Data Processing toolkit. Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these nonlinear dimensionality reduction methods are related to the linear methods listed below. Nonlinear methods can be broadly classified into two groups: those that provide a mapping (either from the highdimensional space to the lowdimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements. 
Nonlinear expectation  In probability theory, a nonlinear expectation is a nonlinear generalization of the expectation. Nonlinear expectations are useful in utility theory as they more closely match human behavior than traditional expectations. 
Nonlinear Iterative Partial Least Squares (NIPALS) 
In statistics, nonlinear iterative partial least squares (NIPALS) is an algorithm for computing the first few components in a principal component or partial least squares analysis. For veryhighdimensional datasets, such as those generated in the ‘omics sciences (e.g., genomics, metabolomics) it is usually only necessary to compute the first few principal components. The nonlinear iterative partial least squares (NIPALS) algorithm calculates t1 and p1′ from X. The outer product, t1p1’ can then be subtracted from X leaving the residual matrix E1. This can be then used to calculate subsequent principal components. This results in a dramatic reduction in computational time since calculation of the covariance matrix is avoided. 
Nonlinear Simplex Regression Model  In this paper, we propose a simplex regression model in which both the mean and the dispersion parameters are related to covariates by nonlinear predictors. We provide closedform expressions for the score function, for Fisher’s information matrix and its inverse. Some diagnostic measures are introduced. We propose a residual, obtained using Fisher’s scoring iterative scheme for the estimation of the parameters that index the regression nonlinear predictor to the mean response and numerically evaluate its behaviour. We also derive the appropriate matrices for assessing local influence on the parameter estimates under different perturbation schemes. We also proposed a scheme for the choice of starting values for the Fisher’s iterative scheme for nonlinear simplex models. The diagnostic techniques were applied on actual data. The local influence analyses reveal that the simplex models can be a modeling alternative more robust to influential cases than the beta regression models, both to linear and nonlinear models. 
Nonlinear Variable Selection based on Derivatives (NVSD) 
We investigate structured sparsity methods for variable selection in regression problems where the target depends nonlinearly on the inputs. We focus on general nonlinear functions not limiting a priori the function space to additive models. We propose two new regularizers based on partial derivatives as nonlinear equivalents of group lasso and elastic net. We formulate the problem within the framework of learning in reproducing kernel Hilbert spaces and show how the variational problem can be reformulated into a more practical finite dimensional equivalent. We develop a new algorithm derived from the ADMM principles that relies solely on closed forms of the proximal operators. We explore the empirical properties of our new algorithm for Nonlinear Variable Selection based on Derivatives (NVSD) on a set of experiments and confirm favourable properties of our structuredsparsity models and the algorithm in terms of both prediction and variable selection accuracy. 
Nonlinearity Coefficient  For a long time, designing neural architectures that exhibit high performance was considered a dark art that required expert handtuning. One of the few wellknown guidelines for architecture design is the avoidance of exploding gradients, though even this guideline has remained relatively vague and circumstantial. We introduce the nonlinearity coefficient (NLC), a measurement of the complexity of the function computed by a neural network that is based on the magnitude of the gradient. Via an extensive empirical study, we show that the NLC is a powerful predictor of test error and that attaining a rightsized NLC is essential for optimal performance. The NLC exhibits a range of intriguing and important properties. It is closely tied to the amount of information gained from computing a single network gradient. It is tied to the error incurred when replacing the nonlinearity operations in the network with linear operations. It is not susceptible to the confounders of multiplicative scaling, additive bias and layer width. It is stable from layer to layer. Hence, we argue that the NLC is the first robust predictor of overfitting in deep networks. 
Nonlocal Neural Network  Both convolutional and recurrent operations are building blocks that process one local neighborhood at a time. In this paper, we present nonlocal operations as a generic family of building blocks for capturing longrange dependencies. Inspired by the classical nonlocal means method in computer vision, our nonlocal operation computes the response at a position as a weighted sum of the features at all positions. This building block can be plugged into many computer vision architectures. On the task of video classification, even without any bells and whistles, our nonlocal models can compete or outperform current competition winners on both Kinetics and Charades datasets. In static image recognition, our nonlocal models improve object detection/segmentation and pose estimation on the COCO suite of tasks. Code will be made available. 
Nonmetric MultiDimensional Scaling (NMDS) 
Nonmetric multidimensional scaling (MDS, also NMDS and NMS) is an ordination technique that differs in several ways from nearly all other ordination methods. In most ordination methods, many axes are calculated, but only a few are viewed, owing to graphical limitations. In MDS, a small number of axes are explicitly chosen prior to the analysis and the data are fitted to those dimensions; there are no hidden axes of variation. Second, most other ordination methods are analytical and therefore result in a single unique solution to a set of data. In contrast, MDS is a numerical technique that iteratively seeks a solution and stops computation when an acceptable solution has been found, or it stops after some prespecified number of attempts. As a result, an MDS ordination is not a unique solution and a subsequent MDS analysis on the same set of data and following the same methodology will likely result in a somewhat different ordination. Third, MDS is not an eigenvalueeigenvector technique like principal components analysis or correspondence analysis that ordinates the data such that axis 1 explains the greatest amount of variance, axis 2 explains the next greatest amount of variance, and so on. As a result, an MDS ordination can be rotated, inverted, or centered to any desired configuration. 
Nonnegative Matrix Factorization (NMF) 
Nonnegative matrix factorization (NMF), also nonnegative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This nonnegativity makes the resulting matrices easier to inspect. Since the problem is not exactly solvable in general, it is commonly approximated numerically. NMF finds applications in such fields as computer vision, document clustering, chemometrics and recommender systems. NMF 
Nonnegative Matrix Factorization ExpectationMaximization (NMFEM) 
Mixture models are among the most popular tools for model based clustering. However, when the dimension and the number of clusters is large, the estimation as well as the interpretation of the clusters become challenging. We propose a reduceddimension mixture model, where the K components parameters are combinations of words from a small dictionary – say H words with H«K . Including a Nonnegative Matrix Factorization (NMF) in the EM algorithm allows to simultaneously estimate the dictionary and the parameters of the mixture. We propose the acronym NMFEM for this algorithm. This original approach is motivated by passengers clustering from ticketing data: we apply NMFEM to ticketing data from two Transdev public transport networks. In this case, the words are easily interpreted as typical slots in a timetable. nmfem 
Nonparametric Behavior Clustering Inverse Reinforcement Learning  Inverse Reinforcement Learning (IRL) is the task of learning a single reward function given a Markov Decision Process (MDP) without defining the reward function, and a set of demonstrations generated by humans/experts. However, in practice, it may be unreasonable to assume that human behaviors can be explained by one reward function since they may be inherently inconsistent. Also, demonstrations may be collected from various users and aggregated to infer and predict user’s behaviors. In this paper, we introduce the Nonparametric Behavior Clustering IRL algorithm to simultaneously cluster demonstrations and learn multiple reward functions from demonstrations that may be generated from more than one behaviors. Our method is iterative: It alternates between clustering demonstrations into different behavior clusters and inverse learning the reward functions until convergence. It is built upon the ExpectationMaximization formulation and nonparametric clustering in the IRL setting. Further, to improve the computation efficiency, we remove the need of completely solving multiple IRL problems for multiple clusters during the iteration steps and introduce a resampling technique to avoid generating too many unlikely clusters. We demonstrate the convergence and efficiency of the proposed method through learning multiple driver behaviors from demonstrations generated from a gridworld environment and continuous trajectories collected from autonomous robot cars using the Gazebo robot simulator. 
Nonparametric Canonical Correlation Analysis (NCCA) 
Canonical correlation analysis (CCA) is a fundamental technique in multiview data analysis and representation learning. Several nonlinear extensions of the classical linear CCA method have been proposed, including kernel and deep neural network methods. These approaches restrict attention to certain families of nonlinear projections, which the user must specify (by choosing a kernel or a neural network architecture), and are computationally demanding. Interestingly, the theory of nonlinear CCA without any functional restrictions, has been studied in the population setting by Lancaster already in the 50’s. However, these results, have not inspired practical algorithms. In this paper, we revisit Lancaster’s theory, and use it to devise a practical algorithm for nonparametric CCA (NCCA). Specifically, we show that the most correlated nonlinear projections of two random vectors can be expressed in terms of the singular value decomposition of a certain operator associated with their joint density. Thus, by estimating the population density from data, NCCA reduces to solving an eigenvalue system, superficially like kernel CCA but, importantly, without having to compute the inverse of any kernel matrix. We also derive a partially linear CCA (PLCCA) variant in which one of the views undergoes a linear projection while the other is nonparametric. PLCCA turns out to have a similar form to the classical linear CCA, but with a nonparametric regression term replacing the linear regression in CCA. Using a kernel density estimate based on a small number of nearest neighbors, our NCCA and PLCCA algorithms are memoryefficient, often run much faster, and achieve better performance than kernel CCA and comparable performance to deep CCA. 
NonParametric Generalized Linear Model (NPGLM) 
In this paper, we try to solve the problem of temporal link prediction in information networks. This implies predicting the time it takes for a link to appear in the future, given its features that have been extracted at the current network snapshot. To this end, we introduce a probabilistic nonparametric approach, called ‘NonParametric Generalized Linear Model’ (NPGLM), which infers the hidden underlying probability distribution of the link advent time given its features. We then present a learning algorithm for NPGLM and an inference method to answer timerelated queries. Extensive experiments conducted on both synthetic data and realworld Sina Weibo social network demonstrate the effectiveness of NPGLM in solving temporal link prediction problem visavis competitive baselines. 
Nonparametric Neural Networks  Automatically determining the optimal size of a neural network for a given task without prior information currently requires an expensive global search and training many networks from scratch. In this paper, we address the problem of automatically finding a good network size during a single training cycle. We introduce *nonparametric neural networks*, a nonprobabilistic framework for conducting optimization over all possible network sizes and prove its soundness when network growth is limited via an L_p penalty. We train networks under this framework by continuously adding new units while eliminating redundant units via an L_2 penalty. We employ a novel optimization algorithm, which we term *adaptive radialangular gradient descent* or *AdaRad*, and obtain promising results. 
NonParametric Transformation Network (NPTN) 
ConvNets have been very effective in many applications where it is required to learn invariances to withinclass nuisance transformations. However, through their architecture, ConvNets only enforce invariance to translation. In this paper, we introduce a new class of convolutional architectures called NonParametric Transformation Networks (NPTNs) which can learn general invariances and symmetries directly from data. NPTNs are a direct and natural generalization of ConvNets and can be optimized directly using gradient descent. They make no assumption regarding structure of the invariances present in the data and in that aspect are very flexible and powerful. We also model ConvNets and NPTNs under a unified framework called Transformation Networks which establishes the natural connection between the two. We demonstrate the efficacy of NPTNs on natural data such as MNIST and CIFAR 10 where it outperforms ConvNet baselines with the same number of parameters. We show it is effective in learning invariances unknown apriori directly from data from scratch. Finally, we apply NPTNs to Capsule Networks and show that they enable them to perform even better. 
NonResponse Bias  Nonresponse bias occurs in statistical surveys if the answers of respondents differ from the potential answers of those who did not answer. 
NOnstationary Space TIme variable Latent Length scale GP (NOSTILLGP) 
One of the primary aspects of sustainable development involves accurate understanding and modeling of environmental phenomena. Many of these phenomena exhibit variations in both space and time and it is imperative to develop a deeper understanding of techniques that can model spacetime dynamics accurately. In this paper we propose NOSTILLGP – NOnstationary Space TIme variable Latent Length scale GP, a generic nonstationary, spatiotemporal Gaussian Process (GP) model. We present several strategies, for efficient training of our model, necessary for realworld applicability. Extensive empirical validation is performed using three realworld environmental monitoring datasets, with diverse dynamics across space and time. Results from the experiments clearly demonstrate general applicability and effectiveness of our approach for applications in environmental monitoring. 
Nonstationary Stochastic Processes  A stochastic process (a collection of random variables ordered in time, e.g. GDP(t)) is said to be (weakly) stationary if its mean and variance are constant over time, i.e. time invariant (along with its autocovariance). Such a time series will tend to return to its mean (mean reversion) and fluctuations around this mean will have a broadly constant amplitude. Alternatively, a stationary process will not drift too far away from its mean value because of the nite variance. By contrast, a nonstationary time series will have a timevarying mean or a timevarying variance or both. lmenssp 
NonUniform Fast Fourier Transform (NUFFT) 
Fourier analysis plays a natural role in a wide variety of applications, from medical imaging to radio astronomy, data analysis and the numerical solution of partial differential equations. When the sampling is uniform and the Fourier transform is desired at equispaced frequencies, the classical fast Fourier transform (FFT) has played a fundamental role in computation. The FFT requires O(N log N) work to compute N Fourier modes from N data points rather than O(N2) work. When the data is irregular in either the ‘physical’ or ‘frequency’ domain, unfortunately, the FFT does not apply. Over the last twenty years, a number of algorithms have been developed to overcome this limitation – generally referred to as nonuniform FFTs (NUFFT), nonequispaced FFTs (NFFT) or unequallyspaced FFTs (USFFT). They achieve the same O(N log N) computational complexity, but with a larger, precisiondependent, and dimensiondependent constant. http://…/glee_nufft_sirev.pdf https://…/optimizingpythonwithnumpyandnumba 
Norm  In linear algebra, functional analysis and related areas of mathematics, a norm is a function that assigns a strictly positive length or size to each vector in a vector space – save possibly for the zero vector, which is assigned a length of zero. A seminorm, on the other hand, is allowed to assign zero length to some nonzero vectors (in addition to the zero vector). A norm must also satisfy certain properties pertaining to scalability and additivity which are given in the formal definition below. A simple example is the 2dimensional Euclidean space R2 equipped with the Euclidean norm. Elements in this vector space (e.g., (3, 7)) are usually drawn as arrows in a 2dimensional cartesian coordinate system starting at the origin (0, 0). The Euclidean norm assigns to each vector the length of its arrow. Because of this, the Euclidean norm is often known as the magnitude. A vector space on which a norm is defined is called a normed vector space. Similarly, a vector space with a seminorm is called a seminormed vector space. It is often possible to supply a norm for a given vector space in more than one way. 
Normalization  In statistics and applications of statistics, normalization can have a range of meanings. In the simplest cases, normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging. In more complicated cases, normalization may refer to more sophisticated adjustments where the intention is to bring the entire probability distributions of adjusted values into alignment. In the case of normalization of scores in educational assessment, there may be an intention to align distributions to a normal distribution. A different approach to normalization of probability distributions is quantile normalization, where the quantiles of the different measures are brought into alignment. 
Normalized Mutual Information (NMI) 
NMI 
Normalized Nonnegative Models (NNM) 
We introduce normalized nonnegative models (NNM) for explorative data analysis. NNMs are partial convexifications of models from probability theory. We demonstrate their value at the example of item recommendation. We show that NNMbased recommender systems satisfy three criteria that all recommender systems should ideally satisfy: high predictive power, computational tractability, and expressive representations of users and items. Expressive user and item representations are important in practice to succinctly summarize the pool of customers and the pool of items. In NNMs, user representations are expressive because each user’s preference can be regarded as normalized mixture of preferences of stereotypical users. The interpretability of item and user representations allow us to arrange properties of items (e.g., genres of movies or topics of documents) or users (e.g., personality traits) hierarchically. 
Not only SQL (NoSQL) 
A NoSQL or Not Only SQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. The data structure (e.g. keyvalue, graph, or document) differs from the RDBMS, and therefore some operations are faster in NoSQL and some in RDBMS. There are differences though and the particular suitability of a given NoSQL DB depends on the problem to be solved (e.g. does the solution use graph algorithms?). The appearance of mature NoSQL databases has reduced the rationale for Java content repository (JCR) implementations. NoSQL databases are finding significant and growing industry use in big data and realtime web applications. NoSQL systems are also referred to as “Not only SQL” to emphasize that they may in fact allow SQLlike query languages to be used. Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor of availability and partition tolerance. Barriers to the greater adoption of NoSQL stores include the use of lowlevel query languages, the lack of standardized interfaces, and the huge investments already made in SQL by enterprises. Most NoSQL stores lack true ACID transactions, although a few recent systems, such as FairCom ctreeACE, Google Spanner and FoundationDB, have made them central to their designs. 
NoUTurn (NUTS) 
Algorithm by Hoffman and Gelman (2014): Hamiltonian Monte Carlo (HMC) is a Markov chain Monte Carlo (MCMC) algorithm that avoids the random walk behavior and sensitivity to correlated parameters that plague many MCMC methods by taking a series of steps informed by firstorder gradient information. These features allow it to converge to highdimensional target distributions much more quickly than simpler methods such as random walk Metropolis or Gibbs sampling. However, HMC’s performance is highly sensitive to two userspecified parameters: a step size ϵϵ and a desired number of steps LL. In particular, if LL is too small then the algorithm exhibits undesirable random walk behavior, while if LL is too large the algorithm wastes computation. We introduce the NoUTurn Sampler (NUTS), an extension to HMC that eliminates the need to set a number of steps LL. NUTS uses a recursive algorithm to build a set of likely candidate points that spans a wide swath of the target distribution, stopping automatically when it starts to double back and retrace its steps. Empirically, NUTS performs at least as efficiently as (and sometimes more efficiently than) a well tuned standard HMC method, without requiring user intervention or costly tuning runs. We also derive a method for adapting the step size parameter ϵϵ on the fly based on primaldual averaging. NUTS can thus be used with no handtuning at all, making it suitable for applications such as BUGSstyle automatic inference engines that require efficient ‘turnkey’ samplers. adnuts 
Novel Data Streams (NDS) 
We define NDS as those data streams whose content is initiated directly by the user (patient) themselves. This would exclude data sources such as electronic health records, disease registries, vital statistics, electronic lab reporting, emergency department visits, ambulance call data, school absenteeism, prescription pharmacy sales, serology, amongst others. Although ready access to aggregated information from these excluded sources is novel in many health settings, our focus here is on those streams which are both directly initiated by the user and also not alreadymaintained by public health departments or other health professionals. Despite this more narrow definition our suggestions for improving NDS surveillancemay also be applicable to more established surveillance systems, participatory systems (e.g., Flu Near You, influenzaNet) , and new data streams aggregated from established systems, such as Biosense and ISDS DiSTRIBuTE network. While much of the recent focus on using NDS for disease surveillance has centered on Internet search queries andTwitter posts , there aremanyNDS outside of these two sources.Our aim therefore is to provide a general framework for enhancing and developing NDS surveillance systems, which applies to more than just search data and Tweets. At aminimum, our definition ofNDS would include Internet search data and socialmedia, such as Google searches, Google Plus, Facebook, and Twitter posts, as well asWikipedia access logs, restaurant reservation and review logs, nonprescription pharmacy sales, news source scraping , and prediction markets. 
Novel Integration of the Sample and Thresholded covariance estimators (NOVELIST) 
We propose a ‘NOVEL Integration of the Sample and Thresholded covariance estimators’ (NOVELIST) to estimate the large covariance (correlation) and precision matrix. NOVELIST performs shrinkage of the sample covariance (correlation) towards its thresholded version. The sample covariance (correlation) component is nonsparse and can be lowrank in high dimensions. The thresholded sample covariance (correlation) component is sparse, and its addition ensures the stable invertibility of NOVELIST. The bene ts of the NOVELIST estimator include simplicity, ease of implementation, computational e ciency and the fact that its application avoids eigenanalysis. We obtain an explicit convergence rate in the operator norm over a large class of covariance (correlation) matrices when the dimension p and the sample size n satisfy log(p/n) > 0. In empirical comparisons with several popular estimators, the NOVELIST estimator in which the amount of shrinkage and thresholding is chosen by crossvalidation performs well in estimating covariance and precision matrices over a wide range of models and sparsity classes. http://…/poster_NOVELIST_Sept2014.pdf novelist 
Novelty Detection  Novelty detection is the identification of new or unknown data that a machine learning system has not been trained with and was not previously aware of, with the help of either statistical or machine learning based approaches. Novelty detection is one of the fundamental requirements of a good classification system. A machine learning system can never be trained with all the possible object classes and hence the performance of the network will be poor for those classes that are underrepresented in the training set. A good classification system must have the ability to differentiate between known and unknown objects during testing. For this purpose, different models for novelty detection have been proposed. Novelty detection is a hard problem in machine learning since it depends on the statistics of the already known information. A generally applicable, parameterfree method for outlier detection in a highdimensional space is not yet known. Novelty detection finds a variety of applications especially in signal processing, computer vision, pattern recognition, data mining and robotics. Another important application is the detection of a disease or potential fault whose class may be underrepresented in the training set. The statistical approaches to novelty detection may be classified into parametric and nonparametric approaches. Parametric approaches assume a specific statistical distribution (such as a Gaussian distribution) of data and statistical modeling based on data mean and covariance, whereas nonparametric approaches do not make any assumption on the statistical properties of data. http://…/mlsp09a.pdf http://…/mlsp09b.pdf http://…i=10.1.1.3.3578&rep=rep1&type=pdf http://…/smola09a.pdf http://…/karkaliwise2013.pdf 
NtMalDetect  As computing systems become increasingly advanced and as users increasingly engage themselves in technology, security has never been a greater concern. In malware detection, static analysis has been the prominent approach. This approach, however, quickly falls short as malicious programs become more advanced and adopt the capabilities of obfuscating its binaries to execute the same malicious functions, making static analysis virtually inapplicable to newer variants. The approach assessed in this paper uses dynamic analysis of malware which may generalize better than static analysis to variants. Widely used document classification techniques were assessed in detecting malware by doing such analysis on system call traces, a form of dynamic analysis. Features considered are extracted from system call traces of benign and malicious programs, and the task to classify these traces is treated as a binary document classification task using sparse features. The system call traces were processed to remove the parameters to only leave the system call function names. The features were grouped into various ngrams and weighted with Term FrequencyInverse Document Frequency. Support Vector Machines were used and optimized using a Stochastic Gradient Descent algorithm that implemented L1, L2, and ElasticNet regularization terms, the best of which achieved a highest of 98% accuracy with 98% recall score. Additional contributions include the identification of significant system call sequences that could be avenues for further research. 
Null Hypothesis Significance Testing (NHST) 
Null Hypothesis Significance Testing (NHST) is a statistical method for testing whether the factor we are talking about has the effect on our observation. For example, a t test or an ANOVA test for comparing the means is a good example of NHST. It probably the most common statistical testing used in HCI. http://…/hypothesistestingisonlymostly.html 
NullHop  Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many stateoftheart (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power consumption becomes a problem for real time mobile applications. We propose a flexible and efficient CNN accelerator architecture which can support the implementation of SOA CNNs in lowpower and lowlatency application scenarios. This architecture exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across a wide range of convolutional network kernel sizes; and numbers of input and output feature maps. We implemented the proposed architecture on an FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. We show how in RTL simulations in a 28nm process with a clock frequency of 500MHz, the NullHop core is able to reach over 450 GOp/s and efficiency of 368%, maintaining over 98% utilization of the MAC units and achieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2 
Numenta Anomaly Benchmark (NAB) 
Much of the world’s data is streaming, timeseries data, where anomalies give significant information in critical situations; examples abound in domains such as finance, IT, security, medical, and energy. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in realtime, not batches, and learn while simultaneously making predictions. There are no benchmarks to adequately test and score the efficacy of realtime anomaly detectors. Here we propose the Numenta Anomaly Benchmark (NAB), which attempts to provide a controlled and repeatable environment of opensource tools to test and measure anomaly detection algorithms on streaming data. The perfect detector would detect all anomalies as soon as possible, trigger no false alarms, work with realworld timeseries data across a variety of domains, and automatically adapt to changing statistics. Rewarding these characteristics is formalized in NAB, using a scoring algorithm designed for streaming data. NAB evaluates detectors on a benchmark dataset with labeled, realworld timeseries data. We present these components, and give results and analyses for several open source, commerciallyused algorithms. The goal for NAB is to provide a standard, open source framework with which the research community can compare and evaluate different algorithms for detecting anomalies in streaming data. 
Numerical Formal Concept Analysis (nFCA) 
Numerical Formal Concept Analysis (nFCA) technique: Formal Concept Analysis (FCA) is a powerful method in computer science (CS) for identifying overall inherent structures within and between the row and column variables (called objects and attributes in CS) of a binary data set. It is a bit like lifting up the overall hierarchical structure of a forest from a superposition based on simple local information, ie. pairwise relationships between variables of the data. The objective of nFCA is to combine FCA and statistics to translate what an FCA can offer for binary data to numerical data. The end product of our nFCA is a pair of nFCA graphs, where the Hgraph is a clustered lattice graph indicating inherent hierarchical and clustered relations and the Igraph is a complementary tree plot indicating the strength and directions of each of the relations and additional network relationships. The nFCA performs better than the conventional hierarchical clustering methods in terms of the Cophenetic correlation coefficient and the relational structure. nFCA 
Numerical Template Toolbox (NT2) 
The Numerical Template Toolbox (NT2) is an Open Source C++ library aimed at simplifying the development, debugging and optimization of highperformance computing applications by providing a Matlab like syntax that eases the transition between prototype and actual application. RcppNT2 
nutsflow/ml  Data preprocessing is a fundamental part of any machine learning application and frequently the most timeconsuming aspect when developing a machine learning solution. Preprocessing for deep learning is characterized by pipelines that lazily load data and perform data transformation, augmentation, batching and logging. Many of these functions are common across applications but require different arrangements for training, testing or inference. Here we introduce a novel software framework named nutsflow/ml that encapsulates common preprocessing operations as components, which can be flexibly arranged to rapidly construct efficient preprocessing pipelines for deep learning. 
NVIDIA Deep Learning GPU Training System (DIGITS) 
The NVIDIA Deep Learning GPU Training System (DIGITS) puts the power of deep learning in the hands of data scientists and researchers. Quickly design the best deep neural network (DNN) for your data using realtime network behavior visualization. Best of all, DIGITS is a complete system so you don’t have to write any code. Get started with DIGITS in under an hour. 
NyquistShannon Sampling Theorem  In the field of digital signal processing, the sampling theorem is a fundamental bridge between continuoustime signals (often called ‘analog signals’) and discretetime signals (often called ‘digital signals’). It establishes a sufficient condition between a signal’s bandwidth and the sample rate that permits a discrete sequence of samples to capture all the information from the continuoustime signal. Strictly speaking, the theorem only applies to a class of mathematical functions having a Fourier transform that is zero outside of a finite region of frequencies. Intuitively we expect that when one reduces a continuous function to a discrete sequence and interpolates back to a continuous function, the fidelity of the result depends on the density (or sample rate) of the original samples. The sampling theorem introduces the concept of a sample rate that is sufficient for perfect fidelity for the class of functions that are bandlimited to a given bandwidth, such that no actual information is lost in the sampling process. It expresses the sufficient sample rate in terms of the bandwidth for the class of functions. The theorem also leads to a formula for perfectly reconstructing the original continuoustime function from the samples. Perfect reconstruction may still be possible when the samplerate criterion is not satisfied, provided other constraints on the signal are known. (See § Sampling of nonbaseband signals below, and Compressed sensing.) The name NyquistShannon sampling theorem honors Harry Nyquist and Claude Shannon. The theorem was also discovered independently by E. T. Whittaker, by Vladimir Kotelnikov, and by others. So it is also known by the names NyquistShannonKotelnikov, WhittakerShannonKotelnikov, WhittakerNyquistKotelnikovShannon, and cardinal theorem of interpolation. http://…Nyquist%E2%80%93Shannonsamplingtheorem 
Advertisements