Naive Bayes Classifier  A naive Bayes classifier is a simple probabilistic classifier based on applying Bayes’ theorem with strong (naive) independence assumptions. A more descriptive term for the underlying probability model would be “independent feature model”. An overview of statistical classifiers is given in the article on pattern recognition. 
Named Entity Extraction  
Named Entity Recognition (NER) 
Namedentity recognition (NER) (also known as entity identification, entity chunking and entity extraction) is a subtask of information extraction that seeks to locate and classify elements in text into predefined categories such as the names of persons, organizations, locations, expressions of times, quantities, monetary values, percentages, etc. Most research on NER systems has been structured as taking an unannotated block of text, such as this one: Jim bought 300 shares of Acme Corp. in 2006. And producing an annotated block of text that highlights the names of entities: Person bought 300 shares of Organization in Time. In this example, a person name consisting of one token, a twotoken company name and a temporal expression have been detected and classified. Stateoftheart NER systems for English produce nearhuman performance. For example, the best system entering MUC7 scored 93.39% of Fmeasure while human annotators scored 97.60% and 96.95%. http://…/aijwikiner.pdf 
Named Entity Recognition and Classification (NERC) 
The term ‘Named Entity’, now widely used in Natural Language Processing, was coined for the Sixth Message Understanding Conference (MUC6) (R. Grishman & Sundheim 1996). At that time, MUC was focusing on Information Extraction (IE) tasks where structured information of company activities and defense related activities is extracted from unstructured text, such as newspaper articles. In defining the task, people noticed that it is essential to recognize information units like names, including person, organization and location names, and numeric expressions including time, date, money and percent expressions. Identifying references to these entities in text was recognized as one of the important subtasks of IE and was called ‘Named Entity Recognition and Classification (NERC)’. 
Named Entity Recognizer (NER) 
Stanford NER is a Java implementation of a Named Entity Recognizer. Named Entity Recognition (NER) labels sequences of words in a text which are the names of things, such as person and company names, or gene and protein names. It comes with wellengineered feature extractors for Named Entity Recognition, and many options for defining feature extractors. Included with the download are good named entity recognizers for English, particularly for the 3 classes (PERSON, ORGANIZATION, LOCATION), and we also make available on this page various other models for different languages and circumstances, including models trained on just the CoNLL 2003 English training data. The distributional similarity features in some models improve performance but the models require considerably more memory. Stanford NER is also known as CRFClassifier. The software provides a general implementation of (arbitrary order) linear chain Conditional Random Field (CRF) sequence models. That is, by training your own models, you can actually use this code to build sequence models for any task. (CRF models were pioneered by Lafferty, McCallum, and Pereira (2001); see Sutton and McCallum (2006) or Sutton and McCallum (2010) for more comprehensible introductions.) 
Natural Language Generation  Natural Language Generation (NLG) is the natural language processing task of generating natural language from a machine representation system such as a knowledge base or a logical form. Psycholinguists prefer the term language production when such formal representations are interpreted as models for mental representations. It could be said an NLG system is like a translator that converts a computer based representation into a natural language representation. However, the methods to produce the final language are different from those of a compiler due to the inherent expressivity of natural languages. NLG may be viewed as the opposite of natural language understanding: whereas in natural language understanding the system needs to disambiguate the input sentence to produce the machine representation language, in NLG the system needs to make decisions about how to put a concept into words. Simple examples are systems that generate form letters. These do not typically involve grammar rules, but may generate a letter to a consumer, e.g. stating that a credit card spending limit was reached. More complex NLG systems dynamically create texts to meet a communicative goal. As in other areas of natural language processing, this can be done using either explicit models of language (e.g., grammars) and the domain, or using statistical models derived by analysing humanwritten texts. 
Natural Language Inference (NLI) 
Inference has been a central topic in artificial intelligence from the start, but while automatic methods for formal deduction have advanced tremendously, comparatively little progress has been made on the problem of natural language inference (NLI), that is, determining whether a natural language hypothesis h can justifiably be inferred from a natural language premise p. The challenges of NLI are quite different from those encountered in formal deduction: the emphasis is on informal reasoning, lexical semantic knowledge, and variability of linguistic expression. 
Natural Language Processing (NLP) 
Natural language processing (NLP) is a field of computer science, artificial intelligence, and linguistics concerned with the interactions between computers and human (natural) languages. As such, NLP is related to the area of human – computer interaction. Many challenges in NLP involve natural language understanding, that is, enabling computers to derive meaning from human or natural language input, and others involve natural language generation. NLP,openNLP 
Natural Language Query  A natural language query consists only of normal terms in the user’s language, without any special syntax or format. 
Natural Language Toolkit (NLTK) 
The Natural Language Toolkit, or more commonly NLTK, is a suite of libraries and programs for symbolic and statistical natural language processing (NLP) for the Python programming language. NLTK includes graphical demonstrations and sample data. It is accompanied by a book that explains the underlying concepts behind the language processing tasks supported by the toolkit, plus a cookbook. NLTK is intended to support research and teaching in NLP or closely related areas, including empirical linguistics, cognitive science, artificial intelligence, information retrieval, and machine learning. NLTK has been used successfully as a teaching tool, as an individual study tool, and as a platform for prototyping and building research systems. http://www.nltk.org 
Natural Language Understanding (NLU) 
Natural language understanding (NLU) is a subtopic of natural language processing in artificial intelligence that deals with machine reading comprehension. NLU is considered an AIhard problem. The process of disassembling and parsing input is more complex than the reverse process of assembling output in natural language generation because of the occurrence of unknown and unexpected features in the input and the need to determine the appropriate syntactic and semantic schemes to apply to it, factors which are predetermined when outputting language.[dubious – discuss] There is considerable commercial interest in the field because of its application to newsgathering, text categorization, voiceactivation, archiving, and largescale contentanalysis. 
Natural Parameter Networks (NPN) 
Neural networks (NN) have achieved stateoftheart performance in various applications. Unfortunately in applications where training data is insufficient, they are often prone to overfitting. One effective way to alleviate this problem is to exploit the Bayesian approach by using Bayesian neural networks (BNN). Another shortcoming of NN is the lack of flexibility to customize different distributions for the weights and neurons according to the data, as is often done in probabilistic graphical models. To address these problems, we propose a class of probabilistic neural networks, dubbed naturalparameter networks (NPN), as a novel and lightweight Bayesian treatment of NN. NPN allows the usage of arbitrary exponentialfamily distributions to model the weights and neurons. Different from traditional NN and BNN, NPN takes distributions as input and goes through layers of transformation before producing distributions to match the target output distributions. As a Bayesian treatment, efficient backpropagation (BP) is performed to learn the natural parameters for the distributions over both the weights and neurons. The output distributions of each layer, as byproducts, may be used as secondorder representations for the associated tasks such as link prediction. Experiments on realworld datasets show that NPN can achieve stateoftheart performance. 
ND4J  ND4J is a scientific computing library for the JVM. It is meant to be used in production environments rather than as a research tool, which means routines are designed to run fast with minimum RAM requirements. 
NearBucket Locality Sensitive Hashing (NearBucketLSH) 
We present NearBucketLSH, an effective algorithm for similarity search in largescale distributed online social networks organized as peertopeer overlays. As communication is a dominant consideration in distributed systems, we focus on minimizing the network cost while guaranteeing good search quality. Our algorithm is based on Locality Sensitive Hashing (LSH), which limits the search to collections of objects, called buckets, that have a high probability to be similar to the query. More specifically, NearBucketLSH employs an LSH extension that searches in near buckets, and improves search quality but also significantly increases the network cost. We decrease the network cost by considering the internals of both LSH and the P2P overlay, and harnessing their properties to our needs. We show that our NearBucketLSH increases search quality for a given network cost compared to previous art. In many cases, the search quality increases by more than 50%. 
Nearest Descent (ND) 

Nearest Neighbor Descent (NND) 

NearFar Matching  Nearfar matching is a study design technique for preprocessing observational data to mimic a pairrandomized trial. Individuals are matched to be near on measured confounders and far on levels of an instrumental variable. nearfar 
Necessary Condition Analysis (NCA) 
Theoretical ‘necessary but not sufficient’ statements are common in the organizational sciences. Traditional data analyses approaches (e.g., correlation or multiple regression) are not appropriate for testing or inducing such statements. This paper proposes Necessary Condition Analysis (NCA) as a general and straightforward methodology for identifying necessary conditions in datasets. The paper presents the logic and methodology of necessary but not sufficient contributions of organizational determinants (e.g., events, characteristics, resources, efforts) to a desired outcome (e.g., good performance). A necessary determinant must be present for achieving an outcome, but its presence is not sufficient to obtain that outcome. Without the necessary condition, there is guaranteed failure, which cannot be compensated by other determinants of the outcome. This logic and its related methodology are fundamentally different from the traditional sufficiencybased logic and methodology. Practical recommendations and free software are offered to support researchers to apply NCA. NCA 
Negative Binomial Regression (NBR) 
Negative binomial regression is for modeling count variables, usually for overdispersed count outcome variables. NegBinBetaBinreg 
NelderMead Method  The NelderMead method or downhill simplex method or amoeba method is a commonly applied numerical method used to find the minimum or maximum of an objective function in a manydimensional space. It is applied to nonlinear optimization problems for which derivatives may not be known. However, the NelderMead technique is a heuristic search method that can converge to nonstationary points on problems that can be solved by alternative methods. The NelderMead technique was proposed by John Nelder & Roger Mead (1965). 
Neo4j  Neo4j is an opensource graph database, implemented in Java. The developers describe Neo4j as ’embedded, diskbased, fully transactional Java persistence engine that stores data structured in graphs rather than in tables’. Neo4j is the most popular graph database. Neo4j version 1.0 was released in February, 2010. The community edition of the database is licensed under the free GNU General Public License (GPL) v3. The additional modules, such as online backup and high availability, are licensed under the free Affero General Public License (AGPL) v3. The database, with the additional modules, is also available under a commercial license, in a dual license model. Neo4j version 2.0 was released in December, 2013. Neo4j was developed by Neo Technology, Inc., based in the San Francisco Bay Area, US and Malmö, Sweden. RNeo4j 
Nested Association Mapping (NAM) 
Nested association mapping (NAM) is a technique designed by the labs of Edward Buckler, James Holland, and Michael McMullen for identifying and dissecting the genetic architecture of complex traits in corn (Zea mays). It is important to note that nested association mapping (unlike Association mapping) is a specific technique that cannot be performed outside of a specifically designed population such as the Maize NAM population. NAM 
Nested Chinese Restaurant Process (NCRP) 
The nested Chinese restaurant process (nCRP) is a stochastic process that assigns probability distributions to ensembles of inÞnitely deep, inÞnitely branching trees. 
Nested Dirichlet Process Mixture of Products of Multinomial Distributions (NDPMPM) 
We present a Bayesian model for estimating the joint distribution of multivariate categorical data when units are nested within groups. Such data arise frequently in social science settings, for example, people living in households. The model assumes that (i) each group is a member of a grouplevel latent class, and (ii) each unit is a member of a unitlevel latent class nested within its grouplevel latent class. This structure allows the model to capture dependence among units in the same group. It also fa cilitates simultaneous modeling of variables at both group and unit levels. We develop a version of the model that assigns zero probability to groups and units with physically impossible combinations of variables. We apply the model to estimate multivariate relationships in a subset of the Ameri can Community Survey. Using the estimated model, we generate synthetic household data that could be disseminated as redacted public use les. Supplementary materials for this article are available online. NestedCategBayesImpute 
Nested Error Regression Model  This paper suggests the nested error regression model, with use of uncertain random effects, which means that the random effects in each area are expressed as a mixture of a normal distribution and a positive mass at 0. For estimation of model parameters and prediction of random effects, we consider Bayesian yet objective inference by setting improper prior distributions on the model parameters. We show under the mild sufficient condition that the posterior distribution is proper and the posterior variances are finite to confirm validity of posterior inference. To generate samples from the posterior distribution, we provide the Gibbs sampling method. The full conditional distributions of the posterior distribution are all familiar forms such that the proposed methodology is easy to implement. This paper also addresses the problem of prediction of finite population means and we provide a sampling based method to tackle this issue. We compare the proposed model with the conventional nested error regression model through simulation and empirical studies. 
Nested Sampling Algorithm  The nested sampling algorithm is a computational approach to the problem of comparing models in Bayesian statistics, developed in 2004 by physicist John Skilling. 
Nesterov’s Accelerated Gradient (NAG) 
Nesterov’s Accelerated Gradient Descent performs a simple step of gradient descent to go from x_s to y_{s+1}, and then it ‘slides’ a little bit further than y_{s+1} in the direction given by the previous point y_s. The intuition behind the algorithm is quite difficult to grasp, and unfortunately the analysis will not be very enlightening either. Nonetheless Nesterov’s Accelerated Gradient is an optimal method (in terms of oracle complexity) for smooth convex optimization, 
Net#  Neural networks are one of the most popular machine learning algorithms today. One of the challenges when using neural networks is how to define a network topology given the variety of possible layer types, connections among them, and activation functions. Net# solves this problem by providing a succinct way to define almost any neural network architecture in a descriptive, easytoread format. This post provides a short tutorial for building a neural network using the Net# language to classify images of handwritten numeric digits in Microsoft Azure Machine Learning. 
netinf  Given a set of events that spread between a set of nodes the algorithm infers the most likely stable diffusion network that is underlying the diffusion process. NetworkInference 
Network Analysis  Network analysis is a quantitative methodology for studying properties related to connectivity and distances in graphs, with diverse applications like citation indexing and information retrieval on the Web. ➘ “Network Theory” ➘ “Social Network Analysis” A Short Course on Network Analysis Network Analysis for Wikipedia 
Network Based Diffusion Analysis (NBDA) 
Social learning has been documented in a wide diversity of animals. In freeliving animals, however, it has been difficult to discern whether animals learn socially by observing other group members or asocially by acquiring a new behaviour independently. We addressed this challenge by developing networkbased diffusion analysis (NBDA), which analyses the spread of traits through animal groups and takes into account that social network structure directs social learning opportunities. NBDA fits agentbased models of social and asocial learning to the observed data using maximumlikelihood estimation. The underlying learning mechanism can then be identified using model selection based on the Akaike information criterion. spatialnbda 
Network In Network (NIN) 
We propose a novel deep network structure called ‘Network In Network’ (NIN) to enhance model discriminability for local patches within the receptive field. The conventional convolutional layer uses linear filters followed by a nonlinear activation function to scan the input. Instead, we build micro neural networks with more complex structures to abstract the data within the receptive field. We instantiate the micro neural network with a multilayer perceptron, which is a potent function approximator. The feature maps are obtained by sliding the micro networks over the input in a similar manner as CNN; they are then fed into the next layer. Deep NIN can be implemented by stacking mutiple of the above described structure. With enhanced local modeling via the micro network, we are able to utilize global average pooling over feature maps in the classification layer, which is easier to interpret and less prone to overfitting than traditional fully connected layers. We demonstrated the stateoftheart classification performances with NIN on CIFAR10 and CIFAR100, and reasonable performances on SVHN and MNIST datasets. GitXiv 
Network Mapping  Network mapping is the study of the physical connectivity of networks. Internet mapping is the study of the physical connectivity of the Internet. Network mapping discovers the devices on the network and their connectivity. It is not to be confused with network discovery or network enumerating which discovers devices on the network and their characteristics such as (operating system, open ports, listening network services, etc.). The field of automated network mapping has taken on greater importance as networks become more dynamic and complex in nature. 
Network Maximal Correlation (NMC) 
We introduce Network Maximal Correlation (NMC) as a multivariate measure of nonlinear association among random variables. NMC is defined via an optimization that infers (nontrivial) transformations of variables by maximizing aggregate inner products between transformed variables. We characterize a solution of the NMC optimization using geometric properties of Hilbert spaces for finite discrete and jointly Gaussian random variables. For finite discrete variables, we propose an algorithm based on alternating conditional expectation to determine NMC. We also show that empirically computed NMC converges to NMC exponentially fast in sample size. For jointly Gaussian variables, we show that under some conditions the NMC optimization is an instance of the MaxCut problem. We then illustrate an application of NMC and multiple MC in inference of graphical model for bijective, possibly nonmonotone, functions of jointly Gaussian variables generalizing the copula setup developed by Liu et al. Finally, we illustrate NMC’s utility in a real data application of learning nonlinear dependencies among genes in a cancer dataset. 
Network MetaAnalysis  I present methods for assessing the relative effectiveness of two treatments when they have not been compared directly in a randomized trial but have each been compared to other treatments. These network metaanalysis techniques allow estimation of both heterogeneity in the effect of any given treatment and inconsistency (‘incoherence’) in the evidence from different pairs of treatments. 
Network Scale Up Method (NSUM) 
The network scaleup method was developed by a team of researchers under grants from the U. S. National Science Foundation to H. Russell Bernard and Christopher McCarty at the University of Florida. The method can be applied now to estimating the size of hardtocount (or impossibletocount) populations but the method is a work in progress. Each new application provides data for improving the validity and accuracy of the estimates. As with the development of the model, these improvements require the efforts of survey researchers, mathematicians, and ethnographers. The network scaleup method was developed in conjunction with our team’s research on the rules governing who people know and how they know them. The particular list of people who people come to know in a lifetime may appear random, but the rules governing who we come to know are surely not random. One basic component of social structure is the number of people whom people know. NSUM 
Network Science  Network science is an interdisciplinary academic field which studies complex networks such as telecommunication networks, computer networks, biological networks, cognitive and semantic networks, and social networks. The field draws on theories and methods including graph theory from mathematics, statistical mechanics from physics, data mining and information visualization from computer science, inferential modeling from statistics, and social structure from sociology. The United States National Research Council defines network science as ‘the study of network representations of physical, biological, and social phenomena leading to predictive models of these phenomena.’ 
Network Sketching  Convolutional neural networks (CNNs) with deep architectures have substantially advanced the stateoftheart in computer vision tasks. However, deep networks are typically resourceintensive and thus difficult to be deployed on mobile devices. Recently, CNNs with binary weights have shown compelling efficiency to the community, whereas the accuracy of such models is usually unsatisfactory in practice. In this paper, we introduce network sketching as a novel technique of pursuing binaryweight CNNs, targeting at more faithful inference and better tradeoff for practical applications. Our basic idea is to exploit binary structure directly in pretrained filter banks and produce binaryweight models via tensor expansion. The whole process can be treated as a coarsetofine model approximation, akin to the pencil drawing steps of outlining and shading. To further speedup the generated models, namely the sketches, we also propose an associative implementation of binary tensor convolutions. Experimental results demonstrate that a proper sketch of AlexNet (or ResNet) outperforms the existing binaryweight models by large margins on the ImageNet large scale classification task, while the committed memory for network parameters only exceeds a little. 
Network Theory  In computer and network science, network theory is the study of graphs as a representation of either symmetric relations or, more generally, of asymmetric relations between discrete objects. Network theory is a part of graph theory. It has applications in many disciplines including statistical physics, particle physics, computer science, electrical engineering, biology, economics, operations research, and sociology. Applications of network theory include logistical networks, the World Wide Web, Internet, gene regulatory networks, metabolic networks, social networks, epistemological networks, etc; see List of network theory topics for more examples. Euler’s solution of the Seven Bridges of Königsberg problem is considered to be the first true proof in the theory of networks. 
Neural Block Sampling  Efficient Monte Carlo inference often requires manual construction of modelspecific proposals. We propose an approach to automated proposal construction by training neural networks to provide fast approximations to block Gibbs conditionals. The learned proposals generalize to occurrences of common structural motifs both within a given model and across models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no modelspecific training required. We explore several applications including openuniverse Gaussian mixture models, in which our learned proposals outperform a handtuned sampler, and a realworld named entity recognition task, in which our sampler’s ability to escape local modes yields higher final F1 scores than singlesite Gibbs. 
Neural Collaborative Filtering (NCF) 
In recent years, deep neural networks have yielded immense success on speech recognition, computer vision and natural language processing. However, the exploration of deep neural networks on recommender systems has received relatively less scrutiny. In this work, we strive to develop techniques based on neural networks to tackle the key problem in recommendation — collaborative filtering — on the basis of implicit feedback. Although some recent work has employed deep learning for recommendation, they primarily used it to model auxiliary information, such as textual descriptions of items and acoustic features of musics. When it comes to model the key factor in collaborative filtering — the interaction between user and item features, they still resorted to matrix factorization and applied an inner product on the latent features of users and items. By replacing the inner product with a neural architecture that can learn an arbitrary function from data, we present a general framework named NCF, short for Neural networkbased Collaborative Filtering. NCF is generic and can express and generalize matrix factorization under its framework. To supercharge NCF modelling with nonlinearities, we propose to leverage a multilayer perceptron to learn the useritem interaction function. Extensive experiments on two realworld datasets show significant improvements of our proposed NCF framework over the stateoftheart methods. Empirical evidence shows that using deeper layers of neural networks offers better recommendation performance. 
Neural Decision Trees  In this paper we propose a synergistic melting of neural networks and decision trees into a deep hashing neural network (HNN) having a modeling capability exponential with respect to its number of neurons. We first derive a soft decision tree named neural decision tree allowing the optimization of arbitrary decision function at each split node. We then rewrite this soft space partitioning as a new kind of neural network layer, namely the hashing layer (HL), which can be seen as a generalization of the known softmax layer. This HL can easily replace the standard last layer of ANN in any known network topology and thus can be used after a convolutional or recurrent neural network for example. We present the modeling capacity of this deep hashing function on small datasets where one can reach at least equally good results as standard neural networks by diminishing the number of output neurons. Finally, we show that for the case where the number of output neurons is large, the neural network can mitigate the absence of linear decision boundaries by learning for each difficult class a collection of not necessarily connected subregions of the space leading to more flexible decision surfaces. Finally, the HNN can be seen as a deep locality sensitive hashing function which can be trained in a supervised or unsupervised setting as we will demonstrate for classification and regression problems. 
Neural Decomposition (ND) 
We present a neural network technique for the analysis and extrapolation of timeseries data called Neural Decomposition (ND). Units with a sinusoidal activation function are used to perform a Fourierlike decomposition of training samples into a sum of sinusoids, augmented by units with nonperiodic activation functions to capture linear trends and other nonperiodic components. We show how careful weight initialization can be combined with regularization to form a simple model that generalizes well. Our method generalizes effectively on the MackeyGlass series, a dataset of unemployment rates as reported by the U.S. Department of Labor Statistics, a timeseries of monthly international airline passengers, the monthly ozone concentration in downtown Los Angeles, and an unevenly sampled timeseries of oxygen isotope measurements from a cave in north India. We find that ND outperforms popular timeseries forecasting techniques including LSTM, echo state networks, ARIMA, SARIMA, SVR with a radial basis function, and Gashler and Ashmore’s model. 
Neural Hawkes Process  Many events occur in the world. Some event types are stochastically excited or inhibited—in the sense of having their probabilities elevated or decreased—by patterns in the sequence of previous events. Discovering such patterns can help us predict which type of event will happen next and when. Learning such structure should benefit various applications, including medical prognosis, consumer behavior, and social media activity prediction. We propose to model streams of discrete events in continuous time, by constructing a neurally selfmodulating multivariate point process. This generative model allows past events to influence the future in complex ways, by conditioning future event intensities on the hidden state of a recurrent neural network that has consumed the stream of past events. We evaluate our model on multiple datasets and show that it significantly outperforms other strong baselines. 
Neural Machine Translation (NMT) 
Neural machine translation (NMT) is the approach to machine translation in which a large neural network is trained to maximize translation performance. It is a radical departure from the phrasebased statistical translation approaches, in which a translation system consists of subcomponents that are separately optimized. The artificial neural network (ANN) is a model inspired by the functional aspects and structure of the brain’s biological neural networks. With use of ANN, it is possible to execute a number of tasks, such as classification, clustering, and prediction, using machine learning techniques like supervised or reinforced learning to learn or adjust net connections. A bidirectional recurrent neural network (RNN), known as an encoder, is used by the neural network to encode a source sentence for a second RNN, known as a decoder, that is used to predict words in the target language. NMT models are inspired by deep representation learning. They require only a fraction of the memory needed by traditional statistical machine translation (SMT) models. Furthermore, unlike conventional translation systems, each and every component of the neural translation model is trained jointly to maximize the translation performance. When a new neural network is created, it is trained for certain domains or applications. Once an automatic learning mechanism is established, the network practices. With time it starts operating according to its own judgment, turning into an ‘expert’. 
Neural Networks / Artificial Neural Networks (ANN) 
In computer science and related fields, artificial neural networks (ANNs) are computational models inspired by an animal’s central nervous systems (in particular the brain) which is capable of machine learning as well as pattern recognition. Artificial neural networks are generally presented as systems of interconnected “neurons” which can compute values from inputs. neural,neuralnet 
Neural Programmer  Deep neural networks have achieved impressive supervised classification performance in many tasks including image recognition, speech recognition, and sequence to sequence learning. However, this success has not been translated to applications like question answering that may involve complex arithmetic and logic reasoning. A major limitation of these models is in their inability to learn even simple arithmetic and logic operations. For example, it has been shown that neural networks fail to learn to add two binary numbers reliably. In this work, we propose Neural Programmer, an endtoend differentiable neural network augmented with a small set of basic arithmetic and logic operations. Neural Programmer can call these augmented operations over several steps, thereby inducing compositional programs that are more complex than the builtin operations. The model learns from a weak supervision signal which is the result of execution of the correct program, hence it does not require expensive annotation of the correct program itself. The decisions of what operations to call, and what data segments to apply to are inferred by Neural Programmer. Such decisions, during training, are done in a differentiable fashion so that the entire network can be trained jointly by gradient descent. We find that training the model is difficult, but it can be greatly improved by adding random noise to the gradient. On a fairly complex synthetic tablecomprehension dataset, traditional recurrent networks and attentional models perform poorly while Neural Programmer typically obtains nearly perfect accuracy. 
Neural Reasoner  We propose Neural Reasoner, a framework for neural networkbased reasoning over natural language sentences. Given a question, Neural Reasoner can infer over multiple supporting facts and find an answer to the question in specific forms. Neural Reasoner has 1) a specific interactionpooling mechanism, allowing it to examine multiple facts, and 2) a deep architecture, allowing it to model the complicated logical relations in reasoning tasks. Assuming no particular structure exists in the question and facts, Neural Reasoner is able to accommodate different types of reasoning and different forms of language expressions. Despite the model complexity, Neural Reasoner can still be trained effectively in an endtoend manner. Our empirical studies show that Neural Reasoner can outperform existing neural reasoning systems with remarkable margins on two difficult artificial tasks (Positional Reasoning and Path Finding) proposed in. For example, it improves the accuracy on Path Finding(10K) from 33.4% to over 98%. 
Neural Semantic Encoders (NSE) 
We present a memory augmented neural network for natural language understanding: Neural Semantic Encoders (NSE). NSE has a variable sized encoding memory that evolves over time and maintains the understanding of input sequences through read, compose and write operations. NSE can access multiple and shared memories depending on the complexity of a task. We demonstrated the effectiveness and the flexibility of NSE on five different natural language tasks, natural language inference, question answering, sentence classification, document sentiment analysis and machine translation where NSE achieved stateoftheart performance when evaluated on publically available benchmarks. For example, our sharedmemory model showed an encouraging result on neural machine translation, improving an attentionbased baseline by approximately 1.0 BLEU. 
Neural SLAM  We present an approach for agents to learn representations of a global map from sensor data, to aid their exploration in new environments. To achieve this, we embed procedures mimicking that of traditional Simultaneous Localization and Mapping (SLAM) into the soft attention based addressing of external memory architectures, in which the external memory acts as an internal representation of the environment. This structure encourages the evolution of SLAMlike behaviors inside a completely differentiable deep neural network. We show that this approach can help reinforcement learning agents to successfully explore new environments where longterm memory is essential. We validate our approach in both challenging gridworld environments and preliminary Gazebo experiments. 
Neural Style Transfer  The recent work of Gatys et al. demonstrated the power of Convolutional Neural Networks (CNN) in creating artistic fantastic imagery by separating and recombing the image content and style. This process of using CNN to migrate the semantic content of one image to different styles is referred to as Neural Style Transfer. 
Neural Tensor Network (NTN) 
The Neural Tensor Network (NTN) replaces a standard linear neural network layer with a bilinear tensor layer that directly relates two entity vectors across multiple dimensions. The model computes a score of how likely it is that two entities are in a certain relationship. 
Neural Turing Machines (NTM) 
We extend the capabilities of neural networks by coupling them to external memory resources, which they can interact with by attentional processes. The combined system is analogous to a Turing Machine or Von Neumann architecture but is differentiable endtoend, allowing it to be efficiently trained with gradient descent. Preliminary results demonstrate that Neural Turing Machines can infer simple algorithms such as copying, sorting, and associative recall from input and output examples. Neural Turing Machines are fully differentiable computers that use backpropagation to learn their own programming. 
Neural Vector Space Model (NVSM) 
We propose the Neural Vector Space Model (NVSM), a method that learns representations of documents in an unsupervised manner for news article retrieval. In the NVSM paradigm, we learn lowdimensional representations of words and documents from scratch using gradient descent and rank documents according to their similarity with query representations that are composed from word representations. We show that NVSM performs better at document ranking than existing latent semantic vector space methods. The addition of NVSM to a mixture of lexical language models and a stateoftheart baseline vector space model yields a statistically significant increase in retrieval effectiveness. Consequently, NVSM adds a complementary relevance signal. Next to semantic matching, we find that NVSM performs well in cases where lexical matching is needed. NVSM learns a notion of term specificity directly from the document collection without feature engineering. We also show that NVSM learns regularities related to Luhn significance. Finally, we give advice on how to deploy NVSM in situations where model selection (e.g., crossvalidation) is infeasible. We find that an unsupervised ensemble of multiple models trained with different hyperparameter values performs better than a single crossvalidated model. Therefore, NVSM can safely be used for ranking documents without supervised relevance judgments. 
NeuroFuzzy  In the field of artificial intelligence, neurofuzzy refers to combinations of artificial neural networks and fuzzy logic. Neurofuzzy was proposed by J. S. R. Jang. Neurofuzzy hybridization results in a hybrid intelligent system that synergizes these two techniques by combining the humanlike reasoning style of fuzzy systems with the learning and connectionist structure of neural networks. Neurofuzzy hybridization is widely termed as Fuzzy Neural Network (FNN) or NeuroFuzzy System (NFS) in the literature. Neurofuzzy system (the more popular term is used henceforth) incorporates the humanlike reasoning style of fuzzy systems through the use of fuzzy sets and a linguistic model consisting of a set of IFTHEN fuzzy rules. The main strength of neurofuzzy systems is that they are universal approximators with the ability to solicit interpretable IFTHEN rules. The strength of neurofuzzy systems involves two contradictory requirements in fuzzy modeling: interpretability versus accuracy. In practice, one of the two properties prevails. The neurofuzzy in fuzzy modeling research field is divided into two areas: linguistic fuzzy modeling that is focused on interpretability, mainly the Mamdani model; and precise fuzzy modeling that is focused on accuracy, mainly the TakagiSugenoKang (TSK) model. Although generally assumed to be the realization of a fuzzy system through connectionist networks, this term is also used to describe some other configurations including: • Deriving fuzzy rules from trained RBF networks. • Fuzzy logic based tuning of neural network training parameters. • Fuzzy logic criteria for increasing a network size. • Realising fuzzy membership function through clustering algorithms in unsupervised learning in SOMs and neural networks. • Representing fuzzification, fuzzy inference and defuzzification through multilayers feedforward connectionist networks. It must be pointed out that interpretability of the Mamdanitype neurofuzzy systems can be lost. To improve the interpretability of neurofuzzy systems, certain measures must be taken, wherein important aspects of interpretability of neurofuzzy systems are also discussed. A recent research line addresses the data stream mining case, where neurofuzzy systems are sequentially updated with new incoming samples on demand and onthefly. Thereby, system updates do not only include a recursive adaptation of model parameters, but also a dynamic evolution and pruning of model components (neurons, rules), in order to handle concept drift and dynamically changing system behavior adequately and to keep the systems/models ‘uptodate’ anytime. Comprehensive surveys of various evolving neurofuzzy systems approaches can be found in and. frbs 
NeuroFuzzy System  Modern neurofuzzy systems are usually represented as special multilayer feedforward neural networks (see for example models like ANFIS , FuNe , Fuzzy RuleNet , GARIC , or NEFCLASS and NEFCON ). However, fuzzifications of other neural network architectures are also considered, for example selforganizing feature maps. In those neurofuzzy networks, connection weights and propagation and activation functions differ from common neural networks. Although there are a lot of different approaches , we usually use the term neuro–fuzzy system for approaches which display the following properties: • A neurofuzzy system is based on a fuzzy system which is trained by a learning algorithm derived from neural network theory. The (heuristical) learning procedure operates on local information, and causes only local modifications in the underlying fuzzy system. • A neurofuzzy system can be viewed as a 3layer feedforward neural network. The first layer represents input variables, the middle (hidden) layer represents fuzzy rules and the third layer represents output variables. Fuzzy sets are encoded as (fuzzy) connection weights. It is not necessary to represent a fuzzy system like this to apply a learning algorithm to it. However, it can be convenient, because it represents the data flow of input processing and learning within the model. Remark: Sometimes a 5layer architecture is used, where the fuzzy sets are represented in the units of the second and fourth layer. • A neurofuzzy system can be always (i.e.\ before, during and after learning) interpreted as a system of fuzzy rules. It is also possible to create the system out of training data from scratch, as it is possible to initialize it by prior knowledge in form of fuzzy rules. Remark: Not all neurofuzzy models specifiy learning procedures for fuzzy rule creation. • The learning procedure of a neurofuzzy system takes the semantical properties of the underlying fuzzy system into account. This results in constraints on the possible modifications applicable to the system parameters. Remark: Not all neurofuzzy approaches have this property. • A neurofuzzy system approximates an $n$dimensional (unknown) function that is partially defined by the training data. The fuzzy rules encoded within the system represent vague samples, and can be viewed as prototypes of the training data. A neurofuzzy system should not be seen as a kind of (fuzzy) expert system, and it has nothing to do with fuzzy logic in the narrow sense. frbs 
NeuroIndex  The article describes a new data structure called neuroindex. It is an alternative to wellknown file indexes. The neuroindex is fundamentally different because it stores weight coefficients in neural network. It is not a reference type like ‘keywordposition in a file’. 
Neuroinformatics  Neuroinformatics is a research field concerned with the organization of neuroscience data by the application of computational models and analytical tools. These areas of research are important for the integration and analysis of increasingly largevolume, highdimensional, and finegrain experimental data. Neuroinformaticians provide computational tools, mathematical models, and create interoperable databases for clinicians and research scientists. Neuroscience is a heterogeneous field, consisting of many and various subdisciplines (e.g., Cognitive Psychology, Behavioral Neuroscience, and Behavioral Genetics). In order for our understanding of the brain to continue to deepen, it is necessary that these subdisciplines are able to share data and findings in a meaningful way; Neuroinformaticians facilitate this. Neuroinformatics stands at the intersection of neuroscience and information science. Other fields, like genomics, have demonstrated the effectiveness of freelydistributed databases and the application of theoretical and computational models for solving complex problems. In Neuroinformatics, such facilities allow researchers to more easily quantitatively confirm their working theories by computational modeling. Additionally, neuroinformatics fosters collaborative research—an important fact that facilitates the field’s interest in studying the multilevel complexity of the brain. There are three main directions where neuroinformatics has to be applied: 1. the development of tools and databases for management and sharing of neuroscience data at all levels of analysis, 2. the development of tools for analyzing and modeling neuroscience data, 3. the development of computational models of the nervous system and neural processes. 
Newick Format  In mathematics, Newick tree format (or Newick notation or New Hampshire tree format) is a way of representing graphtheoretical trees with edge lengths using parentheses and commas. It was adopted by James Archie, William H. E. Day, Joseph Felsenstein, Wayne Maddison, Christopher Meacham, F. James Rohlf, and David Swofford, at two meetings in 1986, the second of which was at Newick’s restaurant in Dover, New Hampshire, US. The adopted format is a generalization of the format developed by Meacham in 1984 for the first treedrawing programs in Felsenstein’s PHYLIP package. ggtree 
NewSQL  NewSQL is a class of modern relational database management systems that seek to provide the same scalable performance of NoSQL systems for online transaction processing (OLTP) readwrite workloads while still maintaining the ACID guarantees of a traditional database system. 
NeymanPearson Classification  
NGram  In the fields of computational linguistics and probability, an ngram is a contiguous sequence of n items from a given sequence of text or speech. The items can be phonemes, syllables, letters, words or base pairs according to the application. The ngrams typically are collected from a text or speech corpus. An ngram of size 1 is referred to as a “unigram”; size 2 is a “bigram” (or, less commonly, a “digram”); size 3 is a “trigram”. Larger sizes are sometimes referred to by the value of n, e.g., “fourgram”, “fivegram”, and so on. 
Niching  Simply put, niching is a class of methods that try to converge to more than one solution during a single run. Niching is the idea of segmenting the population of the GA into disjoint sets, intended so that you have at least one member in each region of the fitness function that is ‘interesting’; generally by this we mean that you cover more than one local optima. Algorithm of the Week: Niching in Genetic Algorithms 
No Free Lunch Theorem (NFL) 
In mathematical folklore, the ‘no free lunch’ theorem (sometimes pluralized) of David Wolpert and William Macready appears in the 1997 ‘No Free Lunch Theorems for Optimization’. Wolpert had previously derived no free lunch theorems for machine learning (statistical inference). In 2005, Wolpert and Macready themselves indicated that the first theorem in their paper ‘state that any two optimization algorithms are equivalent when their performance is averaged across all possible problems’. The 1997 theorems of Wolpert and Macready are mathematically technicaland some find them unintuitive. The folkloric ‘no free lunch’ (NFL) theorem is an easily stated and easily understood consequence of theorems Wolpert and Macready actually prove. It is weaker than the proven theorems, and thus does not encapsulate them. Various investigators have extended the work of Wolpert and Macready substantively. http://…/No_free_lunch_in_search_and_optimization 
Node Link Diagram  Graphs are frequently drawn as nodelink diagrams in which the vertices are represented as disks, boxes, or textual labels and the edges are represented as line segments, polylines, or curves in the Euclidean plane. Nodelink diagrams can be traced back to the 13th century work of Ramon Llull, who drew diagrams of this type for complete graphs in order to analyze all pairwise combinations among sets of metaphysical concepts. 
NoisyNet  We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent’s policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and $\epsilon$greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to superhuman performance. 
Nomogram  A nomogram, also called a nomograph, alignment chart or abaque, is a graphical calculating device, a twodimensional diagram designed to allow the approximate graphical computation of a function. The field of nomography was invented in 1884 by the French engineer Philbert Maurice d’Ocagne (18621938) and used extensively for many years to provide engineers with fast graphical calculations of complicated formulas to a practical precision. Nomograms use a parallel coordinate system invented by d’Ocagne rather than standard Cartesian coordinates. A nomogram consists of a set of n scales, one for each variable in an equation. Knowing the values of n1 variables, the value of the unknown variable can be found, or by fixing the values of some variables, the relationship between the unfixed ones can be studied. The result is obtained by laying a straightedge across the known values on the scales and reading the unknown value from where it crosses the scale for that variable. The virtual or drawn line created by the straightedge is called an index line or isopleth. 
Nonconvex Conditional Gradient Sliding (NCGS) 
We investigate a projection free method, namely conditional gradient sliding on batched, stochastic and finitesum nonconvex problem. CGS is a smart combination of Nesterov’s accelerated gradient method and FrankWolfe (FW) method, and outperforms FW in the convex setting by saving gradient computations. However, the study of CGS in the nonconvex setting is limited. In this paper, we propose the nonconvex conditional gradient sliding (NCGS) which surpasses the nonconvex FrankWolfe method in batched, stochastic and finitesum setting. 
NonHomogeneous Markov Switching Autoregressive Models (MSAR) 
In this paper, nonhomogeneous MarkovSwitching Autoregressive (MSAR) models are proposed to describe wind time series. In these models, several au toregressive models are used to describe the time evolution of the wind speed and the switching between these different models is controlled by a hidden Markov chain which represents the weather types. We first block the data by month in order to remove seasonal components and propose a MSAR model with nonhomogeneous autoregressive models to describe daily components. Then we discuss extensions where the hidden Markov chain is also nonstationary to handle seasonal and interannual fluctuations. NHMSAR 
Nonlinear Dimensionality Reduction (NLDR) 
Highdimensional data, meaning data that requires more than two or three dimensions to represent, can be difficult to interpret. One approach to simplification is to assume that the data of interest lie on an embedded nonlinear manifold within the higherdimensional space. If the manifold is of low enough dimension, the data can be visualised in the lowdimensional space. Topleft: a 3D dataset of 1000 points in a spiraling band (a.k.a. the Swiss roll) with a rectangular hole in the middle. Topright: the original 2D manifold used to generate the 3D dataset. Bottom left and right: 2D recoveries of the manifold respectively using the LLE and Hessian LLE algorithms as implemented by the Modular Data Processing toolkit. Below is a summary of some of the important algorithms from the history of manifold learning and nonlinear dimensionality reduction (NLDR). Many of these nonlinear dimensionality reduction methods are related to the linear methods listed below. Nonlinear methods can be broadly classified into two groups: those that provide a mapping (either from the highdimensional space to the lowdimensional embedding or vice versa), and those that just give a visualisation. In the context of machine learning, mapping methods may be viewed as a preliminary feature extraction step, after which pattern recognition algorithms are applied. Typically those that just give a visualisation are based on proximity data – that is, distance measurements. 
Nonlinear expectation  In probability theory, a nonlinear expectation is a nonlinear generalization of the expectation. Nonlinear expectations are useful in utility theory as they more closely match human behavior than traditional expectations. 
Nonlinear Iterative Partial Least Squares (NIPALS) 
In statistics, nonlinear iterative partial least squares (NIPALS) is an algorithm for computing the first few components in a principal component or partial least squares analysis. For veryhighdimensional datasets, such as those generated in the ‘omics sciences (e.g., genomics, metabolomics) it is usually only necessary to compute the first few principal components. The nonlinear iterative partial least squares (NIPALS) algorithm calculates t1 and p1′ from X. The outer product, t1p1’ can then be subtracted from X leaving the residual matrix E1. This can be then used to calculate subsequent principal components. This results in a dramatic reduction in computational time since calculation of the covariance matrix is avoided. 
Nonmetric MultiDimensional Scaling (NMDS) 
Nonmetric multidimensional scaling (MDS, also NMDS and NMS) is an ordination technique that differs in several ways from nearly all other ordination methods. In most ordination methods, many axes are calculated, but only a few are viewed, owing to graphical limitations. In MDS, a small number of axes are explicitly chosen prior to the analysis and the data are fitted to those dimensions; there are no hidden axes of variation. Second, most other ordination methods are analytical and therefore result in a single unique solution to a set of data. In contrast, MDS is a numerical technique that iteratively seeks a solution and stops computation when an acceptable solution has been found, or it stops after some prespecified number of attempts. As a result, an MDS ordination is not a unique solution and a subsequent MDS analysis on the same set of data and following the same methodology will likely result in a somewhat different ordination. Third, MDS is not an eigenvalueeigenvector technique like principal components analysis or correspondence analysis that ordinates the data such that axis 1 explains the greatest amount of variance, axis 2 explains the next greatest amount of variance, and so on. As a result, an MDS ordination can be rotated, inverted, or centered to any desired configuration. 
Nonnegative Matrix Factorization (NMF) 
Nonnegative matrix factorization (NMF), also nonnegative matrix approximation is a group of algorithms in multivariate analysis and linear algebra where a matrix V is factorized into (usually) two matrices W and H, with the property that all three matrices have no negative elements. This nonnegativity makes the resulting matrices easier to inspect. Since the problem is not exactly solvable in general, it is commonly approximated numerically. NMF finds applications in such fields as computer vision, document clustering, chemometrics and recommender systems. NMF 
Nonparametric Canonical Correlation Analysis (NCCA) 
Canonical correlation analysis (CCA) is a fundamental technique in multiview data analysis and representation learning. Several nonlinear extensions of the classical linear CCA method have been proposed, including kernel and deep neural network methods. These approaches restrict attention to certain families of nonlinear projections, which the user must specify (by choosing a kernel or a neural network architecture), and are computationally demanding. Interestingly, the theory of nonlinear CCA without any functional restrictions, has been studied in the population setting by Lancaster already in the 50’s. However, these results, have not inspired practical algorithms. In this paper, we revisit Lancaster’s theory, and use it to devise a practical algorithm for nonparametric CCA (NCCA). Specifically, we show that the most correlated nonlinear projections of two random vectors can be expressed in terms of the singular value decomposition of a certain operator associated with their joint density. Thus, by estimating the population density from data, NCCA reduces to solving an eigenvalue system, superficially like kernel CCA but, importantly, without having to compute the inverse of any kernel matrix. We also derive a partially linear CCA (PLCCA) variant in which one of the views undergoes a linear projection while the other is nonparametric. PLCCA turns out to have a similar form to the classical linear CCA, but with a nonparametric regression term replacing the linear regression in CCA. Using a kernel density estimate based on a small number of nearest neighbors, our NCCA and PLCCA algorithms are memoryefficient, often run much faster, and achieve better performance than kernel CCA and comparable performance to deep CCA. 
NonParametric Generalized Linear Model (NPGLM) 
In this paper, we try to solve the problem of temporal link prediction in information networks. This implies predicting the time it takes for a link to appear in the future, given its features that have been extracted at the current network snapshot. To this end, we introduce a probabilistic nonparametric approach, called ‘NonParametric Generalized Linear Model’ (NPGLM), which infers the hidden underlying probability distribution of the link advent time given its features. We then present a learning algorithm for NPGLM and an inference method to answer timerelated queries. Extensive experiments conducted on both synthetic data and realworld Sina Weibo social network demonstrate the effectiveness of NPGLM in solving temporal link prediction problem visavis competitive baselines. 
NonResponse Bias  Nonresponse bias occurs in statistical surveys if the answers of respondents differ from the potential answers of those who did not answer. 
Nonstationary Stochastic Processes  A stochastic process (a collection of random variables ordered in time, e.g. GDP(t)) is said to be (weakly) stationary if its mean and variance are constant over time, i.e. time invariant (along with its autocovariance). Such a time series will tend to return to its mean (mean reversion) and fluctuations around this mean will have a broadly constant amplitude. Alternatively, a stationary process will not drift too far away from its mean value because of the nite variance. By contrast, a nonstationary time series will have a timevarying mean or a timevarying variance or both. lmenssp 
NonUniform Fast Fourier Transform (NUFFT) 
Fourier analysis plays a natural role in a wide variety of applications, from medical imaging to radio astronomy, data analysis and the numerical solution of partial differential equations. When the sampling is uniform and the Fourier transform is desired at equispaced frequencies, the classical fast Fourier transform (FFT) has played a fundamental role in computation. The FFT requires O(N log N) work to compute N Fourier modes from N data points rather than O(N2) work. When the data is irregular in either the ‘physical’ or ‘frequency’ domain, unfortunately, the FFT does not apply. Over the last twenty years, a number of algorithms have been developed to overcome this limitation – generally referred to as nonuniform FFTs (NUFFT), nonequispaced FFTs (NFFT) or unequallyspaced FFTs (USFFT). They achieve the same O(N log N) computational complexity, but with a larger, precisiondependent, and dimensiondependent constant. http://…/glee_nufft_sirev.pdf https://…/optimizingpythonwithnumpyandnumba 
Norm  In linear algebra, functional analysis and related areas of mathematics, a norm is a function that assigns a strictly positive length or size to each vector in a vector space – save possibly for the zero vector, which is assigned a length of zero. A seminorm, on the other hand, is allowed to assign zero length to some nonzero vectors (in addition to the zero vector). A norm must also satisfy certain properties pertaining to scalability and additivity which are given in the formal definition below. A simple example is the 2dimensional Euclidean space R2 equipped with the Euclidean norm. Elements in this vector space (e.g., (3, 7)) are usually drawn as arrows in a 2dimensional cartesian coordinate system starting at the origin (0, 0). The Euclidean norm assigns to each vector the length of its arrow. Because of this, the Euclidean norm is often known as the magnitude. A vector space on which a norm is defined is called a normed vector space. Similarly, a vector space with a seminorm is called a seminormed vector space. It is often possible to supply a norm for a given vector space in more than one way. 
Normalization  In statistics and applications of statistics, normalization can have a range of meanings. In the simplest cases, normalization of ratings means adjusting values measured on different scales to a notionally common scale, often prior to averaging. In more complicated cases, normalization may refer to more sophisticated adjustments where the intention is to bring the entire probability distributions of adjusted values into alignment. In the case of normalization of scores in educational assessment, there may be an intention to align distributions to a normal distribution. A different approach to normalization of probability distributions is quantile normalization, where the quantiles of the different measures are brought into alignment. 
Normalized Mutual Information (NMI) 
NMI 
Normalized Nonnegative Models (NNM) 
We introduce normalized nonnegative models (NNM) for explorative data analysis. NNMs are partial convexifications of models from probability theory. We demonstrate their value at the example of item recommendation. We show that NNMbased recommender systems satisfy three criteria that all recommender systems should ideally satisfy: high predictive power, computational tractability, and expressive representations of users and items. Expressive user and item representations are important in practice to succinctly summarize the pool of customers and the pool of items. In NNMs, user representations are expressive because each user’s preference can be regarded as normalized mixture of preferences of stereotypical users. The interpretability of item and user representations allow us to arrange properties of items (e.g., genres of movies or topics of documents) or users (e.g., personality traits) hierarchically. 
Not only SQL (NoSQL) 
A NoSQL or Not Only SQL database provides a mechanism for storage and retrieval of data that is modeled in means other than the tabular relations used in relational databases. Motivations for this approach include simplicity of design, horizontal scaling and finer control over availability. The data structure (e.g. keyvalue, graph, or document) differs from the RDBMS, and therefore some operations are faster in NoSQL and some in RDBMS. There are differences though and the particular suitability of a given NoSQL DB depends on the problem to be solved (e.g. does the solution use graph algorithms?). The appearance of mature NoSQL databases has reduced the rationale for Java content repository (JCR) implementations. NoSQL databases are finding significant and growing industry use in big data and realtime web applications. NoSQL systems are also referred to as “Not only SQL” to emphasize that they may in fact allow SQLlike query languages to be used. Many NoSQL stores compromise consistency (in the sense of the CAP theorem) in favor of availability and partition tolerance. Barriers to the greater adoption of NoSQL stores include the use of lowlevel query languages, the lack of standardized interfaces, and the huge investments already made in SQL by enterprises. Most NoSQL stores lack true ACID transactions, although a few recent systems, such as FairCom ctreeACE, Google Spanner and FoundationDB, have made them central to their designs. 
Novel Data Streams (NDS) 
We define NDS as those data streams whose content is initiated directly by the user (patient) themselves. This would exclude data sources such as electronic health records, disease registries, vital statistics, electronic lab reporting, emergency department visits, ambulance call data, school absenteeism, prescription pharmacy sales, serology, amongst others. Although ready access to aggregated information from these excluded sources is novel in many health settings, our focus here is on those streams which are both directly initiated by the user and also not alreadymaintained by public health departments or other health professionals. Despite this more narrow definition our suggestions for improving NDS surveillancemay also be applicable to more established surveillance systems, participatory systems (e.g., Flu Near You, influenzaNet) , and new data streams aggregated from established systems, such as Biosense and ISDS DiSTRIBuTE network. While much of the recent focus on using NDS for disease surveillance has centered on Internet search queries andTwitter posts , there aremanyNDS outside of these two sources.Our aim therefore is to provide a general framework for enhancing and developing NDS surveillance systems, which applies to more than just search data and Tweets. At aminimum, our definition ofNDS would include Internet search data and socialmedia, such as Google searches, Google Plus, Facebook, and Twitter posts, as well asWikipedia access logs, restaurant reservation and review logs, nonprescription pharmacy sales, news source scraping , and prediction markets. 
Novel Integration of the Sample and Thresholded covariance estimators (NOVELIST) 
We propose a ‘NOVEL Integration of the Sample and Thresholded covariance estimators’ (NOVELIST) to estimate the large covariance (correlation) and precision matrix. NOVELIST performs shrinkage of the sample covariance (correlation) towards its thresholded version. The sample covariance (correlation) component is nonsparse and can be lowrank in high dimensions. The thresholded sample covariance (correlation) component is sparse, and its addition ensures the stable invertibility of NOVELIST. The bene ts of the NOVELIST estimator include simplicity, ease of implementation, computational e ciency and the fact that its application avoids eigenanalysis. We obtain an explicit convergence rate in the operator norm over a large class of covariance (correlation) matrices when the dimension p and the sample size n satisfy log(p/n) > 0. In empirical comparisons with several popular estimators, the NOVELIST estimator in which the amount of shrinkage and thresholding is chosen by crossvalidation performs well in estimating covariance and precision matrices over a wide range of models and sparsity classes. http://…/poster_NOVELIST_Sept2014.pdf novelist 
Novelty Detection  Novelty detection is the identification of new or unknown data that a machine learning system has not been trained with and was not previously aware of, with the help of either statistical or machine learning based approaches. Novelty detection is one of the fundamental requirements of a good classification system. A machine learning system can never be trained with all the possible object classes and hence the performance of the network will be poor for those classes that are underrepresented in the training set. A good classification system must have the ability to differentiate between known and unknown objects during testing. For this purpose, different models for novelty detection have been proposed. Novelty detection is a hard problem in machine learning since it depends on the statistics of the already known information. A generally applicable, parameterfree method for outlier detection in a highdimensional space is not yet known. Novelty detection finds a variety of applications especially in signal processing, computer vision, pattern recognition, data mining and robotics. Another important application is the detection of a disease or potential fault whose class may be underrepresented in the training set. The statistical approaches to novelty detection may be classified into parametric and nonparametric approaches. Parametric approaches assume a specific statistical distribution (such as a Gaussian distribution) of data and statistical modeling based on data mean and covariance, whereas nonparametric approaches do not make any assumption on the statistical properties of data. http://…/mlsp09a.pdf http://…/mlsp09b.pdf http://…i=10.1.1.3.3578&rep=rep1&type=pdf http://…/smola09a.pdf http://…/karkaliwise2013.pdf 
Null Hypothesis Significance Testing (NHST) 
Null Hypothesis Significance Testing (NHST) is a statistical method for testing whether the factor we are talking about has the effect on our observation. For example, a t test or an ANOVA test for comparing the means is a good example of NHST. It probably the most common statistical testing used in HCI. http://…/hypothesistestingisonlymostly.html 
NullHop  Convolutional neural networks (CNNs) have become the dominant neural network architecture for solving many stateoftheart (SOA) visual processing tasks. Even though Graphical Processing Units (GPUs) are most often used in training and deploying CNNs, their power consumption becomes a problem for real time mobile applications. We propose a flexible and efficient CNN accelerator architecture which can support the implementation of SOA CNNs in lowpower and lowlatency application scenarios. This architecture exploits the sparsity of neuron activations in CNNs to accelerate the computation and reduce memory requirements. The flexible architecture allows high utilization of available computing resources across a wide range of convolutional network kernel sizes; and numbers of input and output feature maps. We implemented the proposed architecture on an FPGA platform and present results showing how our implementation reduces external memory transfers and compute time in five different CNNs ranging from small ones up to the widely known large VGG16 and VGG19 CNNs. We show how in RTL simulations in a 28nm process with a clock frequency of 500MHz, the NullHop core is able to reach over 450 GOp/s and efficiency of 368%, maintaining over 98% utilization of the MAC units and achieving a power efficiency of over 3TOp/s/W in a core area of 5.8mm2 
Numenta Anomaly Benchmark (NAB) 
Much of the world’s data is streaming, timeseries data, where anomalies give significant information in critical situations; examples abound in domains such as finance, IT, security, medical, and energy. Yet detecting anomalies in streaming data is a difficult task, requiring detectors to process data in realtime, not batches, and learn while simultaneously making predictions. There are no benchmarks to adequately test and score the efficacy of realtime anomaly detectors. Here we propose the Numenta Anomaly Benchmark (NAB), which attempts to provide a controlled and repeatable environment of opensource tools to test and measure anomaly detection algorithms on streaming data. The perfect detector would detect all anomalies as soon as possible, trigger no false alarms, work with realworld timeseries data across a variety of domains, and automatically adapt to changing statistics. Rewarding these characteristics is formalized in NAB, using a scoring algorithm designed for streaming data. NAB evaluates detectors on a benchmark dataset with labeled, realworld timeseries data. We present these components, and give results and analyses for several open source, commerciallyused algorithms. The goal for NAB is to provide a standard, open source framework with which the research community can compare and evaluate different algorithms for detecting anomalies in streaming data. 
Numerical Formal Concept Analysis (nFCA) 
Numerical Formal Concept Analysis (nFCA) technique: Formal Concept Analysis (FCA) is a powerful method in computer science (CS) for identifying overall inherent structures within and between the row and column variables (called objects and attributes in CS) of a binary data set. It is a bit like lifting up the overall hierarchical structure of a forest from a superposition based on simple local information, ie. pairwise relationships between variables of the data. The objective of nFCA is to combine FCA and statistics to translate what an FCA can offer for binary data to numerical data. The end product of our nFCA is a pair of nFCA graphs, where the Hgraph is a clustered lattice graph indicating inherent hierarchical and clustered relations and the Igraph is a complementary tree plot indicating the strength and directions of each of the relations and additional network relationships. The nFCA performs better than the conventional hierarchical clustering methods in terms of the Cophenetic correlation coefficient and the relational structure. nFCA 
Numerical Template Toolbox (NT2) 
The Numerical Template Toolbox (NT2) is an Open Source C++ library aimed at simplifying the development, debugging and optimization of highperformance computing applications by providing a Matlab like syntax that eases the transition between prototype and actual application. RcppNT2 
nutsflow/ml  Data preprocessing is a fundamental part of any machine learning application and frequently the most timeconsuming aspect when developing a machine learning solution. Preprocessing for deep learning is characterized by pipelines that lazily load data and perform data transformation, augmentation, batching and logging. Many of these functions are common across applications but require different arrangements for training, testing or inference. Here we introduce a novel software framework named nutsflow/ml that encapsulates common preprocessing operations as components, which can be flexibly arranged to rapidly construct efficient preprocessing pipelines for deep learning. 
NVIDIA Deep Learning GPU Training System (DIGITS) 
The NVIDIA Deep Learning GPU Training System (DIGITS) puts the power of deep learning in the hands of data scientists and researchers. Quickly design the best deep neural network (DNN) for your data using realtime network behavior visualization. Best of all, DIGITS is a complete system so you don’t have to write any code. Get started with DIGITS in under an hour. 
NyquistShannon Sampling Theorem  In the field of digital signal processing, the sampling theorem is a fundamental bridge between continuoustime signals (often called ‘analog signals’) and discretetime signals (often called ‘digital signals’). It establishes a sufficient condition between a signal’s bandwidth and the sample rate that permits a discrete sequence of samples to capture all the information from the continuoustime signal. Strictly speaking, the theorem only applies to a class of mathematical functions having a Fourier transform that is zero outside of a finite region of frequencies. Intuitively we expect that when one reduces a continuous function to a discrete sequence and interpolates back to a continuous function, the fidelity of the result depends on the density (or sample rate) of the original samples. The sampling theorem introduces the concept of a sample rate that is sufficient for perfect fidelity for the class of functions that are bandlimited to a given bandwidth, such that no actual information is lost in the sampling process. It expresses the sufficient sample rate in terms of the bandwidth for the class of functions. The theorem also leads to a formula for perfectly reconstructing the original continuoustime function from the samples. Perfect reconstruction may still be possible when the samplerate criterion is not satisfied, provided other constraints on the signal are known. (See § Sampling of nonbaseband signals below, and Compressed sensing.) The name NyquistShannon sampling theorem honors Harry Nyquist and Claude Shannon. The theorem was also discovered independently by E. T. Whittaker, by Vladimir Kotelnikov, and by others. So it is also known by the names NyquistShannonKotelnikov, WhittakerShannonKotelnikov, WhittakerNyquistKotelnikovShannon, and cardinal theorem of interpolation. http://…Nyquist%E2%80%93Shannonsamplingtheorem 
Advertisements