Pabon Lasso  Pabon Lasso is a graphical method for monitoring the efficiency of different wards of a hospital or different hospitals.Pabon Lasso graph is divided into 4 parts which are created after drawing the average of BTR and BOR. The part in the leftdown side is Zone I, leftup side is Zone II, Rightup side part is Zone III and the last part is Zone IV. PabonLasso 
Pachinko Allocation Machine (PAM) 
➘ “Pachinko Allocation Model” Variational Inference In Pachinko Allocation Machines 
Pachinko Allocation Model (PAM) 
In machine learning and natural language processing, the pachinko allocation model (PAM) is a topic model. Topic models are a suite of algorithms to uncover the hidden thematic structure of a collection of documents. The algorithm improves upon earlier topic models such as latent Dirichlet allocation (LDA) by modeling correlations between topics in addition to the word correlations which constitute topics. PAM provides more flexibility and greater expressive power than latent Dirichlet allocation. While first described and implemented in the context of natural language processing, the algorithm may have applications in other fields such as bioinformatics. The model is named for pachinko machines – a game popular in Japan, in which metal balls bounce down around a complex collection of pins until they land in various bins at the bottom. http://…/pamicml06.pdf 
Pachinkogram  Conditional Probabilities Visualisation 
Pachyderm  MapReduce without Hadoop Analyze massive datasets with Docker: Pachyderm is an open source MapReduce engine that uses Docker containers for distributed computations. Pachyderm is a completely new MapReduce engine built on top of modern tools. The biggest benefit of starting from scratch is that we get to leverage amazing advances in open source infrastructure, such as Docker and CoreOS. https://…/pfs Replacing Hadoop 
Packet Capture (pcap) 
In the field of computer network administration, pcap (packet capture) consists of an application programming interface (API) for capturing network traffic. Unixlike systems implement pcap in the libpcap library; Windows uses a port of libpcap known as WinPcap. Monitoring software may use libpcap and/or WinPcap to capture packets travelling over a network and, in newer versions, to transmit packets on a network at the link layer, as well as to get a list of network interfaces for possible use with libpcap or WinPcap. The pcap API is written in C, so other languages such as Java, .NET languages, and scripting languages generally use a wrapper; no such wrappers are provided by libpcap or WinPcap itself. C++ programs may link directly to the C API or use an objectoriented wrapper. 
Packing (PacGAN) 
Generative adversarial networks (GANs) are innovative techniques for learning generative models of complex data distributions from samples. Despite remarkable recent improvements in generating realistic images, one of their major shortcomings is the fact that in practice, they tend to produce samples with little diversity, even when trained on diverse datasets. This phenomenon, known as mode collapse, has been the main focus of several recent advances in GANs. Yet there is little understanding of why mode collapse happens and why existing approaches are able to mitigate mode collapse. We propose a principled approach to handling mode collapse, which we call packing. The main idea is to modify the discriminator to make decisions based on multiple samples from the same class, either real or artificially generated. We borrow analysis tools from binary hypothesis testing—in particular the seminal result of Blackwell [Bla53]—to prove a fundamental connection between packing and mode collapse. We show that packing naturally penalizes generators with mode collapse, thereby favoring generator distributions with less mode collapse during the training process. Numerical experiments on benchmark datasets suggests that packing provides significant improvements in practice as well. 
Padé Approximant  In mathematics a Padé approximant is the ‘best’ approximation of a function by a rational function of given order – under this technique, the approximant’s power series agrees with the power series of the function it is approximating. The technique was developed around 1890 by Henri Padé, but goes back to Georg Frobenius who introduced the idea and investigated the features of rational approximations of power series. The Padé approximant often gives better approximation of the function than truncating its Taylor series, and it may still work where the Taylor series does not converge. For these reasons Padé approximants are used extensively in computer calculations. They have also been used as auxiliary functions in Diophantine approximation and transcendental number theory, though for sharp results ad hoc methods in some sense inspired by the Padé theory typically replace them. http://…ing Padé Approximant Coefficients Using R Pade 
PageRank  PageRank is an algorithm used by Google Search to rank websites in their search engine results. PageRank was named after Larry Page, one of the founders of Google. PageRank is a way of measuring the importance of website pages. According to Google: PageRank works by counting the number and quality of links to a page to determine a rough estimate of how important the website is. The underlying assumption is that more important websites are likely to receive more links from other websites. 
Paired Lasso Regression  palasso 
PANDA  In this paper we consider a distributed convex optimization problem over timevarying networks. We propose a dual method that converges Rlinearly to the optimal point given that the agents’ objective functions are strongly convex and have Lipschitz continuous gradients. The proposed method requires half the amount of variable exchanges per iterate than methods based on DIGing, and yields improved practical performance as empirically demonstrated. 
Pando  Volunteer computing is currently successfully used to make hundreds of thousands of machines available freeofcharge to projects of general interest. However the effort and cost involved in participating in and launching such projects may explain why only a few highprofile projects use it and why only 0.1% of Internet users participate in them. In this paper we present Pando, a new webbased volunteer computing system designed to be easy to deploy and which does not require dedicated servers. The tool uses new demanddriven stream abstractions and a WebRTC overlay based on a fat tree for connecting volunteers. Together the stream abstractions and the fattree overlay enable a thousand browser tabs running on multiple machines to be used for computation, enough to tap into all machines bought as part of previous hardware investments made by a small or mediumcompany or a university department. Moreover the approach is based on a simple programming model that should be both easy to use by itself by JavaScript programmers and as a compilation target by compiler writers. We provide a commandline version of the tool and all scripts and procedures necessary to replicate the experiments we made on the Grid5000 testbed. 
Panel Analysis of Nonstationarity in Idiosyncratic and Common Components (PANIC) 
Decomposing a mutivariate time series into common factors and idiosyncratic components, a method called PANIC (Panel Analysis of Nonstationary in Idiosyncratic and Common components) is suggested by Bai and Ng (2004). PANICr 
Panel Data  In statistics and econometrics, the term panel data refers to multidimensional data frequently involving measurements over time. Panel data contain observations of multiple phenomena obtained over multiple time periods for the same firms or individuals. In biostatistics, the term longitudinal data is often used instead, wherein a subject or cluster constitutes a panel member or individual in a longitudinal study. Time series and crosssectional data are special cases of panel data that are in one dimension only (one panel member or individual for the former, one time point for the latter). ivpanel 
Panel Vector Autoregression (PVAR) 
We extend two general methods of moment estimators to panel vector autoregression models (PVAR) with p lags of endogenous variables, predetermined and strictly exogenous variables. This general PVAR model contains the first difference GMM estimator by HoltzEakin et al. (1988) <doi:10.2307/1913103>, Arellano and Bond (1991) <doi:10.2307/2297968> and the system GMM estimator by Blundell and Bond (1998) <doi:10.1016/S03044076(98)000098>. We also provide specification tests (Hansen overidentification test, lag selection criterion and stability test of the PVAR polynomial) and classical structural analysis for PVAR models such as orthogonal and generalized impulse response functions, bootstrapped confidence intervals for impulse response analysis and forecast error variance decompositions. panelvar 
PANFIS++  The concept of evolving intelligent system (EIS) provides an effective avenue for data stream mining because it is capable of coping with two prominent issues: online learning and rapidly changing environments. We note at least three uncharted territories of existing EISs: data uncertainty, temporal system dynamic, redundant data streams. This book chapter aims at delivering a concrete solution of this problem with the algorithmic development of a novel learning algorithm, namely PANFIS++. PANFIS++ is a generalized version of the PANFIS by putting forward three important components: 1) An online active learning scenario is developed to overcome redundant data streams. This module allows to actively select data streams for the training process, thereby expediting execution time and enhancing generalization performance, 2) PANFIS++ is built upon an interval type2 fuzzy system environment, which incorporates the socalled footprint of uncertainty. This component provides a degree of tolerance for data uncertainty. 3) PANFIS++ is structured under a recurrent network architecture with a selffeedback loop. This is meant to tackle the temporal system dynamic. The efficacy of the PANFIS++ has been numerically validated through numerous realworld and synthetic case studies, where it delivers the highest predictive accuracy while retaining the lowest complexity. 
PanSharpening Generative Adversarial Network (PSGAN) 
Remote sensing image fusion (also known as pansharpening) aims to generate a high resolution multispectral image from inputs of a high spatial resolution single band panchromatic (PAN) image and a low spatial resolution multispectral (MS) image. In this paper, we propose PSGAN, a generative adversarial network (GAN) for remote sensing image pansharpening. To the best of our knowledge, this is the first attempt at producing high quality pansharpened images with GANs. The PSGAN consists of two parts. Firstly, a twostream fusion architecture is designed to generate the desired high resolution multispectral images, then a fully convolutional network serving as a discriminator is applied to distinct ‘real’ or ‘pansharpened’ MS images. Experiments on images acquired by Quickbird and GaoFen1 satellites demonstrate that the proposed PSGAN can fuse PAN and MS images effectively and significantly improve the results over the state of the art traditional and CNN based pansharpening methods. 
PARAFAC Tensor Decomposition  ➘ “Tensor Rank Decomposition” 
Paragraph Vector  Many machine learning algorithms require the input to be represented as a fixedlength feature vector. When it comes to texts, one of the most common fixedlength features is bagofwords. Despite their popularity, bagofwords features have two major weaknesses: they lose the ordering of the words and they also ignore semantics of the words. For example, ‘powerful,’ ‘strong’ and ‘Paris’ are equally distant. In this paper, we propose Paragraph Vector, an unsupervised algorithm that learns fixedlength feature representations from variablelength pieces of texts, such as sentences, paragraphs, and documents. Our algorithm represents each document by a dense vector which is trained to predict words in the document. Its construction gives our algorithm the potential to overcome the weaknesses of bagofwords models. Empirical results show that Paragraph Vectors outperform bagofwords models as well as other techniques for text representations. Finally, we achieve new stateoftheart results on several text classification and sentiment analysis tasks. GitXiv 
Paragraph Vectorbased Matrix Factorization Recommender System (ParVecMF) 
Reviewbased recommender systems have gained noticeable ground in recent years. In addition to the rating scores, those systems are enriched with textual evaluations of items by the users. Neural language processing models, on the other hand, have already found application in recommender systems, mainly as a means of encoding user preference data, with the actual textual description of items serving only as side information. In this paper, a novel approach to incorporating the aforementioned models into the recommendation process is presented. Initially, a neural language processing model and more specifically the paragraph vector model is used to encode textual user reviews of variable length into feature vectors of fixed length. Subsequently this information is fused along with the rating scores in a probabilistic matrix factorization algorithm, based on maximum aposteriori estimation. The resulting system, ParVecMF, is compared to a ratings’ matrix factorization approach on a reference dataset. The obtained preliminary results on a set of two metrics are encouraging and may stimulate further research in this area. 
ParaGraphE  Knowledge graph embedding aims at translating the knowledge graph into numerical representations by transforming the entities and relations into con tinuous lowdimensional vectors. Recently, many methods [1, 5, 3, 2, 6] have been proposed to deal with this problem, but existing singlethread implemen tations of them are timeconsuming for largescale knowledge graphs. Here, we design a unified parallel framework to parallelize these methods, which achieves a significant time reduction without in uencing the accuracy. We name our framework as ParaGraphE, which provides a library for parallel knowledge graph embedding. The source code can be downloaded from https: //github.com/LIBBLE/LIBBLEMultiThread/tree/master/ParaGraphE. 
Parallel and Interacting Stochastic Approximation Annealing (PISAA) 
We present the parallel and interacting stochastic approximation annealing (PISAA) algorithm, a stochastic simulation procedure for global optimisation, that extends and improves the stochastic approximation annealing (SAA) by using population Monte Carlo ideas. The standard SAA algorithm guarantees convergence to the global minimum when a squareroot cooling schedule is used; however the efficiency of its performance depends crucially on its selfadjusting mechanism. Because its mechanism is based on information obtained from only a single chain, SAA may present slow convergence in complex optimisation problems. The proposed algorithm involves simulating a population of SAA chains that interact each other in a manner that ensures significant improvement of the selfadjusting mechanism and better exploration of the sampling space. Central to the proposed algorithm are the ideas of (i) recycling information from the whole population of Markov chains to design a more accurate/stable selfadjusting mechanism and (ii) incorporating more advanced proposals, such as crossover operations, for the exploration of the sampling space. PISAA presents a significantly improved performance in terms of convergence. PISAA can be implemented in parallel computing environments if available. We demonstrate the good performance of the proposed algorithm on challenging applications including Bayesian network learning and protein folding. Our numerical comparisons suggest that PISAA outperforms the simulated annealing, stochastic approximation annealing, and annealing evolutionary stochastic approximation Monte Carlo especially in high dimensional or rugged scenarios. 
Parallel Augmented Maps (PAM) 
In this paper we introduce an interface for supporting ordered maps that are augmented to support quick ‘sums’ of values over ranges of the keys. We have implemented this interface as part of a C++ library called PAM (Parallel and Persistent Augmented Map library). This library supports a wide variety of functions on maps ranging from basic insertion and deletion to more interesting functions such as union, intersection, difference, filtering, extracting ranges, splitting, and rangesums. The functions in the library are parallel, persistent (meaning that functions do not affect their inputs), and workefficient. The underlying data structure is the augmented balanced binary search tree, which is a binary search tree in which each node is augmented with a value keeping the ‘sum’ of its subtree with respect to some user supplied function. With this augmentation the library can be directly applied to many settings such as to 2D range trees, interval trees, word index searching, and segment trees. The interface greatly simplifies the implementation of such data structures while it achieves efficiency that is significantly better than previous libraries. We tested our library and its corresponding applications. Experiments show that our implementation of set functions can get up to 50+ speedup on 72 cores. As for our range tree implementation, the sequential running time is more efficient than existing libraries such as CGAL, and can get up to 42+ speedup on 72 cores. 
Parallel Coordinates  Parallel coordinates is a common way of visualizing highdimensional geometry and analyzing multivariate data. To show a set of points in an ndimensional space, a backdrop is drawn consisting of n parallel lines, typically vertical and equally spaced. A point in ndimensional space is represented as a polyline with vertices on the parallel axes; the position of the vertex on the ith axis corresponds to the ith coordinate of the point. This visualization is closely related to time series visualization, except that it is applied to data where the axes do not correspond to points in time, and therefore do not have a natural order. Therefore, different axis arrangements may be of interest. visova,MASS,Acinonyx 
Parallel Data Assimilation Framework (PDAF) 
The Parallel Data Assimilation Framework – PDAF – is a software environment for ensemble data assimilation. PDAF simplifies the implementation of the data assimilation system with existing numerical models. With this, users can obtain a data assimilation system with less work and can focus on applying data assimilation. PDAF provides fully implemented and optimized data assimilation algorithms, in particular ensemblebased Kalman filters like LETKF and LSEIK. It allows users to easily test different assimilation algorithms and observations. PDAF is optimized for the application with largescale models that usually run on big parallel computers and is applicable for operational applications. However, it is also well suited for smaller models and even toy models. PDAF provides a standardized interface that separates the numerical model from the assimilation routines. This allows to perform the further development of the assimilation methods and the model independently. New algorithmic developments can be readily made available through the interface such that they can be immediately applied with existing implementations. The test suite of PDAF provides small models for easy testing of algorithmic developments and for teaching data assimilation. PDAF is an opensource project. Its functionality will be further extended by input from research projects. In addition, users are welcome to contribute to the further enhancement of PDAF, e.g. by contributing additional assimilation methods or interface routines for different numerical models. 
Parallel External Memory (PEM) 
In this paper, we study parallel algorithms for privatecache chip multiprocessors (CMPs), focusing on methods for foundational problems that can scale to hundreds or even thousands of cores. By focusing on privatecache CMPs, we show that we can design efficient algorithms that need no additional assumptions about the way that cores are interconnected, for we assume that all interprocessor communication occurs through the memory hierarchy. We study several fundamental problems, including prefix sums, selection, and sorting, which often form the building blocks of other parallel algorithms. Indeed, we present two sorting algorithms, a distribution sort and a mergesort. All algorithms in the paper are asymptotically optimal in terms of the parallel cache accesses and space complexity under reasonable assumptions about the relationships between the number of processors, the size of memory, and the size of cache blocks. In addition, we study sorting lower bounds in a computational model, which we call the parallel externalmemory (PEM) model, that formalizes the essential properties of our algorithms for privatecache chip multiprocessors. 
Parallel External Memory Algorithm (PEMA) 

Parallel Grid Pooling (PGP) 
Convolutional neural network (CNN) architectures utilize downsampling layers, which restrict the subsequent layers to learn spatially invariant features while reducing computational costs. However, such a downsampling operation makes it impossible to use the full spectrum of input features. Motivated by this observation, we propose a novel layer called parallel grid pooling (PGP) which is applicable to various CNN models. PGP performs downsampling without discarding any intermediate feature. It works as data augmentation and is complementary to commonly used data augmentation techniques. Furthermore, we demonstrate that a dilated convolution can naturally be represented using PGP operations, which suggests that the dilated convolution can also be regarded as a type of data augmentation technique. Experimental results based on popular image classification benchmarks demonstrate the effectiveness of the proposed method. Code is available at: https://…/akitotakeki 
Parallel Pareto Local Search based on Decomposition (PPLS/D) 
Pareto Local Search (PLS) is a basic building block in many multiobjective metaheuristics. In this paper, Parallel Pareto Local Search based on Decomposition (PPLS/D) is proposed. PPLS/D decomposes the original search space into L subregions and executes L parallel search processes in these subregions simultaneously. Inside each subregion, the PPLS/D process is first guided by a scalar objective function to approach the Pareto set quickly, then it finds nondominated solutions in this subregion. Our experimental studies on the multiobjective Unconstrained Binary Quadratic Programming problems (mUBQPs) with two to four objectives demonstrate the efficiency of PPLS/D. We investigate the behavior of PPLS/D to understand its working mechanism. Moreover, we propose a variant of PPLS/D called PPLS/D with Adaptive Expansion (PPLS/DAE), in which each process can search other subregions after it converges in its own subregion. Its advantages and disadvantages have been studied. 
Parallel Predictive Entropy Search (PPES) 
We develop parallel predictive entropy search (PPES), a novel algorithm for Bayesian optimization of expensive blackbox objective functions. At each iteration, PPES aims to select a batch of points which will maximize the information gain about the global maximizer of the objective. Well known strategies exist for suggesting a single evaluation point based on previous observations, while far fewer are known for selecting batches of points to evaluate in parallel. The few batch selection schemes that have been studied all resort to greedy methods to compute an optimal batch. To the best of our knowledge, PPES is the first nongreedy batch Bayesian optimization strategy. We demonstrate the benefit of this approach in optimization performance on both synthetic and real world applications, including problems in machine learning, rocket science and robotics. 
Parallel Sets (ParSets) 
Parallel Sets (ParSets) is a visualization application for categorical data, like census and survey data, inventory, and many other kinds of data that can be summed up in a crosstabulation. ParSets provide a simple, interactive way to explore and analyze such data. 
Parameter Hub (PHub) 
Most work in the deep learning systems community has focused on faster inference, but arriving at a trained model requires lengthy experiments. Accelerating training lets developers iterate faster and come up with better models. DNN training is often seen as a computebound problem, best done in a single large compute node with many GPUs. As DNNs get bigger, training requires going distributed. Distributed deep neural network (DDNN) training constitutes an important workload on the cloud. Larger DNN models and faster compute engines shift training performance bottleneck from computation to communication. Our experiments show existing DNN training frameworks do not scale in a typical cloud environment due to insufficient bandwidth and inefficient parameter server software stacks. We propose PHub, a high performance parameter server (PS) software design that provides an optimized network stack and a streamlined gradient processing pipeline to benefit common PS setups, and PBox, a balanced, scalable central PS hardware that fully utilizes PHub capabilities. We show that in a typical cloud environment, PHub can achieve up to 3.8x speedup over stateoftheart designs when training ImageNet. We discuss future directions of integrating PHub with programmable switches for innetwork aggregation during training, leveraging the datacenter network topology to reduce bandwidth usage and localize data movement. 
Parameter Selection and Model Evaluation (PSME) 

Parameter Transfer Unit (PTU) 
Parameters in deep neural networks which are trained on largescale databases can generalize across multiple domains, which is referred as ‘transferability’. Unfortunately, the transferability is usually defined as discrete states and it differs with domains and network architectures. Existing works usually heuristically apply parametersharing or finetuning, and there is no principled approach to learn a parameter transfer strategy. To address the gap, a parameter transfer unit (PTU) is proposed in this paper. The PTU learns a finegrained nonlinear combination of activations from both the source and the target domain networks, and subsumes handcrafted discrete transfer states. In the PTU, the transferability is controlled by two gates which are artificial neurons and can be learned from data. The PTU is a general and flexible module which can be used in both CNNs and RNNs. Experiments are conducted with various network architectures and multiple transfer domain pairs. Results demonstrate the effectiveness of the PTU as it outperforms heuristic parametersharing and finetuning in most settings. 
ParameterFree Online Learning  We introduce an efficient algorithmic framework for model selection in online learning, also known as parameterfree online learning. Departing from previous work, which has focused on highly structured function classes such as nested balls in Hilbert space, we propose a generic metaalgorithm framework that achieves online model selection oracle inequalities under minimal structural assumptions. We give the first computationally efficient parameterfree algorithms that work in arbitrary Banach spaces under mild smoothness assumptions; previous results applied only to Hilbert spaces. We further derive new oracle inequalities for matrix classes, nonnested convex sets, and $\mathbb{R}^{d}$ with generic regularizers. Finally, we generalize these results by providing oracle inequalities for arbitrary nonlinear classes in the online supervised learning model. These results are all derived through a unified metaalgorithm scheme using a novel ‘multiscale’ algorithm for prediction with expert advice based on random playout, which may be of independent interest. 
PArameterized Clipping acTivation (PACT) 
Deep learning algorithms achieve high classification accuracy at the expense of significant computation cost. To address this cost, a number of quantization schemes have been proposed – but most of these techniques focused on quantizing weights, which are relatively smaller in size compared to activations. This paper proposes a novel quantization scheme for activations during training – that enables neural networks to work well with ultra low precision weights and activations without any significant accuracy degradation. This technique, PArameterized Clipping acTivation (PACT), uses an activation clipping parameter $\alpha$ that is optimized during training to find the right quantization scale. PACT allows quantizing activations to arbitrary bit precisions, while achieving much better accuracy relative to published stateoftheart quantization schemes. We show, for the first time, that both weights and activations can be quantized to 4bits of precision while still achieving accuracy comparable to full precision networks across a range of popular models and datasets. We also show that exploiting these reducedprecision computational units in hardware can enable a superlinear improvement in inferencing performance due to a significant reduction in the area of accelerator compute engines coupled with the ability to retain the quantized model and activation data in onchip memories. 
Parametric Gaussian Processes (PGP) 
This work introduces the concept of parametric Gaussian processes (PGPs), which is built upon the seemingly selfcontradictory idea of making Gaussian processes parametric. Parametric Gaussian processes, by construction, are designed to operate in ‘big data’ regimes where one is interested in quantifying the uncertainty associated with noisy data. The proposed methodology circumvents the wellestablished need for stochastic variational inference, a scalable algorithm for approximating posterior distributions. The effectiveness of the proposed approach is demonstrated using an illustrative example with simulated data and a benchmark dataset in the airline industry with approximately $6$ million records. 
Parametric Model  In statistics, a parametric model or parametric family or finitedimensional model is a family of distributions that can be described using a finite number of parameters. These parameters are usually collected together to form a single kdimensional parameter vector θ = (θ1, θ2, …, θk). Parametric models are contrasted with the semiparametric, seminonparametric, and nonparametric models, all of which consist of an infinite set of ‘parameters’ for description. The distinction between these four classes is as follows: · in a ‘parametric’ model all the parameters are in finitedimensional parameter spaces; · a model is ‘nonparametric’ if all the parameters are in infinitedimensional parameter spaces; · a ‘semiparametric’ model contains finitedimensional parameters of interest and infinitedimensional nuisance parameters; · a ‘seminonparametric’ model has both finitedimensional and infinitedimensional unknown parameters of interest. Some statisticians believe that the concepts ‘parametric’, ‘nonparametric’, and ‘semiparametric’ are ambiguous. It can also be noted that the set of all probability measures has cardinality of continuum, and therefore it is possible to parametrize any model at all by a single number in (0,1) interval. This difficulty can be avoided by considering only ‘smooth’ parametric models. 
Parametric Portfolio Policies (PPP) 
We propose a novel approach to optimizing portfolios with large numbers of assets. We model directly the portfolio weight in each asset as a function of the asset’s characteristics. The coefficients of this function are found by optimizing the investor’s average utility of the portfolio’s return over the sample period. Our approach is computationally simple and easily modified and extended to capture the effect of transaction costs, for example, produces sensible portfolio weights, and offers robust performance in and out of sample. In contrast, the traditional approach of first modeling the joint distribution of returns and then solving for the corresponding optimal portfolio weights is not only difficult to implement for a large number of assets but also yields notoriously noisy and unstable results. We present an empirical implementation for the universe of all stocks in the CRSPCompustat data set, exploiting the size, value, and momentum anomalies. 
Parametric Rectified Linear Unit (PReLU) 
Rectified activation units (rectifiers) are essential for stateoftheart neural networks. In this work, we study rectifier neural networks for image classification from two aspects. First, we propose a Parametric Rectified Linear Unit (PReLU) that generalizes the traditional rectified unit. PReLU improves model fitting with nearly zero extra computational cost and little overfitting risk. Second, we derive a robust initialization method that particularly considers the rectifier nonlinearities. This method enables us to train extremely deep rectified models directly from scratch and to investigate deeper or wider network architectures. Based on our PReLU networks (PReLUnets), we achieve 4.94% top5 test error on the ImageNet 2012 classification dataset. This is a 26% relative improvement over the ILSVRC 2014 winner (GoogLeNet, 6.66%). To our knowledge, our result is the first to surpass humanlevel performance (5.1%,) on this visual recognition challenge. 
Paranom  In this paper, we present Paranom, a parallel anomaly dataset generator. We discuss its design and provide brief experimental results demonstrating its usefulness in improving the classification correctness of LSTMAD, a stateoftheart anomaly detection model. 
Pareto Depth Analysis (PDA) 
We consider the problem of identifying patterns in a data set that exhibit anomalous behavior, often referred to as anomaly detection. Similaritybased anomaly detection algorithms detect abnormally large amounts of similarity or dissimilarity, e.g.~as measured by nearest neighbor Euclidean distances between a test sample and the training samples. In many application domains there may not exist a single dissimilarity measure that captures all possible anomalous patterns. In such cases, multiple dissimilarity measures can be defined, including nonmetric measures, and one can test for anomalies by scalarizing using a nonnegative linear combination of them. If the relative importance of the different dissimilarity measures are not known in advance, as in many anomaly detection applications, the anomaly detection algorithm may need to be executed multiple times with different choices of weights in the linear combination. In this paper, we propose a method for similaritybased anomaly detection using a novel multicriteria dissimilarity measure, the Pareto depth. The proposed Pareto depth analysis (PDA) anomaly detection algorithm uses the concept of Pareto optimality to detect anomalies under multiple criteria without having to run an algorithm multiple times with different choices of weights. The proposed PDA approach is provably better than using linear combinations of the criteria and shows superior performance on experiments with synthetic and real data sets. 
ParetoSmoothed Importance Sampling (PSIS) 
While it’s always possible to compute a variational approximation to a posterior distribution, it can be difficult to discover problems with this approximation’. We propose two diagnostic algorithms to alleviate this problem. The Paretosmoothed importance sampling (PSIS) diagnostic gives a goodness of fit measurement for joint distributions, while simultaneously improving the error in the estimate. The variational simulationbased calibration (VSBC) assesses the average performance of point estimates. 
ParlAI  We introduce ParlAI (pronounced ‘parlay’), an opensource software platform for dialog research implemented in Python, available at http://parl.ai. Its goal is to provide a unified framework for training and testing of dialog models, including multitask training, and integration of Amazon Mechanical Turk for data collection, human evaluation, and online/reinforcement learning. Over 20 tasks are supported in the first release, including popular datasets such as SQuAD, bAbI tasks, MCTest, WikiQA, QACNN, QADailyMail, CBT, bAbI Dialog, Ubuntu, OpenSubtitles and VQA. Included are examples of training neural models with PyTorch and Lua Torch, including both batch and hogwild training of memory networks and attentive LSTMs. 
Parle  We propose a new algorithm called Parle for parallel training of deep networks that converges 24x faster than a dataparallel implementation of SGD, while achieving significantly improved error rates that are nearly stateoftheart on several benchmarks including CIFAR10 and CIFAR100, without introducing any additional hyperparameters. We exploit the phenomenon of flat minima that has been shown to lead to improved generalization error for deep networks. Parle requires very infrequent communication with the parameter server and instead performs more computation on each client, which makes it wellsuited to both singlemachine, multiGPU settings and distributed implementations. 
Parrondo’s Paradox  Parrondo’s paradox, a paradox in game theory, has been described as: A combination of losing strategies becomes a winning strategy. It is named after its creator, Juan Parrondo, who discovered the paradox in 1996. A more explanatory description is: ‘There exist pairs of games, each with a higher probability of losing than winning, for which it is possible to construct a winning strategy by playing the games alternately.’ Parrondo devised the paradox in connection with his analysis of the Brownian ratchet, a thought experiment about a machine that can purportedly extract energy from random heat motions popularized by physicist Richard Feynman. However, the paradox disappears when rigorously analyzed. 
Parseval Networks  We introduce Parseval networks, a form of deep neural networks in which the Lipschitz constant of linear, convolutional and aggregation layers is constrained to be smaller than 1. Parseval networks are empirically and theoretically motivated by an analysis of the robustness of the predictions made by deep neural networks when their input is subject to an adversarial perturbation. The most important feature of Parseval networks is to maintain weight matrices of linear and convolutional layers to be (approximately) Parseval tight frames, which are extensions of orthogonal matrices to nonsquare matrices. We describe how these constraints can be maintained efficiently during SGD. We show that Parseval networks match the stateoftheart in terms of accuracy on CIFAR10/100 and Street View House Numbers (SVHN) while being more robust than their vanilla counterpart against adversarial examples. Incidentally, Parseval networks also tend to train faster and make a better usage of the full capacity of the networks. 
Parsimonious Adaptive Rejection Sampling  Monte Carlo (MC) methods have become very popular in signal processing during the past decades. The adaptive rejection sampling (ARS) algorithms are wellknown MC technique which draw efficiently independent samples from univariate target densities. The ARS schemes yield a sequence of proposal functions that converge toward the target, so that the probability of accepting a sample approaches one. However, sampling from the proposal pdf becomes more computationally demanding each time it is updated. We propose the Parsimonious Adaptive Rejection Sampling (PARS) method, where an efficient tradeoff between acceptance rate and proposal complexity is obtained. Thus, the resulting algorithm is faster than the standard ARS approach. 
Parsimonious Bayesian Deep Network  Combining Bayesian nonparametrics and a forward model selection strategy, we construct parsimonious Bayesian deep networks (PBDNs) that infer capacityregularized network architectures from the data and require neither crossvalidation nor finetuning when training the model. One of the two essential components of a PBDN is the development of a special infinitewide singlehiddenlayer neural network, whose number of active hidden units can be inferred from the data. The other one is the construction of a greedy layerwise learning algorithm that uses a forward model selection criterion to determine when to stop adding another hidden layer. We develop both Gibbs sampling and stochastic gradient descent based maximum a posteriori inference for PBDNs, providing stateoftheart classification accuracy and interpretable data subtypes near the decision boundaries, while maintaining low computational complexity for outofsample prediction. 
Parsimonious Gaussian Mixture Models  McNicholas and Murphy (2008) <doi:10.1007/s1122200890560>, McNicholas (2010) <doi:10.1016/j.jspi.2009.11.006>, McNicholas and Murphy (2010) <doi:10.1093/bioinformatics/btq498>. pgmm 
Parsimonious Learning Machine (PALM) 
Data stream has been the underlying challenge in the age of big data because it calls for realtime data processing with the absence of a retraining process and/or an iterative learning approach. In realm of fuzzy system community, data stream is handled by algorithmic development of selfadaptive neurofuzzy systems (SANFS) characterized by the singlepass learning mode and the open structure property which enables effective handling of fast and rapidly changing natures of data streams. The underlying bottleneck of SANFSs lies in its design principle which involves a high number of free parameters (rule premise and rule consequent) to be adapted in the training process. This figure can even double in the case of type2 fuzzy system. In this work, a novel SANFS, namely parsimonious learning machine (PALM), is proposed. PALM features utilization of a new type of fuzzy rule based on the concept of hyperplane clustering which significantly reduces the number of network parameters because it has no rule premise parameters. PALM is proposed in both type1 and type2 fuzzy systems where all of which characterize a fully dynamic rulebased system. That is, it is capable of automatically generating, merging and tuning the hyperplane based fuzzy rule in the single pass manner. The efficacy of PALM has been evaluated through numerical study with six realworld and synthetic data streams from public database and our own realworld project of autonomous vehicles. The proposed model showcases significant improvements in terms of computational complexity and number of required parameters against several renowned SANFSs, while attaining comparable and often better predictive accuracy. 
Part of Speech (POS) 
A part of speech is a category of words (or, more generally, of lexical items) which have similar grammatical properties. Words that are assigned to the same part of speech generally display similar behavior in terms of syntax – they play similar roles within the grammatical structure of sentences – and sometimes in terms of morphology, in that they undergo inflection for similar properties. Commonly listed English parts of speech are noun, verb, adjective, adverb, pronoun, preposition, conjunction, interjection, and sometimes article or determiner. A part of speech – particularly in more modern classifications, which often make more precise distinctions than the traditional scheme does – may also be called a word class, lexical class, or lexical category, although the term lexical category refers in some contexts to a particular type of syntactic category, and may thus exclude parts of speech that are considered to be functional, such as pronouns. The term form class is also used, although this has various conflicting definitions. Word classes may be classified as open or closed: open classes (like nouns, verbs and adjectives) acquire new members constantly, while closed classes (such as pronouns and conjunctions) acquire new members infrequently, if at all. Almost all languages have the word classes noun and verb, but beyond these there are significant variations in different languages. For example, Japanese has as many as three classes of adjectives where English has one; Chinese, Korean and Japanese have a class of nominal classifiers; many languages lack a distinction between adjectives and adverbs, or between adjectives and verbs. This variation in the number of categories and their identifying properties means that analysis needs to be done for each individual language. Nevertheless, the labels for each category are assigned on the basis of universal criteria. http://…/Stative_verb 
Part of Speech Tagging (POST) 
In corpus linguistics, partofspeech tagging (POS tagging or POST), also called grammatical tagging or wordcategory disambiguation, is the process of marking up a word in a text (corpus) as corresponding to a particular part of speech, based on both its definition, as well as its context – i.e. relationship with adjacent and related words in a phrase, sentence, or paragraph. A simplified form of this is commonly taught to schoolage children, in the identification of words as nouns, verbs, adjectives, adverbs, etc. Once performed by hand, POS tagging is now done in the context of computational linguistics, using algorithms which associate discrete terms, as well as hidden parts of speech, in accordance with a set of descriptive tags. POStagging algorithms fall into two distinctive groups: rulebased and stochastic. E. Brill’s tagger, one of the first and most widely used English POStaggers, employs rulebased algorithms. 
Partial Area Under a Receiver Operating Characteristic (pAUC) 
We propose a method for maximizing a partial area under a receiver operating characteristic (ROC) curve (pAUC) for binary classification tasks. In binary classification tasks, accuracy is the most commonly used as a measure of classifier performance. In some applications such as anomaly detection and diagnostic testing, accuracy is not an appropriate measure since prior probabilties are often greatly biased. Although in such cases the pAUC has been utilized as a performance measure, few methods have been proposed for directly maximizing the pAUC. This optimization is achieved by using a scoring function. The conventional approach utilizes a linear function as the scoring function. In contrast we newly introduce nonlinear scoring functions for this purpose. Specifically, we present two types of nonlinear scoring functions based on generative models and deep neural networks. We show experimentally that nonlinear scoring fucntions improve the conventional methods through the application of a binary classification of real and bogus objects obtained with the Hyper SuprimeCam on the Subaru telescope. 
Partial AutoCorrelation Function (PACF) 
In time series analysis, the partial autocorrelation function (PACF) plays an important role in data analyses aimed at identifying the extent of the lag in an autoregressive model. The use of this function was introduced as part of the BoxJenkins approach to time series modelling, where by plotting the partial autocorrelative functions one could determine the appropriate lags p in an AR (p) model or in an extended ARIMA (p,d,q) model. 
Partial DecisionDNNF  Model counting is the problem of computing the number of satisfying assignments of a given propositional formula. Although exact model counters can be naturally furnished by most of the knowledge compilation (KC) methods, in practice, they fail to generate the compiled results for the exact counting of models for certain formulas due to the explosion in sizes. DecisionDNNF is an important KC language that captures most of the practical compilers. We propose a generalized DecisionDNNF (referred to as partial DecisionDNNF) via introducing a class of new leaf vertices (called unknown vertices), and then propose an algorithm called PartialKC to generate randomly partial DecisionDNNF formulas from the given formulas. An unbiased estimate of the model number can be computed via a randomly partial DecisionDNNF formula. Each calling of PartialKC consists of multiple callings of MicroKC, while each of the latter callings is a process of importance sampling equipped with KC technologies. The experimental results show that PartialKC is more accurate than both SampleSearch and SearchTreeSampler, PartialKC scales better than SearchTreeSampler, and the KC technologies can obviously accelerate sampling. 
Partial Dependency Plots  Partial dependence plots show the dependence between the target function and a set of ‘target’ features, marginalizing over the values of all other features (the complement features). Due to the limits of human perception the size of the target feature set must be small (usually, one or two) thus the target features are usually chosen among the most important features. randomForest 
Partial Least Squares Regression (PLS) 
Partial least squares regression (PLS regression) is a statistical method that bears some relation to principal components regression; instead of finding hyperplanes of minimum variance between the response and independent variables, it finds a linear regression model by projecting the predicted variables and the observable variables to a new space. Because both the X and Y data are projected to new spaces, the PLS family of methods are known as bilinear factor models. Partial least squares Discriminant Analysis (PLSDA) is a variant used when the Y is categorical. PLS think twice about partial least squares 
Partial Membership Latent Dirichlet Allocation (PMLDA) 
For many years, topic models (e.g., pLSA, LDA, SLDA) have been widely used for segmenting and recognizing objects in imagery simultaneously. However, these models are confined to the analysis of categorical data, forcing a visual word to belong to one and only one topic. There are many images in which some regions cannot be assigned a crisp categorical label (e.g., transition regions between a foggy sky and the ground or between sand and water at a beach). In these cases, a visual word is best represented with partial memberships across multiple topics. To address this, we present a partial membership latent Dirichlet allocation (PMLDA) model and associated parameter estimation algorithms. PMLDA defines a novel partial membership model for word and document generation. We employ Gibbs sampling for parameter estimation. Experimental results on two natural image datasets and one SONAR image dataset show that PMLDA can produce both crisp and soft semantic image segmentations; a capability existing methods do not have. 
Partial Robust M Regression  If an appropriate weighting scheme is chosen, partial Mestimators become entirely robust to any type of outlying points, and are called Partial Robust Mestimators. It is shown that partial robust Mregression outperforms existing methods for robust PLS regression in terms of statistical precision and computational speed, while keeping good robustness properties. sprm 
Partial Transfer Learning  Adversarial learning has been successfully embedded into deep networks to learn transferable features, which reduce distribution discrepancy between the source and target domains. Existing domain adversarial networks assume fully shared label space across domains. In the presence of big data, there is strong motivation of transferring both classification and representation models from existing big domains to unknown small domains. This paper introduces partial transfer learning, which relaxes the shared label space assumption to that the target label space is only a subspace of the source label space. Previous methods typically match the whole source domain to the target domain, which are prone to negative transfer for the partial transfer problem. We present Selective Adversarial Network (SAN), which simultaneously circumvents negative transfer by selecting out the outlier source classes and promotes positive transfer by maximally matching the data distributions in the shared label space. Experiments demonstrate that our models exceed stateoftheart results for partial transfer learning tasks on several benchmark datasets. 
Partially Adaptive Momentum Estimation Method (Padam) 
Adaptive gradient methods, which adopt historical gradient information to automatically adjust the learning rate, have been observed to generalize worse than stochastic gradient descent (SGD) with momentum in training deep neural networks. This leaves how to close the generalization gap of adaptive gradient methods an open problem. In this work, we show that adaptive gradient methods such as Adam, Amsgrad, are sometimes ‘over adapted’. We design a new algorithm, called Partially adaptive momentum estimation method (Padam), which unifies the Adam/Amsgrad with SGD to achieve the best from both worlds. Experiments on standard benchmarks show that Padam can maintain fast convergence rate as Adam/Amsgrad while generalizing as well as SGD in training deep neural networks. These results would suggest practitioners pick up adaptive gradient methods once again for faster training of deep neural networks. 
Partially Linear Additive Quantile Regression  plaqr 
Partially Linear Spatial Probit Model  A partially linear probit model for spatially dependent data is considered. A triangular array setting is used to cover various patterns of spatial data. Conditional spatial heteroscedasticity and nonidentically distributed observations and a linear process for disturbances are assumed, allowing various spatial dependencies. The estimation procedure is a combination of a weighted likelihood and a generalized method of moments. The procedure first fixes the parametric components of the model and then estimates the nonparametric part using weighted likelihood; the obtained estimate is then used to construct a GMM parametric component estimate. The consistency and asymptotic distribution of the estimators are established under sufficient conditions. Some simulation experiments are provided to investigate the finite sample performance of the estimators. 
Partially Observable Markov Decision Process  A partially observable Markov decision process (POMDP) is a generalization of a Markov decision process (MDP). A POMDP models an agent decision process in which it is assumed that the system dynamics are determined by an MDP, but the agent cannot directly observe the underlying state. Instead, it must maintain a probability distribution over the set of possible states, based on a set of observations and observation probabilities, and the underlying MDP. The POMDP framework is general enough to model a variety of realworld sequential decision processes. Applications include robot navigation problems, machine maintenance, and planning under uncertainty in general. The framework originated in the operations research community, and was later adapted by the artificial intelligence and automated planning communities. An exact solution to a POMDP yields the optimal action for each possible belief over the world states. The optimal action maximizes (or minimizes) the expected reward (or cost) of the agent over a possibly infinite horizon. The sequence of optimal actions is known as the optimal policy of the agent for interacting with its environment. 
Partially Observed Markov Decision Process (POMDP) 
➘ “Partially Observable Markov Decision Process” 
Partially Observed Markov Process (POMP) 
➚ “Hidden Markov Model” pomp 
Participatory Sensing  Participatory Sensing is the concept of communities (or other groups of people) contributing sensory information to form a body of knowledge. A growth in mobile devices, such as the iPhone, which has multiple sensors, has made participatory sensing viable in the largescale. Participatory sensing can be used to retrieve information about the environment, weather, congestion as well as any other sensory information that collectively forms knowledge. Such open communication systems could pose challenges to the veracity of transmitted information. Individual sensors may require a trusted platform or hierarchical trust structures. Additional challenges include, but are not limited to, effective incentives for participation, security, reputation and privacy. 
Particle Filter  Particle filters or Sequential Monte Carlo (SMC) methods are a set of online posterior density estimation algorithms that estimate the posterior density of the statespace by directly implementing the Bayesian recursion equations. The term ‘sequential Monte Carlo’ was first coined in Liu and Chen (1998). SMC methods use a sampling approach, with a set of particles to represent the posterior density. The statespace model can be nonlinear and the initial state and noise distributions can take any form required. SMC methods provide a wellestablished methodology for generating samples from the required distribution without requiring assumptions about the statespace model or the state distributions. However, these methods do not perform well when applied to highdimensional systems. SMC methods implement the Bayesian recursion equations directly by using an ensemble based approach. The samples from the distribution are represented by a set of particles; each particle has a weight assigned to it that represents the probability of that particle being sampled from the probability density function. Weight disparity leading to weight collapse is a common issue encountered in these filtering algorithms; however it can be mitigated by including a resampling step before the weights become too uneven. In the resampling step, the particles with negligible weights are replaced by new particles in the proximity of the particles with higher weights. http://…/particlefiltering.pdf 
Particle Swarm Optimization (PSO) 
In computer science, particle swarm optimization (PSO) is a computational method that optimizes a problem by iteratively trying to improve a candidate solution with regard to a given measure of quality. It solves a problem by having a population of candidate solutions, here dubbed particles, and moving these particles around in the searchspace according to simple mathematical formulae over the particle’s position and velocity. Each particle’s movement is influenced by its local best known position, but is also guided toward the best known positions in the searchspace, which are updated as better positions are found by other particles. This is expected to move the swarm toward the best solutions. 
Partitional Clustering  Partitional clustering decomposes a data set into a set of disjoint clusters. Given a data set of N points, a partitioning method constructs K (N ≥ K) partitions of the data, with each partition representing a cluster. That is, it classifies the data into K groups by satisfying the following requirements: (1) each group contains at least one point, and (2) each point belongs to exactly one group. Notice that for fuzzy partitioning, a point can belong to more than one group. Many partitional clustering algorithms try to minimize an objective function. 
Partitioning Around Medoids (PAM) 
The PAM algorithm was developed by Leonard Kaufman and Peter J. Rousseeuw, and this algorithm is very similar to Kmeans, mostly because both are partitional algorithms, in other words, both break the datasets into groups, and both works trying to minimize the error, but PAM works with Medoids, that are an entity of the dataset that represent the group in which it is inserted, and Kmeans works with Centroids, that are artificially created entity that represent its cluster. The PAM algorithm partitionates a dataset of n objects into a number k of clusters, where both the dataset and the number k is an input of the algorithm. This algorithm works with a matrix of dissimilarity, where its goal is to minimize the overall dissimilarity between the representants of each cluster and its members. 
ParzenRosenblatt Kernel Density Estimation  The Parzenwindow method (also known as ParzenRosenblatt window method) is a widely used nonparametric approach to estimate a probability density function p(x) for a specific point p(x) from a sample p(xn) that doesn’t require any knowledge or assumption about the underlying distribution. 
ParzenRosenblatt Window Technique  A nonparametric kernel density estimation technique for probability densities of random variables if the underlying distribution/model is unknown. A socalled window function is used to count samples within hypercubes or Gaussian kernels of a specified volume to estimate the probability density. 
Passing Bablok Regression  The comparison of methods experiment is important part in process of analytical methods and instruments validation. Passing and Bablok regression analysis is a statistical procedure that allows valuable estimation of analytical methods agreement and possible systematic bias between them. It is robust, nonparametric, non sensitive to distribution of errors and data outliers. Assumptions for proper application of Passing and Bablok regression are continuously distributed data and linear relationship between data measured by two analytical methods. Results are presented with scatter diagram and regression line, and regression equation where intercept represents constant and slope proportional measurement error. Confidence intervals of 95% of intercept and slope explain if their value differ from value zero (intercept) and value one (slope) only by chance, allowing conclusion of method agreement and correction action if necessary. Residual plot revealed outliers and identify possible nonlinearity. Furthermore, cumulative sum linearity test is performed to investigate possible significant deviation from linearity between two sets of data. Non linear samples are not suitable for concluding on method agreement. deming 
Passive and Partially Active (PPA) 
Faulttolerance techniques for stream processing engines can be categorized into passive and active approaches. A typical passive approach periodically checkpoints a processing task’s runtime states and can recover a failed task by restoring its runtime state using its latest checkpoint. On the other hand, an active approach usually employs backup nodes to run replicated tasks. Upon failure, the active replica can take over the processing of the failed task with minimal latency. However, both approaches have their own inadequacies in Massively Parallel Stream Processing Engines (MPSPE). The passive approach incurs a long recovery latency especially when a number of correlated nodes fail simultaneously, while the active approach requires extra replication resources. In this paper, we propose a new faulttolerance framework, which is Passive and Partially Active (PPA). In a PPA scheme, the passive approach is applied to all tasks while only a selected set of tasks will be actively replicated. The number of actively replicated tasks depends on the available resources. If tasks without active replicas fail, tentative outputs will be generated before the completion of the recovery process. We also propose effective and efficient algorithms to optimize a partially active replication plan to maximize the quality of tentative outputs. We implemented PPA on top of Storm, an opensource MPSPE and conducted extensive experiments using both real and synthetic datasets to verify the effectiveness of our approach. 
PassiveAggressive Learning (PA) 
We present a unified view for online classification, regression, and uniclass problems. This view leads to a single algorithmic framework for the three problems. We prove worst case loss bounds for various algorithms for both the realizable case and the nonrealizable case. A conversion of our main online algorithm to the setting of batch learning is also discussed. The end result is new algorithms and accompanying loss bounds for the hingeloss. 
PatchNet  The ability to visually understand and interpret learned features from complex predictive models is crucial for their acceptance in sensitive areas such as health care. To move closer to this goal of truly interpretable complex models, we present PatchNet, a network that restricts global context for image classification tasks in order to easily provide visual representations of learned texture features on a predetermined local scale. We demonstrate how PatchNet provides visual heatmap representations of the learned features, and we mathematically analyze the behavior of the network during convergence. We also present a version of PatchNet that is particularly well suited for lowering false positive rates in image classification tasks. We apply PatchNet to the classification of textures from the Describable Textures Dataset and to the ISBIISIC 2016 melanoma classification challenge. 
PatchShuffle Regularization  This paper focuses on regularizing the training of the convolutional neural network (CNN). We propose a new regularization approach named “PatchShuffle“ that can be adopted in any classificationoriented CNN models. It is easy to implement: in each minibatch, images or feature maps are randomly chosen to undergo a transformation such that pixels within each local patch are shuffled. Through generating images and feature maps with interior orderless patches, PatchShuffle creates rich local variations, reduces the risk of network overfitting, and can be viewed as a beneficial supplement to various kinds of training regularization techniques, such as weight decay, model ensemble and dropout. Experiments on four representative classification datasets show that PatchShuffle improves the generalization ability of CNN especially when the data is scarce. Moreover, we empirically illustrate that CNN models trained with PatchShuffle are more robust to noise and local changes in an image. 
Path Analysis  In statistics, path analysis is used to describe the directed dependencies among a set of variables. This includes models equivalent to any form of multiple regression analysis, factor analysis, canonical correlation analysis, discriminant analysis, as well as more general families of models in the multivariate analysis of variance and covariance analyses (MANOVA, ANOVA, ANCOVA). In addition to being thought of as a form of multiple regression focusing on causality, path analysis can be viewed as a special case of structural equation modeling (SEM) – one in which only single indicators are employed for each of the variables in the causal model. That is, path analysis is SEM with a structural model, but no measurement model. Other terms used to refer to path analysis include causal modeling, analysis of covariance structures, and latent variable models. 
Path Correlation Data  A communication network can be modeled as a directed connected graph with edge weights that characterize performance metrics such as loss and delay. Network tomography aims to infer these edge weights from their pathwise versions measured on a set of intersecting paths between a subset of boundary vertices, and even the underlying graph when this is not known. In particular, temporal correlations between path metrics have been used infer composite weights on the subpath formed by the path intersection. We call these subpath weights the Path Correlation Data. In this paper we ask the following question: when can the underlying weighted graph be recovered knowing only the boundary vertices and the Path Correlation Data? We establish necessary and sufficient conditions for a graph to be reconstructible from this information, and describe an algorithm to perform the reconstruction. Subject to our conditions, the result applies to directed graphs with asymmetric edge weights, and accommodates paths arising from asymmetric routing in the underlying communication network. We also describe the relationship between the graph produced by our algorithm and the true graph in the case that our conditions are not satisfied. 
Path Modeling Segmentation Tree (PATHMOX) 
One of the main issues within path modeling techniques, especially in business and marketing applications, concerns the identification of different segments in the model population. The approach proposed by the authors consists of building a path models tree having a decision treelike structure by means of the PATHMOX (Path Modeling Segmentation Tree) algorithm. This algorithm is specifically designed when prior information in form of external variables (such as sociodemographic variables) is available. Inner models are compared using an extension for testing the equality of two regression models; and outer models are compared by means of a RyanJoiner correlation test. genpathmox 
PathNet  For artificial general intelligence (AGI) it would be efficient if multiple users trained the same giant neural network, permitting parameter reuse, without catastrophic forgetting. PathNet is a first step in this direction. It is a neural network algorithm that uses agents embedded in the neural network whose task is to discover which parts of the network to reuse for new tasks. Agents are pathways (views) through the network which determine the subset of parameters that are used and updated by the forwards and backwards passes of the backpropogation algorithm. During learning, a tournament selection genetic algorithm is used to select pathways through the neural network for replication and mutation. Pathway fitness is the performance of that pathway measured according to a cost function. We demonstrate successful transfer learning; fixing the parameters along a path learned on task A and reevolving a new population of paths for task B, allows task B to be learned faster than it could be learned from scratch or after finetuning. Paths evolved on task B reuse parts of the optimal path evolved on task A. Positive transfer was demonstrated for binary MNIST, CIFAR, and SVHN supervised learning classification tasks, and a set of Atari and Labyrinth reinforcement learning tasks, suggesting PathNets have general applicability for neural network training. Finally, PathNet also significantly improves the robustness to hyperparameter choices of a parallel asynchronous reinforcement learning algorithm (A3C). 
Pathway Induced Multiple Kernel Learning (PIMKL) 
Reliable identification of molecular biomarkers is essential for accurate patient stratification. While stateoftheart machine learning approaches for sample classification continue to push boundaries in terms of performance, most of these methods are not able to integrate different data types and lack generalization power limiting their application in a clinical setting. Furthermore, many methods behave as black boxes, therefore we have very little understanding about the mechanisms that lead to the prediction provided. While opaqueness concerning machine behaviour might not be a problem in deterministic domains, in health care, providing explanations about the molecular factors and phenotypes that are driving the classification is crucial to build trust in the performance of the predictive system. We propose Pathway Induced Multiple Kernel Learning (PIMKL), a novel methodology to classify samples reliably that can, at the same time, provide a pathwaybased molecular fingerprint of the signature that underlies the classification. PIMKL exploits prior knowledge in the form of molecular interaction networks and annotated gene sets, by optimizing a mixture of pathwayinduced kernels using a Multiple Kernel Learning algorithm (MKL), an approach that has demonstrated excellent performance in different machine learning applications. After optimizing the combination of kernels for prediction of a specific phenotype, the model provides a stable molecular signature that can be interpreted in the light of the ingested prior knowledge and that can be used in transfer learning tasks. 
Pathwise Calibrated Sparse Shooting Algorithm (PICASSO) 
A family of efficient algorithms, called PathwIse CalibrAted Sparse Shooting AlgOrithm, for a variety of sparse learning problems, including Sparse Linear Regression, Sparse Logistic Regression, Sparse Column Inverse Operator and Sparse Multivariate Regression. Different types of active set identification schemes are implemented, such as cyclic search, greedy search, stochastic search and proximal gradient search. Besides, the package provides the choices between convex (L1 norm) and nonconvex (MCP and SCAD) regularizations. Moreover, group regularization for Sparse Linear Regression and Sparse Logistic Regression are also implemented. picasso 
Patient Rule Induction Method (PRIM) 
PRIM (Patient Rule Induction Method) is a data mining technique introduced by Friedman and Fisher (1999). Its objective is to find subregions in the input space with relatively high (low) values for the target variable. By construction, PRIM directly targets these regions rather than indirectly through the estimation of a regression function. The method is such that these subregions can be described by simple rules, as the subregions are (unions of) rectangles in the input space. There are many practical problems where finding such rectangular subregions with relatively high (low) values of the target variable is of considerable interest. Often these are problems where a decision maker wants to choose the values or ranges of the input variables so as to optimize the value of the target variable. Such types of applications can be found in the fields of medical research, financial risk analysis, and social sciences, and PRIM has been applied to these fields. While PRIM enjoys some popularity, and even several modifications have been proposed (see Becker and Fahrmeier, 2001, Cole, Galic and Zack, 2003, Leblanc et al, 2003, Nannings et al. (2008), Wu and Chipman, 2003, and Wang et al, 2004), there is according to our knowledge no thorough study of its basic statistical properties. PRIMsrc,prim 
Pattern Classification  The usage of patterns in datasets to discriminate between classes, i.e., to assign a class label to a new observation based on inference. 
Pattern Sequence based Forecasting (PSF) 
This paper discusses about PSF, an R package for Pattern Sequence based Forecasting (PSF) algorithm used for univariate time series future prediction. The PSF algorithm consists of two major parts: clustering and prediction techniques. Clustering part includes selection of cluster size and then labeling of time series data with reference to various clusters. Whereas, the prediction part include functions like optimum window size selection for specific patterns and prediction of future values with reference to past pattern sequences. The PSF package consists of various functions to implement PSF algorithm. It also contains a function, which automates all other functions to obtain optimum prediction results. The aim of this package is to promote PSF algorithm and to ease its implementation with minimum efforts. This paper describe all the functions in PSF package with their syntax and simple examples. Finally, the usefulness of this package is discussed by comparing it with auto.arima, a well known time series forecasting function available on CRAN repository. PSF 
Pattern Theory  Pattern theory, formulated by Ulf Grenander, is a mathematical formalism to describe knowledge of the world as patterns. It differs from other approaches to artificial intelligence in that it does not begin by prescribing algorithms and machinery to recognize and classify patterns; rather, it prescribes a vocabulary to articulate and recast the pattern concepts in precise language. In addition to the new algebraic vocabulary, its statistical approach was novel in its aim to: · Identify the hidden variables of a data set using real world data rather than artificial stimuli, which was commonplace at the time. · Formulate prior distributions for hidden variables and models for the observed variables that form the vertices of a Gibbslike graph. · Study the randomness and variability of these graphs. · Create the basic classes of stochastic models applied by listing the deformations of the patterns. · Synthesize (sample) from the models, not just analyze signals with it. Broad in its mathematical coverage, Pattern Theory spans algebra and statistics, as well as local topological and global entropic properties. 
PatternNet  Visual patterns represent the discernible regularity in the visual world. They capture the essential nature of visual objects or scenes. Understanding and modeling visual patterns is a fundamental problem in visual recognition that has wide ranging applications. In this paper, we study the problem of visual pattern mining and propose a novel deep neural network architecture called PatternNet for discovering these patterns that are both discriminative and representative. The proposed PatternNet leverages the filters in the last convolution layer of a convolutional neural network to find locally consistent visual patches, and by combining these filters we can effectively discover unique visual patterns. In addition, PatternNet can discover visual patterns efficiently without performing expensive image patch sampling, and this advantage provides an order of magnitude speedup compared to most other approaches. We evaluate the proposed PatternNet subjectively by showing randomly selected visual patterns which are discovered by our method and quantitatively by performing image classification with the identified visual patterns and comparing our performance with the current stateoftheart. We also directly evaluate the quality of the discovered visual patterns by leveraging the identified patterns as proposed objects in an image and compare with other relevant methods. Our proposed network and procedure, PatterNet, is able to outperform competing methods for the tasks described. 
Paxos Algorithm  A faulttolerant file system called Echo was built at SRC in the late 80s. The builders claimed that it would maintain consistency despite any number of nonByzantine faults, and would make progress if any majority of the processors were working. As with most such systems, it was quite simple when nothing went wrong, but had a complicated algorithm for handling failures based on taking care of all the cases that the implementers could think of. I decided that what they were trying to do was impossible, and set out to prove it. Instead, I discovered the Paxos algorithm, described in this paper. At the heart of the algorithm is a threephase consensus protocol. Dale Skeen seems to have been the first to have recognized the need for a threephase protocol to avoid blocking in the presence of an arbitrary single failure. However, to my knowledge, Paxos contains the first threephase commit algorithm that is a real algorithm, with a clearly stated correctness condition and a proof of correctness. 
PC Algorithm  An algorithm that has the same input/output relations as the SGS procedure for faithful distributions but which for sparse graphs does not require the testing of higher order independence relations in the discrete case, and in any case requires testing as few dseparation relations as possible. The procedure starts by forming the complete undirected graph, then ‘thins’ that graph by removing edges with zero order conditional independence relations, thins again with first order conditional independence relations, and so on. The set of variables conditioned on need only be a subset of the set of variables adjacent to one or the other of the variables conditioned. http://…/41_paper.pdf CausalFX 
Pedometrics  Pedometrics is a branch of soil science that applies mathematical and statistical methods for the study of the distribution and genesis of soils. The goal of pedometrics is to achieve a better understanding of the soil as a phenomenon that varies over different scales in space and time. This understanding is important, both for improved soil management and for our scientific appreciation of the soil and the systems (agronomic, ecological and hydrological) of which it is a part. For this reason much of pedometrics is concerned with predicting the properties of the soil in space and time, with sampling and monitoring the soil and with modelling the soil’s behaviour. Pedometricians are typically engaged in developing and applying quantitative methods to apply to these problems. These include geostatistical methods for spatial prediction, sampling designs and strategies, linear modelling methods and novel mathematical and computational techniques such as wavelet transforms, data mining and fuzzy logic. 
PeerNet  Deep learning systems have become ubiquitous in many aspects of our lives. Unfortunately, it has been shown that such systems are vulnerable to adversarial attacks, making them prone to potential unlawful uses. Designing deep neural networks that are robust to adversarial attacks is a fundamental step in making such systems safer and deployable in a broader variety of applications (e.g. autonomous driving), but more importantly is a necessary step to design novel and more advanced architectures built on new computational paradigms rather than marginally building on the existing ones. In this paper we introduce PeerNets, a novel family of convolutional networks alternating classical Euclidean convolutions with graph convolutions to harness information from a graph of peer samples. This results in a form of nonlocal forward propagation in the model, where latent features are conditioned on the global structure induced by the graph, that is up to 3 times more robust to a variety of white and blackbox adversarial attacks compared to conventional architectures with almost no drop in accuracy. 
Penalized Adaptive Weighted Least Squares Regression (PWLS) 
To conduct regression analysis for data contaminated with outliers, many approaches have been proposed for simultaneous outlier detection and robust regression, so is the approach proposed in this manuscript. This new approach is called ‘penalized weighted least squares’ (PWLS). By assigning each observation an individual weight and incorporating a lassotype penalty on the logtransformation of the weight vector, the PWLS is able to perform outlier detection and robust regression simultaneously. A Bayesian pointofview of the PWLS is provided, and it is showed that the PWLS can be seen as an example of Mestimation. Two methods are developed for selecting the tuning parameter in the PWLS. The performance of the PWLS is demonstrated via simulations and real applications. pawls 
Penalized Maximum Tangent Likelihood Estimation  We introduce a new class of mean regression estimators — penalized maximum tangent likelihood estimation — for highdimensional regression estimation and variable selection. We first explain the motivations for the key ingredient, maximum tangent likelihood estimation (MTE), and establish its asymptotic properties. We further propose a penalized MTE for variable selection and show that it is $\sqrt{n}$consistent, enjoys the oracle property. The proposed class of estimators consists penalized $\ell_2$ distance, penalized exponential squared loss, penalized least trimmed square and penalized least square as special cases and can be regarded as a mixture of minimum KullbackLeibler distance estimation and minimum $\ell_2$ distance estimation. Furthermore, we consider the proposed class of estimators under the highdimensional setting when the number of variables $d$ can grow exponentially with the sample size $n$, and show that the entire class of estimators (including the aforementioned special cases) can achieve the optimal rate of convergence in the order of $\sqrt{\ln(d)/n}$. Finally, simulation studies and real data analysis demonstrate the advantages of the penalized MTE. 
Penalized Splines of Propensity Prediction (PSPP) 
Little and An (2004, Statistica Sinica 14, 949968) proposed a penalized spline of propensity prediction (PSPP) method of imputation of missing values that yields robust modelbased inference under the missing at random assumption. The propensity score for a missing variable is estimated and a regression model is fitted that includes the spline of the estimated logit propensity score as a covariate. The predicted unconditional mean of the missing variable has a double robustness (DR) property under misspecification of the imputation model. We show that a simplified version of PSPP, which does not center other regressors prior to including them in the prediction model, also has the DR property. We also propose two extensions of PSPP, namely, stratified PSPP and bivariate PSPP, that extend the DR property to inferences about conditional means. These extended PSPP methods are compared with the PSPP method and simple alternatives in a simulation study and applied to an online weight loss study conducted by Kaiser Permanente.. ‘Robustsquared’ Imputation Models Using BART 
PenaYohai Initial Estimator  Pena, D., & Yohai, V. (1999) <doi:10.2307/2670164>. pyinit 
PEORL  Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework {\em PEORL} that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decisionmaking in a dynamic environment with uncertainties. Symbolic plans are used to guide the agent’s task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL. 
Percentages of Maximum Deviation from Independence (PEM) 
GDAtools 
Perception  Perception (from the Latin perceptio, percipio) is the organization, identification, and interpretation of sensory information in order to represent and understand the environment. 
Perceptron Ranking Using Interval Labeled Data (PRIL) 
In this paper, we propose an online learning algorithm PRIL for learning ranking classifiers using interval labeled data and show its correctness. We show its convergence in finite number of steps if there exists an ideal classifier such that the rank given by it for an example always lies in its label interval. We then generalize this mistake bound result for the general case. We also provide regret bound for the proposed algorithm. We propose a multiplicative update algorithm for PRIL called MPRIL. We provide its correctness and convergence results. We show the effectiveness of PRIL by showing its performance on various datasets. 
Perfect Privacy  The problem of private data disclosure is studied from an information theoretic perspective. Considering a pair of correlated random variables $(X,Y)$, where $Y$ denotes the observed data while $X$ denotes the private latent variables, the following problem is addressed: What is the maximum information that can be revealed about $Y$, while disclosing no information about $X$? Assuming that a Markov kernel maps $Y$ to the revealed information $U$, it is shown that the maximum mutual information between $Y$ and $U$, i.e., $I(Y;U)$, can be obtained as the solution of a standard linear program, when $X$ and $U$ are required to be independent, called \textit{perfect privacy}. This solution is shown to be greater than or equal to the \textit{nonprivate information about $X$ carried by $Y$.} Maximal information disclosure under perfect privacy is is shown to be the solution of a linear program also when the utility is measured by the reduction in the mean square error, $\mathbb{E}[(YU)^2]$, or the probability of error, $\mbox{Pr}$. For jointly Gaussian $(X,Y)$, it is shown that perfect privacy is not possible if the kernel is applied to only $Y$; whereas perfect privacy can be achieved if the mapping is from both $X$ and $Y$; that is, if the private latent variables can also be observed at the encoder. Next, measuring the utility and privacy by $I(Y;U)$ and $I(X;U)$, respectively, the slope of the optimal utilityprivacy tradeoff curve is studied when $I(X;U)=0$. Finally, through a similar but independent analysis, an alternative characterization of the maximal correlation between two random variables is provided. 
Performance Analytics Decision Support Framework (PADS) 
The PADS (Performance Analytics Decision Support) Framework represents a more strategic approach to linking nextgeneration performance management and big data analytics technologies. The twin missions of the PADS Framework are to: 1. facilitate communication and collaboration among IT and business teams to proactively anticipate, identify and resolve application performance problems by focusing on user experience across the entire application delivery chain; and, 2. enable IT to orchestrate and manage internally and externally sourced services efficiently to improve decisionmaking and business outcomes. The PADS Framework can help companies ensure employee engagement and increase customer satisfaction and loyalty to drive higher operating results and market valuation. 
Performance Envelope  One way to speed up the algorithm configuration task is to use short runs instead of long runs as much as possible, but without discarding the configurations that eventually do well on the long runs. We consider the problem of selecting the top performing configurations of the Conditional Markov Chain Search (CMCS), a general algorithm schema that includes, for examples, VNS. We investigate how the structure of performance on short tests links with those on long tests, showing that significant differences arise between test domains. We propose a ‘performance envelope’ method to exploit the links; that learns when runs should be terminated, but that automatically adapts to the domain. 
Permutation Distribution Clustering  pdc 
Permutation Tests  A permutation test (also called a randomization test, rerandomization test, or an exact test) is a type of statistical significance test in which the distribution of the test statistic under the null hypothesis is obtained by calculating all possible values of the test statistic under rearrangements of the labels on the observed data points. In other words, the method by which treatments are allocated to subjects in an experimental design is mirrored in the analysis of that design. If the labels are exchangeable under the null hypothesis, then the resulting tests yield exact significance levels; see also exchangeability. Confidence intervals can then be derived from the tests. 
Perpetual Learning Machine  Despite the promise of braininspired machine learning, deep neural networks (DNN) have frustratingly failed to bridge the deceptively large gap between learning and memory. Here, we introduce a Perpetual Learning Machine; a new type of DNN that is capable of brainlike dynamic ‘on the fly’ learning because it exists in a selfsupervised state of Perpetual Stochastic Gradient Descent. Thus, we provide the means to unify learning and memory within a machine learning framework. 
Perplexity  In information theory, perplexity is a measurement of how well a probability distribution or probability model predicts a sample. It may be used to compare probability models. 
Persistence Diagrams  Persistence diagrams have been widely recognized as a compact descriptor for characterizing multiscale topological features in data. When many datasets are available, statistical features embedded in those persistence diagrams can be extracted by applying machine learnings. In particular, the ability for explicitly analyzing the inverse in the original data space from those statistical features of persistence diagrams is significantly important for practical applications. 
Personalized Attention Network (PANet) 
Human visual attention is subjective and biased according to the personal preference of the viewer, however, current works of saliency detection are general and objective, without counting the factor of the observer. This will make the attention prediction for a particular person not accurate enough. In this work, we present the novel idea of personalized attention prediction and develop Personalized Attention Network (PANet), a convolutional network that predicts saliency in images with personal preference. The model consists of two streams which share common feature extraction layers, and one stream is responsible for saliency prediction, while the other is adapted from the detection model and used to fit user preference. We automatically collect user preference from their albums and leaves them freedom to define what and how many categories their preferences are divided into. To train PANet, we dynamically generate ground truth saliency maps upon existing detection labels and saliency labels, and the generation parameters are based upon our collected datasets consists of 1k images. We evaluate the model with saliency prediction metrics and test the trained model on different preference vectors. The results have shown that our system is much better than general models in personalized saliency prediction and is efficient to use for different preferences. 
Personally Identifiable Information (PII) 
Personally identifiable information (PII), or Sensitive Personal Information (SPI), as used in US privacy law and information security, is information that can be used on its own or with other information to identify, contact, or locate a single person, or to identify an individual in context. The abbreviation PII is widely accepted in the US context, but the phrase it abbreviates has four common variants based on personal / personally, and identifiable / identifying. Not all are equivalent, and for legal purposes the effective definitions vary depending on the jurisdiction and the purposes for which the term is being used. (In other countries with privacy protection laws derived from the OECD privacy principles, the term used is more often ‘personal information’, which may be somewhat broader: in Australia’s Privacy Act 1988 (Cth) ‘personal information’ also includes information from which the person’s identity is ‘reasonably ascertainable’, potentially covering some information not covered by PII.) NIST Special Publication 800122 defines PII as ‘any information about an individual maintained by an agency, including (1) any information that can be used to distinguish or trace an individual‘s identity, such as name, social security number, date and place of birth, mother‘s maiden name, or biometric records; and (2) any other information that is linked or linkable to an individual, such as medical, educational, financial, and employment information.’ So, for example, a user’s IP address as used in a communication exchange is classed as PII regardless of whether it may or may not on its own be able to uniquely identify a person. Although the concept of PII is old, it has become much more important as information technology and the Internet have made it easier to collect PII through breaches of Internet security, network security and web browser security, leading to a profitable market in collecting and reselling PII. PII can also be exploited by criminals to stalk or steal the identity of a person, or to aid in the planning of criminal acts. As a response to these threats, many website privacy policies specifically address the gathering of PII, and lawmakers have enacted a series of legislations to limit the distribution and accessibility of PII. However, PII is a legal concept, not a technical concept. Because of the versatility and power of modern reidentification algorithms, the absence of PII data does not mean that the remaining data does not identify individuals. While some attributes may be uniquely identifying on their own, any attribute can be identifying in combination with others. These attributes have been referred to as quasiidentifiers or pseudoidentifiers. anonymizer,generator,detector 
Perturbative Neural Network (PNN) 
Convolutional neural networks are witnessing wide adoption in computer vision systems with numerous applications across a range of visual recognition tasks. Much of this progress is fueled through advances in convolutional neural network architectures and learning algorithms even as the basic premise of a convolutional layer has remained unchanged. In this paper, we seek to revisit the convolutional layer that has been the workhorse of stateoftheart visual recognition models. We introduce a very simple, yet effective, module called a perturbation layer as an alternative to a convolutional layer. The perturbation layer does away with convolution in the traditional sense and instead computes its response as a weighted linear combination of nonlinearly activated additive noise perturbed inputs. We demonstrate both analytically and empirically that this perturbation layer can be an effective replacement for a standard convolutional layer. Empirically, deep neural networks with perturbation layers, called Perturbative Neural Networks (PNNs), in lieu of convolutional layers perform comparably with standard CNNs on a range of visual datasets (MNIST, CIFAR10, PASCAL VOC, and ImageNet) with fewer parameters. 
Pervasive Analytics  During eras of global economic shifts, there was always a key resource discovered that became the spark of transformation for groups of individuals that could effectively harness it. Today, that resource is data. In no uncertain terms, we are witnessing a global data rush and leading companies realize that data will grow enterprise over the next several decades as much as any capital asset. These forwardlooking companies realize that to be successful, enterprises must leverage analytics in order to create a more predictable and valuable organization. In some cases they must package data in a way that adds value and informs employees, or their customers, by deploying analytics into decisions making processes everywhere. This idea is referred to as pervasive analytics. But to drive a pervasive analytics strategy and win the data rush, successful companies also recognize the need to transform the way they think about data management and processes in order to unlock the true value of data. 
Pervasive Computing  The idea that technology is moving beyond the personal computer to everyday devices with embedded technology and connectivity as computing devices become progressively smaller and more powerful. Also called ubiquitous computing, pervasive computing is the result of computer technology advancing at exponential speeds — a trend toward all manmade and some natural products having hardware and software. Pervasive computing goes beyond the realm of personal computers: it is the idea that almost any device, from clothing to tools to appliances to cars to homes to the human body to your coffee mug, can be imbedded with chips to connect the device to an infinite network of other devices. The goal of pervasive computing, which combines current network technologies with wireless computing, voice recognition, Internet capability and artificial intelligence, is to create an environment where the connectivity of devices is embedded in such a way that the connectivity is unobtrusive and always available. 
Phaedra  Phaedra is an open source platform for data capture and analysis of highcontent screening data. It offers functionality to · import image data from any source · assess your data with industry’s richest toolbox · improve data quality using intelligent validation methods · use builtin statistics and machine learning workflows · generate QC and analysis reports using templates 
Phased LSTM  Recurrent Neural Networks (RNNs) have become the stateoftheart choice for extracting patterns from temporal sequences. However, current RNN models are illsuited to process irregularly sampled data triggered by events generated in continuous time by sensors or other neurons. Such data can occur, for example, when the input comes from novel eventdriven artificial sensors that generate sparse, asynchronous streams of events or from multiple conventional sensors with different update intervals. In this work, we introduce the Phased LSTM model, which extends the LSTM unit by adding a new time gate. This gate is controlled by a parametrized oscillation with a frequency range that produces updates of the memory cell only during a small percentage of the cycle. Even with the sparse updates imposed by the oscillation, the Phased LSTM network achieves faster convergence than regular LSTMs on tasks which require learning of long sequences. The model naturally integrates inputs from sensors of arbitrary sampling rates, thereby opening new areas of investigation for processing asynchronous sensory events that carry timing information. It also greatly improves the performance of LSTMs in standard RNN applications, and does so with an orderofmagnitude fewer computes at runtime. 
PhaseLin  Phase retrieval deals with the recovery of complex or realvalued signals from magnitude measurements. As shown recently, the method PhaseMax enables phase retrieval via convex optimization and without lifting the problem to a higher dimension. To succeed, PhaseMax requires an initial guess of the solution, which can be calculated via spectral initializers. In this paper, we show that with the availability of an initial guess, phase retrieval can be carried out with an ever simpler, linear procedure. Our algorithm, called PhaseLin, is the linear estimator that minimizes the mean squared error (MSE) when applied to the magnitude measurements. The linear nature of PhaseLin enables an exact and nonasymptotic MSE analysis for arbitrary measurement matrices. We furthermore demonstrate that by iteratively using PhaseLin, one arrives at an efficient phase retrieval algorithm that performs on par with existing convex and nonconvex methods on synthetic and realworld data. 
PHOENICS  In this work we introduce PHOENICS, a probabilistic global optimization algorithm combining ideas from Bayesian optimization with concepts from Bayesian kernel density estimation. We propose an inexpensive acquisition function balancing the explorative and exploitative behavior of the algorithm. This acquisition function enables intuitive sampling strategies for an efficient parallel search of global minima. The performance of PHOENICS is assessed via an exhaustive benchmark study on a set of 15 discrete, quasidiscrete and continuous multidimensional functions. Unlike optimization methods based on Gaussian processes (GP) and random forests (RF), we show that PHOENICS is less sensitive to the nature of the codomain, and outperforms GP and RF optimizations. We illustrate the performance of PHOENICS on the Oregonator, a difficult casestudy describing a complex chemical reaction network. We demonstrate that only PHOENICS was able to reproduce qualitatively and quantitatively the target dynamic behavior of this nonlinear reaction dynamics. We recommend PHOENICS for rapid optimization of scalar, possibly nonconvex, blackbox unknown objective functions. 
Picasso  Picasso is a free opensource (Eclipse Public License) web application written in Python for rendering standard visualizations useful for training convolutional neural networks. Picasso ships with occlusion maps and saliency maps, two visualizations which help reveal issues that evaluation metrics like loss and accuracy might hide: for example, learning a proxy classification task. Picasso works with the Keras and Tensorflow deep learning frameworks. Picasso can be used with minimal configuration by deep learning researchers and engineers alike across various neural network architectures. Adding new visualizations is simple: the user can specify their visualization code and HTML template separately from the application code. 
Piecewise Linear (PWL) 
In this paper, we study the representational power of deep neural networks (DNN) that belong to the family of piecewiselinear (PWL) functions, based on PWL activation units such as rectifier or maxout. We investigate the complexity of such networks by studying the number of linear regions of the PWL function. Typically, a PWL function from a DNN can be seen as a large family of linear functions acting on millions of such regions. We directly build upon the work of Montufar et al. (2014) and Raghu et al. (2017) by refining the upper and lower bounds on the number of linear regions for rectified and maxout networks. In addition to achieving tighter bounds, we also develop a novel method to perform exact enumeration or counting of the number of linear regions with a mixedinteger linear formulation that maps the input space to output. We use this new capability to visualize how the number of linear regions change while training DNNs. 
PiecewiseDeterministic Markov Processes (PDMP) 
In probability theory, a piecewisedeterministic Markov process (PDMP) is a process whose behaviour is governed by random jumps at points in time, but whose evolution is deterministically governed by an ordinary differential equation between those times. The class of models is “wide enough to include as special cases virtually all the nondiffusion models of applied probability.” The process is defined by three quantities: the flow, the jump rate, and the transition measure. The model was first introduced in a paper by Mark H. A. Davis in 1984. EstSimPDMP 
Pierre’s Correlogram  Rcriticor 
Pigeonring  The pigeonhole principle states that if n items are contained in m boxes, then at least one box has no fewer than n/m items. It is utilized to solve many data management problems, especially for thresholded similarity searches. Despite many pigeonhole principlebased solutions proposed in the last few decades, the condition stated by the principle is weak. It only constrains the number of items in a single box. By organizing the boxes in a ring, we observe that the number of items in multiple boxes are also constrained. We propose a new principle called the pigeonring principle which formally captures such constraints and yields stronger conditions. To utilize the pigeonring principle, we focus on problems defined in the form of identifying data objects whose similarities or distances to the query is constrained by a threshold. Many solutions to these problems utilize the pigeonhole principle to find candidates that satisfy a filtering condition. By the pigeonring principle, stronger filtering conditions can be established. We show that the pigeonhole principle is a special case of the pigeonring principle. This suggests that all the solutions based on the pigeonhole principle are possible to be accelerated by the pigeonring principle. A universal filtering framework is introduced to encompass the solutions to these problems based on the pigeonring principle. Besides, we discuss how to quickly find candidates specified by the pigeonring principle with minor modifications on top of existing algorithms. Experimental results on real datasets demonstrate the applicability of the pigeonring principle as well as the superior performance of the algorithms based on the principle. 
PilotStreaming  An increasing number of scientific applications rely on stream processing for generating timely insights from data feeds of scientific instruments, simulations, and InternetofThing (IoT) sensors. The development of streaming applications is a complex task and requires the integration of heterogeneous, distributed infrastructure, frameworks, middleware and application components. Different application components are often written in different languages using different abstractions and frameworks. Often, additional components, such as a message broker (e.g. Kafka), are required to decouple data production and consumptions and avoiding issues, such as backpressure. Streaming applications may be extremely dynamic due to factors, such as variable data rates caused by the data source, adaptive sampling techniques or network congestions, variable processing loads caused by usage of different machine learning algorithms. As a result applicationlevel resource management that can respond to changes in one of these factors is critical. We propose PilotStreaming, a framework for supporting streaming frameworks, applications and their resource management needs on HPC infrastructure. PilotStreaming is based on the PilotJob concept and enables developers to manage distributed computing and data resources for complex streaming applications. It enables applications to dynamically respond to resource requirements by adding/removing resources at runtime. This capability is critical for balancing complex streaming pipelines. To address the complexity in developing and characterization of streaming applications, we present the Streaming Mini App framework, which supports different plugable algorithms for data generation and processing, e.g., for reconstructing light source images using different techniques. We utilize the MiniApp framework to conduct an evaluation of PilotStreaming. 
PingAn  Geodistributed data analysis in a cloudedge system is emerging as a daily demand. Out of saving time in wide area data transfer, some tasks are dispersed to the edge clusters satisfied data locality. However, execution in the edge clusters is less well, due to limited resource, overload interference and clusterlevel unreachable troubles, which obstructs the guarantee on the speed and completion of jobs. Synthesizing the impact of cluster heterogeneity and costly intercluster data fetch, we expect to make effective copies across clusters for tasks to provide both success and efficiency of the arriving jobs. To this end, we design PingAn, an online insurance algorithm making redundance acrosscluster copies for tasks, promising $(1+\varepsilon)speed \, o(\frac{1}{\varepsilon^2+\varepsilon})competitive$ in sum of the job flowtimes. PingAn shares resource among a part of jobs with an adjustable $\varepsilon$ fraction to fit the system load condition and insures for tasks following efficiencyfirst reliabilityaware principle to optimize the effect of copies on jobs’ performance. Tracedriven simulations demonstrate that PingAn can reduce the average job flowtimes by at least $14\%$ more than the stateoftheart speculation mechanisms. We also build PingAn in Spark on Yarn System to verify its practicality and generality. Experiments show that PingAn can reduce the average job completion time by up to $40\%$ comparing to the default Spark execution. 
PinSage  Recent advancements in deep neural networks for graphstructured data have led to stateoftheart performance on recommender system benchmarks. However, making these methods practical and scalable to webscale recommendation tasks with billions of items and hundreds of millions of users remains a challenge. Here we describe a largescale deep recommendation engine that we developed and deployed at Pinterest. We develop a dataefficient Graph Convolutional Network (GCN) algorithm PinSage, which combines efficient random walks and graph convolutions to generate embeddings of nodes (i.e., items) that incorporate both graph structure as well as node feature information. Compared to prior GCN approaches, we develop a novel method based on highly efficient random walks to structure the convolutions and design a novel training strategy that relies on harderandharder training examples to improve robustness and convergence of the model. We also develop an efficient MapReduce model inference algorithm to generate embeddings using a trained model. We deploy PinSage at Pinterest and train it on 7.5 billion examples on a graph with 3 billion nodes representing pins and boards, and 18 billion edges. According to offline metrics, user studies and A/B tests, PinSage generates higherquality recommendations than comparable deep learning and graphbased alternatives. To our knowledge, this is the largest application of deep graph embeddings to date and paves the way for a new generation of webscale recommender systems based on graph convolutional architectures. 
Pipeline Pilot Bayesian Classifiers (PNPBC) 
The commercial product “Pipeline Pilot” uses a Naive Bayes statistics based approach, which essentially contrasts the active samples of a target with the whole (background) compound database. It does not explicitly consider the samples labelled as incative. Laplacianadjusted probability estimates for the features lead to individual feature weights which are finally summed up to give the prediction. We reimplemented the “Pipeline Pilot” Naive Bayes statistics in order to use it on a multicore supercomputer, which allowed us to compare this method on our benchmark dataset. 
PixelSNAIL  Autoregressive generative models consistently achieve the best results in density estimation tasks involving high dimensional data, such as images or audio. They pose density estimation as a sequence modeling task, where a recurrent neural network (RNN) models the conditional distribution over the next element conditioned on all previous elements. In this paradigm, the bottleneck is the extent to which the RNN can model longrange dependencies, and the most successful approaches rely on causal convolutions, which offer better access to earlier parts of the sequence than conventional RNNs. Taking inspiration from recent work in meta reinforcement learning, where dealing with longrange dependencies is also essential, we introduce a new generative model architecture that combines causal convolutions with self attention. In this note, we describe the resulting model and present stateoftheart loglikelihood results on CIFAR10 (2.85 bits per dim) and $32 \times 32$ ImageNet (3.80 bits per dim). Our implementation is available at https://…/pixelsnailpublic. 
PlackettLuce Model  PlackettLuce model is based on the concept of permutation probability. This model has been extended from Bradley Terry model, where the permutation between two objects for pairwise comparison are applied. Plackettluce model extends the Bradley Terry in comparing multiple objects at a time by permutation probability of a list of objects to be ranked. The key idea is that for the best ranked list of objects, the permutation probability is maximum, decreases with worse ranked list and is minimum at the worst ranked list of objects. PlackettLuce 
Plasticity  Plasticity is the ability of a learning algorithm to adapt to new data. 
Platt Scaling  In machine learning, Platt scaling or Platt calibration is a way of transforming the outputs of a classification model into a probability distribution over classes. The method was invented by John Platt in the context of support vector machines, replacing an earlier method by Vapnik, but can be applied to other classification models. Platt scaling works by fitting a logistic regression model to a classifier’s scores. 
Plug and Play Generative Networks (PPGN) 
Generating highresolution, photorealistic images has been a longstanding goal in machine learning. Recently, Nguyen et al. (2016) showed one interesting way to synthesize novel images by performing gradient ascent in the latent space of a generator network to maximize the activations of one or multiple neurons in a separate classifier network. In this paper we extend this method by introducing an additional prior on the latent code, improving both sample quality and sample diversity, leading to a stateoftheart generative model that produces high quality images at higher resolutions (227×227) than previous generative models, and does so for all 1000 ImageNet categories. In addition, we provide a unified probabilistic interpretation of related activation maximization methods and call the general class of models ‘Plug and Play Generative Networks’. PPGNs are composed of 1) a generator network G that is capable of drawing a wide range of image types and 2) a replaceable ‘condition’ network C that tells the generator what to draw. We demonstrate the generation of images conditioned on a class (when C is an ImageNet or MIT Places classification network) and also conditioned on a caption (when C is an image captioning network). Our method also improves the state of the art of Multifaceted Feature Visualization, which generates the set of synthetic inputs that activate a neuron in order to better understand how deep neural networks operate. Finally, we show that our model performs reasonably well at the task of image inpainting. While image models are used in this paper, the approach is modalityagnostic and can be applied to many types of data. 
Plus L take away R (+LR) 
The “Plus L take away R” (+L R) is basically a combination of SFS and SBS. It append features to the feature subset Ltimes, and afterwards it removes features Rtimes until we reach our desired size for the feature subset. Variant 1: L > R If L > R, the algorithm starts with an empty feature subset and adds L features to it from the feature space. Then it goes over to the next step 2, where it removes R features from the feature subset, after which it goes back to step 1 to add L features again. Those steps are repeated until the feature subset reaches the desired size k. Variant 2: R > L Else, if R > L, the algorithms starts with the whole feature space* as feature subset. It remove sR features from it before it adds back L features from those features that were just removed. Those steps are repeated until the feature subset reaches the desired size k*. 
Point and Figure Chart (P&F) 
Point and figure (P&F) is a charting technique used in technical analysis. Point and figure charting is unique in that it does not plot price against time as all other techniques do. Instead it plots price against changes in direction by plotting a column of Xs as the price rises and a column of Os as the price falls. rpnf 
Point Convolutional Neural Network (PCNN) 
This paper presents Point Convolutional Neural Networks (PCNN): a novel framework for applying convolutional neural networks to point clouds. The framework consists of two operators: extension and restriction, mapping point cloud functions to volumetric functions and viseversa. A point cloud convolution is defined by pullback of the Euclidean volumetric convolution via an extensionrestriction mechanism. The point cloud convolution is computationally efficient, invariant to the order of points in the point cloud, robust to different samplings and varying densities, and translation invariant, that is the same convolution kernel is used at all points. PCNN generalizes image CNNs and allows readily adapting their architectures to the point cloud setting. Evaluation of PCNN on three central point cloud learning benchmarks convincingly outperform competing point cloud learning methods, and the vast majority of methods working with more informative shape representations such as surfaces and/or normals. 
Point Linking Network (PLN) 
Object detection is a core problem in computer vision. With the development of deep ConvNets, the performance of object detectors has been dramatically improved. The deep ConvNets based object detectors mainly focus on regressing the coordinates of bounding box, \eg, FasterRCNN, YOLO and SSD. Different from these methods that considering bounding box as a whole, we propose a novel object bounding box representation using points and links and implemented using deep ConvNets, termed as Point Linking Network (PLN). Specifically, we regress the corner/center points of boundingbox and their links using a fully convolutional network; then we map the corner points and their links back to multiple bounding boxes; finally an object detection result is obtained by fusing the multiple bounding boxes. PLN is naturally robust to object occlusion and flexible to object scale variation and aspect ratio variation. In the experiments, PLN with the Inceptionv2 model achieves stateoftheart singlemodel and singlescale results on the PASCAL VOC 2007, the PASCAL VOC 2012 and the COCO detection benchmarks without bells and whistles. The source code will be released. 
Point Pattern Analysis (PPA) 
Point pattern analysis (PPA) is the study of the spatial arrangements of points in (usually 2dimensional) space. A fundamental problem of PPA is inferring whether a given arrangement is merely random or the result of some process. selectspm,stpp 
Point Process  In statistics and probability theory, a point process is a type of random process for which any one realisation consists of a set of isolated points either in time or geographical space, or in even more general spaces. For example, the occurrence of lightning strikes might be considered as a point process in both time and geographical space if each is recorded according to its location in time and space. Point processes are well studied objects in probability theory and the subject of powerful tools in statistics for modeling and analyzing spatial data, which is of interest in such diverse disciplines as forestry, plant ecology, epidemiology, geography, seismology, materials science, astronomy, telecommunications, computational neuroscience, economics and others. Point processes on the real line form an important special case that is particularly amenable to study, because the different points are ordered in a natural way, and the whole point process can be described completely by the (random) intervals between the points. These point processes are frequently used as models for random events in time, such as the arrival of customers in a queue (queueing theory), of impulses in a neuron (computational neuroscience), particles in a Geiger counter, location of radio stations in a telecommunication network or of searches on the worldwide web. mmpp 
Pointer Network (PtrNet) 
We introduce a new neural architecture to learn the conditional probability of an output sequence with elements that are discrete tokens corresponding to positions in an input sequence. Such problems cannot be trivially addressed by existent approaches such as sequencetosequence and Neural Turing Machines, because the number of target classes in each step of the output depends on the length of the input, which is variable. Problems such as sorting variable sized sequences, and various combinatorial optimization problems belong to this class. Our model solves the problem of variable size output dictionaries using a recently proposed mechanism of neural attention. It differs from the previous attention attempts in that, instead of using attention to blend hidden units of an encoder to a context vector at each decoder step, it uses attention as a pointer to select a member of the input sequence as the output. We call this architecture a Pointer Net (PtrNet). We show PtrNets can be used to learn approximate solutions to three challenging geometric problems — finding planar convex hulls, computing Delaunay triangulations, and the planar Travelling Salesman Problem — using training examples alone. PtrNets not only improve over sequencetosequence with input attention, but also allow us to generalize to variable size output dictionaries. We show that the learnt models generalize beyond the maximum lengths they were trained on. We hope our results on these tasks will encourage a broader exploration of neural learning for discrete problems. 
Pointer Networks  Pointer networks are a variation of the sequencetosequence model with attention. Instead of translating one sequence into another, they yield a succession of pointers to the elements of the input series. The most basic use of this is ordering the elements of a variablelength sequence. Basic seq2seq is an LSTM encoder coupled with an LSTM decoder. It’s most often heard of in the context of machine translation: given a sentence in one language, the encoder turns it into a fixedsize representation. Decoder transforms this into a sentence again, possibly of different length than the source. For example, ‘como estas?’ – two words – would be translated to ‘how are you?’ – three words. The model gives better results when augmented with attention. Practically it means that instead of processing the input from start to finish, the decoder can look back and forth over input. Specifically, it has access to encoder states from each step, not just the last one. Consider how it may help with Spanish, in which adjectives go before nouns: ‘neural network’ becomes ‘red neuronal’. In technical terms, attention (at least this particular kind, contentbased attention) boils down to dot products and weighted averages. In short, a weighted average of encoder states becomes the decoder state. Attention is just the distribution of weights. 
PointWise Convolutional Neural Network  Deep learning with 3D data such as reconstructed point clouds and CAD models has received great research interests recently. However, the capability of using point clouds with convolutional neural network has been so far not fully explored. In this technical report, we present a convolutional neural network for semantic segmentation and object recognition with 3D point clouds. At the core of our network is pointwise convolution, a convolution operator that can be applied at each point of a point cloud. Our fully convolutional network design, while being simple to implement, can yield competitive accuracy in both semantic segmentation and object recognition task. 
Poisson Autoregressive Models With Exogenous Covariates (PoARX) 
This paper introduces multivariate Poisson autoregressive models with exogenous covariates (PoARX) for modelling multivariate time series of counts. We obtain conditions for the PoARX process to be stationary and ergodic before proposing a computationally efficient procedure for estimation of parameters by the method of inference functions (IFM) and obtaining asymptotic normality of these estimators. Lastly, we demonstrate an application to count data for the number of people entering and exiting a building, and show how the different aspects of the model combine to produce a strong predictive model. We conclude by suggesting some further areas of application and by listing directions for future work. 
Poisson Factorization Machine (PFM) 
Newsroom in online ecosystem is difficult to untangle. With prevalence of social media, interactions between journalists and individuals become visible, but lack of understanding to inner processing of information feedback loop in public sphere leave most journalists baffled. Can we provide an organized view to characterize journalist behaviors on individual level to know better of the ecosystem? To this end, I propose Poisson Factorization Machine (PFM), a Bayesian analogue to matrix factorization that assumes Poisson distribution for generative process. The model generalizes recent studies on Poisson Matrix Factorization to account temporal interaction which involves tensorlike structure, and label information. Two inference procedures are designed, one based on batch variational EM and another stochastic variational inference scheme that efficiently scales with data size. An important novelty in this note is that I show how to stack layers of PFM to introduce a deep architecture. This work discusses some potential results applying the model and explains how such latent factors may be useful for analyzing latent behaviors for data exploration. 
Poisson Regression  In statistics, Poisson regression is a form of regression analysis used to model count data and contingency tables. Poisson regression assumes the response variable Y has a Poisson distribution, and assumes the logarithm of its expected value can be modeled by a linear combination of unknown parameters. A Poisson regression model is sometimes known as a loglinear model, especially when used to model contingency tables. 
Polar Transformer Network (PTN) 
Convolutional neural networks (CNNs) are equivariant with respect to translation; a translation in the input causes a translation in the output. Attempts to generalize equivariance have concentrated on rotations. In this paper, we combine the idea of the spatial transformer, and the canonical coordinate representations of groups (polar transform) to realize a network that is invariant to translation, and equivariant to rotation and scale. A conventional CNN is used to predict the origin of a polar transform. The polar transform is performed in a differentiable way, similar to the Spatial Transformer Networks, and the resulting polar representation is fed into a second CNN. The model is trained endtoend with a classification loss. We apply the method on variations of MNIST, obtained by perturbing it with clutter, translation, rotation, and scaling. We achieve state of the art performance in the rotated MNIST, with fewer parameters and faster training time than previous methods, and we outperform all tested methods in the SIM2MNIST dataset, which we introduce. 
Polarity Detection  
Polyaxon  Deep Learning library for TensorFlow for building end to end models and experiments. Polyaxon was built with the following goals: · Modularity: The creation of a computation graph based on modular and understandable modules, with the possibility to reuse and share the module in subsequent usage. · Usability: Training a model should be easy enough, and should enable quick experimentations. · Configurable: Models and experiments could be created using a YAML/Json file, but also in python files. · Extensibility: The modularity and the extensive documentation of the code makes it easy to build and extend the set of provided modules. · Performance: Polyaxon is based on internal tensorflow code base and leverage the builtin distributed learning. · Data Preprocessing: Polyaxon provides many pipelines and data processor to support different data inputs. Github ➘ “Tensorflow” 
Polyglot Persistence  Today, most large companies are using a variety of different data storage technologies for different kinds of data. A lot of companies still use relational databases to store some data, but the persistence needs of applications are evolving from predominantly relational to a mixture of data sources. Polyglot persistence is commonly used to define this hybrid approach. Increasingly, architects are approaching the data storage problem by first figuring out how they want to manipulate the data, and then choosing the appropriate technology to fit their needs. What polyglot persistence boils down to is choice – the ability to leverage multiple data storages, depending on your use cases. http://…/polyglotpersistence 
Polyglot Processing  … So here we are. Above observations motivate me to suggest a new term that aims to capture the shift of focus toward processing of the data: polyglot processing – which essentially means using the right processing engine for a given task. To the best of my knowledge no one has suggested or attempted to define this term yet, besides a somewhat related mentioning in the realm of the Apache Bigtop project, however in a much narrower context…. 
Polyglot Programming  Beyond being something incredibly difficult to say many times in a row, polyglot programming is the use of different programming languages, frameworks, services and databases for developing individual applications. 
Polygonal Symbolic Data Analysis  ➘ “Symbolic Data Analysis” psda 
PolygonRNN++  Manually labeling datasets with object masks is extremely time consuming. In this work, we follow the idea of PolygonRNN to produce polygonal annotations of objects interactively using humansintheloop. We introduce several important improvements to the model: 1) we design a new CNN encoder architecture, 2) show how to effectively train the model with Reinforcement Learning, and 3) significantly increase the output resolution using a Graph Neural Network, allowing the model to accurately annotate highresolution objects in images. Extensive evaluation on the Cityscapes dataset shows that our model, which we refer to as PolygonRNN++, significantly outperforms the original model in both automatic (10% absolute and 16% relative improvement in mean IoU) and interactive modes (requiring 50% fewer clicks by annotators). We further analyze the crossdomain scenario in which our model is trained on one dataset, and used out of the box on datasets from varying domains. The results show that PolygonRNN++ exhibits powerful generalization capabilities, achieving significant improvements over existing pixelwise methods. Using simple online finetuning we further achieve a high reduction in annotation time for new datasets, moving a step closer towards an interactive annotation tool to be used in practice. 
Polypus  In this paper we propose a new parallel architecture based on Big Data technologies for realtime sentiment analysis on microblogging posts. Polypus is a modular framework that provides the following functionalities: (1) massive text extraction from Twitter, (2) distributed nonrelational storage optimized for time range queries, (3) memorybased intermodule buffering, (4) realtime sentiment classification, (5) near realtime keyword sentiment aggregation in time series, (6) a HTTP API to interact with the Polypus cluster and (7) a web interface to analyze results visually. The whole architecture is selfdeployable and based on Docker containers. 
Polytomous Discrimination Index (PDI) 
Polytomous Discrimination Index (PDI), described in the paper: Van Calster B (2012) <doi:10.1007/s1065401297333>. Jialiang Li (2017) <doi:10.1177/0962280217692830>. mcca 
Pomegranate  We present pomegranate, an open source machine learning package for probabilistic modeling in Python. Probabilistic modeling encompasses a wide range of methods that explicitly describe uncertainty using probability distributions. Three widely used probabilistic models implemented in pomegranate are general mixture models, hidden Markov models, and Bayesian networks. A primary focus of pomegranate is to abstract away the complexities of training models from their definition. This allows users to focus on specifying the correct model for their application instead of being limited by their understanding of the underlying algorithms. An aspect of this focus involves the collection of additive sufficient statistics from data sets as a strategy for training models. This approach trivially enables many useful learning strategies, such as outofcore learning, minibatch learning, and semisupervised learning, without requiring the user to consider how to partition data or modify the algorithms to handle these tasks themselves. pomegranate is written in Cython to speed up calculations and releases the global interpreter lock to allow for builtin multithreaded parallelism, making it competitive with—or outperform—other implementations of similar algorithms. This paper presents an overview of the design choices in pomegranate, and how they have enabled complex features to be supported by simple code. 
Pontryagin Maximum Principle  Pontryagin’s maximum (or minimum) principle is used in optimal control theory to find the best possible control for taking a dynamical system from one state to another, especially in the presence of constraints for the state or input controls. It was formulated in 1956 by the Russian mathematician Lev Pontryagin and his students. It has as a special case the EulerLagrange equation of the calculus of variations. The principle states, informally, that the control Hamiltonian must take an extreme value over controls in the set of all permissible controls. Whether the extreme value is maximum or minimum depends both on the problem and on the sign convention used for defining the Hamiltonian. The normal convention, which is the one used in Hamiltonian, leads to a maximum hence maximum principle but the sign convention used in this article makes the extreme value a minimum. 
Pool Adjacent Violators Algorithm (PAVA) 
Pool Adjacent Violators Algorithm (PAVA) is a linear time (and linear memory) algorithm for linear ordering isotonic regression. 
Population Based Training (PBT) 
Neural networks dominate the modern machine learning landscape, but their training and success still suffer from sensitivity to empirical choices of hyperparameters such as model architecture, loss function, and optimisation algorithm. In this work we present \emph{Population Based Training (PBT)}, a simple asynchronous optimisation algorithm which effectively utilises a fixed computational budget to jointly optimise a population of models and their hyperparameters to maximise performance. Importantly, PBT discovers a schedule of hyperparameter settings rather than following the generally suboptimal strategy of trying to find a single fixed set to use for the whole course of training. With just a small modification to a typical distributed hyperparameter training framework, our method allows robust and reliable training of models. We demonstrate the effectiveness of PBT on deep reinforcement learning problems, showing faster wallclock convergence and higher final performance of agents by optimising over a suite of hyperparameters. In addition, we show the same method can be applied to supervised learning for machine translation, where PBT is used to maximise the BLEU score directly, and also to training of Generative Adversarial Networks to maximise the Inception score of generated images. In all cases PBT results in the automatic discovery of hyperparameter schedules and model selection which results in stable training and better final performance. 
Porcellio Scaber Algorithm (PSA) 
Bioinspired algorithms have received a significant amount of attention in both academic and engineering societies. In this paper, based on the observation of two major survival rules of a species of woodlice, i.e., porcellio scaber, we design and propose an algorithm called the porcellio scaber algorithm (PSA) for solving optimization problems, including differentiable and nondifferential ones as well as the case with local optimums. Numerical results based on benchmark problems are presented to validate the efficacy of PSA. 
Porcupine Neural Network (PNN) 
Neural networks have been used prominently in several machine learning and statistics applications. In general, the underlying optimization of neural networks is nonconvex which makes their performance analysis challenging. In this paper, we take a novel approach to this problem by asking whether one can constrain neural network weights to make its optimization landscape have good theoretical properties while at the same time, be a good approximation for the unconstrained one. For twolayer neural networks, we provide affirmative answers to these questions by introducing Porcupine Neural Networks (PNNs) whose weight vectors are constrained to lie over a finite set of lines. We show that most local optima of PNN optimizations are global while we have a characterization of regions where bad local optimizers may exist. Moreover, our theoretical and empirical results suggest that an unconstrained neural network can be approximated using a polynomiallylarge PNN. 
Portmanteau Test  A portmanteau test is a type of statistical hypothesis test in which the null hypothesis is well specified, but the alternative hypothesis is more loosely specified. Tests constructed in this context can have the property of being at least moderately powerful against a wide range of departures from the null hypothesis. Thus, in applied statistics, a portmanteau test provides a reasonable way of proceeding as a general check of a model’s match to a dataset where there are many different ways in which the model may depart from the underlying data generating process. Use of such tests avoids having to be very specific about the particular type of departure being tested. 
PoseNet  The Convolution Neural Network (CNN) has demonstrated the unique advantage in audio, image and text learning; recently it has also challenged Recurrent Neural Networks (RNNs) with long shortterm memory cells (LSTM) in sequencetosequence learning, since the computations involved in CNN are easily parallelizable whereas those involved in RNN are mostly sequential, leading to a performance bottleneck. However, unlike RNN, the native CNN lacks the history sensitivity required for sequence transformation; therefore enhancing the sequential order awareness, or positionsensitivity, becomes the key to make CNN the general deep learning model. In this work we introduce an extended CNN model with strengthen positionsensitivity, called PoseNet. A notable feature of PoseNet is the asymmetric treatment of position information in the encoder and the decoder. Experiments shows that PoseNet allows us to improve the accuracy of CNN based sequencetosequence learning significantly, achieving around 3336 BLEU scores on the WMT 2014 EnglishtoGerman translation task, and around 4446 BLEU scores on the EnglishtoFrench translation task. 
Position, Sequence and Set Similarity Measure  In this paper the author presents a new similarity measure for strings of characters based on S3M which he expands to take into account not only the characters set and sequence but also their position. After demonstrating the superiority of this new measure and discussing the need for a self adaptive spell checker, this work is further developed into an adaptive spell checker that produces a cluster with a defined number of words for each presented misspelled word. The accuracy of this solution is measured comparing its results against the results of the most widely used spell checker. 
Possibilistic CMeans (PCM) 
PCM partitions an mdimensional dataset Formula into several clusters to describe an underlying structure within the data. A possibilistic partition is defined as a Formula matrix Formula, where Formula is the membership value of object Formula towards the ith cluster … The Possibilistic CMeans Algorithm: Insights and Recommendations A Possibilistic Fuzzy cMeans Clustering Algorithm PCM and APCM Revisited: An Uncertainty Perspective 
Posterior Predictive Distribution  In statistics, and especially Bayesian statistics, the posterior predictive distribution is the distribution of unobserved observations (prediction) conditional on the observed data. Described as the distribution that a new i.i.d. data point \tilde{x} would have, given a set of N existing i.i.d. observations \mathbf{X} = . In a frequentist context, this might be derived by computing the maximum likelihood estimate (or some other estimate) of the parameter(s) given the observed data, and then plugging them into the distribution function of the new observations. 
Posterior Probability  In Bayesian statistics, the posterior probability of a random event or an uncertain proposition is the conditional probability that is assigned after the relevant evidence or background is taken into account. Similarly, the posterior probability distribution is the probability distribution of an unknown quantity, treated as a random variable, conditional on the evidence obtained from an experiment or survey. “Posterior”, in this context, means after taking into account the relevant evidence related to the particular case being examined. 
Posterior Sampling for Pure Exploration (PSPE) 
In several realistic situations, an interactive learning agent can practice and refine its strategy before going on to be evaluated. For instance, consider a student preparing for a series of tests. She would typically take a few practice tests to know which areas she needs to improve upon. Based of the scores she obtains in these practice tests, she would formulate a strategy for maximizing her scores in the actual tests. We treat this scenario in the context of an agent exploring a fixedhorizon episodic Markov Decision Process (MDP), where the agent can practice on the MDP for some number of episodes (not necessarily known in advance) before starting to incur regret for its actions. During practice, the agent’s goal must be to maximize the probability of following an optimal policy. This is akin to the problem of Pure Exploration (PE). We extend the PE problem of Multi Armed Bandits (MAB) to MDPs and propose a Bayesian algorithm called Posterior Sampling for Pure Exploration (PSPE), which is similar to its bandit counterpart. We show that the Bayesian simple regret converges at an optimal exponential rate when using PSPE. When the agent starts being evaluated, its goal would be to minimize the cumulative regret incurred. This is akin to the problem of Reinforcement Learning (RL). The agent uses the Posterior Sampling for Reinforcement Learning algorithm (PSRL) initialized with the posteriors of the practice phase. We hypothesize that this PSPE + PSRL combination is an optimal strategy for minimizing regret in RL problems with an initial practice phase. We show empirical results which prove that having a lower simple regret at the end of the practice phase results in having lower cumulative regret during evaluation. 
Potential Confounding Factor (PCF) 

Potts Model  Potts model in Potts, R. B. (1952) <doi:10.1017/S0305004100027419> PottsUtils 
Power Linear Unit (PoLU) 
In this paper, we introduce ‘Power Linear Unit’ (PoLU) which increases the nonlinearity capacity of a neural network and thus helps improving its performance. PoLU adopts several advantages of previously proposed activation functions. First, the output of PoLU for positive inputs is designed to be identity to avoid the gradient vanishing problem. Second, PoLU has a nonzero output for negative inputs such that the output mean of the units is close to zero, hence reducing the bias shift effect. Thirdly, there is a saturation on the negative part of PoLU, which makes it more noiserobust for negative inputs. Furthermore, we prove that PoLU is able to map more portions of every layer’s input to the same space by using the power function and thus increases the number of response regions of the neural network. We use image classification for comparing our proposed activation function with others. In the experiments, MNIST, CIFAR10, CIFAR100, Street View House Numbers (SVHN) and ImageNet are used as benchmark datasets. The neural networks we implemented include widelyused ELUNetwork, ResNet50, and VGG16, plus a couple of shallow networks. Experimental results show that our proposed activation function outperforms other stateoftheart models with most networks. 
Power Normal Distribution (PN) 
… PowerNormal 
Praaline  This paper presents Praaline, an opensource software system for managing, annotating, analysing and visualising speech corpora. Researchers working with speech corpora are often faced with multiple tools and formats, and they need to work with everincreasing amounts of data in a collaborative way. Praaline integrates and extends existing timeproven tools for spoken corpora analysis (Praat, Sonic Visualiser and a bridge to the R statistical package) in a modular system, facilitating automation and reuse. Users are exposed to an integrated, userfriendly interface from which to access multiple tools. Corpus metadata and annotations may be stored in a database, locally or remotely, and users can define the metadata and annotation structure. Users may run a customisable cascade of analysis steps, based on plugins and scripts, and update the database with the results. The corpus database may be queried, to produce aggregated datasets. Praaline is extensible using Python or C++ plugins, while Praat and R scripts may be executed against the corpus data. A series of visualisations, editors and plugins are provided. Praaline is free software, released under the GPL license. 
PraisWinsten Estimation  In econometrics, PraisWinsten estimation is a procedure meant to take care of the serial correlation of type AR(1) in a linear model. Conceived by Sigbert Prais and Christopher Winsten in 1954, it is a modification of CochraneOrcutt estimation in the sense that it does not lose the first observation and leads to more efficiency as a result. prais 
Preattentive Processing  Preattentive processing is the unconscious accumulation of information from the environment. All available information is preattentively processed. Then, the brain filters and processes what is important. Information that has the highest salience (a stimulus that stands out the most) or relevance to what a person is thinking about is selected for further and more complete analysis by conscious (attentive) processing. Understanding how preattentive processing works is useful in advertising, in education, and for prediction of cognitive ability. 
Precision  In pattern recognition and information retrieval with binary classification, precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance. Suppose a program for recognizing dogs in scenes from a video identifies 7 dogs in a scene containing 9 dogs and some cats. If 4 of the identifications are correct, but 3 are actually cats, the program’s precision is 4/7 while its recall is 4/9. When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. In statistics, if the null hypothesis is that all and only the relevant items are retrieved, absence of type I and type II errors corresponds respectively to maximum precision (no false positive) and maximum recall (no false negative). The above pattern recognition example contained 7 – 4 = 3 type I errors and 9 – 4 = 5 type II errors. Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity. In simple terms, high precision means that an algorithm returned substantially more relevant results than irrelevant, while high recall means that an algorithm returned most of the relevant results. 
Precision and Recall  In pattern recognition and information retrieval with binary classification, precision (also called positive predictive value) is the fraction of retrieved instances that are relevant, while recall (also known as sensitivity) is the fraction of relevant instances that are retrieved. Both precision and recall are therefore based on an understanding and measure of relevance. Suppose a program for recognizing dogs in scenes from a video identifies 7 dogs in a scene containing 9 dogs and some cats. If 4 of the identifications are correct, but 3 are actually cats, the program’s precision is 4/7 while its recall is 4/9. When a search engine returns 30 pages only 20 of which were relevant while failing to return 40 additional relevant pages, its precision is 20/30 = 2/3 while its recall is 20/60 = 1/3. In statistics, if the null hypothesis is that all and only the relevant items are retrieved, absence of type I and type II errors corresponds respectively to maximum precision (no false positive) and maximum recall (no false negative). The above pattern recognition example contained 7 – 4 = 3 type I errors and 9 – 4 = 5 type II errors. Precision can be seen as a measure of exactness or quality, whereas recall is a measure of completeness or quantity. In simple terms, high precision means that an algorithm returned substantially more relevant results than irrelevant, while high recall means that an algorithm returned most of the relevant results. http://…/recallprecision.pdf 
Preconditioned Stochastic Gradient Descent (PSGD) 
This paper studies the performance of preconditioned stochastic gradient descent (PSGD), which can be regarded as an enhance stochastic Newton method with the ability to handle gradient noise and nonconvexity at the same time. We have improved the implementation of PSGD, unrevealed its relationship to equilibrated stochastic gradient descent (ESGD) and batch normalization, and provided a software package (https://…/psgd_tf ) implemented in Tensorflow to compare variations of PSGD and stochastic gradient descent (SGD) on a wide range of benchmark problems with commonly used neural network models, e.g., convolutional and recurrent neural networks. Comparison results clearly demonstrate the advantages of PSGD in terms of convergence speeds and generalization performances. 
Predicted Relevance Model (PRM) 
Evaluation of search engines relies on assessments of search results for selected test queries, from which we would ideally like to draw conclusions in terms of relevance of the results for general (e.g., future, unknown) users. In practice however, most evaluation scenarios only allow us to conclusively determine the relevance towards the particular assessor that provided the judgments. A factor that cannot be ignored when extending conclusions made from assessors towards users, is the possible disagreement on relevance, assuming that a single gold truth label does not exist. This paper presents and analyzes the Predicted Relevance Model (PRM), which allows predicting a particular result’s relevance for a random user, based on an observed assessment and knowledge on the average disagreement between assessors. With the PRM, existing evaluation metrics designed to measure binary assessor relevance, can be transformed into more robust and effectively graded measures that evaluate relevance towards a random user. It also leads to a principled way of quantifying multiple graded or categorical relevance levels for use as gains in established graded relevance measures, such as normalized discounted cumulative gain (nDCG), which nowadays often use heuristic and dataindependent gain values. Given a set of test topics with graded relevance judgments, the PRM allows evaluating systems on different scenarios, such as their capability of retrieving top results, or how well they are able to filter out nonrelevant ones. Its use in actual evaluation scenarios is illustrated on several information retrieval test collections. 
Prediction Advantage (PA) 
We introduce the Prediction Advantage (PA), a novel performance measure for prediction functions under any loss function (e.g., classification or regression). The PA is defined as the performance advantage relative to the Bayesian risk restricted to knowing only the distribution of the labels. We derive the PA for wellknown loss functions, including 0/1 loss, crossentropy loss, absolute loss, and squared loss. In the latter case, the PA is identical to the wellknown Rsquared measure, widely used in statistics. The use of the PA ensures meaningful quantification of prediction performance, which is not guaranteed, for example, when dealing with noisy imbalanced classification problems. We argue that among several known alternative performance measures, PA is the best (and only) quantity ensuring meaningfulness for all noise and imbalance levels. 
Prediction Difference Analysis  This article presents the prediction difference analysis method for visualizing the response of a deep neural network to a specific input. When classifying images, the method highlights areas in a given input image that provide evidence for or against a certain class. It overcomes several shortcoming of previous methods and provides great additional insight into the decision making process of classifiers. Making neural network decisions interpretable through visualization is important both to improve models and to accelerate the adoption of blackbox classifiers in application areas such as medicine. We illustrate the method in experiments on natural images (ImageNet data), as well as medical images (MRI brain scans). 
Prediction Interval  In statistical inference, specifically predictive inference, a prediction interval is an estimate of an interval in which future observations will fall, with a certain probability, given what has already been observed. Prediction intervals are often used in regression analysis. Prediction intervals are used in both frequentist statistics and Bayesian statistics: a prediction interval bears the same relationship to a future observation that a frequentist confidence interval or Bayesian credible interval bears to an unobservable population parameter: prediction intervals predict the distribution of individual future points, whereas confidence intervals and credible intervals of parameters predict the distribution of estimates of the true population mean or other quantity of interest that cannot be observed. Prediction Interval, the wider sister of Confidence Interval 
PredictionIO  PredictionIO is an open source machine learning server for software developers to create predictive features, such as personalization, recommendation and content discovery. 
PredictionPerformancePlot  ROCR 
Predictive Analysis Library (PAL) 
The Predictive Analysis Library (PAL) defines functions that can be called from within SQLScript procedures to perform analytic algorithms. This release of PAL includes classic and universal predictive analysis algorithms in eight datamining categories: · Clustering · Classification · Association · Time Series · Preprocessing · Statistics · Social Network Analysis · Miscellaneous 
Predictive Analytics / Predictive Analysis (PA) 
Predictive analytics encompasses a variety of statistical techniques from modeling, machine learning, and data mining that analyze current and historical facts to make predictions about future, or otherwise unknown, events. In business, predictive models exploit patterns found in historical and transactional data to identify risks and opportunities. Models capture relationships among many factors to allow assessment of risk or potential associated with a particular set of conditions, guiding decision making for candidate transactions. Predictive analytics is used in actuarial science, marketing, financial services, insurance, telecommunications, retail, travel, healthcare, pharmaceuticals and other fields. One of the most well known applications is credit scoring, which is used throughout financial services. Scoring models process a customer’s credit history, loan application, customer data, etc., in order to rankorder individuals by their likelihood of making future credit payments on time. A wellknown example is FICO 
Predictive CLR  In this paper we explore different regression models based on Clusterwise Linear Regression (CLR). CLR aims to find the partition of the data into $k$ clusters, such that linear regressions fitted to each of the clusters minimize overall mean squared error on the whole data. The main obstacle preventing to use found regression models for prediction on the unseen test points is the absence of a reasonable way to obtain CLR cluster labels when the values of target variable are unknown. In this paper we propose two novel approaches on how to solve this problem. The first approach, predictive CLR builds a separate classification model to predict test CLR labels. The second approach, constrained CLR utilizes a set of userspecified constraints that enforce certain points to go to the same clusters. Assuming the constraint values are known for the test points, they can be directly used to assign CLR labels. We evaluate these two approaches on three UCI ML datasets as well as on a large corpus of health insurance claims. We show that both of the proposed algorithms significantly improve over the known CLRbased regression methods. Moreover, predictive CLR consistently outperforms linear regression and random forest, and shows comparable performance to support vector regression on UCI ML datasets. The constrained CLR approach achieves the best performance on the health insurance dataset, while enjoying only $\approx 20$ times increased computational time over linear regression. 
Predictive Maintenance (PdM) 
Predictive maintenance (PdM) techniques are designed to help determine the condition of inservice equipment in order to predict when maintenance should be performed. This approach promises cost savings over routine or timebased preventive maintenance, because tasks are performed only when warranted. The main promise of Predicted Maintenance is to allow convenient scheduling of corrective maintenance, and to prevent unexpected equipment failures. The key is ‘the right information in the right time’. By knowing which equipment needs maintenance, maintenance work can be better planned (spare parts, people, etc.) and what would have been ‘unplanned stops’ are transformed to shorter and fewer ‘planned stops’, thus increasing plant availability. Other potential advantages include increased equipment lifetime, increased plant safety, fewer accidents with negative impact on environment, and optimized spare parts handling. 
Predictive Model Markup Language (PMML) 
The Predictive Model Markup Language (PMML) is an XMLbased file format developed by the Data Mining Group to provide a way for applications to describe and exchange models produced by data mining and machine learning algorithms. It supports common models such as logistic regression and feedforward neural networks. Since PMML is an XMLbased standard, the specification comes in the form of an XML schema. http://…/pmmlrevolution.html 
Predictive Neural Network (PNN) 
Recurrent neural networks are a powerful means to cope with time series. We show that already linearly activated recurrent neural networks can approximate any timedependent function f(t) given by a number of function values. The approximation can effectively be learned by simply solving a linear equation system; no backpropagation or similar methods are needed. Furthermore the network size can be reduced by taking only the most relevant components of the network. Thus, in contrast to others, our approach not only learns network weights but also the network architecture. The networks have interesting properties: In the stationary case they end up in ellipse trajectories in the long run, and they allow the prediction of further values and compact representations of functions. We demonstrate this by several experiments, among them multiple superimposed oscillators (MSO) and robotic soccer. Predictive neural networks outperform the previous stateoftheart for the MSO task with a minimal number of units. 
Predictive Personalization  Predictive personalization is defined as the ability to predict customer behavior, needs or wants – and tailor offers and communications very precisely. Social data is one source of providing this predictive analysis, particularly social data that is structured. Predictive personalization is a much more recent means of personalization and can be used well to augment current personalization offerings. 
Predictive Quality and Maintenance (PQM) 
PQM solutions, which harness data gathered by both the Internet of Things (IoT) and data from traditional legacy systems, focus on detecting and addressing quality and maintenance issues before they turn into serious problemsfor example, problems that can cause unplanned downtime. 
Predictive State Recurrent Neural Networks (PSRNN) 
We present a new model, called Predictive State Recurrent Neural Networks (PSRNNs), for filtering and prediction in dynamical systems. PSRNNs draw on insights from both Recurrent Neural Networks (RNNs) and Predictive State Representations (PSRs), and inherit advantages from both types of models. Like many successful RNN architectures, PSRNNs use (potentially deeply composed) bilinear transfer functions to combine information from multiple sources, so that one source can act as a gate for another. These bilinear functions arise naturally from the connection to state updates in Bayes filters like PSRs, in which observations can be viewed as gating belief states. We show that PSRNNs can be learned effectively by combining backpropogation through time (BPTT) with an initialization based on a statistically consistent learning algorithm for PSRs called twostage regression (2SR). We also show that PSRNNs can be can be factorized using tensor decomposition, reducing model size and suggesting interesting theoretical connections to existing multiplicative architectures such as LSTMs. We applied PSRNNs to 4 datasets, and showed that we outperform several popular alternative approaches to modeling dynamical systems in all cases. 
Predictive State Representation (PSR) 
In computer science, a predictive state representation (PSR) is a way to model a state of controlled dynamical system from a history of actions taken and resulting observations. PSR captures the state of a system as a vector of predictions for future tests (experiments) that can be done on the system. A test is a sequence of actionobservation pairs and its prediction is the probability of the test’s observation sequence happening if the test’s actionsequence were to be executed on the system. One of the advantage of using PSR is that the predictions are directly related to observable quantities. This is in contrast to other models of dynamical systems, such as partially observable Markov decision processes (POMDPs) where the state of the system is represented as a probability distribution over unobserved nominal states. 
Predictron  One of the key challenges of artificial intelligence is to learn models that are effective in the context of planning. In this document we introduce the predictron architecture. The predictron consists of a fully abstract model, represented by a Markov reward process, that can be rolled forward multiple ‘imagined’ planning steps. Each forward pass of the predictron accumulates internal rewards and values over multiple planning depths. The predictron is trained endtoend so as to make these accumulated values accurately approximate the true value function. We applied the predictron to procedurally generated random mazes and a simulator for the game of pool. The predictron yielded significantly more accurate predictions than conventional deep neural network architectures. 
PredRNN++  We present PredRNN++, an improved recurrent network for video predictive learning. In pursuit of a greater spatiotemporal modeling capability, our approach increases the transition depth between adjacent states by leveraging a novel recurrent unit, which is named Causal LSTM for reorganizing the spatial and temporal memories in a cascaded mechanism. However, there is still a dilemma in video predictive learning: increasingly deepintime models have been designed for capturing complex variations, while introducing more difficulties in the gradient backpropagation. To alleviate this undesirable effect, we propose a Gradient Highway architecture, which provides alternative shorter routes for gradient flows from outputs back to longrange inputs. This architecture works seamlessly with causal LSTMs, enabling PredRNN++ to capture shortterm and longterm dependencies adaptively. We assess our model on both synthetic and real video datasets, showing its ability to ease the vanishing gradient problem and yield stateoftheart prediction results even in a difficult objects occlusion scenario. 
Preference Mapping  Preference Mapping allows to build maps which are useful in a variety of domains. A preference map is a decision support tool in analyses where a configuration of objects has been obtained from a first analysis (PCA, MCA, MDS), and where a table with complementary data describing the objects is available (attributes or preference data). There are two types of preference mapping methods: 1.External preference mapping or PREFMAP 2.Internal preference mapping SensoMineR 
Preferential Attachment (PA) 
A preferential attachment process is any of a class of processes in which some quantity, typically some form of wealth or credit, is distributed among a number of individuals or objects according to how much they already have, so that those who are already wealthy receive more than those who are not. ‘Preferential attachment’ is only the most recent of many names that have been given to such processes. They are also referred to under the names ‘Yule process’, ‘cumulative advantage’, ‘the rich get richer’, and, less correctly, the ‘Matthew effect’. They are also related to Gibrat’s law. The principal reason for scientific interest in preferential attachment is that it can, under suitable circumstances, generate power law distributions. 
PrePartitioned Generalized MatrixVector Multiplication (PMV) 
How can we analyze enormous networks including the Web and social networks which have hundreds of billions of nodes and edges? Network analyses have been conducted by various graph mining methods including shortest path computation, PageRank, connected component computation, random walk with restart, etc. These graph mining methods can be expressed as generalized matrixvector multiplication which consists of few operations inspired by typical matrixvector multiplication. Recently, several graph processing systems based on matrixvector multiplication or their own primitives have been proposed to deal with large graphs; however, they all have failed on Webscale graphs due to insufficient memory space or the lack of consideration for I/O costs. In this paper, we propose PMV (Prepartitioned generalized MatrixVector multiplication), a scalable distributed graph mining method based on generalized matrixvector multiplication on distributed systems. PMV significantly decreases the communication cost, which is the main bottleneck of distributed systems, by partitioning the input graph in advance and judiciously applying execution strategies based on the density of the prepartitioned submatrices. Experiments show that PMV succeeds in processing up to 16x larger graphs than existing distributed memorybased graph mining methods, and requires 9x less time than previous diskbased graph mining methods by reducing I/O costs significantly. 
Prescriptive Analytics  Prescriptive analytics not only anticipates what will happen and when it will happen, but also why it will happen. Further, prescriptive analytics suggests decision options on how to take advantage of a future opportunity or mitigate a future risk and shows the implication of each decision option. Prescriptive analytics can continually take in new data to repredict and represcribe, thus automatically improving prediction accuracy and prescribing better decision options. Prescriptive analytics ingests hybrid data, a combination of structured (numbers, categories) and unstructured data (videos, images, sounds, texts), and business rules to predict what lies ahead and to prescribe how to take advantage of this predicted future without compromising other priorities. 
PRESISTANT  Data preprocessing is one of the most time consuming and relevant steps in a data analysis process (e.g., classification task). A given data preprocessing operator (e.g., transformation) can have positive, negative or zero impact on the final result of the analysis. Expert users have the required knowledge to find the right preprocessing operators. However, when it comes to nonexperts, they are overwhelmed by the amount of preprocessing operators and it is challenging for them to find operators that would positively impact their analysis (e.g., increase the predictive accuracy of a classifier). Existing solutions either assume that users have expert knowledge, or they recommend preprocessing operators that are only ‘syntactically’ applicable to a dataset, without taking into account their impact on the final analysis. In this work, we aim at providing assistance to nonexpert users by recommending data preprocessing operators that are ranked according to their impact on the final analysis. We developed a tool PRESISTANT, that uses Random Forests to learn the impact of preprocessing operators on the performance (e.g., predictive accuracy) of 5 different classification algorithms, such as J48, Naive Bayes, PART, Logistic Regression, and Nearest Neighbor. Extensive evaluations on the recommendations provided by our tool, show that PRESISTANT can effectively help nonexperts in order to achieve improved results in their analytical tasks. 
PRESS  Nonlinear models are frequently applied to determine the optimal supply natural gas to a given residential unit based on economical and technical factors, or used to fit biochemical and pharmaceutical assay nonlinear data. In this article we propose PRESS statistics and prediction coefficients for a class of nonlinear beta regression models, namely $P^2$ statistics. We aim at using both prediction coefficients and goodnessoffit measures as a scheme of model select criteria. In this sense, we introduce for beta regression models under nonlinearity the use of the model selection criteria based on robust pseudo$R^2$ statistics. Monte Carlo simulation results on the finite sample behavior of both predictionbased model selection criteria $P^2$ and the pseudo$R^2$ statistics are provided. Three applications for real data are presented. The linear application relates to the distribution of natural gas for home usage in S\~ao Paulo, Brazil. Faced with the economic risk of too overestimate or to underestimate the distribution of gas has been necessary to construct prediction limits and to select the best predicted and fitted model to construct best prediction limits it is the aim of the first application. Additionally, the two nonlinear applications presented also highlight the importance of considering both goodnessofpredictive and goodnessoffit of the competitive models. 
PRESTO  In query optimisation accurate cardinality estimation is essential for finding optimal query plans. It is especially challenging for RDF due to the lack of explicit schema and the excessive occurrence of joins in RDF queries. Existing approaches typically collect statistics based on the counts of triples and estimate the cardinality of a query as the product of its join components, where errors can accumulate even when the estimation of each component is accurate. As opposed to existing methods, we propose PRESTO, a cardinality estimation method that is based on the counts of subgraphs instead of triples and uses a probabilistic method to estimate cardinalities of RDF queries as a whole. PRESTO avoids some major issues of existing approaches and is able to accurately estimate arbitrary queries under a bound memory constraint. We evaluate PRESTO with YAGO and show that PRESTO is more accurate for both simple and complex queries. 
Pretty Quick Version of R (pqR) 
pqR is a new version of the R interpreter. It is based on R2.15.0, distributed by the R Core Team (at rproject.org), but improves on it in many ways, mostly ways that speed it up, but also by implementing some new features and fixing some bugs. pqR is an opensource project licensed under the GPL. One notable improvement in pqR is that it is able to do some numeric computations in parallel with each other, and with other operations of the interpreter, on systems with multiple processors or processor cores. 
Price of Fairness (PoF) 
We introduce a flexible family of fairness regularizers for (linear and logistic) regression problems. These regularizers all enjoy convexity, permitting fast optimization, and they span the rang from notions of group fairness to strong individual fairness. By varying the weight on the fairness regularizer, we can compute the efficient frontier of the accuracyfairness tradeoff on any given dataset, and we measure the severity of this tradeoff via a numerical quantity we call the Price of Fairness (PoF). The centerpiece of our results is an extensive comparative study of the PoF across six different datasets in which fairness is a primary consideration. 
PrimalDual ActiveSet (PDAS) 
Isotonic regression (IR) is a nonparametric calibration method used in supervised learning. For performing largescale IR, we propose a primaldual activeset (PDAS) algorithm which, in contrast to the stateoftheart Pool Adjacent Violators (PAV) algorithm, can be parallized and is easily warmstarted thus wellsuited in the online settings. We prove that, like the PAV algorithm, our PDAS algorithm for IR is convergent and has a work complexity of O(n), though our numerical experiments suggest that our PDAS algorithm is often faster than PAV. In addition, we propose PDAS variants (with safeguarding to ensure convergence) for solving related trend filtering (TF) problems, providing the results of experiments to illustrate their effectiveness. 
PrimalDual Group Convolutional Neural Networks (PDGCNets) 
In this paper, we present a simple and modularized neural network architecture, named primaldual group convolutional neural networks (PDGCNets). The main point lies in a novel building block, a pair of two successive group convolutions: primal group convolution and dual group convolution. The two group convolutions are complementary: (i) the convolution on each primal partition in primal group convolution is a spatial convolution, while on each dual partition in dual group convolution, the convolution is a pointwise convolution; (ii) the channels in the same dual partition come from different primal partitions. We discuss one representative advantage: Wider than a regular convolution with the number of parameters and the computation complexity preserved. We also show that regular convolutions, group convolution with summation fusion (as used in ResNeXt), and the Xception block are special cases of primaldual group convolutions. Empirical results over standard benchmarks, CIFAR$10$, CIFAR$100$, SVHN and ImageNet demonstrate that our networks are more efficient in using parameters and computation complexity with similar or higher accuracy. 
Prim’s Algorithm  In computer science, Prim’s algorithm is a greedy algorithm that finds a minimum spanning tree for a connected weighted undirected graph. This means it finds a subset of the edges that forms a tree that includes every vertex, where the total weight of all the edges in the tree is minimized. The algorithm was developed in 1930 by Czech mathematician Vojtěch Jarník and later independently by computer scientist Robert C. Prim in 1957 and rediscovered by Edsger Dijkstra in 1959. Therefore it is also sometimes called the DJP algorithm, the Jarník algorithm, or the PrimJarník algorithm. Other algorithms for this problem include Kruskal’s algorithm and Borůvka’s algorithm. These algorithms find the minimum spanning forest in a possibly disconnected graph. By running Prim’s algorithm for each connected component of the graph, it can also be used to find the minimum spanning forest. 
Principal Component Analysis (PCA) 
Principal component analysis (PCA) is a statistical procedure that uses orthogonal transformation to convert a set of observations of possibly correlated variables into a set of values of linearly uncorrelated variables called principal components. The number of principal components is less than or equal to the number of original variables. This transformation is defined in such a way that the first principal component has the largest possible variance (that is, accounts for as much of the variability in the data as possible), and each succeeding component in turn has the highest variance possible under the constraint that it is orthogonal to (i.e., uncorrelated with) the preceding components. Principal components are guaranteed to be independent if the data set is jointly normally distributed. PCA is sensitive to the relative scaling of the original variables.
http://…roperapplicationsofprincipalcomponent 
Principal Component Pursuit (PCP) 
see section 1.2 ➘ “Robust Principal Component Analysis” 
Principal Component Regression (PCR) 
In statistics, principal component regression (PCR) is a regression analysis technique that is based on principal component analysis (PCA). Typically, it considers regressing the outcome (also known as the response or the dependent variable) on a set of covariates (also known as predictors, or explanatory variables, or independent variables) based on a standard linear regression model, but uses PCA for estimating the unknown regression coefficients in the model. In PCR, instead of regressing the dependent variable on the explanatory variables directly, the principal components of the explanatory variables are used as regressors. One typically uses only a subset of all the principal components for regression, thus making PCR some kind of a regularized procedure. Often the principal components with higher variances (the ones based on eigenvectors corresponding to the higher eigenvalues of the sample variancecovariance matrix of the explanatory variables) are selected as regressors. However, for the purpose of predicting the outcome, the principal components with low variances may also be important, in some cases even more important. One major use of PCR lies in overcoming the multicollinearity problem which arises when two or more of the explanatory variables are close to being collinear. PCR can aptly deal with such situations by excluding some of the lowvariance principal components in the regression step. In addition, by usually regressing on only a subset of all the principal components, PCR can result in dimension reduction through substantially lowering the effective number of parameters characterizing the underlying model. This can be particularly useful in settings with highdimensional covariates. Also, through appropriate selection of the principal components to be used for regression, PCR can lead to efficient prediction of the outcome based on the assumed model. Sketching for Principal Component Regression 
Principal Covariates Regression (PCovR) 
A method for multivariate regression is proposed that is based on the simultaneous leastsquares minimization of Y residuals and X residuals by a number of orthogonal X components. By lending increasing weight to the X variables relative to the Y variables, the procedure moves from ordinary leastsquares regression to principal component regression, forming a relatively simple alternative for continuum regression. PCovR 
Principal Differences Analysis (PDA) 
We introduce principal differences analysis (PDA) for analyzing differences between highdimensional distributions. The method operates by finding the projection that maximizes the Wasserstein divergence between the resulting univariate populations. Relying on the CramerWold device, it requires no assumptions about the form of the underlying distributions, nor the nature of their interclass differences. A sparse variant of the method is introduced to identify features responsible for the differences. We provide algorithms for both the original minimax formulation as well as its semidefinite relaxation. In addition to deriving some convergence results, we illustrate how the approach may be applied to identify differences between cell populations in the somatosensory cortex and hippocampus as manifested by single cell RNAseq. Our broader framework extends beyond the specific choice of Wasserstein divergence. 
Principal Orthogonal ComplEment Thresholding (POET) 
Estimate large covariance matrices in approximate factor models by thresholding principal orthogonal complements. POET 
Principal Stratification Sensitivity Analyses  sensitivityPStrat 
Principal Variance Component Analysis (PVCA) 
Often times ‘batch effects’ are present in microarray data due to any number of factors, including e.g. a poor experimental design or when the gene expression data is combined from different studies with limited standardization. To estimate the variability of experimental effects including batch, a novel hybrid approach known as principal variance component analysis (PVCA) has been developed. The approach leverages the strengths of two very popular data analysis methods: first, principal component analysis (PCA) is used to efficiently reduce data dimension with maintaining the majority of the variability in the data, and variance components analysis (VCA) fits a mixed linear model using factors of interest as random effects to estimate and partition the total variability. The PVCA approach can be used as a screening tool to determine which sources of variability (biological, technical or other) are most prominent in a given microarray data set. Using the eigenvalues associated with their corresponding eigenvectors as weights, associated variations of all factors are standardized and the magnitude of each source of variability (including each batch effect) is presented as a proportion of total variance. Although PVCA is a generic approach for quantifying the corresponding proportion of variation of each effect, it can be a handy assessment for estimating batch effect before and after batch normalization. 
Prior Probability  In Bayesian statistical inference, a prior probability distribution, often called simply the prior, of an uncertain quantity p is the probability distribution that would express one’s uncertainty about p before some evidence is taken into account. For example, p could be the proportion of voters who will vote for a particular politician in a future election. It is meant to attribute uncertainty, rather than randomness, to the uncertain quantity. The unknown quantity may be a parameter or latent variable. One applies Bayes’ theorem, multiplying the prior by the likelihood function and then normalizing, to get the posterior probability distribution, which is the conditional distribution of the uncertain quantity, given the data. A prior is often the purely subjective assessment of an experienced expert. Some will choose a conjugate prior when they can, to make calculation of the posterior distribution easier. Parameters of prior distributions are called hyperparameters, to distinguish them from parameters of the model of the underlying data. 
PriorAware Dual Decomposition (PADD) 
Spectral topic modeling algorithms operate on matrices/tensors of word cooccurrence statistics to learn topicspecific word distributions. This approach removes the dependence on the original documents and produces substantial gains in efficiency and provable topic inference, but at a cost: the model can no longer provide information about the topic composition of individual documents. Recently Thresholded Linear Inverse (TLI) is proposed to map the observed words of each document back to its topic composition. However, its linear characteristics limit the inference quality without considering the important prior information over topics. In this paper, we evaluate Simple Probabilistic Inverse (SPI) method and novel Prioraware Dual Decomposition (PADD) that is capable of learning documentspecific topic compositions in parallel. Experiments show that PADD successfully leverages topic correlations as a prior, notably outperforming TLI and learning quality topic compositions comparable to Gibbs sampling on various data. 
Priority Queue Training (PQT) 
We consider the task of program synthesis in the presence of a reward function over the output of programs, where the goal is to find programs with maximal rewards. We employ an iterative optimization scheme, where we train an RNN on a dataset of K best programs from a priority queue of the generated programs so far. Then, we synthesize new programs and add them to the priority queue by sampling from the RNN. We benchmark our algorithm, called priority queue training (or PQT), against genetic algorithm and reinforcement learning baselines on a simple but expressive Turing complete programming language called BF. Our experimental results show that our simple PQT algorithm significantly outperforms the baselines. By adding a program length penalty to the reward function, we are able to synthesize short, human readable programs. 
PrivacyPreserving Adversarial Network (PPAN) 
We propose a datadriven framework for optimizing privacypreserving data release mechanisms toward the informationtheoretically optimal tradeoff between minimizing distortion of useful data and concealing sensitive information. Our approach employs adversariallytrained neural networks to implement randomized mechanisms and to perform a variational approximation of mutual information privacy. We empirically validate our PrivacyPreserving Adversarial Networks (PPAN) framework with experiments conducted on discrete and continuous synthetic data, as well as the MNIST handwritten digits dataset. With the synthetic data, we find that our modelagnostic PPAN approach achieves tradeoff points very close to the optimal tradeoffs that are analyticallyderived from model knowledge. In experiments with the MNIST data, we visually demonstrate a learned tradeoff between minimizing the pixellevel distortion versus concealing the written digit. 
Private Incremental Regression  Data is continuously generated by modern data sources, and a recent challenge in machine learning has been to develop techniques that perform well in an incremental (streaming) setting. In this paper, we investigate the problem of private machine learning, where as common in practice, the data is not given at once, but rather arrives incrementally over time. We introduce the problems of private incremental ERM and private incremental regression where the general goal is to always maintain a good empirical risk minimizer for the history observed under differential privacy. Our first contribution is a generic transformation of private batch ERM mechanisms into private incremental ERM mechanisms, based on a simple idea of invoking the private batch ERM procedure at some regular time intervals. We take this construction as a baseline for comparison. We then provide two mechanisms for the private incremental regression problem. Our first mechanism is based on privately constructing a noisy incremental gradient function, which is then used in a modified projected gradient procedure at every timestep. This mechanism has an excess empirical risk of $\approx\sqrt{d}$, where $d$ is the dimensionality of the data. While from the results of [Bassily et al. 2014] this bound is tight in the worstcase, we show that certain geometric properties of the input and constraint set can be used to derive significantly better results for certain interesting regression problems. 
Privileged MultiLabel Learning (PrML) 
This paper presents privileged multilabel learning (PrML) to explore and exploit the relationship between labels in multilabel learning problems. We suggest that for each individual label, it cannot only be implicitly connected with other labels via the lowrank constraint over label predictors, but also its performance on examples can receive the explicit comments from other labels together acting as an \emph{Oracle teacher}. We generate privileged label feature for each example and its individual label, and then integrate it into the framework of lowrank based multilabel learning. The proposed algorithm can therefore comprehensively explore and exploit label relationships by inheriting all the merits of privileged information and lowrank constraints. We show that PrML can be efficiently solved by dual coordinate descent algorithm using iterative optimization strategy with cheap updates. Experiments on benchmark datasets show that through privileged label features, the performance can be significantly improved and PrML is superior to several competing methods in most cases. 
PrivyNet  Massive data exist among user local platforms that usually cannot support deep neural network (DNN) training due to computation and storage resource constraints. Cloudbased training schemes can provide beneficial services, but rely on excessive user data collection, which can lead to potential privacy risks and violations. In this paper, we propose PrivyNet, a flexible framework to enable DNN training on the cloud while protecting the data privacy simultaneously. We propose to split the DNNs into two parts and deploy them separately onto the local platforms and the cloud. The local neural network (NN) is used for feature extraction. To avoid local training, we rely on the idea of transfer learning and derive the local NNs by extracting the initial layers from pretrained NNs. We identify and compare three factors that determine the topology of the local NN, including the number of layers, the depth of output channels, and the subset of selected channels. We also propose a hierarchical strategy to determine the local NN topology, which is flexible to optimize the accuracy of the target learning task under the constraints on privacy loss, local computation, and storage. To validate PrivyNet, we use the convolutional NN (CNN) based image classification task as an example and characterize the dependency of privacy loss and accuracy on the local NN topology in detail. We also demonstrate that PrivyNet is efficient and can help explore and optimize the tradeoff between privacy loss and accuracy. 
Probabilistic Adaptive Computation Time  We present a probabilistic model with discrete latent variables that control the computation time in deep learning models such as ResNets and LSTMs. A prior on the latent variables expresses the preference for faster computation. The amount of computation for an input is determined via amortized maximum a posteriori (MAP) inference. MAP inference is performed using a novel stochastic variational optimization method. The recently proposed Adaptive Computation Time mechanism can be seen as an adhoc relaxation of this model. We demonstrate training using the generalpurpose Concrete relaxation of discrete variables. Evaluation on ResNet shows that our method matches the speedaccuracy tradeoff of Adaptive Computation Time, while allowing for evaluation with a simple deterministic procedure that has a lower memory footprint. 
Probabilistic Computing  The MIT Probabilistic Computing Project aims to build software and hardware systems that augment human and machine intelligence. We are currently focused on probabilistic programming. Probabilistic programming is an emerging field that draws on probability theory, programming languages, and systems programming to provide concise, expressive languages for modeling and generalpurpose inference engines that both humans and machines can use. Our research projects include BayesDB and Picture, domainspecific probabilistic programming platforms aimed at augmenting intelligence in the fields of data science and computer vision, respectively. BayesDB, which is open source and in use by organizations like the Bill & Melinda Gates Foundation and JPMorgan, lets users who lack statistics training understand the probable implications of data by writing queries in a simple, SQLlike language. Picture, a probabilistic language being developed in collaboration with Microsoft, lets users solve hard computer vision problems such as inferring 3D models of faces, human bodies and novel generic objects from single images by writing short (<50 line) computer graphics programs that generate and render random scenes. Unlike bottomup vision algorithms, Picture programs build on prior knowledge about scene structure and produce complete 3D wireframes that people can manipulate using ordinary graphics software. The core platform for our research is Venture, an interactive platform suitable for teaching and applications in fields ranging from statistics to robotics. ➚ “BayesDB” 
Probabilistic Conditional Preference Network (PCPnet) 
In order to represent the preferences of a group of individuals, we introduce Probabilistic CPnets (PCPnets). PCPnets provide a compact language for representing probability distributions over preference orderings. We argue that they are useful for aggregating preferences or modelling noisy preferences. Then we give efficient algorithms for the main reasoning problems, namely for computing the probability that a given outcome is preferred to another one, and the probability that a given outcome is optimal. As a byproduct, we obtain an unexpected lineartime algorithm for checking dominance in a standard, treestructured CPnet. 
Probabilistic Data Structure  Probabilistic data structures are a group of data structures that are extremely useful for big data and streaming applications. Generally speaking, these data structures use hash functions to randomize and compactly represent a set of items. Collisions are ignored but errors can be wellcontrolled under certain threshold. Comparing with errorfree approaches, these algorithms use much less memory and have constant query time. They usually support union and intersection operations and therefore can be easily parallelized. http://…/Category:Probabilistic_data_structures 
Probabilistic DClustering  We present a new iterative method for probabilistic clustering of data. Given clusters, their centers and the distances of data points from these centers, the probability of cluster membership at any point is assumed inversely proportional to the distance from (the center of) the cluster in question. This assumption is our working principle. The method is a generalization, to several centers, of theWeiszfeld method for solving the FermatWeber location problem. At each iteration, the distances (Euclidean, Mahalanobis, etc.) from the cluster centers are computed for all data points, and the centers are updated as convex combinations of these points, with weights determined by the above principle. Computations stop when the centers stop moving. Progress is monitored by the joint distance function, a measure of distance from all cluster centers, that evolves during the iterations, and captures the data in its low contours. The method is simple, fast (requiring a small number of cheap iterations) and insensitive to outliers. 
Probabilistic Dependency Networks  ➚ “Dependency Network” 
Probabilistic Distance Clustering (PDclustering) 
Probabilistic distance clustering (PDclustering) is an iterative, distribution free, probabilistic clustering method. PDclustering assigns units to a cluster according to their probability of membership, under the constraint that the product of the probability and the distance of each point to any cluster centre is a constant. PDclustering is a flexible method that can be used with nonspherical clusters, outliers, or noisy data. Facto PDclustering (FPDC) is a recently proposed factor clustering method that involves a linear transformation of variables and a cluster optimizing the PDclustering criterion. It allows clustering of high dimensional data sets. 
Probabilistic Eigenvalue Shaping (PES) 
We consider a nonlinear Fourier transform (NFT)based transmission scheme, where data is embedded into the imaginary part of the nonlinear discrete spectrum. Inspired by probabilistic amplitude shaping, we propose a probabilistic eigenvalue shaping (PES) scheme as a means to increase the data rate of the system. We exploit the fact that for an NFTbased transmission scheme the pulses in the time domain are of unequal duration by transmitting them with a dynamic symbol interval and find a capacityachieving distribution. The PES scheme shapes the information symbols according to the capacityachieving distribution and transmits them together with the parity symbols at the output of a lowdensity paritycheck encoder, suitably modulated, via timesharing. We furthermore derive an achievable rate for the proposed PES scheme. We verify our results with simulations of the discretetime model as well as with splitstep Fourier simulations. 
Probabilistic Event Calculus (PEC) 
We present PEC, an Event Calculus (EC) style action language for reasoning about probabilistic causal and narrative information. It has an action language style syntax similar to that of the EC variant ModularE. Its semantics is given in terms of possible worlds which constitute possible evolutions of the domain, and builds on that of EFEC, an epistemic extension of EC. We also describe an ASP implementation of PEC and show the sense in which this is sound and complete. 
Probabilistic Generative Adversarial Network (PGAN) 
We introduce the Probabilistic Generative Adversarial Network (PGAN), a new GAN variant based on a new kind of objective function. The central idea is to integrate a probabilistic model (a Gaussian Mixture Model, in our case) into the GAN framework which supports a new kind of loss function (based on likelihood rather than classification loss), and at the same time gives a meaningful measure of the quality of the outputs generated by the network. Experiments with MNIST show that the model learns to generate realistic images, and at the same time computes likelihoods that are correlated with the quality of the generated images. We show that PGAN is better able to cope with instability problems that are usually observed in the GAN training procedure. We investigate this from three aspects: the probability landscape of the discriminator, gradients of the generator, and the perfect discriminator problem. 
Probabilistic Graphical Model (PGM) 
Uncertainty is unavoidable in realworld applications: we can almost never predict with certainty what will happen in the future, and even in the present and the past, many important aspects of the world are not observed with certainty. Probability theory gives us the basic foundation to model our beliefs about the different possible states of the world, and to update these beliefs as new evidence is obtained. These beliefs can be combined with individual preferences to help guide our actions, and even in selecting which observations to make. While probability theory has existed since the 17th century, our ability to use it effectively on large problems involving many interrelated variables is fairly recent, and is due largely to the development of a framework known as Probabilistic Graphical Models (PGMs). This framework, which spans methods such as Bayesian networks and Markov random fields, uses ideas from discrete data structures in computer science to efficiently encode and manipulate probability distributions over highdimensional spaces, often involving hundreds or even many thousands of variables. These methods have been used in an enormous range of application domains, which include: web search, medical and fault diagnosis, image understanding, reconstruction of biological networks, speech recognition, natural language processing, decoding of messages sent over a noisy communication channel, robot navigation, and many more. ➚ “Graphical Model” 
Probabilistic Latent Feature Models  Probabilistic Latent Feature Models assume that objects and attributes can be represented as a set of binary latent features and that the strength of objectattribute associations can be explained as a noncompensatory (e.g., disjunctive or conjunctive) mapping of latent features. plfm 
Probabilistic Latent Semantic Analysis (PLSA) 
Probabilistic latent semantic analysis (PLSA), also known as probabilistic latent semantic indexing (PLSI, especially in information retrieval circles) is a statistical technique for the analysis of twomode and cooccurrence data. In effect, one can derive a lowdimensional representation of the observed variables in terms of their affinity to certain hidden variables, just as in latent semantic analysis. PLSA evolved from latent semantic analysis. Compared to standard latent semantic analysis which stems from linear algebra and downsizes the occurrence tables (usually via a singular value decomposition), probabilistic latent semantic analysis is based on a mixture decomposition derived from a latent class model. 
Probabilistic Learning in Control (PILCO) 
Synthesizing Neural Network Controllers with Probabilistic Model based Reinforcement Learning 
Probabilistic Metric Space  A probabilistic metric space is a generalization of metric spaces where the distance is no longer valued in nonnegative real numbers, but instead is valued in distribution functions. 
Probabilistic Neural Network (PNN) 
A probabilistic neural network (PNN) is a feedforward neural network, which was derived from the Bayesian network and a statistical algorithm called Kernel Fisher discriminant analysis. It was introduced by D.F. Specht in the early 1990s. In a PNN, the operations are organized into a multilayered feedforward network with four layers: · Input layer · Hidden layer · Pattern layer/Summation layer · Output layer 
Probabilistic Neural Programs  We present probabilistic neural programs, a framework for program induction that permits flexible specification of both a computational model and inference algorithm while simultaneously enabling the use of deep neural networks. Probabilistic neural programs combine a computation graph for specifying a neural network with an operator for weighted nondeterministic choice. Thus, a program describes both a collection of decisions as well as the neural network architecture used to make each one. We evaluate our approach on a challenging diagram question answering task where probabilistic neural programs correctly execute nearly twice as many programs as a baseline model. 
Probabilistic Partial Least Squares (PPLS) 
With a rapid increase in volume and complexity of data sets there is a need for methods that can extract useful information in these data sets. Dimension reduction approaches such as Partial least squares (PLS) are increasingly being utilized for finding relationships between two data sets. However these methods often lack a probabilistic formulation, hampering development of more flexible models. Moreover dimension reduction methods in general suffer from identifiability problems, causing difficulties in combining and comparing results from multiple studies. We propose Probabilistic PLS (PPLS) as an extension of PLS to model the overlap between two data sets. The likelihood formulation provides opportunities to address issues typically present in data, such as missing entries and heterogeneity between subjects. We show that the PPLS parameters are identifiable up to sign. We derive Maximum Likelihood estimators that respect the identifiability conditions by using an EM algorithm with a constrained optimization in the M step. A simulation study is conducted and we observe a good performance of the PPLS estimates in various scenarios, when compared to PLS estimates. Most notably the estimates seem to be robust against departures from normality. To illustrate the PPLS model, we apply it to real IgG glycan data from two cohorts. We infer the contributions of each variable to the correlated part and observe very similar behavior across cohorts. 
Probabilistic Programming  A probabilistic programming language is a highlevel language that makes it easy for a developer to define probability models and then ‘solve’ these models automatically. These languages incorporate random events as primitives and their runtime environment handles inference. Now, it is a matter of programming that enables a clean separation between modeling and inference. This can vastly reduce the time and effort associated with implementing new models and understanding data. Just as highlevel programming languages transformed developer productivity by abstracting away the details of the processor and memory architecture, probabilistic languages promise to free the developer from the complexities of highperformance probabilistic inference. ProbabilisticProgramming.org 
Probabilistic Programming for Advancing Machine Learning (PPAML) 
Machine learning – the ability of computers to understand data, manage results and infer insights from uncertain information – is the force behind many recent revolutions in computing. Email spam filters, smartphone personal assistants and selfdriving vehicles are all based on research advances in machine learning. Unfortunately, even as the demand for these capabilities is accelerating, every new application requires a Herculean effort. Teams of hardtofind experts must build expensive, custom tools that are often painfully slow and can perform unpredictably against large, complex data sets. The Probabilistic Programming for Advancing Machine Learning (PPAML) program aims to address these challenges. Probabilistic programming is a new programming paradigm for managing uncertain information. Using probabilistic programming languages, PPAML seeks to greatly increase the number of people who can successfully build machine learning applications and make machine learning experts radically more effective. Moreover, the program seeks to create more economical, robust and powerful applications that need less data to produce more accurate results – features inconceivable with today’s technology. http://…/wiki 
Probabilistic Programming Language (PPL) 
A probabilistic programming language (PPL) is a programming language designed to describe probabilistic models and then perform inference in those models. PPLs are closely related to graphical models and Bayesian networks, but are more expressive and flexible. Probabilistic programming represents an attempt to ‘ general purpose programming with probabilistic modeling.’ Probabilistic reasoning is a foundational technology of machine learning. It is used by companies such as Google, Amazon.com and Microsoft. Probabilistic reasoning has been used for predicting stock prices, recommending movies, diagnosing computers, detecting cyber intrusions and image detection. PPLs often extend from a basic language. The choice of underlying basic language depends on the similarity of the model to the basic language’s ontology, as well as commercial considerations and personal preference. For instance, Dimple and Chimple are based on Java, Infer.NET is based on .NET framework, while PRISM extends from Prolog. However, some PPLs such as WinBUGS and Stan offer a selfcontained language, with no obvious origin in another language. Several PPLs are in active development, including some in beta test. 
Probabilistic Record Linkage  Probabilistic record linkage (PRL) is the process of determining which records in two databases correspond to the same underlying entity in the absence of a unique identifier. Bayesian solutions to this problem provide a powerful mechanism for propagating uncertainty due to uncertain links between records (via the posterior distribution). However, computational considerations severely limit the practical applicability of existing Bayesian approaches. We propose a new computational approach, providing both a fast algorithm for deriving point estimates of the linkage structure that properly account for onetoone matching and a restricted MCMC algorithm that samples from an approximate posterior distribution. Our advances make it possible to perform Bayesian PRL for larger problems, and to assess the sensitivity of results to varying prior specifications. We demonstrate the methods on simulated data and an application to a postenumeration survey for coverage estimation in the Italian census. 
Probabilistic Supervised Learning  Predictive modelling and supervised learning are central to modern data science. With predictions from an everexpanding number of supervised blackbox strategies – e.g., kernel methods, random forests, deep learning aka neural networks – being employed as a basis for decision making processes, it is crucial to understand the statistical uncertainty associated with these predictions. As a general means to approach the issue, we present an overarching framework for blackbox prediction strategies that not only predict the target but also their own predictions’ uncertainty. Moreover, the framework allows for fair assessment and comparison of disparate prediction strategies. For this, we formally consider strategies capable of predicting full distributions from feature variables, socalled probabilistic supervised learning strategies. Our work draws from prior work including Bayesian statistics, information theory, and modern supervised machine learning, and in a novel synthesis leads to (a) new theoretical insights such as a probabilistic biasvariance decomposition and an entropic formulation of prediction, as well as to (b) new algorithms and metaalgorithms, such as composite prediction strategies, probabilistic boosting and bagging, and a probabilistic predictive independence test. Our blackbox formulation also leads (c) to a new modular interface view on probabilistic supervised learning and a modelling workflow API design, which we have implemented in the newly released skpro machine learning toolbox, extending the familiar modelling interface and metamodelling functionality of sklearn. The skpro package provides interfaces for construction, composition, and tuning of probabilistic supervised learning strategies, together with orchestration features for validation and comparison of any such strategy – be it frequentist, Bayesian, or other. 
Probability Collectives (PC) 
Probability Collectives is a broad framework for analyzing and controlling distributed systems. It is based on deep formal connections relating game theory, statistical physics, and distributed control/optimization. http://…/Library http://…/9783319159997 
Probability Density Function (PDF) 
In probability theory, a probability density function (pdf), or density of a continuous random variable, is a function that describes the relative likelihood for this random variable to take on a given value. The probability of the random variable falling within a particular range of values is given by the integral of this variable’s density over that range – that is, it is given by the area under the density function but above the horizontal axis and between the lowest and greatest values of the range. The probability density function is nonnegative everywhere, and its integral over the entire space is equal to one. 
Probability Mass Function (PMF) 
In probability theory and statistics, a probability mass function (pmf) is a function that gives the probability that a discrete random variable is exactly equal to some value. The probability mass function is often the primary means of defining a discrete probability distribution, and such functions exist for either scalar or multivariate random variables whose domain is discrete. A probability mass function differs from a probability density function (pdf) in that the latter is associated with continuous rather than discrete random variables; the values of the latter are not probabilities as such: a pdf must be integrated over an interval to yield a probability. Ake 
Probability of Default (PD) 
Probability of default (PD) is a financial term describing the likelihood of a default over a particular time horizon. It provides an estimate of the likelihood that a borrower will be unable to meet its debt obligations. PD is used in a variety of credit analyses and risk management frameworks. Under Basel II, it is a key parameter used in the calculation of economic capital or regulatory capital for a banking institution. LDPD 
Probability of Default Calibration  ➘ “Probability of Default” LDPD 
Probability of Exceedance (POE) 
The ‘probability of exceedance’ curves give the forecast probability that a temperature or precipitation quantity, shown on the horizontal axis, will be exceeded at the location in question, for the given season at the given lead time. http://…5868_calculateexceedanceprobability.htm https://…/risk.pdf http://…/INTR.html http://…/Cumulative_frequency_analysis http://…/poe_index.php?lead=1&var=t http://…/Survival_function 
Probability of Informed Trading (PIN) 
Introduced by Easley et. al. (1996) <doi:10.1111/j.15406261.1996.tb04074.x> . InfoTrad 
Probability Theory  Probability theory is the branch of mathematics concerned with probability, the analysis of random phenomena. The central objects of probability theory are random variables, stochastic processes, and events: mathematical abstractions of nondeterministic events or measured quantities that may either be single occurrences or evolve over time in an apparently random fashion. If an individual coin toss or the roll of dice is considered to be a random event, then if repeated many times the sequence of random events will exhibit certain patterns, which can be studied and predicted. Two representative mathematical results describing such patterns are the law of large numbers and the central limit theorem. As a mathematical foundation for statistics, probability theory is essential to many human activities that involve quantitative analysis of large sets of data. Methods of probability theory also apply to descriptions of complex systems given only partial knowledge of their state, as in statistical mechanics. A great discovery of twentieth century physics was the probabilistic nature of physical phenomena at atomic scales, described in quantum mechanics. 
Probably Approximately Correct (PAC) 
Probably Approximately Correct (PAC) Bayes framework (McAllester, 1999). ➘ “Probably Approximately Correct Learning” Datadependent PACBayes priors via differential privacy 
Probably Approximately Correct Learning (PAC Learning,WARL) 
In computational learning theory, probably approximately correct learning (PAC learning) is a framework for mathematical analysis of machine learning. It was proposed in 1984 by Leslie Valiant. In this framework, the learner receives samples and must select a generalization function (called the hypothesis) from a certain class of possible functions. The goal is that, with high probability (the “probably” part), the selected function will have low generalization error (the “approximately correct” part). The learner must be able to learn the concept given any arbitrary approximation ratio, probability of success, or distribution of the samples. The model was later extended to treat noise (misclassified samples). An important innovation of the PAC framework is the introduction of computational complexity theory concepts to machine learning. In particular, the learner is expected to find efficient functions (time and space requirements bounded to a polynomial of the example size), and the learner itself must implement an efficient procedure (requiring an example count bounded to a polynomial of the concept size, modified by the approximation and likelihood bounds). 
Probably Certifiably Correct Algorithm (PCC) 
Many optimization problems of interest are known to be intractable, and while there are often heuristics that are known to work on typical instances, it is usually not easy to determine a posteriori whether the optimal solution was found. In this short note, we discuss algorithms that not only solve the problem on typical instances, but also provide a posteriori certificates of optimality, probably certifiably correct (PCC) algorithms. As an illustrative example, we present a fast PCC algorithm for minimum bisection under the stochastic block model and briefly discuss other examples. 
probit  In probability theory and statistics, the probit function is the quantile function associated with the standard normal distribution. It has applications in exploratory statistical graphics and specialized regression modeling of binary response variables. 
Procedural Content Generation via Machine Learning (PCGML) 
This survey explores Procedural Content Generation via Machine Learning (PCGML), defined as the generation of game content using machine learning models trained on existing content. As the importance of PCG for game development increases, researchers explore new avenues for generating highquality content with or without human involvement; this paper addresses the relatively new paradigm of using machine learning (in contrast with searchbased, solverbased, and constructive methods). We focus on what is most often considered functional game content such as platformer levels, game maps, interactive fiction stories, and cards in collectible card games, as opposed to cosmetic content such as sprites and sound effects. In addition to using PCG for autonomous generation, cocreativity, mixedinitiative design, and compression, PCGML is suited for repair, critique, and content analysis because of its focus on modeling existing content. We discuss various data sources and representations that affect the resulting generated content. Multiple PCGML methods are covered, including neural networks, long shortterm memory (LSTM) networks, autoencoders, and deep convolutional networks; Markov models, $n$grams, and multidimensional Markov chains; clustering; and matrix factorization. Finally, we discuss open problems in the application of PCGML, including learning from small datasets, lack of training data, multilayered learning, styletransfer, parameter tuning, and PCG as a game mechanic. 
Process Mining  Process mining is a process management technique that allows for the analysis of business processes based on event logs. The basic idea is to extract knowledge from event logs recorded by an information system. Process mining aims at improving this by providing techniques and tools for discovering process, control, data, organizational, and social structures from event logs. Process Mining 
Procrustes Analysis  In statistics, Procrustes analysis is a form of statistical shape analysis used to analyse the distribution of a set of shapes. The name Procrustes refers to a bandit from Greek mythology who made his victims fit his bed either by stretching their limbs or cutting them off. 
Product Community Question Answering (PCQA) 
Product Community Question Answering (PCQA) provides useful information about products and their features (aspects) that may not be well addressed by product descriptions and reviews. We observe that a product’s compatibility issues with other products are frequently discussed in PCQA and such issues are more frequently addressed in accessories, i.e., via a yes/no question ‘Does this mouse work with windows 10?’. In this paper, we address the problem of extracting compatible and incompatible products from yes/no questions in PCQA. This problem can naturally have a twostage framework: first, we perform Complementary Entity (product) Recognition (CER) on yes/no questions; second, we identify the polarities of yes/no answers to assign the complementary entities a compatibility label (compatible, incompatible or unknown). We leverage an existing unsupervised method for the first stage and a 3class classifier by combining a distant PUlearning method (learning from positive and unlabeled examples) together with a binary classifier for the second stage. The benefit of using distant PUlearning is that it can help to expand more implicit yes/no answers without using any human annotated data. We conduct experiments on 4 products to show that the proposed method is effective. 
Product Intelligence  What makes Product Intelligence interesting to us as a field of focus is that it is a superb application for Big Data – providing highly targeted, real time intelligence that serves up insights INSIDE of the new product development process at the exact moment when conclusive, authoritative insight is most needed. When it’s literally make or break. What Is It? What differentiates product intelligence from other research? It provides realtime, datadriven insights for new product development decisions and innovation initiatives based on the large multiples – the scale of big data. What features will attract consumers to my product? How do customers perceive it relative to competing products? In which geographic markets will it be the most successful? Product intelligence can tell you this. Imagine this…you’re developing a personal hair care product and you’re looking for a particular niche, let’s say hair color in China, which could be called a mature market. You can listen to 25 people or perhaps 500 or 5000 in focus groups or online panels. Or you can listen to 500,000. That’s the unique advantage and why big data got the name Big. 
Product Logarithm  
Professor Forcing  The Teacher Forcing algorithm trains recurrent networks by supplying observed sequence values as inputs during training and using the network’s own onestepahead predictions to do multistep sampling. We introduce the Professor Forcing algorithm, which uses adversarial domain adaptation to encourage the dynamics of the recurrent network to be the same when training the network and when sampling from the network over multiple time steps. We apply Professor Forcing to language modeling, vocal synthesis on raw waveforms, handwriting generation, and image generation. Empirically we find that Professor Forcing acts as a regularizer, improving test likelihood on character level Penn Treebank and sequential MNIST. We also find that the model qualitatively improves samples, especially when sampling for a large number of time steps. This is supported by human evaluation of sample quality. Tradeoffs between Professor Forcing and Scheduled Sampling are discussed. We produce TSNEs showing that Professor Forcing successfully makes the dynamics of the network during training and sampling more similar. 
Profiling  In software engineering, profiling (“program profiling”, “software profiling”) is a form of dynamic program analysis that measures, for example, the space (memory) or time complexity of a program, the usage of particular instructions, or frequency and duration of function calls. The most common use of profiling information is to aid program optimization. Profiling is achieved by instrumenting either the program source code or its binary executable form using a tool called a profiler (or code profiler). A number of different techniques may be used by profilers, such as eventbased, statistical, instrumented, and simulation methods. 
Progressive Expectation Maximization (PEM) 

Projection Matrix  A projection matrix P is an nxn square matrix that gives a vector space projection from Rn to a subspace W. The columns of P are the projections of the standard basis vectors, and W is the image of P. A square matrix P is a projection matrix iff P^2 = P. 
Projection Pursuit (PP) 
Projection pursuit (PP) is a type of statistical technique which involves finding the most “interesting” possible projections in multidimensional data. Often, projections which deviate more from a normal distribution are considered to be more interesting. As each projection is found, the data are reduced by removing the component along that projection, and the process is repeated to find new projections; this is the “pursuit” aspect that motivated the technique known as matching pursuit. The idea of projection pursuit is to locate the projection or projections from highdimensional space to lowdimensional space that reveal the most details about the structure of the data set. Once an interesting set of projections has been found, existing structures (clusters, surfaces, etc.) can be extracted and analyzed separately. Projection pursuit has been widely use for blind source separation, so it is very important in independent component analysis. Projection pursuit seek one projection at a time such that the extracted signal is as nonGaussian as possible 
Projection Pursuit Classification Forest (PPforest) 
PPforest 
Projection Pursuit Classification Tree (PPtree) 
In this paper, we propose a new classification tree, the projection pursuit classification tree (PPtree). It combines tree structured methods with projection pursuit dimension reduction. This tree is originated from the projection pursuit method for classification. In each node, one of the projection pursuit indices using class information – LDA, L r or PDA indices – is maximized to find the projection with the most separated group view. On this optimized data projection, the tree splitting criteria are applied to separate the groups. These steps are iterated until the last two classes are separated. The main advantages of this tree is that it effectively uses correlation between variables to find separations, and it has visual representation of the differences between groups in a 1dimensional space that can be used to interpret results. Also in each node of the tree, the projection coefficients represent the variable importance for the group separation. This information is very helpful to select variables in classification problems. PPtreeViz 
Projection Weighted Canonical Correlation Analysis (projection weighted CCA) 
Comparing different neural network representations and determining how representations evolve over time remain challenging open questions in our understanding of the function of neural networks. Comparing representations in neural networks is fundamentally difficult as the structure of representations varies greatly, even across groups of networks trained on identical tasks, and over the course of training. Here, we develop projection weighted CCA (Canonical Correlation Analysis) as a tool for understanding neural networks, building off of SVCCA, a recently proposed method. We first improve the core method, showing how to differentiate between signal and noise, and then apply this technique to compare across a group of CNNs, demonstrating that networks which generalize converge to more similar representations than networks which memorize, that wider networks converge to more similar solutions than narrow networks, and that trained networks with identical topology but different learning rates converge to distinct clusters with diverse representations. We also investigate the representational dynamics of RNNs, across both training and sequential timesteps, finding that RNNs converge in a bottomup pattern over the course of training and that the hidden state is highly variable over the course of a sequence, even when accounting for linear transforms. Together, these results provide new insights into the function of CNNs and RNNs, and demonstrate the utility of using CCA to understand representations. 
ProjectionNet  Deep neural networks have become ubiquitous for applications related to visual recognition and language understanding tasks. However, it is often prohibitive to use typical neural networks on devices like mobile phones or smart watches since the model sizes are huge and cannot fit in the limited memory available on such devices. While these devices could make use of machine learning models running on highperformance data centers with CPUs or GPUs, this is not feasible for many applications because data can be privacy sensitive and inference needs to be performed directly ‘on’ device. We introduce a new architecture for training compact neural networks using a joint optimization framework. At its core lies a novel objective that jointly trains using two different types of networks–a full trainer neural network (using existing architectures like Feedforward NNs or LSTM RNNs) combined with a simpler ‘projection’ network that leverages random projections to transform inputs or intermediate representations into bits. The simpler network encodes lightweight and efficienttocompute operations in bit space with a low memory footprint. The two networks are trained jointly using backpropagation, where the projection network learns from the full network similar to apprenticeship learning. Once trained, the smaller network can be used directly for inference at low memory and computation cost. We demonstrate the effectiveness of the new approach at significantly shrinking the memory requirements of different types of neural networks while preserving good accuracy on visual recognition and text classification tasks. We also study the question ‘how many neural bits are required to solve a given task?’ using the new framework and show empirical results contrasting model predictive capacity (in bits) versus accuracy on several datasets. 
Projective Sparse Latent Space Network Models  In typical latentspace network models, nodes have latent positions, which are all drawn independently from a common distribution. As a consequence, the number of edges in a network scales quadratically with the number of nodes, resulting in a dense graph sequence as the number of nodes grows. We propose an adjustment to latentspace network models which allows the number edges to scale linearly with the number of nodes, to scale quadratically, or at any intermediate rate. Our models also form projective families, making statistical inference and prediction welldefined. Built through point processes, our models are related to both the Poisson random connection model and the graphex framework. 
Propensity Score Matching (PSM) 
In the statistical analysis of observational data, propensity score matching (PSM) is a statistical matching technique that attempts to estimate the effect of a treatment, policy, or other intervention by accounting for the covariates that predict receiving the treatment. PSM attempts to reduce the bias due to confounding variables that could be found in an estimate of the treatment effect obtained from simply comparing outcomes among units that received the treatment versus those that did not. The technique was first published by Paul Rosenbaum and Donald Rubin in 1983, and implements the Rubin causal model for observational studies. The possibility of bias arises because the apparent difference in outcome between these two groups of units may depend on characteristics that affected whether or not a unit received a given treatment instead of due to the effect of the treatment per se. In randomized experiments, the randomization enables unbiased estimation of treatment effects; for each covariate, randomization implies that treatmentgroups will be balanced on average, by the law of large numbers. Unfortunately, for observational studies, the assignment of treatments to research subjects is typically not random. Matching attempts to mimic randomization by creating a sample of units that received the treatment that is comparable on all observed covariates to a sample of units that did not receive the treatment. For example, one may be interested to know the consequences of smoking or the consequences of going to university. The people ‘treated’ are simply those – the smokers, or the university graduates – who in the course of everyday life undergo whatever it is that is being studied by the researcher. In both of these cases it is unfeasible (and perhaps unethical) to randomly assign people to smoking or a university education, so observational studies are required. The treatment effect estimated by simply comparing a particular outcome – rate of cancer or life time earnings – between those who smoked and did not smoke or attended university and did not attend university would be biased by any factors that predict smoking or university attendance, respectively. PSM attempts to control for these differences to make the groups receiving treatment and nottreatment more comparable. 
Property Graph  The term property graph has come to denote an attributed, multirelational graph. That is, a graph where the edges are labeled and both vertices and edges can have any number of key/value properties associated with them. 
Prophet  Today Facebook is open sourcing Prophet, a forecasting tool available in Python and R. Forecasting is a data science task that is central to many activities within an organization. For instance, large organizations like Facebook must engage in capacity planning to efficiently allocate scarce resources and goal setting in order to measure performance relative to a baseline. Producing high quality forecasts is not an easy problem for either machines or for most analysts. We have observed two main themes in the practice of creating a variety of business forecasts: · Completely automatic forecasting techniques can be brittle and they are often too inflexible to incorporate useful assumptions or heuristics. · Analysts who can produce high quality forecasts are quite rare because forecasting is a specialized data science skill requiring substantial experience. {Link@https://…/prophetProphet: Automatic Forecasting Procedure} 
Proportional Hazards Model  Proportional hazards models are a class of survival models in statistics. Survival models relate the time that passes before some event occurs to one or more covariates that may be associated with that quantity of time. In a proportional hazards model, the unique effect of a unit increase in a covariate is multiplicative with respect to the hazard rate. For example, taking a drug may halve one’s hazard rate for a stroke occurring, or, changing the material from which a manufactured component is constructed may double its hazard rate for failure. Other types of survival models such as accelerated failure time models do not exhibit proportional hazards. The accelerated failure time model describes a situation where the biological or mechanical life history of an event is accelerated. 
Proportional Subdistribution Hazards (PSH) 
The proportional hazards model for the subdistribution that Fine and Gray (1999) propose aims at modeling the cumulative incidence of an event of interest. ➘ “Proportional Hazards Model” crrp 
Protocols and Structures for Inference (PSI) 
The Protocols and Structures for Inference (PSI) project has developed an architecture for presenting machine learning algorithms, their inputs (data) and outputs (predictors) as resourceoriented RESTful web services in order to make machine learning technology accessible to a broader range of people than just machine learning researchers. Currently, many machine learning implementations (e.g., in toolkits such as Weka, Orange, Elefant, Shogun, SciKit.Learn, etc.) are tied to specific choices of programming language, and data sets to particular formats (e.g., CSV, svmlight, ARFF). This limits their accessibility, since new users may have to learn a new programming language to run a learner or write a parser for a new data format, and their interoperability, requiring data format converters and multiple language platforms. While there is also a growing number of machine learning web services, each has its own API and is tailored to suit a different subset of machine learning activities. Standardizing the World of Machine Learning Web Service APIs 
Proximal Alternating Direction Network  Deep learning models have gained great success in many realworld applications. However, most existing networks are typically designed in heuristic manners, thus lack of rigorous mathematical principles and derivations. Several recent studies build deep structures by unrolling a particular optimization model that involves task information. Unfortunately, due to the dynamic nature of network parameters, their resultant deep propagation networks do \emph{not} possess the nice convergence property as the original optimization scheme does. This paper provides a novel proximal unrolling framework to establish deep models by integrating experimentally verified network architectures and rich cues of the tasks. More importantly, we \emph{prove in theory} that 1) the propagation generated by our unrolled deep model globally converges to a criticalpoint of a given variational energy, and 2) the proposed framework is still able to learn priors from training data to generate a convergent propagation even when task information is only partially available. Indeed, these theoretical results are the best we can ask for, unless stronger assumptions are enforced. Extensive experiments on various realworld applications verify the theoretical convergence and demonstrate the effectiveness of designed deep models. 
Proximal Policy Optimization  We propose a new family of policy gradient methods for reinforcement learning, which alternate between sampling data through interaction with the environment, and optimizing a ‘surrogate’ objective function using stochastic gradient ascent. Whereas standard policy gradient methods perform one gradient update per data sample, we propose a novel objective function that enables multiple epochs of minibatch updates. The new methods, which we call proximal policy optimization (PPO), have some of the benefits of trust region policy optimization (TRPO), but they are much simpler to implement, more general, and have better sample complexity (empirically). Our experiments test PPO on a collection of benchmark tasks, including simulated robotic locomotion and Atari game playing, and we show that PPO outperforms other online policy gradient methods, and overall strikes a favorable balance between sample complexity, simplicity, and walltime. 
Proximity Measure  In order to understand and act on situations that are represented by a set of objects, very often we are required to compare them. Humans perform this comparison subconsciously using the brain. In the context of artificial intelligence, however, we should be able to describe how the machine might perform this comparison. In this context, one of the basic elements that must be specified is the proximity measure between objects. proxy 
Proximity Variational Inference (PVI) 
Variational inference is a powerful approach for approximate posterior inference. However, it is sensitive to initialization and can be subject to poor local optima. In this paper, we develop proximity variational inference (PVI). PVI is a new method for optimizing the variational objective that constrains subsequent iterates of the variational parameters to robustify the optimization path. Consequently, PVI is less sensitive to initialization and optimization quirks and finds better local optima. We demonstrate our method on three proximity statistics. We study PVI on a Bernoulli factor model and sigmoid belief network with both real and synthetic data and compare to deterministic annealing (Katahira et al., 2008). We highlight the flexibility of PVI by designing a proximity statistic for Bayesian deep learning models such as the variational autoencoder (Kingma and Welling, 2014; Rezende et al., 2014). Empirically, we show that PVI consistently finds better local optima and gives better predictive performance. 
ProximityAmbiguity Sensitive (PAS) 
Distributed representations of words (aka word embedding) have proven helpful in solving natural language processing (NLP) tasks. Training distributed representations of words with neural networks has lately been a major focus of researchers in the field. Recent work on word embedding, the Continuous BagofWords (CBOW) model and the Continuous Skipgram (Skipgram) model, have produced particularly impressive results, significantly speeding up the training process to enable word representation learning from largescale data. However, both CBOW and Skipgram do not pay enough attention to word proximity in terms of model or word ambiguity in terms of linguistics. In this paper, we propose ProximityAmbiguity Sensitive (PAS) models (i.e. PAS CBOW and PAS Skipgram) to produce high quality distributed representations of words considering both word proximity and ambiguity. From the model perspective, we introduce proximity weights as parameters to be learned in PAS CBOW and used in PAS Skipgram. By better modeling word proximity, we reveal the strength of poolingstructured neural networks in word representation learning. The proximitysensitive pooling layer can also be applied to other neural network applications that employ pooling layers. From the linguistics perspective, we train multiple representation vectors per word. Each representation vector corresponds to a particular group of POS tags of the word. By using PAS models, we achieved a 16.9% increase in accuracy over stateoftheart models. 
PRUNE  The majority of contemporary mobile devices and personal computers are based on heterogeneous computing platforms that consist of a number of CPU cores and one or more Graphics Processing Units (GPUs). Despite the high volume of these devices, there are few existing programming frameworks that target full and simultaneous utilization of all CPU and GPU devices of the platform. This article presents a dataflowflavored Model of Computation (MoC) that has been developed for deploying signal processing applications to heterogeneous platforms. The presented MoC is dynamic and allows describing applications with data dependent runtime behavior. On top of the MoC, formal design rules are presented that enable application descriptions to be simultaneously dynamic and decidable. Decidability guarantees compiletime application analyzability for deadlock freedom and bounded memory. The presented MoC and the design rules are realized in a novel Open Source programming environment ‘PRUNE’ and demonstrated with representative application examples from the domains of image processing, computer vision and wireless communications. Experimental results show that the proposed approach outperforms the stateoftheart in analyzability, flexibility and performance. 
Pruned Exact Linear Time (PELT) 
This approach is based on the algorithm of Jackson et al. (2005 (‘An algorithm for optimal partitioning of data on an interval’)) , but involves a pruning step within the dynamic program. This pruning reduces the computational cost of the method, but does not affect the exactness of the resulting segmentation. It can be applied to find changepoints under a range of statistical criteria such as penalised likelihood, quasilikelihood (Braun et al., 2000 (‘Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation’)) and cumulative sum of squares (Inclan and Tiao, 1994 (‘Use of cumulative sums of squares for retrospective detection of changes of variance.’); Picard et al., 2011 (‘Joint segmentation, calling and normalization of multiple cgh profiles’)). In simulations we compare PELT with both Binary Segmentation and Optimal Partitioning. We show that PELT can be calculated orders of magnitude faster than Optimal Partitioning, particularly for long data sets. Whilst asymptotically PELT can be quicker, we find that in practice Binary Segmentation is quicker on the examples we consider, and we believe this would be the case in almost all applications. However, we show that PELT leads to a substantially more accurate segmentation than Binary Segmentation. changepoint 
Pruning  Pruning is a technique in machine learning that reduces the size of decision trees by removing sections of the tree that provide little power to classify instances. The dual goal of pruning is reduced complexity of the final classifier as well as better predictive accuracy by the reduction of overfitting and removal of sections of a classifier that may be based on noisy or erroneous data. 
PruningKOSR  Motivated by many practical applications in logistics and mobilityasaservice, we study the topk optimal sequenced routes (KOSR) querying on large, general graphs where the edge weights may not satisfy the triangle inequality, e.g., road network graphs with travel times as edge weights. The KOSR querying strives to find the topk optimal routes (i.e., with the topk minimal total costs) from a given source to a given destination, which must visit a number of vertices with specific vertex categories (e.g., gas stations, restaurants, and shopping malls) in a particular order (e.g., visiting gas stations before restaurants and then shopping malls). To efficiently find the topk optimal sequenced routes, we propose two algorithms PruningKOSR and StarKOSR. In PruningKOSR, we define a dominance relationship between two partiallyexplored routes. The partiallyexplored routes that can be dominated by other partiallyexplored routes are postponed being extended, which leads to a smaller searching space and thus improves efficiency. In StarKOSR, we further improve the efficiency by extending routes in an A* manner. With the help of a judiciously designed heuristic estimation that works for general graphs, the cost of partially explored routes to the destination can be estimated such that the qualified complete routes can be found early. In addition, we demonstrate the high extensibility of the proposed algorithms by incorporating Hop Labeling, an effective label indexing technique for shortest path queries, to further improve efficiency. Extensive experiments on multiple realworld graphs demonstrate that the proposed methods significantly outperform the baseline method. Furthermore, when k=1, StarKOSR also outperforms the stateoftheart method for the optimal sequenced route queries. 
PSDBSCAN  We present PSDBSCAN, a communication efficient parallel DBSCAN algorithm that combines the disjointset data structure and Parameter Server framework in Platform of AI (PAI). Since data points within the same cluster may be distributed over different workers which result in several disjointsets, merging them incurs large communication costs. In our algorithm, we employ a fast global union approach to union the disjointsets to alleviate the communication burden. Experiments over the datasets of different scales demonstrate that PSDBSCAN outperforms the PDSDBSCAN with 210 times speedup on communication efficiency. We have released our PSDBSCAN in an algorithm platform called Platform of AI (PAI – https://pai.base.shuju.aliyun.com ) in Alibaba Cloud. We have also demonstrated how to use the method in PAI. 
PSDVec  PSDVec is a Python/Perl toolbox that learns word embeddings, i.e. the mapping of words in a natural language to continuous vectors which encode the semantic/syntactic regularities between the words. PSDVec implements a word embedding learning method based on a weighted lowrank positive semidefinite approximation. To scale up the learning process, we implement a blockwise online learning algorithm to learn the embeddings incrementally. This strategy greatly reduces the learning time of word embeddings on a large vocabulary, and can learn the embeddings of new words without relearning the whole vocabulary. On 9 word similarity/analogy benchmark sets and 2 Natural Language Processing (NLP) tasks, PSDVec produces embeddings that has the best average performance among popular word embedding tools. PSDVec provides a new option for NLP practitioners. 
Pseudoinverse Learning (PIL) 
A Vest of the Pseudoinverse Learning Algorithm 
PTree Programming  We propose a novel method for automatic program synthesis. PTree Programming represents the program search space through a single probabilistic prototype tree. From this prototype tree we form program instances which we evaluate on a given problem. The error values from the evaluations are propagated through the prototype tree. We use them to update the probability distributions that determine the symbol choices of further instances. The iterative method is applied to several symbolic regression benchmarks from the literature. It outperforms standard Genetic Programming to a large extend. Furthermore, it relies on a concise set of parameters which are held constant for all problems. The algorithm can be employed for most of the typical computational intelligence tasks such as classification, automatic program induction, and symbolic regression. 
PUNlist  In this paper, we propose a novel data structure called PUNlist, which maintains both the utility information about an itemset and utility upper bound for facilitating the processing of mining high utility itemsets. Based on PUNlists, we present a method, called MIP (Mining high utility Itemset using PUNLists), for fast mining high utility itemsets. The efficiency of MIP is achieved with three techniques. First, itemsets are represented by a highly condensed data structure, PUNlist, which avoids costly, repeatedly utility computation. Second, the utility of an itemset can be efficiently calculated by scanning the PUNlist of the itemset and the PUNlists of long itemsets can be fast constructed by the PUNlists of short itemsets. Third, by employing the utility upper bound lying in the PUNlists as the pruning strategy, MIP directly discovers high utility itemsets from the search space, called setenumeration tree, without generating numerous candidates. Extensive experiments on various synthetic and real datasets show that PUNlist is very effective since MIP is at least an order of magnitude faster than recently reported algorithms on average. 
PyCharm  PyCharm is an Integrated Development Environment (IDE) used in computer programming, specifically for the Python language. It is developed by the Czech company JetBrains. It provides code analysis, a graphical debugger, an integrated unit tester, integration with version control systems (VCSes), and supports web development with Django. PyCharm is crossplatform, with Windows, macOS and Linux versions. The Community Edition is released under the Apache License, and there is also Professional Edition released under a proprietary license – this has extra features. 
Pycnophylactic Interpolation  Thiessen polygon’s are an extreme case – we assume homogeneity within the polygons and abrupt changes at the borders This is unlikely to be correct – for example, precipitation or population totals don’t have abrupt changes at arbitrary borders Tobler developed pycnophylactic interpolation to overcome this problem. Here, values are reassigned by masspreserving reallocation. The basic principle is that the volume of the attribute within a region remains the same. However, it is assumed that a better representation of the variation is a smooth surface. The volume (the sum within each region) remains constant, whilst the surface becomes smoother. The solution is iterative – the stopping point is arbitrary. pycno 
PyData  PyData is an educational program of NumFOCUS, a 501(c)3 nonprofit organization in the United States. PyData provides a forum for the international community of users and developers of data analysis tools to share ideas and learn from each other. The global PyData network promotes discussion of best practices, new approaches, and emerging technologies for data management, processing, analytics, and visualization. PyData communities approach data science using many languages, including (but not limited to) Python, Julia, and R. We aim to be an accessible, communitydriven conference, with novice to advanced level presentations. PyData tutorials and talks bring attendees the latest project features along with cuttingedge use cases. 
PyMC3  Probabilistic Programming (PP) allows flexible specification of statistical Bayesian models in code. PyMC3 is a new, opensource PP framework with an intutive and readable, yet powerful, syntax that is close to the natural syntax statisticians use to describe models. It features nextgeneration Markov chain Monte Carlo (MCMC) sampling algorithms such as the NoUTurn Sampler (NUTS; Hoffman, 2014), a selftuning variant of Hamiltonian Monte Carlo (HMC; Duane, 1987). This class of samplers work well on high dimensional and complex posterior distributions and allows many complex models to be fit without specialized knowledge about fitting algorithms. HMC and NUTS take advantage of gradient information from the likelihood to achieve much faster convergence than traditional sampling methods, especially for larger models. NUTS also has several selftuning strategies for adaptively setting the tunable parameters of Hamiltonian Monte Carlo, which means you usually don’t need to have specialized knowledge about how the algorithms work. PyMC3, Stan (Stan Development Team, 2014), and the LaplacesDemon package for R are currently the only PP packages to offer HMC. 
Pyomo  Pyomo is a Pythonbased opensource software package that supports a diverse set of optimization capabilities for formulating, solving, and analyzing optimization models. A core capability of Pyomo is modeling structured optimization applications. Pyomo can be used to define general symbolic problems, create specific problem instances, and solve these instances using commercial and opensource solvers. Pyomo’s modeling objects are embedded within a fullfeatured highlevel programming language providing a rich set of supporting libraries, which distinguishes Pyomo from other algebraic modeling languages like AMPL, AIMMS and GAMS. Pyomo supports a wide range of problem types, including: · Linear programming · Quadratic programming · Nonlinear programming · Mixedinteger linear programming · Mixedinteger quadratic programming · Mixedinteger nonlinear programming · Stochastic programming · Generalized disjunctive programming · Differential algebraic equations · Bilevel programming · Mathematical programs with equilibrium constraints Pyomo also supports iterative analysis and scripting capabilities within a fullfeatured programming language. Further, Pyomo has also proven an effective framework for developing highlevel optimization and analysis tools. For example, the PySP package provides generic solvers for stochastic programming. PySP leverages the fact that Pyomo’s modeling objects are embedded within a fullfeatured highlevel programming language, which allows for transparent parallelization of subproblems using Python parallel communication libraries. 
Pyramid Attention Network (PAN) 
A Pyramid Attention Network(PAN) is proposed to exploit the impact of global contextual information in semantic segmentation. Different from most existing works, we combine attention mechanism and spatial pyramid to extract precise dense features for pixel labeling instead of complicated dilated convolution and artificially designed decoder networks. Specifically, we introduce a Feature Pyramid Attention module to perform spatial pyramid attention structure on highlevel output and combining global pooling to learn a better feature representation, and a Global Attention Upsample module on each decoder layer to provide global context as a guidance of lowlevel features to select category localization details. The proposed approach achieves stateoftheart performance on PASCAL VOC 2012 and Cityscapes benchmarks with a new record of mIoU accuracy 84.0% on PASCAL VOC 2012, while training without COCO dataset. 
pyRecLab  This paper introduces pyRecLab, a software library written in C++ with Python bindings which allows to quickly train, test and develop recommender systems. Although there are several software libraries for this purpose, only a few let developers to get quickly started with the most traditional methods, permitting them to try different parameters and approach several tasks without a significant loss of performance. Among the few libraries that have all these features, they are available in languages such as Java, Scala or C#, what is a disadvantage for less experienced programmers more used to the popular Python programming language. In this article we introduce details of pyRecLab, showing as well performance analysis in terms of error metrics (MAE and RMSE) and train/test time. We benchmark it against the popular Javabased library LibRec, showing similar results. We expect programmers with little experience and people interested in quickly prototyping recommender systems to be benefited from pyRecLab. 
PyStruct  PyStruct aims at being an easytouse structured learning and prediction library. Currently it implements only maxmargin methods and a perceptron, but other algorithms might follow. The learning algorithms implemented in PyStruct have various names, which are often used loosely or differently in different communities. Common names are conditional random fields (CRFs), maximummargin Markov random fields (M3N) or structural support vector machines. If you are new to structured learning, have a look at What is structured learning?. The goal of PyStruct is to provide a welldocumented tool for researchers as well as nonexperts to make use of structured prediction algorithms. The design tries to stay as close as possible to the interface and conventions of scikitlearn. 
Python Package Index (PyPI) 
The Python Package Index is a repository of software for the Python programming language. 
PyTorch  PyTorch is a python package that provides two highlevel features: · Tensor computation (like numpy) with strong GPU acceleration · Deep Neural Networks built on a tapebased autograd system You can reuse your favorite python packages such as numpy, scipy and Cython to extend PyTorch when needed. 
PyUnfold  PyUnfold is a Python package for incorporating imperfections of the measurement process into a data analysis pipeline. In an ideal world, we would have access to the perfect detector: an apparatus that makes no error in measuring a desired quantity. However, in real life, detectors have finite resolutions, characteristic biases that cannot be eliminated, less than full detection efficiencies, and statistical and systematic uncertainties. By building a matrix that encodes a detector’s smearing of the desired true quantity into the measured observable(s), a deconvolution can be performed that provides an estimate of the true variable. This deconvolution process is known as unfolding. The unfolding method implemented in PyUnfold accomplishes this deconvolution via an iterative procedure, providing results based on physical expectations of the desired quantity. Furthermore, tedious bookkeeping for both statistical and systematic errors produces precise final uncertainty estimates. 
Pyxley  Webbased dashboards are the most straightforward way to share insights with clients and business partners. For R users, Shiny provides a framework that allows data scientists to create interactive web applications without having to write Javascript, HTML, or CSS. Despite Shiny’s utility and success as a dashboard framework, there is no equivalent in Python. There are packages in development, such as Spyre, but nothing that matches Shiny’s level of customization. We have written a Python package, called Pyxley, to not only help simplify the development of webapplications, but to provide a way to easily incorporate custom Javascript for maximum flexibility. This is enabled through Flask, PyReact, and Pandas. Pyxley: Python Powered Dashboards 
Advertisements