Existing models of computation, such as a Turing machine (hereafter, TM), do not consider the agent involved in interpreting the outcome of the computation. We argue that a TM, or any other computation model, has no significance if its output is not interpreted by some agent. Furthermore, we argue that including the interpreter in the model definition sheds light on some of the difficult problems faced in computation and mathematics. We provide an analytic process framework to address this limitation. The framework can be overlaid on existing concepts of computation to address many practical and philosophical concerns such as the P vs NP problem. In addition, we provide constructive proof for the P vs NP problem under the assumption that the class NP comprises of problems solvable by non-deterministic algorithms. We utilize the observation that deterministic computational procedures lack fundamental capacity to fully simulate their non-deterministic variant.
We study the problem of matching agents who arrive at a marketplace over time and leave after d time periods. Agents can only be matched while they are present in the marketplace. Each pair of agents can yield a different match value, and the planner’s goal is to maximize the total value over a finite time horizon. First we study the case in which vertices arrive in an adversarial order. We provide a randomized 0.25-competitive algorithm building on a result by Feldman et al. (2009) and Lehman et al. (2006). We extend the model to the case in which departure times are drawn independently from a distribution with non-decreasing hazard rate, for which we establish a 1/8-competitive algorithm. When the arrival order is chosen uniformly at random, we show that a batching algorithm, which computes a maximum-weighted matching every (d+1) periods, is 0.279-competitive.
NL4Py is a NetLogo controller software for Python, for the rapid, parallel execution of NetLogo models. NL4Py provides both headless (no graphical user interface) and GUI NetLogo workspace control through Python. Spurred on by the increasing availability of open-source computation and machine learning libraries on the Python package index, there is an increasing demand for such rapid, parallel execution of agent-based models through Python. NetLogo, being the language of choice for a majority of agent-based modeling driven research projects, requires an integration to Python for researchers looking to perform statistical analyses of agent-based model output using these libraries. Unfortunately, until the recent introduction of PyNetLogo, and now NL4Py, such a controller was unavailable. This article provides a detailed introduction into the usage of NL4Py and explains its client-server software architecture, highlighting architectural differences to PyNetLogo. A step-by-step demonstration of global sensitivity analysis and parameter calibration of the Wolf Sheep Predation model is then performed through NL4Py. Finally, NL4Py’s performance is benchmarked against PyNetLogo and its combination with IPyParallel, and shown to provide significant savings in execution time over both configurations.
Because of their effectiveness in broad practical applications, LSTM networks have received a wealth of coverage in scientific journals, technical blogs, and implementation guides. However, in most articles, the inference formulas for the LSTM network and its parent, RNN, are stated axiomatically, while the training formulas are omitted altogether. In addition, the technique of ‘unrolling’ an RNN is routinely presented without justification throughout the literature. The goal of this paper is to explain the essential RNN and LSTM fundamentals in a single document. Drawing from concepts in signal processing, we formally derive the canonical RNN formulation from differential equations. We then propose and prove a precise statement, which yields the RNN unrolling technique. We also review the difficulties with training the standard RNN and address them by transforming the RNN into the ‘Vanilla LSTM’ network through a series of logical arguments. We provide all equations pertaining to the LSTM system together with detailed descriptions of its constituent entities. Albeit unconventional, our choice of notation and the method for presenting the LSTM system emphasizes ease of understanding. As part of the analysis, we identify new opportunities to enrich the LSTM system and incorporate these extensions into the Vanilla LSTM network, producing the most general LSTM variant to date. The target reader has already been exposed to RNNs and LSTM networks through numerous available resources and is open to an alternative pedagogical approach. A Machine Learning practitioner seeking guidance for implementing our new augmented LSTM model in software for experimentation and research will find the insights and derivations in this tutorial valuable as well.
Fuzzy clustering methods identify naturally occurring clusters in a dataset, where the extent to which different clusters are overlapped can differ. Most methods have a parameter to fix the level of fuzziness. However, the appropriate level of fuzziness depends on the application at hand. This paper presents Entropy $c$-Means (ECM), a method of fuzzy clustering that simultaneously optimizes two contradictory objective functions, resulting in the creation of fuzzy clusters with different levels of fuzziness. This allows ECM to identify clusters with different degrees of overlap. ECM optimizes the two objective functions using two multi-objective optimization methods, Non-dominated Sorting Genetic Algorithm II (NSGA-II), and Multiobjective Evolutionary Algorithm based on Decomposition (MOEA/D). We also propose a method to select a suitable trade-off clustering from the Pareto front. Experiments on challenging synthetic datasets as well as real-world datasets show that ECM leads to better cluster detection compared to the conventional fuzzy clustering methods as well as previously used multi-objective methods for fuzzy clustering.
Modeling spillover effects from observational data is an important problem in economics, business, and other fields of research. % It helps us infer the causality between two seemingly unrelated set of events. For example, if consumer spending in the United States declines, it has spillover effects on economies that depend on the U.S. as their largest export market. In this paper, we aim to infer the causation that results in spillover effects between pairs of entities (or units), we call this effect as \textit{paired spillover}. To achieve this, we leverage the recent developments in variational inference and deep learning techniques to propose a generative model called Linked Causal Variational Autoencoder (LCVA). Similar to variational autoencoders (VAE), LCVA incorporates an encoder neural network to learn the latent attributes and a decoder network to reconstruct the inputs. However, unlike VAE, LCVA treats the \textit{latent attributes as confounders that are assumed to affect both the treatment and the outcome of units}. Specifically, given a pair of units $u$ and $\bar{u}$, their individual treatment and outcomes, the encoder network of LCVA samples the confounders by conditioning on the observed covariates of $u$, the treatments of both $u$ and $\bar{u}$ and the outcome of $u$. Once inferred, the latent attributes (or confounders) of $u$ captures the spillover effect of $\bar{u}$ on $u$. Using a network of users from job training dataset (LaLonde (1986)) and co-purchase dataset from Amazon e-commerce domain, we show that LCVA is significantly more robust than existing methods in capturing spillover effects.
We propose two methods for exact Gaussian process (GP) inference and learning on massive image, video, spatial-temporal, or multi-output datasets with missing values (or ‘gaps’) in the observed responses. The first method ignores the gaps using sparse selection matrices and a highly effective low-rank preconditioner is introduced to accelerate computations. The second method introduces a novel approach to GP training whereby response values are inferred on the gaps before explicitly training the model. We find this second approach to be greatly advantageous for the class of problems considered. Both of these novel approaches make extensive use of Kronecker matrix algebra to design massively scalable algorithms which have low memory requirements. We demonstrate exact GP inference for a spatial-temporal climate modelling problem with 3.7 million training points as well as a video reconstruction problem with 1 billion points.
Technical computing is a challenging application area for programming languages to address. This is evinced by the unusually large number of specialized languages in the area (e.g. MATLAB, R), and the complexity of common software stacks, often involving multiple languages and custom code generators. We believe this is ultimately due to key characteristics of the domain: highly complex operators, a need for extensive code specialization for performance, and a desire for permissive high-level programming styles allowing productive experimentation. The Julia language attempts to provide a more effective structure for this kind of programming by allowing programmers to express complex polymorphic behaviors using dynamic multiple dispatch over parametric types. The forms of extension and reuse permitted by this paradigm have proven valuable for technical computing. We report on how this approach has allowed domain experts to express useful abstractions while simultaneously providing a natural path to better performance for high-level technical code.
Finding the largest few principal components of a matrix of genetic data is a common task in genome-wide association studies (GWASs), both for dimensionality reduction and for identifying unwanted factors of variation. We describe a simple random matrix model for matrices that arise in GWASs, showing that the singular values have a bulk behavior that obeys a Marchenko-Pastur distributed with a handful of large outliers. We also implement Golub-Kahan-Lanczos (GKL) bidiagonalization in the Julia programming language, providing thick restarting and a choice between full and partial reorthogonalization strategies to control numerical roundoff. Our implementation of GKL bidiagonalization is up to 36 times faster than software tools used commonly in genomics data analysis for computing principal components, such as EIGENSOFT and FlashPCA, which use dense LAPACK routines and randomized subspace iteration respectively.
Sparse deep neural networks(DNNs) are efficient in both memory and compute when compared to dense DNNs. But due to irregularity in computation of sparse DNNs, their efficiencies are much lower than that of dense DNNs on general purpose hardwares. This leads to poor/no performance benefits for sparse DNNs. Performance issue for sparse DNNs can be alleviated by bringing structure to the sparsity and leveraging it for improving runtime efficiency. But such structural constraints often lead to sparse models with suboptimal accuracies. In this work, we jointly address both accuracy and performance of sparse DNNs using our proposed class of neural networks called HBsNN ( Hierarchical Block Sparse Neural Networks).
Traditional chatbots usually need a mass of human dialogue data, especially when using supervised machine learning method. Though they can easily deal with single-turn question answering, for multi-turn the performance is usually unsatisfactory. In this paper, we present Lingke, an information retrieval augmented chatbot which is able to answer questions based on given product introduction document and deal with multi-turn conversations. We will introduce a fine-grained pipeline processing to distill responses based on unstructured documents, and attentive sequential context-response matching for multi-turn conversations.
In this work we give new density estimators by averaging classical density estimators such as the histogram, the frequency polygon and the kernel density estimators obtained over different bootstrap samples of the original data. We prove the L 2-consistency of these new estimators and compare them to several similar approaches by extensive simulations. Based on them, we give also a way to construct non parametric pointwise confidence intervals for the target density.
Multi-layer neural networks have lead to remarkable performance on many kinds of benchmark tasks in text, speech and image processing. Nonlinear parameter estimation in hierarchical models is known to be subject to overfitting. One approach to this overfitting and related problems (local minima, colinearity, feature discovery etc.) is called dropout (Srivastava, et al 2014, Baldi et al 2016). This method removes hidden units with a Bernoulli random variable with probability $p$ over updates. In this paper we will show that Dropout is a special case of a more general model published originally in 1990 called the stochastic delta rule ( SDR, Hanson, 1990). SDR parameterizes each weight in the network as a random variable with mean $\mu_{w_{ij}}$ and standard deviation $\sigma_{w_{ij}}$. These random variables are sampled on each forward activation, consequently creating an exponential number of potential networks with shared weights. Both parameters are updated according to prediction error, thus implementing weight noise injections that reflect a local history of prediction error and efficient model averaging. SDR therefore implements a local gradient-dependent simulated annealing per weight converging to a bayes optimal network. Tests on standard benchmarks (CIFAR) using a modified version of DenseNet shows the SDR outperforms standard dropout in error by over 50% and in loss by over 50%. Furthermore, the SDR implementation converges on a solution much faster, reaching a training error of 5 in just 15 epochs with DenseNet-40 compared to standard DenseNet-40’s 94 epochs.