Approximating distance is one of the key challenge in a facility location problem. Several algorithms have been proposed, however, none of them focused on estimating distance between two concave regions. In this work, we present an algorithm to estimate the distance between two irregular regions of a facility location problem. The proposed algorithm can identify the distance between concave shape regions. We also discuss some relevant properties of the proposed algorithm. A distance-sensitive capacity location model is introduced to test the algorithm. Moreover, sSeveral special geometric cases are discussed to show the advantages and insights of the algorithm.
Over the past decade, various methods have been proposed for the reconstruction of networks modeled as Gaussian Graphical Models. In this work we analyzed three different approaches: the Graphical Lasso (GLasso), Graphical Ridge (GGMridge) and Local Partial Correlation (LPC). For the evaluation of the methods, we used high dimensional data generated from simulated random graphs (Erd\’os-R\’enyi, Barab\’asi-Albert, Watts-Strogatz). The performance was assessed through the Receiver Operating Characteristic (ROC) curve. In addition the methods were used for reconstruction of co-expression network, for differentially expressed genes in human cervical cancer data. LPC method outperformed the GLasso in most of the simulation cases, even though GGMridge produced better ROC curves then both other methods. LPC obtained similar outcomes as GGMridge in real data studies.
Distributed model training suffers from communication overheads due to frequent gradient updates transmitted between compute nodes. To mitigate these overheads, several studies propose the use of sparsified stochastic gradients. We argue that these are facets of a general sparsification method that can operate on any possible atomic decomposition. Notable examples include element-wise, singular value, and Fourier decompositions. We present ATOMO, a general framework for atomic sparsification of stochastic gradients. Given a gradient, an atomic decomposition, and a sparsity budget, ATOMO gives a random unbiased sparsification of the atoms minimizing variance. We show that methods such as QSGD and TernGrad are special cases of ATOMO and show that sparsifiying gradients in their singular value decomposition (SVD), rather than the coordinate-wise one, can lead to significantly faster distributed training.
We develop a theoretical framework for the frequentist assessment of Bayesian model selection, specifically its ability to select the (Kullback-Leibler) optimal model and to portray the corresponding uncertainty. The contribution is not proving consistency for a specific prior, but giving a general strategy for such proofs. Its basis applies to any model, prior, sample size, parameter dimensionality and (although only briefly exploited here) under model misspecification. As an immediate consequence the framework also characterizes a strong form of convergence for $L_0$ penalties and associated pseudo-posterior probabilities of potential interest for uncertainty quantification. The main advantage of the framework is that, instead of studying complex high-dimensional stochastic sums, it suffices to bound certain Bayes factor tails and use standard tools to determine the convergence of deterministic series. As a second contribution we deploy the framework to canonical linear regression. These findings give a high-level description of when one can achieve consistency and at what rate for a wide class of priors as a function of the data-generating truth, sample size and dimensionality. They also indicate when it is possible to use less sparse priors to improve inherent sparsity vs. power trade-offs that are not adequately captured by studying asymptotically optimal rates. Our empirical illustrations align with these findings, underlining the importance of considering the problem at hand’s characteristics to judge the quality of model selection procedures, rather than relying purely on asymptotics.
Conformal Prediction is a machine learning methodology that produces valid prediction regions under mild conditions. In this paper, we explore the application of making predictions over multiple data sources of different sizes without disclosing data between the sources. We propose that each data source applies a transductive conformal predictor independently using the local data, and that the individual predictions are then aggregated to form a combined prediction region. We demonstrate the method on several data sets, and show that the proposed method produces conservatively valid predictions and reduces the variance in the aggregated predictions. We also study the effect that the number of data sources and size of each source has on aggregated predictions, as compared with equally sized sources and pooled data.
The training of Deep Neural Networks usually needs tremendous computing resources. Therefore many deep models are trained in large cluster instead of single machine or GPU. Though major researchs at present try to run whole model on all machines by using asynchronous asynchronous stochastic gradient descent (ASGD), we present a new approach to train deep model parallely — split the model and then seperately train different parts of it in different speed.
This paper is concerned with trust modeling for networked computing systems. Of particular interest to this paper is the observation that trust is a subjective notion that is invisible, implicit and uncertain in nature, therefore it may be suitable for being expressed by subjective probabilities and then modeled on the basis of Bayesian principle. In spite of a few attempts to model trust in the Bayesian paradigm, the field lacks a global comprehensive overview of Bayesian methods and their theoretical connections to other alternatives. This paper presents a study to fill in this gap. It provides a comprehensive review and analysis of the literature, showing that a large deal of existing work, whether or not proposed based on Bayesian principle, can cast into a general Bayesian paradigm termed subjective Bayesian trust (SBT) theory here. The SBT framework can thus act as a general theoretical infrastructure for comparing or analyzing theoretical ties among existing trust models, and for developing novel models. The aim of this study is twofold. One is to gain insights about Bayesian philosophy in modeling trust. The other is to drive current research step ahead in seeking a high-level, abstract way of modeling and evaluating trust.
The method to estimate the predictability of human mobility was proposed in [C. Song \emph{et al.}, Science {\bf 327}, 1018 (2010)], which is extensively followed in exploring the predictability of disparate time series. However, the ambiguous description in the original paper leads to some misunderstandings, including the inconsistent logarithm bases in the entropy estimator and the entropy-predictability-conversion equation, as well as the details in the calculation of the Lempel-Ziv estimator, which further results in remarkably overestimated predictability. This paper demonstrates the degree of overestimation by four different types of theoretically generated time series and an empirical data set, and shows the intrinsic deviation of the Lempel-Ziv estimator for highly random time series. This work provides a clear picture on this issue and thus helps researchers in correctly estimating the predictability of time series.
Modern deep artificial neural networks have achieved impressive results through models with very large capacity—compared to the number of training examples—that control overfitting with the help of different forms of regularization. Regularization can be implicit, as is the case of stochastic gradient descent and parameter sharing in convolutional layers, or explicit. Most common explicit regularization techniques, such as weight decay and dropout, reduce the effective capacity of the model and typically require the use of deeper and wider architectures to compensate for the reduced capacity. Although these techniques have been proven successful in terms of improved generalization, they seem to waste capacity. In contrast, data augmentation techniques do not reduce the effective capacity and improve generalization by increasing the number of training examples. In this paper we systematically analyze the effect of data augmentation on some popular architectures and conclude that data augmentation alone—without any other explicit regularization techniques—can achieve the same performance or higher as regularized models, especially when training with fewer examples, and exhibits much higher adaptability to changes in the architecture.
Learning to infer Bayesian posterior from a few-shot dataset is an important step towards robust meta-learning due to the model uncertainty inherent in the problem. In this paper, we propose a novel Bayesian model-agnostic meta-learning method. The proposed method combines scalable gradient-based meta-learning with nonparametric variational inference in a principled probabilistic framework. During fast adaptation, the method is capable of learning complex uncertainty structure beyond a point estimate or a simple Gaussian approximation. In addition, a robust Bayesian meta-update mechanism with a new meta-loss prevents overfitting during meta-update. Remaining an efficient gradient-based meta-learner, the method is also model-agnostic and simple to implement. Experiment results show the accuracy and robustness of the proposed method in various tasks: sinusoidal regression, image classification, active learning, and reinforcement learning.
This paper deals with neural networks as dynamical systems governed by differential or difference equations. It shows that the introduction of skip connections into network architectures, such as residual networks and dense networks, turns a system of static equations into a system of dynamical equations with varying levels of smoothness on the layer-wise transformations. Closed form solutions for the state space representations of general dense networks, as well as $k^{th}$ order smooth networks, are found in general settings. Furthermore, it is shown that imposing $k^{th}$ order smoothness on a network architecture with $d$-many nodes per layer increases the state space dimension by a multiple of $k$, and so the effective embedding dimension of the data manifold is $k \cdot d$-many dimensions. It follows that network architectures of these types reduce the number of parameters needed to maintain the same embedding dimension by a factor of $k^2$ when compared to an equivalent first-order, residual network, significantly motivating the development of network architectures of these types. Numerical simulations were run to validate parts of the developed theory.
There has been growing interests in recent years from both practical and research perspectives for session-based recommendation tasks as long-term user profiles do not often exist in many real-life recommendation applications. In this case, recommendations for user’s immediate next actions need to be generated based on patterns in anonymous short sessions. An often overlooked aspect is that new items with limited observations arrive continuously in many domains (e.g. news and discussion forums). Therefore, recommendations need to be adaptive to such frequent changes. In this paper, we benchmark a new nonparametric method called context tree (CT) against various state-of-the-art methods on extensive datasets for session-based recommendation task. Apart from the standard static evaluation protocol adopted by previous literatures, we include an adaptive configuration to mimic the situation when new items with limited observations arrives continuously. Our results show that CT outperforms two best-performing approaches (recurrent neural network; heuristic-based nearest neighbor) in majority of the tested configurations and datasets. We analyze reasons for this and demonstrate that it is because of the better adaptation to changes in the domain, as well as the remarkable capability to learn static sequential patterns. Moreover, our running time analysis illustrates the efficiency of using CT as other nonparametric methods.
As neural networks become widely deployed in different applications and on different hardware, it has become increasingly important to optimize inference time and model size along with model accuracy. Most current techniques optimize model size, model accuracy and inference time in different stages, resulting in suboptimal results and computational inefficiency. In this work, we propose a new technique called Smallify that optimizes all three of these metrics at the same time. Specifically we present a new method to simultaneously optimize network size and model performance by neuron-level pruning during training. Neuron-level pruning not only produces much smaller networks but also produces dense weight matrices that are amenable to efficient inference. By applying our technique to convolutional as well as fully connected models, we show that Smallify can reduce network size by 35X with a 6X improvement in inference time with similar accuracy as models found by traditional training techniques.
A great proportion of sequence-to-sequence (Seq2Seq) models for Neural Machine Translation (NMT) adopt Recurrent Neural Network (RNN) to generate translation word by word following a sequential order. As the studies of linguistics have proved that language is not linear word sequence but sequence of complex structure, translation at each step should be conditioned on the whole target-side context. To tackle the problem, we propose a new NMT model that decodes the sequence with the guidance of its structural prediction of the context of the target sequence. Our model generates translation based on the structural prediction of the target-side context so that the translation can be freed from the bind of sequential order. Experimental results demonstrate that our model is more competitive compared with the state-of-the-art methods, and the analysis reflects that our model is also robust to translating sentences of different lengths and it also reduces repetition with the instruction from the target-side context for decoding.
LexNLP is an open source Python package focused on natural language processing and machine learning for legal and regulatory text. The package includes functionality to (i) segment documents, (ii) identify key text such as titles and section headings, (iii) extract over eighteen types of structured information like distances and dates, (iv) extract named entities such as companies and geopolitical entities, (v) transform text into features for model training, and (vi) build unsupervised and supervised models such as word embedding or tagging models. LexNLP includes pre-trained models based on thousands of unit tests drawn from real documents available from the SEC EDGAR database as well as various judicial and regulatory proceedings. LexNLP is designed for use in both academic research and industrial applications, and is distributed at https://…/lexpredict-lexnlp.