Social goods are goods that grant value not only to their owners but also to the owners’ surroundings, be it their families, friends or office mates. The benefit a non-owner derives from the good is affected by many factors, including the type of the good, its availability, and the social status of the non-owner. Depending on the magnitude of the benefit and on the price of the good, a potential buyer might stay away from purchasing the good, hoping to free ride on others’ purchases. A revenue-maximizing seller who sells social goods must take these considerations into account when setting prices for the good. The literature on optimal pricing has advanced considerably over the last decade, but little is known about optimal pricing schemes for selling social goods. In this paper, we conduct a systematic study of revenue-maximizing pricing schemes for social goods: we introduce a Bayesian model for this scenario, and devise nearly-optimal pricing schemes for various types of externalities, both for simultaneous sales and for sequential sales.
The Big Data phenomenon has spawned large-scale linear programming problems. In many cases, these problems are non-stationary. In this paper, we describe a new scalable algorithm called NSLP for solving high-dimensional, non-stationary linear programming problems on modern cluster computing systems. The algorithm consists of two phases: Quest and Targeting. The Quest phase calculates a solution of the system of inequalities defining the constraint system of the linear programming problem under the condition of dynamic changes in input data. To this end, the apparatus of Fejer mappings is used. The Targeting phase forms a special system of points having the shape of an n-dimensional axisymmetric cross. The cross moves in the n-dimensional space in such a way that the solution of the linear programming problem is located all the time in an ‘-vicinity of the central point of the cross.
Persistence diagrams have been widely recognized as a compact descriptor for characterizing multiscale topological features in data. When many datasets are available, statistical features embedded in those persistence diagrams can be extracted by applying machine learnings. In particular, the ability for explicitly analyzing the inverse in the original data space from those statistical features of persistence diagrams is significantly important for practical applications. In this paper, we propose a unified method for the inverse analysis by combining linear machine learning models with persistence images. The method is applied to point clouds and cubical sets, showing the ability of the statistical inverse analysis and its advantages.
This paper proposes a new algorithm for recovery of belief network structure from data handling hidden variables. It consists essentially in an extension of the CI algorithm of Spirtes et al. by restricting the number of conditional dependencies checked up to k variables and in an extension of the original CI by additional steps transforming so called partial including path graph into a belief network. Its correctness is demonstrated.
The goal of our paper is to survey and contrast the major advances in two of the most commonly used high-dimensional techniques, namely, the Lasso and horseshoe regularization methodologies. Lasso is a gold standard for best subset selection of predictors while the horseshoe is a state-of-the-art Bayesian estimator for sparse signals. Lasso is scalable and fast using convex optimization whilst the horseshoe is a non-convex penalty. Our novel perspective focuses on three aspects, (i) efficiency and scalability of computation and (ii) methodological development and performance and (iii) theoretical optimality in high dimensional inference for the Gaussian sparse model and beyond.
Ad-hoc retrieval models can benefit from considering different patterns in the interactions between a query and a document, effectively assessing the relevance of a document for a given user query. Factors to be considered in this interaction include (i) the matching of unigrams and ngrams, (ii) the proximity of the matched query terms, (iii) their position in the document, and (iv) how the different relevance signals are combined over different query terms. While previous work has successfully modeled some of these factors, not all aspects have been fully explored. In this work, we close this gap by proposing different neural components and incorporating them into a single architecture, leading to a novel neural IR model called RE-PACRR. Extensive comparisons with established models on TREC Web Track data confirm that the proposed model yields promising search results.
This study proposed an exhaustive stable/reproducible rule-mining algorithm combined to a classifier to generate both accurate and interpretable models. Our method first extracts rules (i.e., a conjunction of conditions about the values of a small number of input features) with our exhaustive rule-mining algorithm, then constructs a new feature space based on the most relevant rules called ‘local features’ and finally, builds a local predictive model by training a standard classifier on the new local feature space. This local feature space is easy interpretable by providing a human-understandable explanation under the explicit form of rules. Furthermore, our local predictive approach is as powerful as global classical ones like logistic regression (LR), support vector machine (SVM) and rules based methods like random forest (RF) and gradient boosted tree (GBT).
The goal of this tutorial is to introduce key models, algorithms, and open questions related to the use of optimization methods for solving problems arising in machine learning. It is written with an INFORMS audience in mind, specifically those readers who are familiar with the basics of optimization algorithms, but less familiar with machine learning. We begin by deriving a formulation of a supervised learning problem and show how it leads to various optimization problems, depending on the context and underlying assumptions. We then discuss some of the distinctive features of these optimization problems, focusing on the examples of logistic regression and the training of deep neural networks. The latter half of the tutorial focuses on optimization algorithms, first for convex logistic regression, for which we discuss the use of first-order methods, the stochastic gradient method, variance reducing stochastic methods, and second-order methods. Finally, we discuss how these approaches can be employed to the training of deep neural networks, emphasizing the difficulties that arise from the complex, nonconvex structure of these models.
Consider a binary decision making process where a single machine learning classifier replaces a multitude of humans. We raise questions about the resulting loss of diversity in the decision making process. We study the potential benefits of using random classifier ensembles instead of a single classifier in the context of fairness-aware learning and demonstrate various attractive properties: (i) an ensemble of fair classifiers is guaranteed to be fair, for several different measures of fairness, (ii) an ensemble of unfair classifiers can still achieve fair outcomes, and (iii) an ensemble of classifiers can achieve better accuracy-fairness trade-offs than a single classifier. Finally, we introduce notions of distributional fairness to characterize further potential benefits of random classifier ensembles.
We consider the problem of learning the functions computing children from parents in a Structural Causal Model once the underlying causal graph has been identified. This is in some sense the second step after causal discovery. Taking a probabilistic approach to estimating these functions, we derive a natural myopic active learning scheme that identifies the intervention which is optimally informative about all of the unknown functions jointly, given previously observed data. We test the derived algorithms on simple examples, to demonstrate that they produce a structured exploration policy that significantly improves on unstructured base-lines.
It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks often generalize well, and reveal the difference between the minima (with the same training error) that generalize well and those they don’t. We show that it is the characteristics the landscape of the loss function that explains the good generalization capability. For the landscape of loss function for deep networks, the volume of basin of attraction of good minima dominates over that of poor minima, which guarantees optimization methods with random initialization to converge to good minima. We theoretically justify our findings through analyzing 2-layer neural networks; and show that the low-complexity solutions have a small norm of Hessian matrix with respect to model parameters. For deeper networks, extensive numerical evidence helps to support our arguments.
Vectors of data are at the heart of machine learning and data mining. Recently, vector quantization methods have shown great promise in reducing both the time and space costs of operating on vectors. We introduce a vector quantization algorithm that can compress vectors over 12x faster than existing techniques while also accelerating approximate vector operations such as distance and dot product computations by up to 10x. Because it can encode over 2GB of vectors per second, it makes vector quantization cheap enough to employ in many more circumstances. For example, using our technique to compute approximate dot products in a nested loop can multiply matrices faster than a state-of-the-art BLAS implementation, even when our algorithm must first compress the matrices. In addition to showing the above speedups, we demonstrate that our approach can accelerate nearest neighbor search and maximum inner product search by over 100x compared to floating point operations and up to 10x compared to other vector quantization methods. Our approximate Euclidean distance and dot product computations are not only faster than those of related algorithms with slower encodings, but also faster than Hamming distance computations, which have direct hardware support on the tested platforms. We also assess the errors of our algorithm’s approximate distances and dot products, and find that it is competitive with existing, slower vector quantization algorithms.
We introduce NoisyNet, a deep reinforcement learning agent with parametric noise added to its weights, and show that the induced stochasticity of the agent’s policy can be used to aid efficient exploration. The parameters of the noise are learned with gradient descent along with the remaining network weights. NoisyNet is straightforward to implement and adds little computational overhead. We find that replacing the conventional exploration heuristics for A3C, DQN and dueling agents (entropy reward and $\epsilon$-greedy respectively) with NoisyNet yields substantially higher scores for a wide range of Atari games, in some cases advancing the agent from sub to super-human performance.