Information diffusion models typically assume a discrete timeline in which an information token spreads in the network. Since users in real-world networks vary significantly in their intensity and periods of activity, our objective in this work is to answer: How to determine a temporal scale that best agrees with the observed information propagation within a network? A key limitation of existing approaches is that they aggregate the timeline into fixed-size windows, which may not fit all network nodes’ activity periods. We propose the notion of a heterogeneous network clock: a mapping of events to discrete timestamps that best explains their occurrence according to a given cascade propagation model. We focus on the widely-adopted independent cascade (IC) model and formalize the optimal clock as the one that maximizes the likelihood of all observed cascades. The single optimal clock (OC) problem can be solved exactly in polynomial time. However, we prove that learning multiple optimal clocks(kOC), corresponding to temporal patterns of groups of network nodes, is NP-hard. We propose scalable solutions that run in almost linear time in the total number of cascade activations and discuss approximation guarantees for each variant. Our algorithms and their detected clocks enable improved cascade size classification (up to 8 percent F1 lift) and improved missing cascade data inference (0.15 better recall). We also demonstrate that the network clocks exhibit consistency within the type of content diffusing in the network and are robust with respect to the propagation probability parameters of the IC model.
Data based judgments go into artificial intelligence applications but they undergo paradoxical reversal when seemingly unnecessary additional data is provided. Examples of this are Simpson’s reversal and the disjunction effect where the beliefs about the data change once it is presented or aggregated differently. Sometimes the significance of the difference can be evaluated using statistical tests such as Pearson’s chi-squared or Fisher’s exact test, but this may not be helpful in threshold-based decision systems that operate with incomplete information. To mitigate risks in the use of algorithms in decision-making, we consider the question of modeling of beliefs. We argue that evidence supports that beliefs are not classical statistical variables and they should, in the general case, be considered as superposition states of disjoint or polar outcomes. We analyze the disjunction effect from the perspective of the belief as a quantum vector.
Running Deep Neural Network (DNN) models on devices with limited computational capability is a challenge due to large compute and memory requirements. Quantized Neural Networks (QNNs) have emerged as a potential solution to this problem, promising to offer most of the DNN accuracy benefits with much lower computational cost. However, harvesting these benefits on existing mobile CPUs is a challenge since operations on highly quantized datatypes are not natively supported in most instruction set architectures (ISAs). In this work, we first describe a streamlining flow to convert all QNN inference operations to integer ones. Afterwards, we provide techniques based on processing one bit position at a time (bit-serial) to show how QNNs can be efficiently deployed using common bitwise operations. We demonstrate the potential of QNNs on mobile CPUs with microbenchmarks and on a quantized AlexNet, which is 3.5x faster than an optimized 8-bit baseline.
Knowledge graph is known to be helpful for the task of question answering (QA), since it provides well-structured relational information between entities, and allows one to further infer indirect facts. However, it is challenging to build QA systems which can learn to reason over knowledge graphs based on question-answer pairs alone. First, when people ask questions, their expressions are noisy (for example, typos in texts, or variations in pronunciations), which is non- trivial for the QA system to match those mentioned entities to the knowledge graph. Second, many questions require multi-hop logic reasoning over the knowledge graph to retrieve the answers. To address these challenges, we propose a novel and unified deep learning architecture, and an end-to-end variational learning algorithm which can handle noise in questions, and learn multi-hop reasoning simultaneously. Our method achieves state-of-the-art performance on a recent benchmark dataset in the literature. We also derive a series of new benchmark datasets, including questions for multi-hop reasoning, questions paraphrased by neural translation model, and questions in human voice. Our method yields very promising results on all these challenging datasets.
Assisting users by suggesting completed queries as they type is a common feature of search systems known as query auto-completion. A query auto-completion engine may use prior signals and available information (e.g., user is anonymous, user has a history, user visited the site before the search or not, etc.) in order to improve its recommendations. There are many possible strategies for query auto-completion and a challenge is to design one optimal engine that considers and uses all available information. When different strategies are used to produce the suggestions, it becomes hard to rank these heterogeneous suggestions. An alternative strategy could be to aggregate several engines in order to enhance the diversity of recommendations by combining the capacity of each engine to digest available information differently, while keeping the simplicity of each engine. The main objective of this research is therefore to find such mixture of query completion engines that would beat any engine taken alone. We tackle this problem under the bandits setting and evaluate four strategies to overcome this challenge. Experiments conducted on three real datasets show that a mixture of engines can outperform a single engine.
On electronic game platforms, different payment transactions have different levels of risk. Risk is generally higher for digital goods in e-commerce. However, it differs based on product and its popularity, the offer type (packaged game, virtual currency to a game or subscription service), storefront and geography. Existing fraud policies and models make decisions independently for each transaction based on transaction attributes, payment velocities, user characteristics, and other relevant information. However, suspicious transactions may still evade detection and hence we propose a broad learning approach leveraging a graph based perspective to uncover relationships among suspicious transactions, i.e., inter-transaction dependency. Our focus is to detect suspicious transactions by capturing common fraudulent behaviors that would not be considered suspicious when being considered in isolation. In this paper, we present HitFraud that leverages heterogeneous information networks for collective fraud detection by exploring correlated and fast evolving fraudulent behaviors. First, a heterogeneous information network is designed to link entities of interest in the transaction database via different semantics. Then, graph based features are efficiently discovered from the network exploiting the concept of meta-paths, and decisions on frauds are made collectively on test instances. Experiments on real-world payment transaction data from Electronic Arts demonstrate that the prediction performance is effectively boosted by HitFraud with fast convergence where the computation of meta-path based features is largely optimized. Notably, recall can be improved up to 7.93% and F-score 4.62% compared to baselines.
In the multiple linear regression setting, we propose a general framework, termed weighted orthogonal components regression (WOCR), which encompasses many known methods as special cases, including ridge regression and principal components regression. WOCR makes use of the monotonicity inherent in orthogonal components to parameterize the weight function. The formulation allows for efficient determination of tuning parameters and hence is computationally advantageous. Moreover, WOCR offers insights for deriving new better variants. Specifically, we advocate weighting components based on their correlations with the response, which leads to enhanced predictive performance. Both simulated studies and real data examples are provided to assess and illustrate the advantages of the proposed methods.
In this paper, we investigate the online non-convex optimization problem which generalizes the classic {online convex optimization problem by relaxing the convexity assumption on the cost function. For this type of problem, the classic exponential weighting online algorithm has recently been shown to attain a sub-linear regret of $O(\sqrt{T\log T})$. In this paper, we introduce a novel recursive structure to the online algorithm to define a recursive exponential weighting algorithm that attains a regret of $O(\sqrt{T})$, matching the well-known regret lower bound. To the best of our knowledge, this is the first online algorithm with provable $O(\sqrt{T})$ regret for the online non-convex optimization problem.
In Information fusion, the conflict is an important concept. Indeed, combining several imperfect experts or sources allows conflict. In the theory of belief functions, this notion has been discussed a lot. The mass appearing on the empty set during the conjunctive combination rule is generally considered as conflict, but that is not really a conflict. Some measures of conflict have been proposed and some approaches have been proposed in order to manage this conflict or to decide with conflicting mass functions. We recall in this chapter some of them and we propose a discussion to consider the conflict in information fusion with the theory of belief functions.
Empirical Bayes is a versatile approach to learn from a lot’ in two ways: first, from a large number of variables and second, from a potentially large amount of prior information, e.g. stored in public repositories. We review applications of a variety of empirical Bayes methods to a broad spectrum of prediction methods including penalized regression, random forest, linear discriminant analysis, and Bayesian models with sparse or dense priors. We discuss formal’ empirical Bayes methods which maximize the marginal likelihood, but also more informal approaches based on other data summaries. We contrast empirical Bayes to cross-validation and full Bayes, and discuss hybrid approaches. To study the relation between the quality of an empirical Bayes estimator and $p$, the number of variables, we derive the expected mean squared error of a simple empirical Bayes estimator in a linear model setting. We argue that empirical Bayes is particularly useful when the prior contains multiple parameters, modeling a priori information on variables, termed `co-data’. In particular, we present two novel examples that allow for co-data. First, a Bayesian spike-and-slab setting that facilitates inclusion of multiple co-data sources and types; second, a hybrid empirical Bayes-full Bayes ridge regression approach for estimation of the posterior predictive interval.
In this paper, we introduce the Action Schema Network (ASNet): a neural network architecture for learning generalised policies for probabilistic planning problems. By mimicking the relational structure of planning problems, ASNets are able to adopt a weight-sharing scheme which allows the network to be applied to any problem from a given planning domain. This allows the cost of training the network to be amortised over all problems in that domain. Further, we propose a training method which balances exploration and supervised training on small problems to produce a policy which remains robust when evaluated on larger problems. In experiments, we show that ASNet’s learning capability allows it to significantly outperform traditional non-learning planners in several challenging domains.
We approximate analytic queries on streaming data with a weighted reservoir sampling. For a stream of tuples of a Datawarehouse we show how to approximate some OLAP queries. For a stream of graph edges from a Social Network, we approximate the communities as the large connected components of the edges in the reservoir. We show that for a model of random graphs which follow a power law degree distribution, the community detection algorithm is a good approximation. Given two streams of graph edges from two Sources, we define the {\em Community Correlation} as the fraction of the nodes in communities in both streams. Although we do not store the edges of the streams, we can approximate the Community Correlation and define the {\em Integration of two streams}. We illustrate this approach with Twitter streams, associated with TV programs.
Multi-agent settings are quickly gathering importance in machine learning. Beyond a plethora of recent work on deep multi-agent reinforcement learning, hierarchical reinforcement learning, generative adversarial networks and decentralized optimization can all be seen as instances of this setting. However, the presence of multiple learning agents in these settings renders the training problem non-stationary and often leads to unstable training or undesired final results. We present Learning with Opponent-Learning Awareness (LOLA), a method that reasons about the anticipated learning of the other agents. The LOLA learning rule includes an additional term that accounts for the impact of the agent’s policy on the anticipated parameter update of the other agents. We show that the LOLA update rule can be efficiently calculated using an extension of the likelihood ratio policy gradient update, making the method suitable for model-free reinforcement learning. This method thus scales to large parameter and input spaces and nonlinear function approximators. Preliminary results show that the encounter of two LOLA agents leads to the emergence of tit-for-tat and therefore cooperation in the infinitely iterated prisoners’ dilemma, while independent learning does not. In this domain, LOLA also receives higher payouts compared to a naive learner, and is robust against exploitation by higher order gradient-based methods. Applied to infinitely repeated matching pennies, only LOLA agents converge to the Nash equilibrium. We also apply LOLA to a grid world task with an embedded social dilemma using deep recurrent policies. Again, by considering the learning of the other agent, LOLA agents learn to cooperate out of selfish interests.
The nonnegative matrix factorization is a widely used, flexible matrix decomposition, finding applications in biology, image and signal processing and information retrieval, among other areas. Here we present a related matrix factorization. A multi-objective optimization problem finds conical combinations of templates that approximate a given data matrix. The templates are chosen so that as far as possible only the initial data set can be represented this way. However, the templates are not required to be nonnegative nor convex combinations of the original data.
In contrast with goal-oriented dialogue, social dialogue has no clear measure of task success. Consequently, evaluation of these systems is notoriously hard. In this paper, we review current evaluation methods, focusing on automatic metrics. We conclude that turn-based metrics often ignore the context and do not account for the fact that several replies are valid, while end-of-dialogue rewards are mainly hand-crafted. Both lack grounding in human perceptions.
In this article we introduce Merge and Select – a methodology – and factorMerger – an R package – for exploration and visualization of k-group comparisons. Comparison of k-groups is one of the most important issues in exploratory analyses and it has zillions of applications. The classical solution is to test a null hypothesis that observations from all groups come from the same distribution. If the global null hypothesis is rejected a more detailed analysis of differences among pairs of groups is performed. The traditional approach is to use pairwise post hoc tests in order to verify which groups differ significantly. However, this approach fails with large number of groups in both interpretation and visualization layer. The Merge and Select methodology solves this problem by using easy to understand description of LRT based similarity among groups.
Deep Neural Networks (DNNs) have been shown to be vulnerable against adversarial examples, which are data points cleverly constructed to fool the classifier. Such attacks can be devastating in practice, especially as DNNs are being applied to ever increasing critical tasks like image recognition in autonomous driving. In this paper, we introduce a new perspective on the problem. We do so by first defining robustness of a classifier to adversarial exploitation. Next, we show that the problem of adversarial example generation and defense both can be posed as learning problems, which are duals of each other. We also show formally that our defense aims to increase robustness of the classifier. We demonstrate the efficacy of our techniques by experimenting with the MNIST and CIFAR-10 datasets.