# If you did not already know

Fuzzy Cognitive Map
A Fuzzy cognitive map is a cognitive map within which the relations between the elements (e.g. concepts, events, project resources) of a “mental landscape” can be used to compute the “strength of impact” of these elements. The theory behind that computation is fuzzy logic. …
Discrete Dantzig Selector
We propose a new high-dimensional linear regression estimator: the Discrete Dantzig Selector, which minimizes the number of nonzero regression coefficients, subject to a budget on the maximal absolute correlation between the features and the residuals. We show that the estimator can be expressed as a solution to a Mixed Integer Linear Optimization (MILO) problem—a computationally tractable framework that enables the computation of provably optimal global solutions. Our approach has the appealing characteristic that even if we terminate the optimization problem at an early stage, it exits with a certificate of sub-optimality on the quality of the solution. We develop new discrete first order methods, motivated by recent algorithmic developments in first order continuous convex optimization, to obtain high quality feasible solutions for the Discrete Dantzig Selector problem. Our proposal leads to advantages over the off-the-shelf state-of-the-art integer programming algorithms, which include superior upper bounds obtained for a given computational budget. When a solution obtained from the discrete first order methods is passed as a warm-start to a MILO solver, the performance of the latter improves significantly. Exploiting problem specific information, we propose enhanced MILO formulations that further improve the algorithmic performance of the MILO solvers. We demonstrate, both theoretically and empirically, that, in a wide range of regimes, the statistical properties of the Discrete Dantzig Selector are superior to those of popular $\ell_{1}$-based approaches. For problem instances with $p \approx 2500$ features and $n \approx 900$ observations, our computational framework delivers optimal solutions in a few minutes and certifies optimality within an hour. …
Robust Principal Component Analysis (ROBPCA)
We introduce a new method for robust principal component analysis (PCA). Classical PCA is based on the empirical covariance matrix of the data and hence is highly sensitive to outlying observations. Two robust approaches have been developed to date. The first approach is based on the eigenvectors of a robust scatter matrix such as the minimum covariance determinant or an S-estimator and is limited to relatively low-dimensional data. The second approach is based on projection pursuit and can handle highdimensional data. Here we propose the ROBPCA approach, which combines projection pursuit ideas with robust scatter matrix estimation. ROBPCA yields more accurate estimates at noncontaminated datasets and more robust estimates at contaminated data. ROBPCA can be computed rapidly, and is able to detect exact-fit situations. As a by-product, ROBPCA produces a diagnostic plot that displays and classifies the outliers. We apply the algorithm to several datasets from chemometrics and engineering. …

# Whats new on arXiv

There are many applications scenarios for which the computational performance and memory footprint of the prediction phase of Deep Neural Networks (DNNs) needs to be optimized. Binary Neural Networks (BDNNs) have been shown to be an effective way of achieving this objective. In this paper, we show how Convolutional Neural Networks (CNNs) can be implemented using binary representations. Espresso is a compact, yet powerful library written in C/CUDA that features all the functionalities required for the forward propagation of CNNs, in a binary file less than 400KB, without any external dependencies. Although it is mainly designed to take advantage of massive GPU parallelism, Espresso also provides an equivalent CPU implementation for CNNs. Espresso provides special convolutional and dense layers for BCNNs, leveraging bit-packing and bit-wise computations for efficient execution. These techniques provide a speed-up of matrix-multiplication routines, and at the same time, reduce memory usage when storing parameters and activations. We experimentally show that Espresso is significantly faster than existing implementations of optimized binary neural networks ($\approx$ 2 orders of magnitude). Espresso is released under the Apache 2.0 license and is available at http://…/espresso.
This paper considers the distributed optimization problem over a network, where the objective is to optimize a global function formed by a sum of local functions, using only local computation and communication. We develop an Accelerated Distributed Nesterov Gradient Descent (Acc-DNGD) method. When the objective function is convex and $L$-smooth, we show that it achieves a $O(\frac{1}{t^{1.4-\epsilon}})$ convergence rate for all $\epsilon\in(0,1.4)$. We also show the convergence rate can be improved to $O(\frac{1}{t^2})$ if the objective function is a composition of a linear map and a strongly-convex and smooth function. When the objective function is $\mu$-strongly convex and $L$-smooth, we show that it achieves a linear convergence rate of $O([ 1 - O( (\frac{\mu}{L})^{5/7} )]^t)$, where $\frac{L}{\mu}$ is the condition number of the objective.
Inference of latent feature models in the Bayesian nonparametric setting is generally difficult, especially in high dimensional settings, because it usually requires proposing features from some prior distribution. In special cases, where the integration is tractable, we could sample feature assignments according to a predictive likelihood. However, this still may not be efficient in high dimensions. We present a novel method to accelerate the mixing of latent variable model inference by proposing feature locations from the data, as opposed to the prior. This sampling method is efficient for proper mixing of the Markov chain Monte Carlo sampler, computationally attractive because this method can be performed in parallel, and is theoretically guaranteed to converge to the posterior distribution as its limiting distribution.
Approximate probabilistic inference algorithms are central to many fields. Examples include sequential Monte Carlo inference in robotics, variational inference in machine learning, and Markov chain Monte Carlo inference in statistics. A key problem faced by practitioners is measuring the accuracy of an approximate inference algorithm on a specific dataset. This paper introduces the auxiliary inference divergence estimator (AIDE), an algorithm for measuring the accuracy of approximate inference algorithms. AIDE is based on the observation that inference algorithms can be treated as probabilistic models and the random variables used within the inference algorithm can be viewed as auxiliary variables. This view leads to a new estimator for the symmetric KL divergence between the output distributions of two inference algorithms. The paper illustrates application of AIDE to algorithms for inference in regression, hidden Markov, and Dirichlet process mixture models. The experiments show that AIDE captures the qualitative behavior of a broad class of inference algorithms and can detect failure modes of inference algorithms that are missed by standard heuristics.
In this paper we introduce RankPL, a modeling language that can be thought of as a qualitative variant of a probabilistic programming language with a semantics based on Spohn’s ranking theory. Broadly speaking, RankPL can be used to represent and reason about processes that exhibit uncertainty expressible by distinguishing ‘normal’ from’ surprising’ events. RankPL allows (iterated) revision of rankings over alternative program states and supports various types of reasoning, including abduction and causal inference. We present the language, its denotational semantics, and a number of practical examples. We also discuss an implementation of RankPL that is available for download.
We present an accelerated algorithm for hierarchical density based clustering. Our new algorithm improves upon HDBSCAN*, which itself provided a significant qualitative improvement over the popular DBSCAN algorithm. The accelerated HDBSCAN* algorithm provides comparable performance to DBSCAN, while supporting variable density clusters, and eliminating the need for the difficult to tune distance scale parameter. This makes accelerated HDBSCAN* the default choice for density based clustering. Library available at: https://…/hdbscan
A number of real world problems in many domains (e.g. sociology, biology, political science and communication networks) can be modeled as dynamic networks with nodes representing entities of interest and edges representing interactions among the entities at different points in time. A common representation for such models is the snapshot model – where a network is defined at logical time-stamps. An important problem under this model is change point detection. In this work we devise an effective and efficient three-step-approach for detecting change points in dynamic networks under the snapshot model. Our algorithm achieves up to 9X speedup over the state-of-the-art while improving quality on both synthetic and real world networks.
The success of deep neural networks has inspired many to wonder whether other learners could benefit from deep, layered architectures. We present a general framework called forward thinking for deep learning that generalizes the architectural flexibility and sophistication of deep neural networks while also allowing for (i) different types of learning functions in the network, other than neurons, and (ii) the ability to adaptively deepen the network as needed to improve results. This is done by training one layer at a time, and once a layer is trained, the input data are mapped forward through the layer to create a new learning problem. The process is then repeated, transforming the data through multiple layers, one at a time, rendering a new dataset, which is expected to be better behaved, and on which a final output layer can achieve good performance. In the case where the neurons of deep neural nets are replaced with decision trees, we call the result a Forward Thinking Deep Random Forest (FTDRF). We demonstrate a proof of concept by applying FTDRF on the MNIST dataset. We also provide a general mathematical formulation that allows for other types of deep learning problems to be considered.
We introduce recurrent additive networks (RANs), a new gated RNN which is distinguished by the use of purely additive latent state updates. At every time step, the new state is computed as a gated component-wise sum of the input and the previous state, without any of the non-linearities commonly used in RNN transition dynamics. We formally show that RAN states are weighted sums of the input vectors, and that the gates only contribute to computing the weights of these sums. Despite this relatively simple functional form, experiments demonstrate that RANs outperform both LSTMs and GRUs on benchmark language modeling problems. This result shows that many of the non-linear computations in LSTMs and related networks are not essential, at least for the problems we consider, and suggests that the gates are doing more of the computational work than previously understood.
The growing pressure on cloud application scalability has accentuated storage performance as a critical bottle- neck. Although cache replacement algorithms have been extensively studied, cache prefetching – reducing latency by retrieving items before they are actually requested remains an underexplored area. Existing approaches to history-based prefetching, in particular, provide too few benefits for real systems for the resources they cost. We propose MITHRIL, a prefetching layer that efficiently exploits historical patterns in cache request associations. MITHRIL is inspired by sporadic association rule mining and only relies on the timestamps of requests. Through evaluation of 135 block-storage traces, we show that MITHRIL is effective, giving an average of a 55% hit ratio increase over LRU and PROBABILITY GRAPH, a 36% hit ratio gain over AMP at reasonable cost. We further show that MITHRIL can supplement any cache replacement algorithm and be readily integrated into existing systems. Furthermore, we demonstrate the improvement comes from MITHRIL being able to capture mid-frequency blocks.
We propose a novel neural network structure called CrossNets, which considers architectures on directed acyclic graphs. This structure builds on previous generalizations of feed forward models, such as ResNets, by allowing for all forward cross connections between layers (both adjacent and non-adjacent). The addition of cross connections among the network increases information flow across the whole network, leading to better training and testing performances. The superior performance of the network is tested against four benchmark datasets: MNIST, CIFAR-10, CIFAR-100, and SVHN. We conclude with a proof of convergence for Crossnets to a local minimum for error, where weights for connections are chosen through backpropagation with momentum.
Deep reinforcement learning (DRL) methods such as the Deep Q-Network (DQN) have achieved state-of-the-art results in a variety of challenging, high-dimensional domains. This success is mainly attributed to the power of deep neural networks to learn rich domain representations for approximating the value function or policy. Batch reinforcement learning methods with linear representations, on the other hand, are more stable and require less hyper parameter tuning. Yet, substantial feature engineering is necessary to achieve good results. In this work we propose a hybrid approach — the Least Squares Deep Q-Network (LS-DQN), which combines rich feature representations learned by a DRL algorithm with the stability of a linear least squares method. We do this by periodically re-training the last hidden layer of a DRL network with a batch least squares update. Key to our approach is a Bayesian regularization term for the least squares update, which prevents over-fitting to the more recent data. We tested LS-DQN on five Atari games and demonstrate significant improvement over vanilla DQN and Double-DQN. We also investigated the reasons for the superior performance of our method. Interestingly, we found that the performance improvement can be attributed to the large batch size used by the LS method when optimizing the last layer.
We present a novel method for frequentist statistical inference in $M$-estimation problems, based on stochastic gradient descent (SGD) with a fixed step size: we demonstrate that the average of such SGD sequences can be used for statistical inference, after proper scaling. An intuitive analysis using the Ornstein-Uhlenbeck process suggests that such averages are asymptotically normal. From a practical perspective, our SGD-based inference procedure is a first order method, and is well-suited for large scale problems. To show its merits, we apply it to both synthetic and real datasets, and demonstrate that its accuracy is comparable to classical statistical methods, while requiring potentially far less computation.
The method introduced in this paper aims at helping deep learning practitioners faced with an overfit problem. The idea is to replace, in a multi-branch network, the standard summation of parallel branches with a stochastic affine combination. Applied to 3-branch residual networks, shake-shake regularization improves on the best single shot published results on CIFAR-10 and CIFAR-100 by reaching test errors of 2.86% and 15.85%. Experiments on architectures without skip connections or Batch Normalization show encouraging results and open the door to a large set of applications. Code is available at https://…/shake-shake.
We introduce a novel framework for adversarial training where the target distribution is annealed between the uniform distribution and the data distribution. We posited a conjecture that learning under continuous annealing in the nonparametric regime is stable irrespective of the divergence measures in the objective function and proposed an algorithm, dubbed {\ss}-GAN, in corollary. In this framework, the fact that the initial support of the generative network is the whole ambient space combined with annealing are key to balancing the minimax game. In our experiments on synthetic data, MNIST, and CelebA, {\ss}-GAN with a fixed annealing schedule was stable and did not suffer from mode collapse.
Despite incredible recent advances in machine learning, building machine learning applications remains prohibitively time-consuming and expensive for all but the best-trained, best-funded engineering organizations. This expense comes not from a need for new and improved statistical models but instead from a lack of systems and tools for supporting end-to-end machine learning application development, from data preparation and labeling to productionization and monitoring. In this document, we outline opportunities for infrastructure supporting usable, end-to-end machine learning applications in the context of the nascent DAWN (Data Analytics for What’s Next) project at Stanford.
Collecting labeled data is costly and thus is a critical bottleneck in real-world classification tasks. To mitigate the problem, we consider a complementary label, which specifies a class that a pattern does not belong to. Collecting complementary labels would be less laborious than ordinary labels since users do not have to carefully choose the correct class from many candidate classes. However, complementary labels are less informative than ordinary labels and thus a suitable approach is needed to better learn from complementary labels. In this paper, we show that an unbiased estimator of the classification risk can be obtained only from complementary labels, if a loss function satisfies a particular symmetric condition. We theoretically prove the estimation error bounds for the proposed method, and experimentally demonstrate the usefulness of the proposed algorithms.
Identifying a set of homogeneous clusters in a heterogeneous dataset is one of the most important classes of problems in statistical modeling. In the realm of unsupervised partitional clustering, k-means is a very important algorithm for this. In this technical report, we develop a new k-means variant called Augmented k-means, which is a hybrid of k-means and logistic regression. During each iteration, logistic regression is used to predict the current cluster labels, and the cluster belonging probabilities are used to control the subsequent re-estimation of cluster means. Observations which can’t be firmly identified into clusters are excluded from the re-estimation step. This can be valuable when the data exhibit many characteristics of real datasets such as heterogeneity, non-sphericity, substantial overlap, and high scatter. Augmented k-means frequently outperforms k-means by more accurately classifying observations into known clusters and / or converging in fewer iterations. We demonstrate this on both simulated and real datasets. Our algorithm is implemented in Python and will be available with this report.
Reinforcement learning is a general and powerful framework with which to study and implement artificial intelligence. Recent advances in deep learning have enabled RL algorithms to achieve impressive performance in restricted domains such as playing Atari video games (Mnih et al., 2015) and, recently, the board game Go (Silver et al., 2016). However, we are still far from constructing a generally intelligent agent. Many of the obstacles and open questions are conceptual: What does it mean to be intelligent? How does one explore and learn optimally in general, unknown environments? What, in fact, does it mean to be optimal in the general sense? The universal Bayesian agent AIXI (Hutter, 2005) is a model of a maximally intelligent agent, and plays a central role in the sub-field of general reinforcement learning (GRL). Recently, AIXI has been shown to be flawed in important ways; it doesn’t explore enough to be asymptotically optimal (Orseau, 2010), and it can perform poorly with certain priors (Leike and Hutter, 2015). Several variants of AIXI have been proposed to attempt to address these shortfalls: among them are entropy-seeking agents (Orseau, 2011), knowledge-seeking agents (Orseau et al., 2013), Bayes with bursts of exploration (Lattimore, 2013), MDL agents (Leike, 2016a), Thompson sampling (Leike et al., 2016), and optimism (Sunehag and Hutter, 2015). We present AIXIjs, a JavaScript implementation of these GRL agents. This implementation is accompanied by a framework for running experiments against various environments, similar to OpenAI Gym (Brockman et al., 2016), and a suite of interactive demos that explore different properties of the agents, similar to REINFORCEjs (Karpathy, 2015). We use AIXIjs to present numerous experiments illustrating fundamental properties of, and differences between, these agents.
Logic regression was developed more than a decade ago as a tool to construct predictors from Boolean combinations of binary covariates. It has been mainly used to model epistatic effects in genetic association studies, which is very appealing due to the intuitive interpretation of logic expressions to describe the interaction between genetic variations. Nevertheless logic regression has remained less well known than other approaches to epistatic association mapping. Here we will adopt an advanced evolutionary algorithm called GMJMCMC (Genetically modified Mode Jumping Markov Chain Monte Carlo) to perform Bayesian model selection in the space of logic regression models. After describing the algorithmic details of GMJMCMC we perform a comprehensive simulation study that illustrates its performance given logic regression terms of various complexity. Specifically GMJMCMC is shown to be able to identify three-way and even four-way interactions with relatively large power, a level of complexity which has not been achieved by previous implementations of logic regression. We apply GMJMCMC to reanalyze QTL mapping data for Recombinant Inbred Lines in Arabidopsis thaliana and from a backcross population in Drosophila where we identify several interesting epistatic effects.
With the increase of online customer opinions in specialised websites and social networks, the necessity of automatic systems to help to organise and classify customer reviews by domain-specific aspect/categories and sentiment polarity is more important than ever. Supervised approaches to Aspect Based Sentiment Analysis obtain good results for the domain/language their are trained on, but having manually labelled data for training supervised systems for all domains and languages use to be very costly and time consuming. In this work we describe W2VLDA, an unsupervised system based on topic modelling, that combined with some other unsupervised methods and a minimal configuration, performs aspect/category classifiation, aspectterms/opinion-words separation and sentiment polarity classification for any given domain and language. We also evaluate the performance of the aspect and sentiment classification in the multilingual SemEval 2016 task 5 (ABSA) dataset. We show competitive results for several languages (English, Spanish, French and Dutch) and domains (hotels, restaurants, electronic-devices).
As opposed to standard empirical risk minimization (ERM), distributionally robust optimization aims to minimize the worst-case risk over a larger ambiguity set containing the original empirical distribution of the training data. In this work, we describe a minimax framework for statistical learning with ambiguity sets given by balls in Wasserstein space. In particular, we prove a generalization bound that involves the covering number properties of the original ERM problem. As an illustrative example, we provide generalization guarantees for domain adaptation problems where the Wasserstein distance between the source and target domain distributions can be reliably estimated from unlabeled samples.
We study algorithms for online nonparametric regression that learn the directions along which the regression function is smoother. Our algorithm learns the Mahalanobis metric based on the gradient outer product matrix $\boldsymbol{G}$ of the regression function (automatically adapting to the effective rank of this matrix), while simultaneously bounding the regret —on the same data sequence— in terms of the spectrum of $\boldsymbol{G}$. As a preliminary step in our analysis, we generalize a nonparametric online learning algorithm by Hazan and Megiddo by enabling it to compete against functions whose Lipschitzness is measured with respect to an arbitrary Mahalanobis metric.

# If you did not already know

Exponential Moving Average
An exponential moving average (EMA), also known as an exponentially weighted moving average (EWMA), is a type of infinite impulse response filter that applies weighting factors which decrease exponentially. The weighting for each older datum decreases exponentially, never reaching zero. The graph at right shows an example of the weight decrease. …
Exponential Moving Average (EMA)
Pruned Exact Linear Time
This approach is based on the algorithm of Jackson et al. (2005 (‘An algorithm for optimal partitioning of data on an interval’)) , but involves a pruning step within the dynamic program. This pruning reduces the computational cost of the method, but does not affect the exactness of the resulting segmentation. It can be applied to find changepoints under a range of statistical criteria such as penalised likelihood, quasi-likelihood (Braun et al., 2000 (‘Multiple changepoint fitting via quasilikelihood, with application to DNA sequence segmentation’)) and cumulative sum of squares (Inclan and Tiao, 1994 (‘Use of cumulative sums of squares for retrospective detection of changes of variance.’); Picard et al., 2011 (‘Joint segmentation, calling and normalization of multiple cgh profiles’)). In simulations we compare PELT with both Binary Segmentation and Optimal Partitioning. We show that PELT can be calculated orders of magnitude faster than Optimal Partitioning, particularly for long data sets. Whilst asymptotically PELT can be quicker, we find that in practice Binary Segmentation is quicker on the examples we consider, and we believe this would be the case in almost all applications. However, we show that PELT leads to a substantially more accurate segmentation than Binary Segmentation. …
Pruned Exact Linear Time (PELT)
Negative Binomial Regression
Negative binomial regression is for modeling count variables, usually for over-dispersed count outcome variables. …
Negative Binomial Regression (NBR)

# Document worth reading: “Unsupervised learning of phase transitions: from principle component analysis to variational autoencoders”

We employ unsupervised machine learning techniques to learn latent parameters which best describe states of the two-dimensional Ising model and the three-dimensional XY model. These methods range from principle component analysis to artificial neural network based variational autoencoders. The states are sampled using a Monte-Carlo simulation above and below the critical temperature. We find that the predicted latent parameters correspond to the known order parameters. The latent representation of the states of the models in question are clustered, which makes it possible to identify phases without prior knowledge of their existence or the underlying Hamiltonian. Furthermore, we find that the reconstruction loss function can be used as a universal identifier for phase transitions. Unsupervised learning of phase transitions: from principle component analysis to variational autoencoders

# Distilled News

In this live webinar, on May 24th at 11AM Central, learn how Anaconda empowers data scientists to encapsulate and deploy their data science projects as live applications with a single click.
A look at 2 topics in A/B testing: Ensuring that bucket assignment is truly random, and conducting an A/B test on an opt-in feature. KNIME Analytics Platform solves your complex data puzzles KNIME Analytics Platform
This post will be about replicating the Marcos Lopez de Prado algorithm from his paper building diversified portfolios that outperform out of sample. This algorithm is one that attempts to make a tradeoff between the classic mean-variance optimization algorithm that takes into account a covariance structure, but is unstable, and an inverse volatility algorithm that ignores covariance, but is more stable. This is a paper that I struggled with until I ran the code in Python (I have anaconda installed but have trouble installing some packages such as keras because I’m on windows…would love to have someone walk me through setting up a Linux dual-boot), as I assumed that the clustering algorithm actually was able to concretely group every asset into a particular cluster (I.E. ETF 1 would be in cluster 1, ETF 2 in cluster 3, etc.). Turns out, that isn’t at all the case. Here’s how the algorithm actually works.
This is the second part of the series on Instrumental Variables. For other parts of the series follow the tag instrumental variables. In this exercise set we will build on the example from part-1. We will now consider an over-identified case i.e. we have multiple IVs for an endogenous variable. We will also look at tests for endogeneity and over-identifying restrictions.
This post discusses a number of options that are available in R for analyzing data from max-diff experiments, using the package flipMaxDiff. For a more detailed explanation of how to analyze max-diff, and what the outputs mean, you should read the post How max-diff analysis works. The post will cover the processes of installing packages, importing your data and experimental design, before discussing counting analysis and the more powerful, and valid, latent class analysis.
Principal components analysis (PCA) is a statistical technique that allows to identify underlying linear patterns in a data set so it can be expressed in terms of other data set of significatively lower dimension without much loss of information. The final data set should be able to explain most of the variance of the original data set by making a variable reduction. The final variables will be named as principal components.

# R Packages worth a look

Stubbing and Setting Expectations on ‘HTTP’ Requests (webmockr)
Stubbing and setting expectations on ‘HTTP’ requests. Includes tools for stubbing ‘HTTP’ requests, including expected request conditions and response conditions. Match on ‘HTTP’ method, query parameters, request body, headers and more.

alabama’ Plugin for the ‘R’ Optimization Infrastructure (ROI.plugin.alabama)
Enhances the R Optimization Infrastructure (‘ROI’) package with the ‘alabama’ solver for solving nonlinear optimization problems.

Automated Linear Regression Diagnostic (lindia)
Provides a set of streamlined functions that allow easy generation of linear regression diagnostic plots necessarily for checking linear model assumptions. This package is meant for easy scheming of linear regression diagnostics, while preserving merits of ‘The Grammar of Graphics’ as implemented in ‘ggplot2’. See the ‘ggplot2’ website for more information regarding the specific capability of graphics.

R Wrapper to the spaCy NLP Library (spacyr)
An R wrapper to the ‘Python’ ‘spaCy’ ‘NLP’ library, from <http://spacy.io>.

Power Analysis Tool for Joint Testing Hazards with Competing Risks Data (powerCompRisk)
A power analysis tool for jointly testing the cause-1 cause-specific hazard and the any-cause hazard with competing risks data.

# If you did not already know

Kernel Fisher Discriminant Analysis
In statistics, kernel Fisher discriminant analysis (KFD), also known as generalized discriminant analysis and kernel discriminant analysis, is a kernelized version of linear discriminant analysis. It is named after Ronald Fisher. Using the kernel trick, LDA is implicitly performed in a new feature space, which allows non-linear mappings to be learned.
“Linear Discriminant Analysis”
Kernel Fisher Discriminant Analysis (KFD,KFDA)
This article deals with a novel branch of Separation of Concerns, called Multi-Advisor Reinforcement Learning (MAd-RL), where a single-agent RL problem is distributed to $n$ learners, called advisors. Each advisor tries to solve the problem with a different focus. Their advice is then communicated to an aggregator, which is in control of the system. For the local training, three off-policy bootstrapping methods are proposed and analysed: local-max bootstraps with the local greedy action, rand-policy bootstraps with respect to the random policy, and agg-policy bootstraps with respect to the aggregator’s greedy policy. MAd-RL is positioned as a generalisation of Reinforcement Learning with Ensemble methods. An experiment is held on a simplified version of the Ms. Pac-Man Atari game. The results confirm the theoretical relative strengths and weaknesses of each method. … Multi-Advisor Reinforcement Learning
Neural Networks / Artificial Neural Networks
In computer science and related fields, artificial neural networks (ANNs) are computational models inspired by an animal’s central nervous systems (in particular the brain) which is capable of machine learning as well as pattern recognition. Artificial neural networks are generally presented as systems of interconnected “neurons” which can compute values from inputs. … Neural Networks / Artificial Neural Networks (ANN)

# If you did not already know

Partitional Clustering
Partitional clustering decomposes a data set into a set of disjoint clusters. Given a data set of N points, a partitioning method constructs K (N ≥ K) partitions of the data, with each partition representing a cluster. That is, it classifies the data into K groups by satisfying the following requirements:
(1) each group contains at least one point, and
(2) each point belongs to exactly one group. Notice that for fuzzy partitioning, a point can belong to more than one group.
Many partitional clustering algorithms try to minimize an objective function. …
Partitional Clustering
Recurrent Collective Classification
We propose a new method for training iterative collective classifiers for labeling nodes in network data. The iterative classification algorithm (ICA) is a canonical method for incorporating relational information into classification. Yet, existing methods for training ICA models rely on the assumption that relational features reflect the true labels of the nodes. This unrealistic assumption introduces a bias that is inconsistent with the actual prediction algorithm. In this paper, we introduce recurrent collective classification (RCC), a variant of ICA analogous to recurrent neural network prediction. RCC accommodates any differentiable local classifier and relational feature functions. We provide gradient-based strategies for optimizing over model parameters to more directly minimize the loss function. In our experiments, this direct loss minimization translates to improved accuracy and robustness on real network data. We demonstrate the robustness of RCC in settings where local classification is very noisy, settings that are particularly challenging for ICA. … Recurrent Collective Classification (RCC)
Data Acceleration
Data technologies are evolving rapidly, but organizations have adopted most of these in piecemeal fashion. As a result, enterprise data – whether related to customer interactions, business performance, computer notifications, or external events in the business environment – is vastly underutilized. Moreover, companies’ data ecosystems have become complex and littered with data silos. This makes the data more difficult to access, which in turn limits the value that organizations can get out of it. Indeed, according to a recent Gartner, Inc. report, 85 percent of Fortune 500 organizations will be unable to exploit Big Data for competitive advantage through 2015. Furthermore, a recent Accenture study found that half of all companies have concerns about the accuracy of their data, and the majority of executives are unclear about the business outcomes they are getting from their data analytics programs. To unlock the value hidden in their data, companies must start treating data as a supply chain, enabling it to flow easily and usefully through the entire organization – and eventually throughout each company’s ecosystem of partners, including suppliers and customers. The time is right for this approach. For one thing, new external data sources are becoming available, providing fresh opportunities for data insights. In addition, the tools and technology required to build a better data platform are available and in use. These provide a foundation on which companies can construct an integrated, end-to-end data supply chain. … Data Acceleration