Distilled News

Website Heatmaps – Tools, Features & Best Practices

Heatmapping is a simple and efficient way to analyze visitor interaction and user behavior on your website. If you are in a Conversion Rate Optimization (aka. CRO) project with your e-commerce or startup (or any other online) business, it’s indispensable to run some website heatmaps – such as click, mouse movement or scroll heatmaps.


In my last post I did some drawings based on L-Systems. These drawings are done sequentially. At any step, the state of the drawing can be described by the position (coordinates) and the orientation of the pencil. In that case I only used two kind of operators: drawing a straight line and turning a constant angle.

Do Deep Neural Networks Suffer from Crowding?

Crowding is a visual effect suffered by humans, in which an object that can be recognized in isolation can no longer be recognized when other objects, called flankers, are placed close to it. In this work, we study the effect of crowding in artificial Deep Neural Networks for object recognition. We analyze both standard deep convolutional neural networks (DCNNs) as well as a new version of DCNNs which is 1) multi-scale and 2) with size of the convolution filters change depending on the eccentricity wrt to the center of fixation. Such networks, that we call eccentricity-dependent, are a computational model of the feedforward path of the primate visual cortex. Our results reveal that the eccentricity-dependent model, trained on target objects in isolation, can recognize such targets in the presence of flankers, if the targets are near the center of the image, whereas DCNNs cannot. Also, for all tested networks, when trained on targets in isolation, we find that recognition accuracy of the networks decreases the closer the flankers are to the target and the more flankers there are. We find that visual similarity between the target and flankers also plays a role and that pooling in early layers of the network leads to more crowding. Additionally, we show that incorporating the flankers into the images of the training set does not improve performance with crowding.

Free Guidebook: Build a complete predictive maintenance strategy

Predictive maintenance is widely considered to be the obvious next step for any business with high-capital assets: harness machine learning to control rising equipment maintenance costs and pave the way for self maintenance through artificial intelligence (AI).

Top Modules and Features of Business Intelligence Tools

What makes BI tools great? What features are important while selecting a good BI tool? Let’s have a look. NYU MS in Business Analytics 2017NYU MS in Business Analytics

Securely store API keys in R scripts with the “secret” package

If you use an API key to access a secure service, or need to use a password to access a protected database, you’ll need to provide these ‘secrets’ in your R code somewhere. That’s easy to do if you just include those keys as strings in your code — but it’s not very secure. This means your private keys and passwords are stored in plain-text on your hard drive, and if you email your script they’re available to anyone who can intercept that email. It’s also really easy to inadvertently include those keys in a public repo if you use Github or similar code-sharing services. To address this problem, Gábor Csárdi and Andrie de Vries created the secret package for R. The secret package integrates with OpenSSH, providing R functions that allow you to create a vault to keys on your local machine, define trusted users who can access those keys, and then include encrypted keys in R scripts or packages that can only be decrypted by you or by people you trust.

Multiple Factor Analysis to analyse several data tables

How to take into account and how to compare information from different information sources? Multiple Factor Analysis is a principal Component Methods that deals with datasets that contain quantitative and/or categorical variables that are structured by groups. Here is a course with videos that present the method named Multiple Factor Analysis.

Multiple Correspondence Analysis with FactoMineR

How to analyse of categorical data? Here is a course with videos that present Multiple Correspondence Analysis in a French way. The most well-known use of Multiple Correspondence Analysis is: surveys. Four videos present a course on MCA, highlighting the way to interpret the data. Then you will find videos presenting the way to implement MCA in FactoMineR, to deal with missing values in MCA thanks to the package missMDA and lastly a video to draw interactive graphs with Factoshiny. And finally you will see that the new package FactoInvestigate allows you to obtain automatically an interpretation of your MCA results. With this course, you will be stand-alone to perform and interpret results obtain with MCA.

Document worth reading: “Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes”

It is widely observed that deep learning models with learned parameters generalize well, even with much more model parameters than the number of training samples. We systematically investigate the underlying reasons why deep neural networks often generalize well, and reveal the difference between the minima (with the same training error) that generalize well and those they don’t. We show that it is the characteristics the landscape of the loss function that explains the good generalization capability. For the landscape of loss function for deep networks, the volume of basin of attraction of good minima dominates over that of poor minima, which guarantees optimization methods with random initialization to converge to good minima. We theoretically justify our findings through analyzing 2-layer neural networks; and show that the low-complexity solutions have a small norm of Hessian matrix with respect to model parameters. For deeper networks, extensive numerical evidence helps to support our arguments. Towards Understanding Generalization of Deep Learning: Perspective of Loss Landscapes

If you did not already know

Byzantine Gradient Descent google
We consider the problem of distributed statistical machine learning in adversarial settings, where some unknown and time-varying subset of working machines may be compromised and behave arbitrarily to prevent an accurate model from being learned. This setting captures the potential adversarial attacks faced by Federated Learning — a modern machine learning paradigm that is proposed by Google researchers and has been intensively studied for ensuring user privacy. Formally, we focus on a distributed system consisting of a parameter server and $m$ working machines. Each working machine keeps $N/m$ data samples, where $N$ is the total number of samples. The goal is to collectively learn the underlying true model parameter of dimension $d$. In classical batch gradient descent methods, the gradients reported to the server by the working machines are aggregated via simple averaging, which is vulnerable to a single Byzantine failure. In this paper, we propose a Byzantine gradient descent method based on the geometric median of means of the gradients. We show that our method can tolerate $q \le (m-1)/2$ Byzantine failures, and the parameter estimate converges in $O(\log N)$ rounds with an estimation error of $\sqrt{d(2q+1)/N}$, hence approaching the optimal error rate $\sqrt{d/N}$ in the centralized and failure-free setting. The total computational complexity of our algorithm is of $O((Nd/m) \log N)$ at each working machine and $O(md + kd \log^3 N)$ at the central server, and the total communication cost is of $O(m d \log N)$. We further provide an application of our general results to the linear regression problem. A key challenge arises in the above problem is that Byzantine failures create arbitrary and unspecified dependency among the iterations and the aggregated gradients. We prove that the aggregated gradient converges uniformly to the true gradient function. …

Least-Angle Regression (LARS) google
In statistics, least-angle regression (LARS) is a regression algorithm for high-dimensional data, developed by Bradley Efron, Trevor Hastie, Iain Johnstone and Robert Tibshirani. Suppose we expect a response variable to be determined by a linear combination of a subset of potential covariates. Then the LARS algorithm provides a means of producing an estimate of which variables to include, as well as their coefficients. Instead of giving a vector result, the LARS solution consists of a curve denoting the solution for each value of the L1 norm of the parameter vector. The algorithm is similar to forward stepwise regression, but instead of including variables at each step, the estimated parameters are increased in a direction equiangular to each one’s correlations with the residual. …

False Positive Rate google
In statistics, when performing multiple comparisons, the term false positive ratio, also known as the false alarm ratio, usually refers to the probability of falsely rejecting the null hypothesis for a particular test. The false positive rate (or “false alarm rate”) usually refers to the expectancy of the false positive ratio.

Magister Dixit

“Some decisions you need to make are big enough to change the course for your business. And your past experiences may not be good predictors of the future. More data are within your reach to understand what was previously unknown. Sophisticated analytical tools are available to you to ‘see’ a wider range of possibilities and evaluate them quickly. Now is a good time for an upgrade in your decision making capabilities.” PWC ( 2014 )

Document worth reading: “Towards Statistical Reasoning in Description Logics over Finite Domains (Full Version)”

We present a probabilistic extension of the description logic $\mathcal{ALC}$ for reasoning about statistical knowledge. We consider conditional statements over proportions of the domain and are interested in the probabilistic-logical consequences of these proportions. After introducing some general reasoning problems and analyzing their properties, we present first algorithms and complexity results for reasoning in some fragments of Statistical $\mathcal{ALC}$. Towards Statistical Reasoning in Description Logics over Finite Domains (Full Version)

Whats new on arXiv

Low-Rank Kernel Subspace Clustering

Most state-of-the-art subspace clustering methods only work with linear (or affine) subspaces. In this paper, we present a kernel subspace clustering method that can handle non-linear models. While an arbitrary kernel can non-linearly map data into high-dimensional Hilbert feature space, the data in the resulting feature space are very unlikely to have the desired subspace structures. By contrast, we propose to learn a low-rank kernel mapping, with which the mapped data in feature space are not only low-rank but also self-expressive, such that the low-dimensional subspace structures are present and manifested in the high-dimensional feature space. We have evaluated the proposed method extensively on both motion segmentation and image clustering benchmarks, and obtained superior results, outperforming the kernel subspace clustering method that uses standard kernels~\cite{patel2014kernel} and other state-of-the-art linear subspace clustering methods.

Online Multi-Armed Bandit

We introduce a novel variant of the multi-armed bandit problem, in which bandits are streamed one at a time to the player, and at each point, the player can either choose to pull the current bandit or move on to the next bandit. Once a player has moved on from a bandit, they may never visit it again, which is a crucial difference between our problem and classic multi-armed bandit problems. In this online context, we study Bernoulli bandits (bandits with payout Ber(p_i) for some underlying mean p_i) with underlying means drawn i.i.d. from various distributions, including the uniform distribution, and in general, all distributions that have a CDF satisfying certain differentiability conditions near zero. In all cases, we suggest several strategies and investigate their expected performance. Furthermore, we bound the performance of any optimal strategy and show that the strategies we have suggested are indeed optimal up to a constant factor. We also investigate the case where the distribution from which the underlying means are drawn is not known ahead of time. We again, are able to suggest algorithms that are optimal up to a constant factor for this case, given certain mild conditions on the universe of distributions.

graph2vec: Learning Distributed Representations of Graphs

Recent works on representation learning for graph structured data predominantly focus on learning distributed representations of graph substructures such as nodes and subgraphs. However, many graph analytics tasks such as graph classification and clustering require representing entire graphs as fixed length feature vectors. While the aforementioned approaches are naturally unequipped to learn such representations, graph kernels remain as the most effective way of obtaining them. However, these graph kernels use handcrafted features (e.g., shortest paths, graphlets, etc.) and hence are hampered by problems such as poor generalization. To address this limitation, in this work, we propose a neural embedding framework named graph2vec to learn data-driven distributed representations of arbitrary sized graphs. graph2vec’s embeddings are learnt in an unsupervised manner and are task agnostic. Hence, they could be used for any downstream task such as graph classification, clustering and even seeding supervised representation learning approaches. Our experiments on several benchmark and large real-world datasets show that graph2vec achieves significant improvements in classification and clustering accuracies over substructure representation learning approaches and are competitive with state-of-the-art graph kernels.

Deep Learning to Attend to Risk in ICU

Modeling physiological time-series in ICU is of high clinical importance. However, data collected within ICU are irregular in time and often contain missing measurements. Since absence of a measure would signify its lack of importance, the missingness is indeed informative and might reflect the decision making by the clinician. Here we propose a deep learning architecture that can effectively handle these challenges for predicting ICU mortality outcomes. The model is based on Long Short-Term Memory, and has layered attention mechanisms. At the sensing layer, the model decides whether to observe and incorporate parts of the current measurements. At the reasoning layer, evidences across time steps are weighted and combined. The model is evaluated on the PhysioNet 2012 dataset showing competitive and interpretable results.

Iris: A Conversational Agent for Complex Tasks

Today’s conversational agents are restricted to simple standalone commands. In this paper, we present Iris, an agent that draws on human conversational strategies to combine commands, allowing it to perform more complex tasks that it has not been explicitly designed to support: for example, composing one command to ‘plot a histogram’ with another to first ‘log-transform the data’. To enable this complexity, we introduce a domain specific language that transforms commands into automata that Iris can compose, sequence, and execute dynamically by interacting with a user through natural language, as well as a conversational type system that manages what kinds of commands can be combined. We have designed Iris to help users with data science tasks, a domain that requires support for command combination. In evaluation, we find that data scientists complete a predictive modeling task significantly faster (2.6 times speedup) with Iris than a modern non-conversational programming environment. Iris supports the same kinds of commands as today’s agents, but empowers users to weave together these commands to accomplish complex goals.

Optimization by gradient boosting

Gradient boosting is a state-of-the-art prediction technique that sequentially produces a model in the form of linear combinations of simple predictors—typically decision trees—by solving an infinite-dimensional convex optimization problem. We provide in the present paper a thorough analysis of two widespread versions of gradient boosting, and introduce a general framework for studying these algorithms from the point of view of functional optimization. We prove their convergence as the number of iterations tends to infinity and highlight the importance of having a strongly convex risk functional to minimize. We also present a reasonable statistical context ensuring consistency properties of the boosting predictors as the sample size grows. In our approach, the optimization procedures are run forever (that is, without resorting to an early stopping strategy), and statistical regularization is basically achieved via an appropriate L^2 penalization of the loss and strong convexity arguments.

Neural Reranking for Named Entity Recognition

We propose a neural reranking system for named entity recognition (NER). The basic idea is to leverage recurrent neural network models to learn sentence-level patterns that involve named entity mentions. In particular, given an output sentence produced by a baseline NER model, we replace all entity mentions, such as \textit{Barack Obama}, into their entity types, such as \textit{PER}. The resulting sentence patterns contain direct output information, yet is less sparse without specific named entities. For example, ‘PER was born in LOC’ can be such a pattern. LSTM and CNN structures are utilised for learning deep representations of such sentences for reranking. Results show that our system can significantly improve the NER accuracies over two different baselines, giving the best reported results on a standard benchmark.

Trial without Error: Towards Safe Reinforcement Learning via Human Intervention

AI systems are increasingly applied to complex tasks that involve interaction with humans. During training, such systems are potentially dangerous, as they haven’t yet learned to avoid actions that could cause serious harm. How can an AI system explore and learn without making a single mistake that harms humans or otherwise causes serious damage? For model-free reinforcement learning, having a human ‘in the loop’ and ready to intervene is currently the only way to prevent all catastrophes. We formalize human intervention for RL and show how to reduce the human labor required by training a supervised learner to imitate the human’s intervention decisions. We evaluate this scheme on Atari games, with a Deep RL agent being overseen by a human for four hours. When the class of catastrophes is simple, we are able to prevent all catastrophes without affecting the agent’s learning (whereas an RL baseline fails due to catastrophic forgetting). However, this scheme is less successful when catastrophes are more complex: it reduces but does not eliminate catastrophes and the supervised learner fails on adversarial examples found by the agent. Extrapolating to more challenging environments, we show that our implementation would not scale (due to the infeasible amount of human labor required). We outline extensions of the scheme that are necessary if we are to train model-free agents without a single catastrophe.

Translational Recommender Networks

Representing relationships as translations in vector space lives at the heart of many neural embedding models such as word embeddings and knowledge graph embeddings. In this work, we study the connections of this translational principle with collaborative filtering algorithms. We propose Translational Recommender Networks (\textsc{TransRec}), a new attentive neural architecture that utilizes the translational principle to model the relationships between user and item pairs. Our model employs a neural attention mechanism over a \emph{Latent Relational Attentive Memory} (LRAM) module to learn the latent relations between user-item pairs that best explains the interaction. By exploiting adaptive user-item specific translations in vector space, our model also alleviates the geometric inflexibility problem of other metric learning algorithms while enabling greater modeling capability and fine-grained fitting of users and items in vector space. The proposed architecture not only demonstrates the state-of-the-art performance across multiple recommendation benchmarks but also boasts of improved interpretability. Qualitative studies over the LRAM module shows evidence that our proposed model is able to infer and encode explicit sentiment, temporal and attribute information despite being only trained on implicit feedback. As such, this ascertains the ability of \textsc{TransRec} to uncover hidden relational structure within implicit datasets.

On Lasso refitting strategies

A well-know drawback of l1-penalized estimators is the systematic shrinkage of the large coefficients towards zero. A simple remedy is to treat Lasso as a model-selection procedure and to perform a second refitting step on the selected support. In this work we formalize the notion of refitting and provide oracle bounds for arbitrary refitting procedures of the Lasso solution. One of the most widely used refitting techniques which is based on least-squares may bring a problem of interpretability, since the signs of the refitted estimator might be flipped with respect to the original estimator. This problem arises from the fact that the least-square refitting considers only the support of the Lasso solution, avoiding any information about signs or amplitudes. To this end we define a sign-consistent refitting as an arbitrary refitting procedure, preserving the signs of the first step Lasso solution and provide Oracle inequalities for such estimators. Finally, we consider special refitting strategies: Bregman Lasso and Boosted Lasso. Bregman Lasso has a fruitful property to converge to the sign-consistent least-squares refitting (least-squares with sign constraints), which provides with greater interpretability. We additionally study the Bregman Lasso refitting in the case of orthogonal design, providing with simple intuition behind the proposed method. Boosted Lasso, in contrast, considers information about magnitudes of the first Lasso step and allows to develop better oracle rates for prediction. Finally, we conduct an extensive numerical study to show advantages of one approach over others in different synthetic and semi-real scenarios.

Learning to select data for transfer learning with Bayesian Optimization

Domain similarity measures can be used to gauge adaptability and select suitable data for transfer learning, but existing approaches define ad hoc measures that are deemed suitable for respective tasks. Inspired by work on curriculum learning, we propose to \emph{learn} data selection measures using Bayesian Optimization and evaluate them across models, domains and tasks. Our learned measures outperform existing domain similarity measures significantly on three tasks: sentiment analysis, part-of-speech tagging, and parsing. We show the importance of complementing similarity with diversity, and that learned measures are — to some degree — transferable across models, domains, and even tasks.

MAG: A Multilingual, Knowledge-based Agnostic and Deterministic Entity Linking Approach

Entity linking has recently been the subject of a significant body of research. Currently, the best performing approaches rely on trained mono-lingual models. Porting these approaches to other languages is consequently a difficult endeavor as it requires corresponding training data and retraining of the models. We address this drawback by presenting a novel multilingual, knowledge-based agnostic and deterministic approach to entity linking, dubbed MAG. MAG is based on a combination of context-based retrieval on structured knowledge bases and graph algorithms. We evaluate MAG on 23 data sets and in 7 languages. Our results show that the best approach trained on English datasets (PBOH) achieves a micro F-measure that is up to 4 times worse on datasets in other languages. MAG, on the other hand, achieves state-of-the-art performance on English datasets and reaches a micro F-measure that is up to 0.6 higher than that of PBOH on non-English languages.

Piecewise Deterministic Markov Chain Monte Carlo

A novel class of non-reversible Markov chain Monte Carlo schemes relying on continuous-time piecewise deterministic Markov Processes has recently emerged. In these algorithms, the state of the Markov process evolves according to a deterministic dynamics which is modified using a Markov transition kernel at random event times. These methods enjoy remarkable features including the ability to update only a subset of the state components while other components implicitly keep evolving. However, several important problems remain open. The deterministic dynamics used so far do not exploit the structure of the target. Moreover, exact simulation of the event times is feasible for an important yet restricted class of problems and, even when it is, it is application specific. This limits the applicability of these methods and prevents the development of a generic software implementation. In this paper, we introduce novel MCMC methods addressing these limitations by bringing together piecewise deterministic Markov processes, Hamiltonian dynamics and slice sampling. We propose novel continuous-time algorithms relying on exact Hamiltonian flows and novel discrete-time algorithms which can exploit complex dynamics such as approximate Hamiltonian dynamics arising from symplectic integrators. We demonstrate the performance of these schemes on a variety of applications.

Expected exponential loss for gaze-based video and volume ground truth annotation
Modeling the SBC Tanzania Production-Distribution Logistics Network
Some useful theorems for asymptotic formulas and their applications to skew plane partitions and cylindric partitions
Polylogarithmic Approximation Algorithms for Weighted-$\mathcal{F}$-Deletion Problems
Packing chromatic number versus chromatic and clique number
Pancreas Segmentation in MRI using Graph-Based Decision Fusion on Convolutional Neural Networks
End-to-End Information Extraction without Token-Level Supervision
Multi-label Music Genre Classification from Audio, Text, and Images Using Deep Features
Feedback Vertex Set Inspired Kernel for Chordal Vertex Deletion
Theoretical insights into the optimization landscape of over-parameterized shallow neural networks
The Multivariate Hawkes Process in High Dimensions: Beyond Mutual Excitation
Projected Power Iteration for Network Alignment
Pathological OCT Retinal Layer Segmentation using Branch Residual U-shape Networks
Automatized Generation of Alphabets of Symbols
Performance Evaluation of Distributed Computing Environments with Hadoop and Spark Frameworks
Comparative Performance Analysis of Neural Networks Architectures on H2O Platform for Various Activation Functions
Automatic Backward Differentiation for American Monte-Carlo Algorithms (Conditional Expectation)
Improving Naive Bayes for Regression with Optimised Artificial Surrogate Data
A characterization of Fibonacci numbers
Probing many-body localization in a disordered quantum magnet
Almost sure growth of supercritical multi-type continuous state branching process
Random initial conditions for semi-linear PDEs
Improving Adherence to Heart Failure Management Guidelines via Abductive Reasoning
An Ensemble Boosting Model for Predicting Transfer to the Pediatric Intensive Care Unit
Attitude Control of a 2U Cubesat by Magnetic and Air Drag Torques
Query-Focused Video Summarization: Dataset, Evaluation, and A Memory Network Based Approach
Bad News for Chordal Partitions
Visual Question Answering with Memory-Augmented Networks
On basic graphs of symmetric graphs of valency five
A weak version of path-dependent functional Itô calculus
Stochastic Near-Optimal Controls for Path-Dependent Systems
Optimal Equilibrium for Time-Inconsistent Stopping Problems — the Discrete-Time Case
Practical Locally Private Heavy Hitters
Tracking as Online Decision-Making: Learning a Policy from Streaming Videos with Reinforcement Learning
MoCoGAN: Decomposing Motion and Content for Video Generation
Jackknife Empirical Likelihood-based inference for S-Gini indices
In-Order Transition-based Constituent Parsing
Coalition formation for Multi-agent Pursuit based on Neural Network and AGRMF Model
Covariant Information Theory and Emergent Gravity
‘Maximizing rigidity’ revisited: a convex programming approach for generic 3D shape reconstruction from multiple perspective views
A Real-time Image Reconstruction System for Particle Treatment Planning Using Proton Computed Tomography (pCT)
Fully polynomial FPT algorithms for some classes of bounded clique-width graphs
Energy Conservation and Decoupling in Optical Fibers with Brick-Walls Attenuation Profile
Convergence to consensus of the general finite-dimensional Cucker-Smale model with time-varying delays
Strong Local Nondeterminism of Spherical Fractional Brownian Motion
Residual Features and Unified Prediction Network for Single Stage Detection
Discrete Extremes
Estimation and testing of survival functions via generalized fiducial inference with censored data
Line-Recovery by Programmable Particles
A simple method for the existence of a density for stochastic evolutions with rough coefficients
Dynamics of quantum information in many-body localized systems
Designing Effective Inter-Pixel Information Flow for Natural Image Matting
Chordal decomposition in operator-splitting methods for sparse semidefinite programs
Speeding up the K{ö}hler’s method of contrast thresholding
Optimal Storage under Unsynchrononized Mobile Byzantine Faults
Asymptotic degree distribution in preferential attachment graph models with multiple type edges
Geometric Rescaling Algorithms for Submodular Function Minimization
Every finite non-solvable group admits an Oriented Regular Representation
Well-posedness for SDEs driven by different type of noises
Optimal conflict-free colouring with respect to a subset of intervals
From Quenched Disorder to Continuous Time Random Walk
Lower Bounds for Searching Robots, some Faulty
Eigenvalues and Wiener index of the Zero Divisor graph $Γ[\mathbb {Z}_n]$
On explicit order 1.5 approximations with varying coefficients: the case of super-linear diffusion coefficients
Truly Sub-cubic Algorithms for Language Edit Distance and RNA Folding via Fast Bounded-Difference Min-Plus Product
On consistency of optimal pricing algorithms in repeated posted-price auctions with strategic buyer
Classification of finite groups that admit an oriented regular representation
Markov loops topology
Towards Bidirectional Hierarchical Representations for Attention-Based Neural Machine Translation
The Power of Constraint Grammars Revisited
To Normalize, or Not to Normalize: The Impact of Normalization on Part-of-Speech Tagging
LIG-CRIStAL System for the WMT17 Automatic Post-Editing Task
Approximate Directed Minimum Degree Spanning Tree in Polynomial Time
Tight Analysis of Randomized Greedy MIS
Differentially Private Testing of Identity and Closeness of Discrete Distributions
On the Parallel Undecided-State Dynamics with Two Colors
Fully Automatic and Real-Time Catheter Segmentation in X-Ray Fluoroscopy
Queues Driven by Hawkes Processes
When You Must Forget: beyond strong persistence when forgetting in answer set programming
Preliminary Exploration of Formula Embedding for Mathematical Information Retrieval: can mathematical formulae be embedded like a natural language?
Oscillations in networks of networks stem from adaptive nodes with memory
On the analysis of signals in a permutation Lempel-Ziv complexity – permutation Shannon entropy plane
Cosmological model discrimination with Deep Learning
A moment-generating formula for Erdős-Rényi component sizes
A Unified Framework for Capacitated Covering Problems in Metric and Geometric Spaces
Weak Modular Product of Bipartite Graphs, Bicliques and Isomorphism
Construction of exact constants of motion and effective models for many-body localized systems
Online codes for analog signals
A Discrete Bouncy Particle Sampler
Moment bounds for some fractional stochastic heat equations on the ball
Auxiliary Objectives for Neural Error Detection Models
Detecting Off-topic Responses to Visual Prompts
Discrete-type approximations for non-Markovian optimal stopping problems: Part I
Artificial Error Generation with Machine Translation and Syntactic Patterns
Coloring Down: $3/2$-approximation for special cases of the weighted tree augmentation problem
Aesthetic-Driven Image Enhancement by Adversarial Learning
Spanning Euler families in hypergraphs with certain vertex cuts
Random eigenfunctions on flat tori: universality for the number of intersections
Exploring text datasets by visualizing relevant words
A Simple Language Model based on PMI Matrix Approximations
Computation Rate Maximization for Wireless Powered Mobile Edge Computing
Quasi-device-independent witnessing of genuine multilevel quantum coherence
Cyclic pseudo-{L}oupekine snarks
Reverse Curriculum Generation for Reinforcement Learning

Book Memo: “Mastering Machine Learning with Python in Six Steps”

A Practical Implementation Guide to Predictive Data Analytics Using Python
Master machine learning with Python in six steps and explore fundamental to advanced topics, all designed to make you a worthy practitioner.
This book’s approach is based on the “Six degrees of separation” theory, which states that everyone and everything is a maximum of six steps away. Mastering Machine Learning with Python in Six Steps presents each topic in two parts: theoretical concepts and practical implementation using suitable Python packages.
You’ll learn the fundamentals of Python programming language, machine learning history, evolution, and the system development frameworks. Key data mining/analysis concepts, such as feature dimension reduction, regression, time series forecasting and their efficient implementation in Scikit-learn are also covered. Finally, you’ll explore advanced text mining techniques, neural networks and deep learning techniques, and their implementation.
All the code presented in the book will be available in the form of iPython notebooks to enable you to try out these examples and extend them to your advantage.

R Packages worth a look

Compare Two Data Frames and Summarise the Difference (dataCompareR)
Easy comparison of two tabular data objects in R. Specifically designed to show differences between two sets of data in a useful way that should make it easier to understand the differences, and if necessary, help you work out how to remedy them. Aims to offer a more useful output than all.equal() when your two data sets do not match, but isn’t intended to replace all.equal() as a way to test for equality.

Fitting Tails by the Empirical Residual Coefficient of Variation (ercv)
Provides a methodology simple and trustworthy for the analysis of extreme values and multiple threshold tests for a generalized Pareto distribution, together with an automatic threshold selection algorithm. See del Castillo, J, Daoudi, J and Lockhart, R (2014) <doi:10.1111/sjos.12037>.

Tools to Transform and Query Data with ‘Apache’ ‘Drill’ (sergeant)
Apache Drill’ is a low-latency distributed query engine designed to enable data exploration and ‘analytics’ on both relational and non-relational ‘datastores’, scaling to petabytes of data. Methods are provided that enable working with ‘Apache’ ‘Drill’ instances via the ‘REST’ ‘API’, ‘JDBC’ interface (optional), ‘DBI’ ‘methods’ and using ‘dplyr’/’dbplyr’ idioms.

Bayesian Nonparametric Spectral Density Estimation Using B-Spline Priors (bsplinePsd)
Implementation of a Metropolis-within-Gibbs MCMC algorithm to flexibly estimate the spectral density of a stationary time series. The algorithm updates a nonparametric B-spline prior using the Whittle likelihood to produce pseudo-posterior samples and is based on the work presented by Edwards, Meyer, and Christensen (2017) <arXiv:1707.04878>.

Find Graph Centrality Indices (centiserve)
Calculates centrality indices additional to the ‘igraph’ package centrality functions.

Distilled News

Debugging & Visualising training of Neural Network with TensorBoard

I started my deep learning journey a few years back. I have learnt a lot in this period. But, even after all these efforts, every Neural network I train provides me with a new experience. If you have tried to train a neural network, you must know my plight! But, through all this time, I have now made a workflow, which I will share with you today. I am sharing my learning / experience about building Neural Network with all of you. I cannot guarantee it will work all the time, but at least it may guide you as to how would you approach to solve the problem. I will also share with you a tool which I find is a useful addition to the deep learning toolbox – TensorBoard.

Structural Changes in Global Warming

In time series analysis, structural changes represent shocks impacting the evolution with time of the data generating process. That is relevant because one of the key assumptions of the Box-Jenkins methodology is that the structure of the data generating process does not change over time. How can structural changes be identified ? The strucchange package can help in that and the present tutorial shows how.

Data Science: Performance of Python vs Pandas vs Numpy

Speed and time is a key factor for any Data Scientist. In business, you do not usually work with toy datasets having thousands of samples. It is more likely that your datasets will contain millions or hundreds of millions samples. Customer orders, web logs, billing events, stock prices – datasets now are huge. I assume you do not want to spend hours or days, waiting for your data processing to complete. The biggest dataset I worked with so far contained over 30 million of records. When I run my data processing script the first time for this dataset, estimated time to complete was around 4 days! I do not have very powerful machine (Macbook Air with i5 and 4 GB of RAM), but the most I could accept was running the script over one night, not multiple days. Thanks to some clever tricks, I was able to decrease this running time to a few hours. This post will explain the first step to achieve good data processing performance – choosing right library/framework for your dataset.

General Aspects in Selecting Best Variables

This chapter covers the following topics:
• The best variables ranking from conventional machine learning algorithms, either predictive or clustering.
• The nature of selecting variables with and without predictive models.
• The effect of variables working in groups (intuition and information theory).
• Exploring the best variable subset in practice using R.
Selecting the best variables is also known as feature selection, selecting the most important predictors, selecting the best predictors, among others.

Twitter analysis using R (Semantic analysis of French elections)

To perform the analysis, I needed an important number of tweets and I wanted to use all of the tweets concerning the election. The Twitter search API is limited since you only have access to a sample of tweets. On the other hand, the streaming API allows you to collect the data in real-time and to collect almost all tweets. Hence, I used the streamR package. So, I collected tweets on 60 seconds batch and saved them on .json files. The use of batches instead of one large file is to improve RAM consumption (Instead of reading and then subsetting one large file, you can do the subset on each of the batches and then merge them). Here is the code to collect the data with streamR.

Facets: An Open Source Visualization Tool for Machine Learning Training Data

Getting the best results out of a machine learning (ML) model requires that you truly understand your data. However, ML datasets can contain hundreds of millions of data points, each consisting of hundreds (or even thousands) of features, making it nearly impossible to understand an entire dataset in an intuitive fashion. Visualization can help unlock nuances and insights in large datasets. A picture may be worth a thousand words, but an interactive visualization can be worth even more. Working with the PAIR initiative, we’ve released Facets, an open source visualization tool to aid in understanding and analyzing ML datasets. Facets consists of two visualizations that allow users to see a holistic picture of their data at different granularities. Get a sense of the shape of each feature of the data using Facets Overview, or explore a set of individual observations using Facets Dive. These visualizations allow you to debug your data which, in machine learning, is as important as debugging your model. They can easily be used inside of Jupyter notebooks or embedded into webpages. In addition to the open source code, we’ve also created a Facets demo website. This website allows anyone to visualize their own datasets directly in the browser without the need for any software installation or setup, without the data ever leaving your computer.

What is the future of deep learning? Are most machine learning experts turning to deep learning?

Yes, most faculty, graduate students, and a lot of engineering teams in industry have already abandoned everything else and shifted to deep learning. Most new graduate students in applied areas such as computer vision that I meet, know nothing about probabilistic graphical models for instance, and their proposed solution to any problem is a CNN/LSTM/GAN.

Machine Learning Applied to Big Data, Explained

Machine learning with Big Data is, in many ways, different than ‘regular’ machine learning. This informative image is helpful in identifying the steps in machine learning with Big Data, and how they fit together into a process of their own.

R Programming Notes – Part 2

In an older post, I discussed a number of functions that are useful for programming in R. I wanted to expand on that topic by covering other functions, packages, and tools that are useful. Over the past year, I have been working as an R programmer and these are some of the new learnings that have become fundamental in my work.

Textual entailment with TensorFlow

Textual entailment is a simple exercise in logic that attempts to discern whether one sentence can be inferred from another. A computer program that takes on the task of textual entailment attempts to categorize an ordered pair of sentences into one of three categories. The first category, called “positive entailment,” occurs when you can use the first sentence to prove that a second sentence is true. The second category, “negative entailment,” is the inverse of positive entailment. This occurs when the first sentence can be used to disprove the second sentence. Finally, if the two sentences have no correlation, they are considered to have a “neutral entailment.” Textual entailment is useful as a component in much larger applications. For example, question-answering systems may use textual entailment to verify an answer from stored information. Textual entailment may also enhance document summarization by filtering out sentences that don’t include new information. Other natural language processing (NLP) systems find similar uses for entailment. Get O’Reilly’s AI newsletter This article will guide you through how to build a simple and fast-to-train neural network to perform textual entailment using TensorFlow.

Automatically Fitting the Support Vector Machine Cost Parameter

In an earlier post I discussed how to avoid overfitting when using Support Vector Machines. This was achieved using cross validation. In cross validation, prediction accuracy is maximized by varying the cost parameter. Importantly, prediction accuracy is calculated on a different subset of the data from that used for training. In this blog post I take that concept a step further, by automating the manual search for the optimal cost. The data set I’ll be using describes different types of glass based upon physical attributes and chemical composition. You can read more about the data here, but for the purposes of my analysis all you need to know is that the outcome variable is categorical (7 types of glass) and the 4 predictor variables are numeric.

Generalized Additive Models and Mixed-Effects in Agriculture

In the previous post I explored the use of linear model in the forms most commonly used in agricultural research. Clearly, when we are talking about linear models we are implicitly assuming that all relations between the dependent variable y and the predictors x are linear. In fact, in a linear model we could specify different shapes for the relation between y and x, for example by including polynomials (read for example: https://…/fitting-polynomial-regression-r ). However, we can do that only in cases where we can clearly see a particular shape of the relation, for example quadratic. The problem is in many cases we can see from a scatterplot that we have a non-linear distribution of the points, but it is difficult to understand its form. Moreover, in a linear model the interpretation of polynomial coefficients become more difficult and this may decrease their usefulness. An alternative approach is provided by Generalized Additive Models, which allows us to fit models with non-linear smoothers without specifying a particular shape a priori.

Artificial Intuition – A Breakthrough Cognitive Paradigm

In a previous post, I introduced the Meta Meta-Model of Deep Learning. However, I did not introduce its details. A word of warning for the reader, the concepts in this section is in flux and in undergoing a lot of changes. Therefore, this article is just a reflection of my current understanding of the language of Deep Learning Meta Meta-Model. That’s definitely a mouth full, so to make life simpler for everyone, I just call this the Deep Learning Canonical Patterns. These patterns are documented in the Deep Learning Design Patterns Wiki. In this post I will explore further the characteristics of Artificial Intuition with the goal of describing a set of patterns that can aid us in formulating novel architectures for Deep Learning. In a previous post “Deep Learning and Artificial Intuition”, I introduced the idea that there are two distinct cognitive mechanisms, one based on logical inference and another based on intuition. At least 6 decades have been spent exploring cognitive mechanisms based on logical inference without making much progress towards AGI. Deep Learning, a breakthrough discovered in 2012, revealed an alternative promising research approach based on the a different cognitive paradigm.