Book Memo: “Machine Learning with R”

R gives you access to the cutting-edge software you need to prepare data for machine learning. No previous knowledge required – this book will take you methodically through every stage of applying machine learning.
• Harness the power of R for statistical computing and data science
• Use R to apply common machine learning algorithms with real-world applications
• Prepare, examine, and visualize data for analysis
• Understand how to choose between machine learning models
• Packed with clear instructions to explore, forecast, and classify data

If you did not already know

OntoSeg google
Text segmentation (TS) aims at dividing long text into coherent segments which reflect the subtopic structure of the text. It is beneficial to many natural language processing tasks, such as Information Retrieval (IR) and document summarisation. Current approaches to text segmentation are similar in that they all use word-frequency metrics to measure the similarity between two regions of text, so that a document is segmented based on the lexical cohesion between its words. Various NLP tasks are now moving towards the semantic web and ontologies, such as ontology-based IR systems, to capture the conceptualizations associated with user needs and contents. Text segmentation based on lexical cohesion between words is hence not sufficient anymore for such tasks. This paper proposes OntoSeg, a novel approach to text segmentation based on the ontological similarity between text blocks. The proposed method uses ontological similarity to explore conceptual relations between text segments and a Hierarchical Agglomerative Clustering (HAC) algorithm to represent the text as a tree-like hierarchy that is conceptually structured. The rich structure of the created tree further allows the segmentation of text in a linear fashion at various levels of granularity. The proposed method was evaluated on a wellknown dataset, and the results show that using ontological similarity in text segmentation is very promising. Also we enhance the proposed method by combining ontological similarity with lexical similarity and the results show an enhancement of the segmentation quality. …

Contrastivecenter Loss google
The deep convolutional neural network(CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastivecenter loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1)the distances of training samples to their corresponding class centers, and (2)the sum of the distances of training samples to their non-corresponding class centers. Experiments on different datasets demonstrate the effectiveness of contrastive-center loss. …

Shared Learning Framework google
Deep Reinforcement Learning has been able to achieve amazing successes in a variety of domains from video games to continuous control by trying to maximize the cumulative reward. However, most of these successes rely on algorithms that require a large amount of data to train in order to obtain results on par with human-level performance. This is not feasible if we are to deploy these systems on real world tasks and hence there has been an increased thrust in exploring data efficient algorithms. To this end, we propose the Shared Learning framework aimed at making $Q$-ensemble algorithms data-efficient. For achieving this, we look into some principles of transfer learning which aim to study the benefits of information exchange across tasks in reinforcement learning and adapt transfer to learning our value function estimates in a novel manner. In this paper, we consider the special case of transfer between the value function estimates in the $Q$-ensemble architecture of BootstrappedDQN. We further empirically demonstrate how our proposed framework can help in speeding up the learning process in $Q$-ensembles with minimum computational overhead on a suite of Atari 2600 Games. …

R Packages worth a look

Steiner Tree Approach for Graph Analysis (SteinerNet)
A set of graph functions to find Steiner trees on graphs. It provides tools for analysing Steiner tree application on networks. It has applications in biological pathway network analysis (Sadeghi 2013) <doi:10.1186/1471-2105-14-144>.

Organizing Data in a Hypercube (hypercube)
Provides methods for organizing data in a hypercube (i.e. a multi-dimensional cube). Cubes are generated from molten data frames. Each cube can be manipulated with five operations: rotation (changeDimensionOrder()), dicing and slicing (add.selection(), remove.selection()), drilling down (add.aggregation()), and rolling up (remove.aggregation()).

Binning Variables to Use in Logistic Regression (logiBin)
Fast binning of multiple variables using parallel processing. A summary of all the variables binned is generated which provides the information value, entropy, an indicator of whether the variable follows a monotonic trend or not, etc. It supports rebinning of variables to force a monotonic trend as well as manual binning based on pre specified cuts. The cut points of the bins are based on conditional inference trees as implemented in the partykit package. The conditional inference framework is described by Hothorn T, Hornik K, Zeileis A (2006) <doi:10.1198/106186006X133933>.

Projection Pursuit Classification Forest (PPforest)
Implements projection pursuit forest algorithm for supervised classification.

Simulations of Matrix Variate Distributions (matrixsampling)
Provides samplers for various matrix variate distributions: Wishart, inverse-Wishart, normal, t, inverted-t, Beta type I and Beta type II. Allows to simulate the noncentral Wishart distribution without the integer restriction on the degrees of freedom.

Effect Modification in Observational Studies Using the Submax Method (submax)
Effect modification occurs if a treatment effect is larger or more stable in certain subgroups defined by observed covariates. The submax or subgroup-maximum method of Lee et al. (2017) <arXiv:1702.00525> does an overall test and separate tests in subgroups, correcting for multiple testing using the joint distribution.

Document worth reading: “Knowledge Transfer Between Artificial Intelligence Systems”

We consider the fundamental question: how a legacy ‘student’ Artificial Intelligent (AI) system could learn from a legacy ‘teacher’ AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here ‘learning’ is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the ‘student’ Artificial Intelligent system have the structure of an $n$-dimensional topological vector space and $n$ is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for $n$ sufficiently large, with probability close to one, the ‘student’ system can successfully and non-iteratively learn $k\ll n$ new examples from the ‘teacher’ (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features. Knowledge Transfer Between Artificial Intelligence Systems

Book Memo: “Data Mining and Analysis”

Fundamental Concepts and Algorithms
The fundamental algorithms in data mining and analysis form the basis for the emerging field of data science, which includes automated methods to analyze patterns and models for all kinds of data, with applications ranging from scientific discovery to business intelligence and analytics. This textbook for senior undergraduate and graduate data mining courses provides a broad yet in-depth overview of data mining, integrating related concepts from machine learning and statistics. The main parts of the book include exploratory data analysis, pattern mining, clustering, and classification. The book lays the basic foundations of these tasks, and also covers cutting-edge topics such as kernel methods, high-dimensional data analysis, and complex graphs and networks. With its comprehensive coverage, algorithmic perspective, and wealth of examples, this book offers solid guidance in data mining for students, researchers, and practitioners alike. Key features:
• Covers both core methods and cutting-edge research
• Algorithmic approach with open-source implementations
• Minimal prerequisites: all key mathematical concepts are presented, as is the intuition behind the formulas
• Short, self-contained chapters with class-tested examples and exercises allow for flexibility in designing a course and for easy reference
• Supplementary website with lecture slides, videos, project ideas, and more

Whats new on arXiv

The Effectiveness of Data Augmentation in Image Classification using Deep Learning

In this paper, we explore and compare multiple solutions to the problem of data augmentation in image classification. Previous work has demonstrated the effectiveness of data augmentation through simple techniques, such as cropping, rotating, and flipping input images. We artificially constrain our access to data to a small subset of the ImageNet dataset, and compare each data augmentation technique in turn. One of the more successful data augmentations strategies is the traditional transformations mentioned above. We also experiment with GANs to generate images of different styles. Finally, we propose a method to allow a neural net to learn augmentations that best improve the classifier, which we call neural augmentation. We discuss the successes and shortcomings of this method on various datasets.

Ellipsoid Method for Linear Programming made simple

In this paper, ellipsoid method for linear programming is derived using only minimal knowledge of algebra and matrices. Unfortunately, most authors first describe the algorithm, then later prove its correctness, which requires a good knowledge of linear algebra.

Mathematics of Deep Learning

Recently there has been a dramatic increase in the performance of recognition systems due to the introduction of deep architectures for representation learning and classification. However, the mathematical reasons for this success remain elusive. This tutorial will review recent work that aims to provide a mathematical justification for several properties of deep networks, such as global optimality, geometric stability, and invariance of the learned representations.

A short characterization of relative entropy

We prove characterization theorems for relative entropy (also known as Kullback-Leibler divergence), q-logarithmic entropy (also known as Tsallis entropy), and q-logarithmic relative entropy. All three have been characterized axiomatically before, but we show that earlier proofs can be simplified considerably, at the same time relaxing some of the hypotheses.

Evolving Unsupervised Deep Neural Networks for Learning Meaningful Representations

Deep Learning (DL) aims at learning the \emph{meaningful representations}. A meaningful representation refers to the one that gives rise to significant performance improvement of associated Machine Learning (ML) tasks by replacing the raw data as the input. However, optimal architecture design and model parameter estimation in DL algorithms are widely considered to be intractable. Evolutionary algorithms are much preferable for complex and non-convex problems due to its inherent characteristics of gradient-free and insensitivity to local optimum. In this paper, we propose a computationally economical algorithm for evolving \emph{unsupervised deep neural networks} to efficiently learn \emph{meaningful representations}, which is very suitable in the current Big Data era where sufficient labeled data for training is often expensive to acquire. In the proposed algorithm, finding an appropriate architecture and the initialized parameter values for a ML task at hand is modeled by one computational efficient gene encoding approach, which is employed to effectively model the task with a large number of parameters. In addition, a local search strategy is incorporated to facilitate the exploitation search for further improving the performance. Furthermore, a small proportion labeled data is utilized during evolution search to guarantee the learnt representations to be meaningful. The performance of the proposed algorithm has been thoroughly investigated over classification tasks. Specifically, error classification rate on MNIST with 1.15\% is reached by the proposed algorithm consistently, which is a very promising result against state-of-the-art unsupervised DL algorithms.

MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels

Recent studies have discovered that deep networks are capable of memorizing the entire data even when the labels are completely random. Since deep models are trained on big data where labels are often noisy, the ability to overfit noise can lead to poor performance. To overcome the overfitting on corrupted training data, we propose a novel technique to regularize deep networks in the data dimension. This is achieved by learning a neural network called MentorNet to supervise the training of the base network, namely, StudentNet. Our work is inspired by curriculum learning and advances the theory by learning a curriculum from data by neural networks. We demonstrate the efficacy of MentorNet on several benchmarks. Comprehensive experiments show that it is able to significantly improve the generalization performance of the state-of-the-art deep networks on corrupted training data.

A Two-stage Online Monitoring Procedure for High-Dimensional Data Streams

Advanced computing and data acquisition technologies have made possible the collection of high-dimensional data streams in many fields. Efficient online monitoring tools which can correctly identify any abnormal data stream for such data are highly sought after. However, most of the existing monitoring procedures directly apply the false discover rate (FDR) controlling procedure to the data at each time point, and the FDR at each time point (the point-wise FDR) is either specified by users or determined by the in-control (IC) average run length (ARL). If the point-wise FDR is specified by users, the resulting procedure lacks control of the global FDR and keeps users in the dark in terms of the IC-ARL. If the point-wise FDR is determined by the IC-ARL, the resulting procedure does not give users the flexibility to choose the number of false alarms (Type-I errors) they can tolerate when identifying abnormal data streams, which often makes the procedure too conservative. To address those limitations, we propose a two-stage monitoring procedure that can control both the IC-ARL and Type-I errors at the levels specified by users. As a result, the proposed procedure allows users to choose not only how often they expect any false alarms when all data streams are IC, but also how many false alarms they can tolerate when identifying abnormal data streams. With this extra flexibility, our proposed two-stage monitoring procedure is shown in the simulation study and real data analysis to outperform the exiting methods.

Relation Extraction : A Survey

With the advent of the Internet, large amount of digital text is generated everyday in the form of news articles, research publications, blogs, question answering forums and social media. It is important to develop techniques for extracting information automatically from these documents, as lot of important information is hidden within them. This extracted information can be used to improve access and management of knowledge hidden in large text corpora. Several applications such as Question Answering, Information Retrieval would benefit from this information. Entities like persons and organizations, form the most basic unit of the information. Occurrences of entities in a sentence are often linked through well-defined relations; e.g., occurrences of person and organization in a sentence may be linked through relations such as employed at. The task of Relation Extraction (RE) is to identify such relations automatically. In this paper, we survey several important supervised, semi-supervised and unsupervised RE techniques. We also cover the paradigms of Open Information Extraction (OIE) and Distant Supervision. Finally, we describe some of the recent trends in the RE techniques and possible future research directions. This survey would be useful for three kinds of readers – i) Newcomers in the field who want to quickly learn about RE; ii) Researchers who want to know how the various RE techniques evolved over time and what are possible future research directions and iii) Practitioners who just need to know which RE technique works best in various settings.

Point-wise Convolutional Neural Network

Deep learning with 3D data such as reconstructed point clouds and CAD models has received great research interests recently. However, the capability of using point clouds with convolutional neural network has been so far not fully explored. In this technical report, we present a convolutional neural network for semantic segmentation and object recognition with 3D point clouds. At the core of our network is point-wise convolution, a convolution operator that can be applied at each point of a point cloud. Our fully convolutional network design, while being simple to implement, can yield competitive accuracy in both semantic segmentation and object recognition task.

Transfer Adversarial Hashing for Hamming Space Retrieval
Balance and Frustration in Signed Networks under Different Contexts
Stochastic Low-Rank Bandits
Learning Disentangling and Fusing Networks for Face Completion Under Structured Occlusions
Asymptotic properties of expansive Galton-Watson trees
Sixty years of percolation
Variance reduction via empirical variance minimization: convergence and complexity
Everything You Always Wanted to Know About TREC RTS* (*But Were Afraid to Ask)
Convex programming in optimal control and information theory
Adaptation to criticality through organizational invariance in embodied agents
Can Balloons Produce Li-Fi? A Disaster Management Perspective
Stability Selection for Structured Variable Selection
Energy-Efficient Non-Orthogonal Transmission under Reliability and Finite Blocklength Constraints
Duality of optimization problems with gauge functions
Interference Characterization in Downlink Li-Fi Optical Attocell Networks
UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition
The Enhanced Hybrid MobileNet
On Generalized Edge Corona Product of Graphs
Calculus for directional limiting normal cones and subdifferentials
Sensitivity of rough differential equations: an approach through the Omega lemma
Differentiable lower bound for expected BLEU score
A Quantum Extension of Variational Bayes Inference
Regularization and Optimization strategies in Deep Convolutional Neural Network
Efficient Computation of the Stochastic Behavior of Partial Sum Processes
Bayesian graphical compositional regression for microbiome data
Gorenstein liaison for toric ideals of graphs
Limit theorems for the Multiplicative Binomial Distribution (MBD)
On the critical threshold for continuum AB percolation
Random permutations without macroscopic cycles
Error Performance of Wireless Powered Cognitive Relay Networks with Interference Alignment
On the Capacity of Wireless Powered Cognitive Relay Network with Interference Alignment
Ergodic Capacity Analysis of Wireless Powered AF Relaying Systems over $α$-$μ$ Fading Channels
Exponential convergence of testing error for stochastic gradient methods
Self-normalized Cramer type moderate deviations for martingales
Penalty Dual Decomposition Method For Nonsmooth Nonconvex Optimization
Biggins’ Martingale Convergence for Branching Lévy Processes
Approximation of Sojourn Times of Gaussian Processes
Random non-Abelian circulant matrices. Spectrum of random convolution operators on large finite groups
Multiple testing for outlier detection in functional data
GMM-Based Synthetic Samples for Classification of Hyperspectral Images With Limited Training Data
Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform
A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks
Open data, open review and open dialogue in making social sciences plausible
Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments
A duality principle for a semi-linear model in micro-magnetism
The Hyperbolic-type Point Process
Explicit bounds for Lipschitz constant of solution to basic problem in calculus of variations
Ballpark Crowdsourcing: The Wisdom of Rough Group Comparisons
Explicit bounds for solutions to optimal control problems
Symbol detection in online handwritten graphics using Faster R-CNN
MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features
Optimal Stochastic Desencoring and Applications to Calibration of Market Models
A Permutation Test on Complex Sample Data
Self-Supervised Depth Learning for Urban Scene Understanding
Rethinking Spatiotemporal Feature Learning For Video Understanding
A User-Study on Online Adaptation of Neural Machine Translation to Human Post-Edits
Active phase for activated random walks on $\mathbb{Z}^d$, $ d \geq 3$, with density less than one and arbitrary sleeping rate
Rough Fuzzy Quadratic Minimum Spanning Tree Problem
Spatial-temporal wind field prediction by Artificial Neural Networks
A study of elliptic partial differential equations with jump diffusion coefficients
A combinatorial description of the centralizer algebras connected to the Links-Gould Invariant
Distance magic labelings of product graphs
Geometric ergodicity for some space-time max-stable Markov chains
Closing in on Time and Space Optimal Construction of Compressed Indexes
Refuting the cavity-method threshold for random 3-SAT
The Edge Universality of Correlated Matrices
Performance Analysis of Approximate Message Passing for Distributed Compressed Sensing
Approximate controllability for Navier–Stokes equations in $\mathrm{3D}$ rectangles under Lions boundary conditions
Reasoning in Systems with Elements that Randomly Switch Characteristics
FFT-Based Deep Learning Deployment in Embedded Systems
Statistical physics on a product of trees
Learning Objectives for Treatment Effect Estimation
The trisection genus of standard simply connected PL 4-manifolds
Multiplicative Convolution of Real Asymmetric and Real Antisymmetric Matrices
Recognizing Linked Domain in Polynomial Time
Tensor Sensing for RF Tomographic Imaging
Combination Networks with or without Secrecy Constraints: The Impact of Caching Relays
Localization of Extended Quantum Objects
Real-time Egocentric Gesture Recognition on Mobile Head Mounted Displays
Fractal dimension of interfaces in Edwards-Anderson spin glasses for up to six space dimensions
An Improved Feedback Coding Scheme for the Wire-tap Channel
Persistent Memory Programming Abstractions in Context of Concurrent Applications
Predicting Station-level Hourly Demands in a Large-scale Bike-sharing Network: A Graph Convolutional Neural Network Approach
The List Linear Arboricity of Graphs
Permuted composition tableaux, 0-Hecke algebra and labeled binary trees
QPTAS and Subexponential Algorithm for Maximum Clique on Disk Graphs
Local False Discovery Rate Based Methods for Multiple Testing of One-Way Classified Hypotheses
Learning Low-shot facial representations via 2D warping
Deep Prior
Lock-free B-slack trees: Highly Space Efficient B-trees
Unsupervised Histopathology Image Synthesis
Magnetotransport in a model of a disordered strange metal
Parametrizations of $k$-Nonnegative Matrices: Cluster Algebras and $k$-Positivity Tests
Reservation-Based Federated Scheduling for Parallel Real-Time Tasks
Step bunching with both directions of the current: Vicinal W(110) surfaces versus atomistic scale model
A Particle Swarm Optimization-based Flexible Convolutional Auto-Encoder for Image Classification
Pediatric Bone Age Assessment Using Deep Convolutional Neural Networks
Outcome Based Matching
Statistical Inference in Fractional Poisson Ornstein-Uhlenbeck Process
Neural networks catching up with finite differences in solving partial differential equations in higher dimensions
Nonparametric Adaptive CUSUM Chart for Detecting Arbitrary Distributional Changes
Quantum ergodicity in the SYK model
Weakly Supervised Action Localization by Sparse Temporal Pooling Network
Extreme 3D Face Reconstruction: Looking Past Occlusions
Learning to Navigate by Growing Deep Networks
Optimized Sampling for Multiscale Dynamics
Learning Binary Residual Representations for Domain-specific Video Streaming
DAMPE squib? Significance of the 1.4 TeV DAMPE excess
The central limit theorem for the number of clusters of the Arratia flow
The Sound and the Fury: Hiding Communications in Noisy Wireless Networks with Interference Uncertainty
Range Queries in Non-blocking $k$-ary Search Trees
Optimality Of Community Structure In Complex Networks
Detection and Attention: Diagnosing Pulmonary Lung Cancer from CT by Imitating Physicians
Corrigendum to ‘SPN graphs: when copositive $=$ SPN’
Multi-appearance Segmentation and Extended 0-1 Program for Dense Small Object Tracking
Passing the Brazilian OAB Exam: data preparation and some experiments
An Enhanced Access Reservation Protocol with a Partial Preamble Transmission Mechanism in NB-IoT Systems
Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition
A Statistical Model with Qualitative Input
Queueing Analysis for Block Fading Rayleigh Channels in the Low SNR Regime
Age of Information in Two-way Updating Systems Using Wireless Power Transfer
Nonlinearity-tolerant 8D modulation formats by set-partitioning PDM-QPSK
$\forall \exists \mathbb{R}$-completeness and area-universality
Optimized Interface Diversity for Ultra-Reliable Low Latency Communication (URLLC)
Fast robust correlation for high dimensional data
Structural and computational results on platypus graphs
Fluctuation Theorem and Thermodynamic Formalism
Analysis of Latency and MAC-layer Performance for Class A LoRaWAN
Rasa: Open Source Language Understanding and Dialogue Management
Rate of Change Analysis for Interestingness Measures
Towards Deep Modeling of Music Semantics using EEG Regularizers
Semi-Automatic Algorithm for Breast MRI Lesion Segmentation Using Marker-Controlled Watershed Transformation
Cellular Automata Applications in Shortest Path Problem
Constrained BSDEs driven by a non quasi-left-continuous random measure and optimal control of PDMPs on bounded domains
Approximation Algorithms for Replenishment Problems with Fixed Turnover Times
Data Structures for Representing Symmetry in Quadratically Constrained Quadratic Programs
Response of entanglement to annealed vis-à-vis quenched disorder in quantum spin models
Isogeometric shape optimization for nonlinear ultrasound focusing
Context-specific independencies for ordinal variables in chain regression models
Robust Estimation of Similarity Transformation for Visual Object Tracking with Correlation Filters
Generalized Degrees of Freedom of the Symmetric Cache-Aided MISO Broadcast Channel with Partial CSIT
Intrinsic Point of Interest Discovery from Trajectory Data
Image Super-resolution via Feature-augmented Random Forest
Proximodistal Exploration in Motor Learning as an Emergent Property of Optimization
The evaluation of geometric Asian power options under time changed mixed fractional Brownian motion
Poisson brackets symmetry from the pentagon-wheel cocycle in the graph complex
A Performance Evaluation of Local Features for Image Based 3D Reconstruction
Strictly proper kernel scores and characteristic kernels on compact spaces
A Bayesian Clearing Mechanism for Combinatorial Auctions
Constraint and Mathematical Programming Models for Integrated Port Container Terminal Operations
A quantum algorithm to train neural networks using low-depth circuits
Quantifying over boolean announcements
Prior Distributions for the Bradley-Terry Model of Paired Comparisons
Deep CNN ensembles and suggestive annotations for infant brain MRI segmentation
The effect of asymmetry of the coil block on self-assembly in ABC coil-rod-coil triblock copolymers
Model comparison for Gibbs random fields using noisy reversible jump Markov chain Monte Carlo
A Probability Monad as the Colimit of Finite Powers
Analysis and calibration of a linear model for structured cell populations with unidirectional motion : Application to the morphogenesis of ovarian follicles
Monotonic Chunkwise Attention
Equilibria in the Tangle
Partisan gerrymandering with geographically compact districts
Systems of BSDEs with oblique reflection and related optimal switching problem
swordfish: Efficient Forecasting of New Physics Searches without Monte Carlo

If you did not already know

Stacked Kernel Network (SKN) google
Kernel methods are powerful tools to capture nonlinear patterns behind data. They implicitly learn high (even infinite) dimensional nonlinear features in the Reproducing Kernel Hilbert Space (RKHS) while making the computation tractable by leveraging the kernel trick. Classic kernel methods learn a single layer of nonlinear features, whose representational power may be limited. Motivated by recent success of deep neural networks (DNNs) that learn multi-layer hierarchical representations, we propose a Stacked Kernel Network (SKN) that learns a hierarchy of RKHS-based nonlinear features. SKN interleaves several layers of nonlinear transformations (from a linear space to a RKHS) and linear transformations (from a RKHS to a linear space). Similar to DNNs, a SKN is composed of multiple layers of hidden units, but each parameterized by a RKHS function rather than a finite-dimensional vector. We propose three ways to represent the RKHS functions in SKN: (1)nonparametric representation, (2)parametric representation and (3)random Fourier feature representation. Furthermore, we expand SKN into CNN architecture called Stacked Kernel Convolutional Network (SKCN). SKCN learning a hierarchy of RKHS-based nonlinear features by convolutional operation with each filter also parameterized by a RKHS function rather than a finite-dimensional matrix in CNN, which is suitable for image inputs. Experiments on various datasets demonstrate the effectiveness of SKN and SKCN, which outperform the competitive methods. …

TauCharts google
Javascript charts with a focus on data, design and flexibility. Free open source D3.js-based library. TauCharts is the data-focused charting library. Our goal – help people to build interactive complex visualizations easily.
Achieve Charting Zen With TauCharts

AOGParsing Operator google
This paper presents a method of learning qualitatively interpretable models in object detection using popular two-stage region-based ConvNet detection systems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI (Region-of-Interest) prediction network.By interpretable models, we focus on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. We utilize a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of RoIs. We propose an AOGParsing operator to substitute the RoIPooling operator widely used in R-CNN, so the proposed method is applicable to many state-of-the-art ConvNet based detection systems. The AOGParsing operator aims to harness both the explainable rigor of top-down hierarchical and compositional grammar models and the discriminative power of bottom-up deep neural networks through end-to-end training. In detection, a bounding box is interpreted by the best parse tree derived from the AOG on-the-fly, which is treated as the extractive rationale generated for interpreting detection. In learning, we propose a folding-unfolding method to train the AOG and ConvNet end-to-end. In experiments, we build on top of the R-FCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets with performance comparable to state-of-the-art methods. …

Distilled News

Dummy Variable for Examining Structural Instability in Regression: An Alternative to Chow Test

One of the fast growing economies in the era of globalization is the Ethiopian economy. Among the lower income group countries, it has emerged as one of the rare countries to achieve a double digit growth rate in Grows Domestic Product (GDP). However, there is a great deal of debate regarding the double digit growth rate, especially during the recent global recession period. So, it becomes a question of empirical research whether there is a structural change in the relationship between the GDP of Ethiopia and the regressor (time). How do we find out that a structural change has in fact occurred? To answer this question, we consider the GDP of Ethiopia (measured on constant 2010 US$) over the period of 1981 to 2015. Like many other countries in the world, Ethiopia has adopted the policy of regulated globalization during the early nineties of the last century. So, our aim is to whether the GDP of Ethiopia has undergone any structural changes following the major policy shift due to adoption of globalization policy. To answer this question, we have two options in statistical and econometric research. The most important classes of tests on structural change are the tests from the generalized fluctuation test framework (Kuan and Hornik, 1995) on the one hand and tests based on F statistics (Hansen, 1992; Andrews, 1993; Andrews and Ploberger, 1994) on the other. The first class includes in particular the CUSUM and MOSUM tests and the fluctuation test, while the Chow and the supF test belong to the latter. A topic that gained more interest rather recently is to monitor structural change, i.e., to start after a history phase (without structural changes) to analyze new observations and to be able to detect a structural change as soon after its occurrence as possible.

Exploring data with pandas and MapD using Apache Arrow

At MapD, we’ve long been big fans of the PyData stack, and are constantly working on ways for our open source GPU-accelerated analytic SQL engine to play nicely with the terrific tools in the most popular stack that supports open data science. We are founding collaborators of GOAI (the GPU Open Analytics Initiative), working with the awesome folks at Anaconda and, and our friends at NVIDIA. In GOAI, we use Apache Arrow to mediate efficient, high-performance data interchange for analytics and AI workflows. A big reason for doing this is to make MapD itself easily accessible to Python tools. For starters, this means supporting modern Python database interfaces like DBAPI. pymapd (built with help from Continuum) is a pythonic interface to MapD’s SQL engine supporting DBAPI 2.0, and it has some extra goodness in being able to use our in-built Arrow support for both data loading and query result output.

The Line Between Commercial and Industrial Data Science

The purpose, tasks, and required skillsets are dramatically different for data scientists and their work in commercial and industrial environments.

10 Surprising Ways Machine Learning is Being Used Today

1. Predicting whether a criminal defendant is a flight risk.
2. Using Twitter to diagnose psychopathy.
3. Helping cyclists win the Tour de France.
4. Identifying endangered whales.
5. Translating legalese.
6. Preventing money laundering.
7. Figuring out which message board threads will be closed.
8. Predicting hospital wait times.
9. Calculating auction prices.
10. Predicting earthquakes.

How to Improve my ML Algorithm? Lessons from Andrew Ng’s experience

You have worked for weeks on building your machine learning system and the performance is not something you are satisfied with. You think of multiple ways to improve your algorithm’s performance, viz. collect more data, add more hidden units, add more layers, change the network architecture, change the basic algorithm etc. But which one of these will give the best improvement on your system? You can either try them all, invest a lot of time and find out what works for you. OR! You can use the following tips from Ng’s experience

The 10 Deep Learning Methods AI Practitioners Need to Apply

Interest in machine learning has exploded over the past decade. You see machine learning in computer science programs, industry conferences, and the Wall Street Journal almost daily. For all the talk about machine learning, many conflate what it can do with what they wish it could do. Fundamentally, machine learning is using algorithms to extract information from raw data and represent it in some type of model. We use this model to infer things about other data we have not yet modeled.

TensorFlow for Short-Term Stocks Prediction

In this post you will see an application of Convolutional Neural Networks to stock market prediction, using a combination of stock prices with sentiment analysis.

Top Data Science and Machine Learning Methods Used in 2017

The most used methods are Regression, Clustering, Visualization, Decision Trees/Rules, and Random Forests; Deep Learning is used by only 20% of respondents; we also analyze which methods are most ‘industrial’ and most ‘academic’.

Robust Algorithms for Machine Learning

Machine learning is often held out as a magical solution to hard problems that will absolve us mere humans from ever having to actually learn anything. But in reality, for data scientists and machine learning engineers, there are a lot of problems that are much more difficult to deal with than simple object recognition in images, or playing board games with finite rule sets. For these majority of problems, it pays to have a variety of approaches to help you reduce the noise and anomalies, to focus on something more tractable. One approach is to design more robust algorithms where the testing error is consistent with the training error, or the performance is stable after adding noise to the dataset1. The idea of any traditional (non-Bayesian) statistical test is the same: we compute a number (called a ‘statistic’) from the data, and use the known distribution of that number to answer the question, ‘What are the odds of this happening by chance?’ That number is the p-value.

Monitoring and Improving the Performance of Machine Learning Models

It’s critical to have “humans in the loop” when automating the deployment of machine learning (ML) models. Why? Because models often perform worse over time. This course covers the human directed safeguards that prevent poorly performing models from deploying into production and the techniques for evaluating models over time. We’ll use ModelDB to capture the appropriate metrics that help you identify poorly performing models. We’ll review the many factors that affect model performance (i.e., changing users and user preferences, stale data, etc.) and the variables that lose predictive power. We’ll explain how to utilize classification and prediction scoring methods such as precision recall, ROC, and jaccard similarity. We’ll also show you how ModelDB allows you to track provenance and metrics for model performance and health; how to integrate ModelDB with SparkML; and how to use the ModelDB APIs to store information when training models in Spark ML. Learners should have basic familiarity with the following: Scala or Python; Hadoop, Spark, or Pandas; SBT or Maven; cloud platforms like Amazon Web Services; Bash, Docker, and REST.

Training and Exporting Machine Learning Models in Spark

Spark ML provides a rich set of tools and models for training, scoring, evaluating, and exporting machine learning models. This video walks you through each step in the process. You’ll explore the basics of Spark’s DataFrames, Transformer, Estimator, Pipeline, and Parameter, and how to utilize the Spark API to create model uniformity and comparability. You’ll learn how to create meaningful models and labels from a raw dataset; train and score a variety of models; target price predictions; compare results using MAE, MSE, and other scores; and employ the SparkML evaluator to automate the parameter-tuning process using cross validation. To complete the lesson, you’ll learn to export and serialize a Spark trained model as PMML (an industry standard for model serialization), so you can deploy in applications outside the Spark cluster environment.

Deploying Machine Learning Models as Microservices Using Docker

Modern applications running in the cloud often rely on REST-based microservices architectures by using Docker containers. Docker enables your applications to communicate between one another and to compose and scale various components. Data scientists use these techniques to efficiently scale their machine learning models to production applications. This video teaches you how to deploy machine learning models behind a REST API—to serve low latency requests from applications—without using a Spark cluster. In the process, you’ll learn how to export models trained in SparkML; how to work with Docker, a convenient way to build, deploy, and ship application code for microservices; and how a model scoring service should support single on-demand predictions and bulk predictions. Learners should have basic familiarity with the following: Scala or Python; Hadoop, Spark, or Pandas; SBT or Maven; cloud platforms like Amazon Web Services; Bash, Docker, and REST.

Deploying Spark ML Pipelines in Production on AWS

Translating a Spark application from running in a local environment to running on a production cluster in the cloud requires several critical steps, including publishing artifacts, installing dependencies, and defining the steps in a pipeline. This video is a hands-on guide through the process of deploying your Spark ML pipelines in production. You’ll learn how to create a pipeline that supports model reproducibility—making your machine learning models more reliable—and how to update your pipeline incrementally as the underlying data change. Learners should have basic familiarity with the following: Scala or Python; Hadoop, Spark, or Pandas; SBT or Maven; Amazon Web Services such as S3, EMR, and EC2; Bash, Docker, and REST.

An Introduction to Machine Learning Models in Production

This course lays out the common architecture, infrastructure, and theoretical considerations for managing an enterprise machine learning (ML) model pipeline. Because automation is the key to effective operations, you’ll learn about open source tools like Spark, Hive, ModelDB, and Docker and how they’re used to bridge the gap between individual models and a reproducible pipeline. You’ll also learn how effective data teams operate; why they use a common process for building, training, deploying, and maintaining ML models; and how they’re able to seamlessly push models into production. The course is designed for the data engineer transitioning to the cloud and for the data scientist ready to use model deployment pipelines that are reproducible and automated. Learners should have basic familiarity with: cloud platforms like Amazon Web Services; Scala or Python; Hadoop, Spark, or Pandas; SBT or Maven; Bash, Docker, and REST.

GPU-accelerated TensorFlow on Kubernetes

Many workflows that utilize TensorFlow need GPUs to efficiently train models on image or video data. Yet, these same workflows typically also involve multi-stage data pre-processing and post-processing, which might not need to run on GPUs. This mix of processing stages, illustrated in Figure 1, results in data science teams running things requiring CPUs in one system while trying to manage GPUs resources separately by yelling across the office: “Hey is anyone using the GPU machine?” A unified methodology is desperately needed for scheduling multi-stage workflows, managing data, and offloading certain portions of the workflows to GPUs.

Pipes in R Tutorial For Beginners

You might have already seen or used the pipe operator when you’re working with packages such as dplyr, magrittr,… But do you know where pipes and the famous %>% operator come from, what they exactly are, or how, when and why you should use them? Can you also come up with some alternatives?

R in the Windows Subsystem for Linux

R has been available for Windows since the very beginning, but if you have a Windows machine and want to use R within a Linux ecosystem, that’s easy to do with the new Fall Creator’s Update (version 1709). If you need access to the gcc toolchain for building R packages, or simply prefer the bash environment, it’s easy to get things up and running. Once you have things set up, you can launch a bash shell and run R at the terminal like you would in any Linux system. And that’s because this is a Linux system: the Windows Subsystem for Linux is a complete Linux distribution running within Windows. This page provides the details on installing Linux on Windows, but here are the basic steps you need and how to get the latest version of R up and running within it.

Introduction to Skewness

In previous posts here, here, and here, we spent quite a bit of time on portfolio volatility, using the standard deviation of returns as a proxy for volatility. Today we will begin to a two-part series on additional statistics that aid our understanding of return dispersion: skewness and kurtosis. Beyond being fancy words and required vocabulary for CFA level 1, these two concepts are both important and fascinating for lovers of returns distributions. For today, we will focus on skewness. Skewness is the degree to which returns are asymmetric around the mean. Since a normal distribution is symmetric around the mean, skewness can be taken as one measure of how returns are not distributed normally. Why does skewness matter? If portfolio returns are right, or positively, skewed, it implies numerous small negative returns and a few large positive returns. If portfolio returns are left, or negatively, skewed, it implies numerous small positive returns and few large negative returns. The phrase “large negative returns” should trigger Pavlovian sweating for investors, even if it’s preceded by a diminutive modifier like “just a few”. For a portfolio manager, a negatively skewed distribution of returns implies a portfolio at risk of rare but large losses. This makes us nervous and is a bit like saying, “I’m healthy, except for my occasional massive heart attack.” Let’s get to it.

A minimal Project Tree in R

The main idea was:
•To ensure reproducibility within a stable working directory tree. She proposes the very concise here::here() but other methods are available such as the template or the ProjectTemplate packages..
•To avoid break havoc in other’s computers with rm(list = ls())!.

Introduction to Computational Linguistics and Dependency Trees in data science

In recent years, the amalgam of deep learning fundamentals with Natural Language Processing techniques has shown a great improvement in the information mining tasks on unstructured text data. The models are now able to recognize natural language and speech comparable to human levels. Despite such improvements, discrepancies in the results still exist as sometimes the information is coded very deep in the syntaxes and syntactic structures of the corpus.

Artificial Intelligence and the Move Towards Preventive Healthcare

In this special guest feature, Waqaas Al-Siddiq, Founder and CEO of Biotricity, discusses how AI’s ability to crunch Big Data will play a key role in the healthcare industry’s shift toward preventative care. A physicians’ ability to find the relevant data they need to make a diagnosis will be augmented by new AI enhanced technologies. Waqaas, the founder of Biotricity, is a serial entrepreneur, a former investment advisor and an expert in wireless communication technology. Academically, he was distinguished for his various innovative designs in digital, analog, embedded, and micro-electro-mechanical products. His work was published in various conferences such as IEEE and the National Communication Council. Waqaas has a dual Bachelor’s degree in Computer Engineering and Economics, a Master’s in Computer Engineering from Rochester Institute of Technology, and a Master’s in Business Administration from Henley Business School. He is completing his Doctorate in Business Administration at Henley, with a focus on Transformative Innovations and Billion Dollar Markets.