# Book Memo: “Machine Learning with R”

**17**
*Sunday*
Dec 2017

Posted Books

in
Advertisements

**17**
*Sunday*
Dec 2017

Posted Books

in
Advertisements

**17**
*Sunday*
Dec 2017

Posted What is ...

in**OntoSeg**

Text segmentation (TS) aims at dividing long text into coherent segments which reflect the subtopic structure of the text. It is beneficial to many natural language processing tasks, such as Information Retrieval (IR) and document summarisation. Current approaches to text segmentation are similar in that they all use word-frequency metrics to measure the similarity between two regions of text, so that a document is segmented based on the lexical cohesion between its words. Various NLP tasks are now moving towards the semantic web and ontologies, such as ontology-based IR systems, to capture the conceptualizations associated with user needs and contents. Text segmentation based on lexical cohesion between words is hence not sufficient anymore for such tasks. This paper proposes OntoSeg, a novel approach to text segmentation based on the ontological similarity between text blocks. The proposed method uses ontological similarity to explore conceptual relations between text segments and a Hierarchical Agglomerative Clustering (HAC) algorithm to represent the text as a tree-like hierarchy that is conceptually structured. The rich structure of the created tree further allows the segmentation of text in a linear fashion at various levels of granularity. The proposed method was evaluated on a wellknown dataset, and the results show that using ontological similarity in text segmentation is very promising. Also we enhance the proposed method by combining ontological similarity with lexical similarity and the results show an enhancement of the segmentation quality. … **Contrastivecenter Loss**

The deep convolutional neural network(CNN) has significantly raised the performance of image classification and face recognition. Softmax is usually used as supervision, but it only penalizes the classification loss. In this paper, we propose a novel auxiliary supervision signal called contrastivecenter loss, which can further enhance the discriminative power of the features, for it learns a class center for each class. The proposed contrastive-center loss simultaneously considers intra-class compactness and inter-class separability, by penalizing the contrastive values between: (1)the distances of training samples to their corresponding class centers, and (2)the sum of the distances of training samples to their non-corresponding class centers. Experiments on different datasets demonstrate the effectiveness of contrastive-center loss. … **Shared Learning Framework**

Deep Reinforcement Learning has been able to achieve amazing successes in a variety of domains from video games to continuous control by trying to maximize the cumulative reward. However, most of these successes rely on algorithms that require a large amount of data to train in order to obtain results on par with human-level performance. This is not feasible if we are to deploy these systems on real world tasks and hence there has been an increased thrust in exploring data efficient algorithms. To this end, we propose the Shared Learning framework aimed at making $Q$-ensemble algorithms data-efficient. For achieving this, we look into some principles of transfer learning which aim to study the benefits of information exchange across tasks in reinforcement learning and adapt transfer to learning our value function estimates in a novel manner. In this paper, we consider the special case of transfer between the value function estimates in the $Q$-ensemble architecture of BootstrappedDQN. We further empirically demonstrate how our proposed framework can help in speeding up the learning process in $Q$-ensembles with minimum computational overhead on a suite of Atari 2600 Games. …

**17**
*Sunday*
Dec 2017

Posted R Packages

in* Steiner Tree Approach for Graph Analysis* (

A set of graph functions to find Steiner trees on graphs. It provides tools for analysing Steiner tree application on networks. It has applications in biological pathway network analysis (Sadeghi 2013) <doi:10.1186/1471-2105-14-144>.

Provides methods for organizing data in a hypercube (i.e. a multi-dimensional cube). Cubes are generated from molten data frames. Each cube can be manipulated with five operations: rotation (changeDimensionOrder()), dicing and slicing (add.selection(), remove.selection()), drilling down (add.aggregation()), and rolling up (remove.aggregation()).

Fast binning of multiple variables using parallel processing. A summary of all the variables binned is generated which provides the information value, entropy, an indicator of whether the variable follows a monotonic trend or not, etc. It supports rebinning of variables to force a monotonic trend as well as manual binning based on pre specified cuts. The cut points of the bins are based on conditional inference trees as implemented in the partykit package. The conditional inference framework is described by Hothorn T, Hornik K, Zeileis A (2006) <doi:10.1198/106186006X133933>.

Implements projection pursuit forest algorithm for supervised classification.

Provides samplers for various matrix variate distributions: Wishart, inverse-Wishart, normal, t, inverted-t, Beta type I and Beta type II. Allows to simulate the noncentral Wishart distribution without the integer restriction on the degrees of freedom.

Effect modification occurs if a treatment effect is larger or more stable in certain subgroups defined by observed covariates. The submax or subgroup-maximum method of Lee et al. (2017) <arXiv:1702.00525> does an overall test and separate tests in subgroups, correcting for multiple testing using the joint distribution.

**16**
*Saturday*
Dec 2017

Posted Documents

inWe consider the fundamental question: how a legacy ‘student’ Artificial Intelligent (AI) system could learn from a legacy ‘teacher’ AI system or a human expert without complete re-training and, most importantly, without requiring significant computational resources. Here ‘learning’ is understood as an ability of one system to mimic responses of the other and vice-versa. We call such learning an Artificial Intelligence knowledge transfer. We show that if internal variables of the ‘student’ Artificial Intelligent system have the structure of an $n$-dimensional topological vector space and $n$ is sufficiently high then, with probability close to one, the required knowledge transfer can be implemented by simple cascades of linear functionals. In particular, for $n$ sufficiently large, with probability close to one, the ‘student’ system can successfully and non-iteratively learn $k\ll n$ new examples from the ‘teacher’ (or correct the same number of mistakes) at the cost of two additional inner products. The concept is illustrated with an example of knowledge transfer from a pre-trained convolutional neural network to a simple linear classifier with HOG features. Knowledge Transfer Between Artificial Intelligence Systems

**16**
*Saturday*
Dec 2017

Posted Magister Dixit

in“Knowledge is some compilation of data that allows you to make decisions, and what we find today is that computers are making a lot of decisions automatically.” Yann LeCun

**16**
*Saturday*
Dec 2017

Posted Books

in
**16**
*Saturday*
Dec 2017

Posted arXiv Papers

in**The Effectiveness of Data Augmentation in Image Classification using Deep Learning**

In this paper, we explore and compare multiple solutions to the problem of data augmentation in image classification. Previous work has demonstrated the effectiveness of data augmentation through simple techniques, such as cropping, rotating, and flipping input images. We artificially constrain our access to data to a small subset of the ImageNet dataset, and compare each data augmentation technique in turn. One of the more successful data augmentations strategies is the traditional transformations mentioned above. We also experiment with GANs to generate images of different styles. Finally, we propose a method to allow a neural net to learn augmentations that best improve the classifier, which we call neural augmentation. We discuss the successes and shortcomings of this method on various datasets.

**Ellipsoid Method for Linear Programming made simple**

In this paper, ellipsoid method for linear programming is derived using only minimal knowledge of algebra and matrices. Unfortunately, most authors first describe the algorithm, then later prove its correctness, which requires a good knowledge of linear algebra.

Recently there has been a dramatic increase in the performance of recognition systems due to the introduction of deep architectures for representation learning and classification. However, the mathematical reasons for this success remain elusive. This tutorial will review recent work that aims to provide a mathematical justification for several properties of deep networks, such as global optimality, geometric stability, and invariance of the learned representations.

**A short characterization of relative entropy**

We prove characterization theorems for relative entropy (also known as Kullback-Leibler divergence), q-logarithmic entropy (also known as Tsallis entropy), and q-logarithmic relative entropy. All three have been characterized axiomatically before, but we show that earlier proofs can be simplified considerably, at the same time relaxing some of the hypotheses.

**Evolving Unsupervised Deep Neural Networks for Learning Meaningful Representations**

Deep Learning (DL) aims at learning the \emph{meaningful representations}. A meaningful representation refers to the one that gives rise to significant performance improvement of associated Machine Learning (ML) tasks by replacing the raw data as the input. However, optimal architecture design and model parameter estimation in DL algorithms are widely considered to be intractable. Evolutionary algorithms are much preferable for complex and non-convex problems due to its inherent characteristics of gradient-free and insensitivity to local optimum. In this paper, we propose a computationally economical algorithm for evolving \emph{unsupervised deep neural networks} to efficiently learn \emph{meaningful representations}, which is very suitable in the current Big Data era where sufficient labeled data for training is often expensive to acquire. In the proposed algorithm, finding an appropriate architecture and the initialized parameter values for a ML task at hand is modeled by one computational efficient gene encoding approach, which is employed to effectively model the task with a large number of parameters. In addition, a local search strategy is incorporated to facilitate the exploitation search for further improving the performance. Furthermore, a small proportion labeled data is utilized during evolution search to guarantee the learnt representations to be meaningful. The performance of the proposed algorithm has been thoroughly investigated over classification tasks. Specifically, error classification rate on MNIST with is reached by the proposed algorithm consistently, which is a very promising result against state-of-the-art unsupervised DL algorithms.

**MentorNet: Regularizing Very Deep Neural Networks on Corrupted Labels**

Recent studies have discovered that deep networks are capable of memorizing the entire data even when the labels are completely random. Since deep models are trained on big data where labels are often noisy, the ability to overfit noise can lead to poor performance. To overcome the overfitting on corrupted training data, we propose a novel technique to regularize deep networks in the data dimension. This is achieved by learning a neural network called MentorNet to supervise the training of the base network, namely, StudentNet. Our work is inspired by curriculum learning and advances the theory by learning a curriculum from data by neural networks. We demonstrate the efficacy of MentorNet on several benchmarks. Comprehensive experiments show that it is able to significantly improve the generalization performance of the state-of-the-art deep networks on corrupted training data.

**A Two-stage Online Monitoring Procedure for High-Dimensional Data Streams**

Advanced computing and data acquisition technologies have made possible the collection of high-dimensional data streams in many fields. Efficient online monitoring tools which can correctly identify any abnormal data stream for such data are highly sought after. However, most of the existing monitoring procedures directly apply the false discover rate (FDR) controlling procedure to the data at each time point, and the FDR at each time point (the point-wise FDR) is either specified by users or determined by the in-control (IC) average run length (ARL). If the point-wise FDR is specified by users, the resulting procedure lacks control of the global FDR and keeps users in the dark in terms of the IC-ARL. If the point-wise FDR is determined by the IC-ARL, the resulting procedure does not give users the flexibility to choose the number of false alarms (Type-I errors) they can tolerate when identifying abnormal data streams, which often makes the procedure too conservative. To address those limitations, we propose a two-stage monitoring procedure that can control both the IC-ARL and Type-I errors at the levels specified by users. As a result, the proposed procedure allows users to choose not only how often they expect any false alarms when all data streams are IC, but also how many false alarms they can tolerate when identifying abnormal data streams. With this extra flexibility, our proposed two-stage monitoring procedure is shown in the simulation study and real data analysis to outperform the exiting methods.

**Relation Extraction : A Survey**

With the advent of the Internet, large amount of digital text is generated everyday in the form of news articles, research publications, blogs, question answering forums and social media. It is important to develop techniques for extracting information automatically from these documents, as lot of important information is hidden within them. This extracted information can be used to improve access and management of knowledge hidden in large text corpora. Several applications such as Question Answering, Information Retrieval would benefit from this information. Entities like persons and organizations, form the most basic unit of the information. Occurrences of entities in a sentence are often linked through well-defined relations; e.g., occurrences of person and organization in a sentence may be linked through relations such as employed at. The task of Relation Extraction (RE) is to identify such relations automatically. In this paper, we survey several important supervised, semi-supervised and unsupervised RE techniques. We also cover the paradigms of Open Information Extraction (OIE) and Distant Supervision. Finally, we describe some of the recent trends in the RE techniques and possible future research directions. This survey would be useful for three kinds of readers – i) Newcomers in the field who want to quickly learn about RE; ii) Researchers who want to know how the various RE techniques evolved over time and what are possible future research directions and iii) Practitioners who just need to know which RE technique works best in various settings.

**Point-wise Convolutional Neural Network**

Deep learning with 3D data such as reconstructed point clouds and CAD models has received great research interests recently. However, the capability of using point clouds with convolutional neural network has been so far not fully explored. In this technical report, we present a convolutional neural network for semantic segmentation and object recognition with 3D point clouds. At the core of our network is point-wise convolution, a convolution operator that can be applied at each point of a point cloud. Our fully convolutional network design, while being simple to implement, can yield competitive accuracy in both semantic segmentation and object recognition task.

• Transfer Adversarial Hashing for Hamming Space Retrieval

• Balance and Frustration in Signed Networks under Different Contexts

• Stochastic Low-Rank Bandits

• Learning Disentangling and Fusing Networks for Face Completion Under Structured Occlusions

• Asymptotic properties of expansive Galton-Watson trees

• Sixty years of percolation

• Variance reduction via empirical variance minimization: convergence and complexity

• Everything You Always Wanted to Know About TREC RTS* (*But Were Afraid to Ask)

• Convex programming in optimal control and information theory

• Adaptation to criticality through organizational invariance in embodied agents

• Can Balloons Produce Li-Fi? A Disaster Management Perspective

• Stability Selection for Structured Variable Selection

• Energy-Efficient Non-Orthogonal Transmission under Reliability and Finite Blocklength Constraints

• Duality of optimization problems with gauge functions

• Interference Characterization in Downlink Li-Fi Optical Attocell Networks

• UV-GAN: Adversarial Facial UV Map Completion for Pose-invariant Face Recognition

• The Enhanced Hybrid MobileNet

• On Generalized Edge Corona Product of Graphs

• Calculus for directional limiting normal cones and subdifferentials

• Sensitivity of rough differential equations: an approach through the Omega lemma

• Differentiable lower bound for expected BLEU score

• A Quantum Extension of Variational Bayes Inference

• Regularization and Optimization strategies in Deep Convolutional Neural Network

• Efficient Computation of the Stochastic Behavior of Partial Sum Processes

• Bayesian graphical compositional regression for microbiome data

• Gorenstein liaison for toric ideals of graphs

• Limit theorems for the Multiplicative Binomial Distribution (MBD)

• On the critical threshold for continuum AB percolation

• Random permutations without macroscopic cycles

• Error Performance of Wireless Powered Cognitive Relay Networks with Interference Alignment

• On the Capacity of Wireless Powered Cognitive Relay Network with Interference Alignment

• Ergodic Capacity Analysis of Wireless Powered AF Relaying Systems over $α$-$μ$ Fading Channels

• Exponential convergence of testing error for stochastic gradient methods

• Self-normalized Cramer type moderate deviations for martingales

• Penalty Dual Decomposition Method For Nonsmooth Nonconvex Optimization

• Biggins’ Martingale Convergence for Branching Lévy Processes

• Approximation of Sojourn Times of Gaussian Processes

• Random non-Abelian circulant matrices. Spectrum of random convolution operators on large finite groups

• Multiple testing for outlier detection in functional data

• GMM-Based Synthetic Samples for Classification of Hyperspectral Images With Limited Training Data

• Creating New Language and Voice Components for the Updated MaryTTS Text-to-Speech Synthesis Platform

• A Multimodal Corpus of Expert Gaze and Behavior during Phonetic Segmentation Tasks

• Open data, open review and open dialogue in making social sciences plausible

• Generic Machine Learning Inference on Heterogenous Treatment Effects in Randomized Experiments

• A duality principle for a semi-linear model in micro-magnetism

• The Hyperbolic-type Point Process

• Explicit bounds for Lipschitz constant of solution to basic problem in calculus of variations

• Ballpark Crowdsourcing: The Wisdom of Rough Group Comparisons

• Explicit bounds for solutions to optimal control problems

• Symbol detection in online handwritten graphics using Faster R-CNN

• MaskLab: Instance Segmentation by Refining Object Detection with Semantic and Direction Features

• Optimal Stochastic Desencoring and Applications to Calibration of Market Models

• A Permutation Test on Complex Sample Data

• Self-Supervised Depth Learning for Urban Scene Understanding

• Rethinking Spatiotemporal Feature Learning For Video Understanding

• A User-Study on Online Adaptation of Neural Machine Translation to Human Post-Edits

• Active phase for activated random walks on $\mathbb{Z}^d$, $ d \geq 3$, with density less than one and arbitrary sleeping rate

• Rough Fuzzy Quadratic Minimum Spanning Tree Problem

• Spatial-temporal wind field prediction by Artificial Neural Networks

• A study of elliptic partial differential equations with jump diffusion coefficients

• A combinatorial description of the centralizer algebras connected to the Links-Gould Invariant

• Distance magic labelings of product graphs

• Geometric ergodicity for some space-time max-stable Markov chains

• Closing in on Time and Space Optimal Construction of Compressed Indexes

• Refuting the cavity-method threshold for random 3-SAT

• The Edge Universality of Correlated Matrices

• Performance Analysis of Approximate Message Passing for Distributed Compressed Sensing

• Approximate controllability for Navier–Stokes equations in $\mathrm{3D}$ rectangles under Lions boundary conditions

• Reasoning in Systems with Elements that Randomly Switch Characteristics

• FFT-Based Deep Learning Deployment in Embedded Systems

• Statistical physics on a product of trees

• Learning Objectives for Treatment Effect Estimation

• The trisection genus of standard simply connected PL 4-manifolds

• Multiplicative Convolution of Real Asymmetric and Real Antisymmetric Matrices

• Recognizing Linked Domain in Polynomial Time

• Tensor Sensing for RF Tomographic Imaging

• Combination Networks with or without Secrecy Constraints: The Impact of Caching Relays

• Localization of Extended Quantum Objects

• Real-time Egocentric Gesture Recognition on Mobile Head Mounted Displays

• Fractal dimension of interfaces in Edwards-Anderson spin glasses for up to six space dimensions

• An Improved Feedback Coding Scheme for the Wire-tap Channel

• Persistent Memory Programming Abstractions in Context of Concurrent Applications

• Predicting Station-level Hourly Demands in a Large-scale Bike-sharing Network: A Graph Convolutional Neural Network Approach

• The List Linear Arboricity of Graphs

• Permuted composition tableaux, 0-Hecke algebra and labeled binary trees

• QPTAS and Subexponential Algorithm for Maximum Clique on Disk Graphs

• Local False Discovery Rate Based Methods for Multiple Testing of One-Way Classified Hypotheses

• Learning Low-shot facial representations via 2D warping

• Deep Prior

• Lock-free B-slack trees: Highly Space Efficient B-trees

• Unsupervised Histopathology Image Synthesis

• Magnetotransport in a model of a disordered strange metal

• Parametrizations of $k$-Nonnegative Matrices: Cluster Algebras and $k$-Positivity Tests

• Reservation-Based Federated Scheduling for Parallel Real-Time Tasks

• Step bunching with both directions of the current: Vicinal W(110) surfaces versus atomistic scale model

• A Particle Swarm Optimization-based Flexible Convolutional Auto-Encoder for Image Classification

• Pediatric Bone Age Assessment Using Deep Convolutional Neural Networks

• Outcome Based Matching

• Statistical Inference in Fractional Poisson Ornstein-Uhlenbeck Process

• Neural networks catching up with finite differences in solving partial differential equations in higher dimensions

• Nonparametric Adaptive CUSUM Chart for Detecting Arbitrary Distributional Changes

• Quantum ergodicity in the SYK model

• Weakly Supervised Action Localization by Sparse Temporal Pooling Network

• Extreme 3D Face Reconstruction: Looking Past Occlusions

• Learning to Navigate by Growing Deep Networks

• Optimized Sampling for Multiscale Dynamics

• Learning Binary Residual Representations for Domain-specific Video Streaming

• DAMPE squib? Significance of the 1.4 TeV DAMPE excess

• The central limit theorem for the number of clusters of the Arratia flow

• The Sound and the Fury: Hiding Communications in Noisy Wireless Networks with Interference Uncertainty

• Range Queries in Non-blocking $k$-ary Search Trees

• Optimality Of Community Structure In Complex Networks

• Detection and Attention: Diagnosing Pulmonary Lung Cancer from CT by Imitating Physicians

• Corrigendum to ‘SPN graphs: when copositive $=$ SPN’

• Multi-appearance Segmentation and Extended 0-1 Program for Dense Small Object Tracking

• Passing the Brazilian OAB Exam: data preparation and some experiments

• An Enhanced Access Reservation Protocol with a Partial Preamble Transmission Mechanism in NB-IoT Systems

• Learning Compact Recurrent Neural Networks with Block-Term Tensor Decomposition

• A Statistical Model with Qualitative Input

• Queueing Analysis for Block Fading Rayleigh Channels in the Low SNR Regime

• Age of Information in Two-way Updating Systems Using Wireless Power Transfer

• Nonlinearity-tolerant 8D modulation formats by set-partitioning PDM-QPSK

• $\forall \exists \mathbb{R}$-completeness and area-universality

• Optimized Interface Diversity for Ultra-Reliable Low Latency Communication (URLLC)

• Fast robust correlation for high dimensional data

• Structural and computational results on platypus graphs

• Fluctuation Theorem and Thermodynamic Formalism

• Analysis of Latency and MAC-layer Performance for Class A LoRaWAN

• Rasa: Open Source Language Understanding and Dialogue Management

• Rate of Change Analysis for Interestingness Measures

• Towards Deep Modeling of Music Semantics using EEG Regularizers

• Semi-Automatic Algorithm for Breast MRI Lesion Segmentation Using Marker-Controlled Watershed Transformation

• Cellular Automata Applications in Shortest Path Problem

• Constrained BSDEs driven by a non quasi-left-continuous random measure and optimal control of PDMPs on bounded domains

• Approximation Algorithms for Replenishment Problems with Fixed Turnover Times

• Data Structures for Representing Symmetry in Quadratically Constrained Quadratic Programs

• Response of entanglement to annealed vis-à-vis quenched disorder in quantum spin models

• Isogeometric shape optimization for nonlinear ultrasound focusing

• Context-specific independencies for ordinal variables in chain regression models

• Robust Estimation of Similarity Transformation for Visual Object Tracking with Correlation Filters

• Generalized Degrees of Freedom of the Symmetric Cache-Aided MISO Broadcast Channel with Partial CSIT

• Intrinsic Point of Interest Discovery from Trajectory Data

• Image Super-resolution via Feature-augmented Random Forest

• Proximodistal Exploration in Motor Learning as an Emergent Property of Optimization

• The evaluation of geometric Asian power options under time changed mixed fractional Brownian motion

• Poisson brackets symmetry from the pentagon-wheel cocycle in the graph complex

• A Performance Evaluation of Local Features for Image Based 3D Reconstruction

• Strictly proper kernel scores and characteristic kernels on compact spaces

• A Bayesian Clearing Mechanism for Combinatorial Auctions

• Constraint and Mathematical Programming Models for Integrated Port Container Terminal Operations

• A quantum algorithm to train neural networks using low-depth circuits

• Quantifying over boolean announcements

• Prior Distributions for the Bradley-Terry Model of Paired Comparisons

• Deep CNN ensembles and suggestive annotations for infant brain MRI segmentation

• The effect of asymmetry of the coil block on self-assembly in ABC coil-rod-coil triblock copolymers

• Model comparison for Gibbs random fields using noisy reversible jump Markov chain Monte Carlo

• A Probability Monad as the Colimit of Finite Powers

• Analysis and calibration of a linear model for structured cell populations with unidirectional motion : Application to the morphogenesis of ovarian follicles

• Monotonic Chunkwise Attention

• Equilibria in the Tangle

• Partisan gerrymandering with geographically compact districts

• Systems of BSDEs with oblique reflection and related optimal switching problem

• swordfish: Efficient Forecasting of New Physics Searches without Monte Carlo

**16**
*Saturday*
Dec 2017

Posted What is ...

in**Stacked Kernel Network (SKN)**

Kernel methods are powerful tools to capture nonlinear patterns behind data. They implicitly learn high (even infinite) dimensional nonlinear features in the Reproducing Kernel Hilbert Space (RKHS) while making the computation tractable by leveraging the kernel trick. Classic kernel methods learn a single layer of nonlinear features, whose representational power may be limited. Motivated by recent success of deep neural networks (DNNs) that learn multi-layer hierarchical representations, we propose a Stacked Kernel Network (SKN) that learns a hierarchy of RKHS-based nonlinear features. SKN interleaves several layers of nonlinear transformations (from a linear space to a RKHS) and linear transformations (from a RKHS to a linear space). Similar to DNNs, a SKN is composed of multiple layers of hidden units, but each parameterized by a RKHS function rather than a finite-dimensional vector. We propose three ways to represent the RKHS functions in SKN: (1)nonparametric representation, (2)parametric representation and (3)random Fourier feature representation. Furthermore, we expand SKN into CNN architecture called Stacked Kernel Convolutional Network (SKCN). SKCN learning a hierarchy of RKHS-based nonlinear features by convolutional operation with each filter also parameterized by a RKHS function rather than a finite-dimensional matrix in CNN, which is suitable for image inputs. Experiments on various datasets demonstrate the effectiveness of SKN and SKCN, which outperform the competitive methods. … **TauCharts**

Javascript charts with a focus on data, design and flexibility. Free open source D3.js-based library. TauCharts is the data-focused charting library. Our goal – help people to build interactive complex visualizations easily.

Achieve Charting Zen With TauCharts … **AOGParsing Operator**

This paper presents a method of learning qualitatively interpretable models in object detection using popular two-stage region-based ConvNet detection systems (i.e., R-CNN). R-CNN consists of a region proposal network and a RoI (Region-of-Interest) prediction network.By interpretable models, we focus on weakly-supervised extractive rationale generation, that is learning to unfold latent discriminative part configurations of object instances automatically and simultaneously in detection without using any supervision for part configurations. We utilize a top-down hierarchical and compositional grammar model embedded in a directed acyclic AND-OR Graph (AOG) to explore and unfold the space of latent part configurations of RoIs. We propose an AOGParsing operator to substitute the RoIPooling operator widely used in R-CNN, so the proposed method is applicable to many state-of-the-art ConvNet based detection systems. The AOGParsing operator aims to harness both the explainable rigor of top-down hierarchical and compositional grammar models and the discriminative power of bottom-up deep neural networks through end-to-end training. In detection, a bounding box is interpreted by the best parse tree derived from the AOG on-the-fly, which is treated as the extractive rationale generated for interpreting detection. In learning, we propose a folding-unfolding method to train the AOG and ConvNet end-to-end. In experiments, we build on top of the R-FCN and test the proposed method on the PASCAL VOC 2007 and 2012 datasets with performance comparable to state-of-the-art methods. …

**16**
*Saturday*
Dec 2017

Posted Distilled News

in**Dummy Variable for Examining Structural Instability in Regression: An Alternative to Chow Test**

One of the fast growing economies in the era of globalization is the Ethiopian economy. Among the lower income group countries, it has emerged as one of the rare countries to achieve a double digit growth rate in Grows Domestic Product (GDP). However, there is a great deal of debate regarding the double digit growth rate, especially during the recent global recession period. So, it becomes a question of empirical research whether there is a structural change in the relationship between the GDP of Ethiopia and the regressor (time). How do we find out that a structural change has in fact occurred? To answer this question, we consider the GDP of Ethiopia (measured on constant 2010 US$) over the period of 1981 to 2015. Like many other countries in the world, Ethiopia has adopted the policy of regulated globalization during the early nineties of the last century. So, our aim is to whether the GDP of Ethiopia has undergone any structural changes following the major policy shift due to adoption of globalization policy. To answer this question, we have two options in statistical and econometric research. The most important classes of tests on structural change are the tests from the generalized fluctuation test framework (Kuan and Hornik, 1995) on the one hand and tests based on F statistics (Hansen, 1992; Andrews, 1993; Andrews and Ploberger, 1994) on the other. The first class includes in particular the CUSUM and MOSUM tests and the fluctuation test, while the Chow and the supF test belong to the latter. A topic that gained more interest rather recently is to monitor structural change, i.e., to start after a history phase (without structural changes) to analyze new observations and to be able to detect a structural change as soon after its occurrence as possible.

**Exploring data with pandas and MapD using Apache Arrow**

At MapD, we’ve long been big fans of the PyData stack, and are constantly working on ways for our open source GPU-accelerated analytic SQL engine to play nicely with the terrific tools in the most popular stack that supports open data science. We are founding collaborators of GOAI (the GPU Open Analytics Initiative), working with the awesome folks at Anaconda and H2O.ai, and our friends at NVIDIA. In GOAI, we use Apache Arrow to mediate efficient, high-performance data interchange for analytics and AI workflows. A big reason for doing this is to make MapD itself easily accessible to Python tools. For starters, this means supporting modern Python database interfaces like DBAPI. pymapd (built with help from Continuum) is a pythonic interface to MapD’s SQL engine supporting DBAPI 2.0, and it has some extra goodness in being able to use our in-built Arrow support for both data loading and query result output.

**The Line Between Commercial and Industrial Data Science**

The purpose, tasks, and required skillsets are dramatically different for data scientists and their work in commercial and industrial environments.

**10 Surprising Ways Machine Learning is Being Used Today**

1. Predicting whether a criminal defendant is a flight risk.

2. Using Twitter to diagnose psychopathy.

3. Helping cyclists win the Tour de France.

4. Identifying endangered whales.

5. Translating legalese.

6. Preventing money laundering.

7. Figuring out which message board threads will be closed.

8. Predicting hospital wait times.

9. Calculating auction prices.

10. Predicting earthquakes.

2. Using Twitter to diagnose psychopathy.

3. Helping cyclists win the Tour de France.

4. Identifying endangered whales.

5. Translating legalese.

6. Preventing money laundering.

7. Figuring out which message board threads will be closed.

8. Predicting hospital wait times.

9. Calculating auction prices.

10. Predicting earthquakes.

**How to Improve my ML Algorithm? Lessons from Andrew Ng’s experience**

You have worked for weeks on building your machine learning system and the performance is not something you are satisfied with. You think of multiple ways to improve your algorithm’s performance, viz. collect more data, add more hidden units, add more layers, change the network architecture, change the basic algorithm etc. But which one of these will give the best improvement on your system? You can either try them all, invest a lot of time and find out what works for you. OR! You can use the following tips from Ng’s experience

**The 10 Deep Learning Methods AI Practitioners Need to Apply**

Interest in machine learning has exploded over the past decade. You see machine learning in computer science programs, industry conferences, and the Wall Street Journal almost daily. For all the talk about machine learning, many conflate what it can do with what they wish it could do. Fundamentally, machine learning is using algorithms to extract information from raw data and represent it in some type of model. We use this model to infer things about other data we have not yet modeled.

**TensorFlow for Short-Term Stocks Prediction**

In this post you will see an application of Convolutional Neural Networks to stock market prediction, using a combination of stock prices with sentiment analysis.

**Top Data Science and Machine Learning Methods Used in 2017**

The most used methods are Regression, Clustering, Visualization, Decision Trees/Rules, and Random Forests; Deep Learning is used by only 20% of respondents; we also analyze which methods are most ‘industrial’ and most ‘academic’.

**Robust Algorithms for Machine Learning**

Machine learning is often held out as a magical solution to hard problems that will absolve us mere humans from ever having to actually learn anything. But in reality, for data scientists and machine learning engineers, there are a lot of problems that are much more difficult to deal with than simple object recognition in images, or playing board games with finite rule sets. For these majority of problems, it pays to have a variety of approaches to help you reduce the noise and anomalies, to focus on something more tractable. One approach is to design more robust algorithms where the testing error is consistent with the training error, or the performance is stable after adding noise to the dataset1. The idea of any traditional (non-Bayesian) statistical test is the same: we compute a number (called a ‘statistic’) from the data, and use the known distribution of that number to answer the question, ‘What are the odds of this happening by chance?’ That number is the p-value.

**Monitoring and Improving the Performance of Machine Learning Models**

It’s critical to have “humans in the loop” when automating the deployment of machine learning (ML) models. Why? Because models often perform worse over time. This course covers the human directed safeguards that prevent poorly performing models from deploying into production and the techniques for evaluating models over time. We’ll use ModelDB to capture the appropriate metrics that help you identify poorly performing models. We’ll review the many factors that affect model performance (i.e., changing users and user preferences, stale data, etc.) and the variables that lose predictive power. We’ll explain how to utilize classification and prediction scoring methods such as precision recall, ROC, and jaccard similarity. We’ll also show you how ModelDB allows you to track provenance and metrics for model performance and health; how to integrate ModelDB with SparkML; and how to use the ModelDB APIs to store information when training models in Spark ML. Learners should have basic familiarity with the following: Scala or Python; Hadoop, Spark, or Pandas; SBT or Maven; cloud platforms like Amazon Web Services; Bash, Docker, and REST.

**Training and Exporting Machine Learning Models in Spark**

Spark ML provides a rich set of tools and models for training, scoring, evaluating, and exporting machine learning models. This video walks you through each step in the process. You’ll explore the basics of Spark’s DataFrames, Transformer, Estimator, Pipeline, and Parameter, and how to utilize the Spark API to create model uniformity and comparability. You’ll learn how to create meaningful models and labels from a raw dataset; train and score a variety of models; target price predictions; compare results using MAE, MSE, and other scores; and employ the SparkML evaluator to automate the parameter-tuning process using cross validation. To complete the lesson, you’ll learn to export and serialize a Spark trained model as PMML (an industry standard for model serialization), so you can deploy in applications outside the Spark cluster environment.

**Deploying Machine Learning Models as Microservices Using Docker**

Modern applications running in the cloud often rely on REST-based microservices architectures by using Docker containers. Docker enables your applications to communicate between one another and to compose and scale various components. Data scientists use these techniques to efficiently scale their machine learning models to production applications. This video teaches you how to deploy machine learning models behind a REST API—to serve low latency requests from applications—without using a Spark cluster. In the process, you’ll learn how to export models trained in SparkML; how to work with Docker, a convenient way to build, deploy, and ship application code for microservices; and how a model scoring service should support single on-demand predictions and bulk predictions. Learners should have basic familiarity with the following: Scala or Python; Hadoop, Spark, or Pandas; SBT or Maven; cloud platforms like Amazon Web Services; Bash, Docker, and REST.

**Deploying Spark ML Pipelines in Production on AWS**

Translating a Spark application from running in a local environment to running on a production cluster in the cloud requires several critical steps, including publishing artifacts, installing dependencies, and defining the steps in a pipeline. This video is a hands-on guide through the process of deploying your Spark ML pipelines in production. You’ll learn how to create a pipeline that supports model reproducibility—making your machine learning models more reliable—and how to update your pipeline incrementally as the underlying data change. Learners should have basic familiarity with the following: Scala or Python; Hadoop, Spark, or Pandas; SBT or Maven; Amazon Web Services such as S3, EMR, and EC2; Bash, Docker, and REST.

**An Introduction to Machine Learning Models in Production**

This course lays out the common architecture, infrastructure, and theoretical considerations for managing an enterprise machine learning (ML) model pipeline. Because automation is the key to effective operations, you’ll learn about open source tools like Spark, Hive, ModelDB, and Docker and how they’re used to bridge the gap between individual models and a reproducible pipeline. You’ll also learn how effective data teams operate; why they use a common process for building, training, deploying, and maintaining ML models; and how they’re able to seamlessly push models into production. The course is designed for the data engineer transitioning to the cloud and for the data scientist ready to use model deployment pipelines that are reproducible and automated. Learners should have basic familiarity with: cloud platforms like Amazon Web Services; Scala or Python; Hadoop, Spark, or Pandas; SBT or Maven; Bash, Docker, and REST.

**GPU-accelerated TensorFlow on Kubernetes**

Many workflows that utilize TensorFlow need GPUs to efficiently train models on image or video data. Yet, these same workflows typically also involve multi-stage data pre-processing and post-processing, which might not need to run on GPUs. This mix of processing stages, illustrated in Figure 1, results in data science teams running things requiring CPUs in one system while trying to manage GPUs resources separately by yelling across the office: “Hey is anyone using the GPU machine?” A unified methodology is desperately needed for scheduling multi-stage workflows, managing data, and offloading certain portions of the workflows to GPUs.

**Pipes in R Tutorial For Beginners**

You might have already seen or used the pipe operator when you’re working with packages such as dplyr, magrittr,… But do you know where pipes and the famous %>% operator come from, what they exactly are, or how, when and why you should use them? Can you also come up with some alternatives?

**R in the Windows Subsystem for Linux**

R has been available for Windows since the very beginning, but if you have a Windows machine and want to use R within a Linux ecosystem, that’s easy to do with the new Fall Creator’s Update (version 1709). If you need access to the gcc toolchain for building R packages, or simply prefer the bash environment, it’s easy to get things up and running. Once you have things set up, you can launch a bash shell and run R at the terminal like you would in any Linux system. And that’s because this is a Linux system: the Windows Subsystem for Linux is a complete Linux distribution running within Windows. This page provides the details on installing Linux on Windows, but here are the basic steps you need and how to get the latest version of R up and running within it.

In previous posts here, here, and here, we spent quite a bit of time on portfolio volatility, using the standard deviation of returns as a proxy for volatility. Today we will begin to a two-part series on additional statistics that aid our understanding of return dispersion: skewness and kurtosis. Beyond being fancy words and required vocabulary for CFA level 1, these two concepts are both important and fascinating for lovers of returns distributions. For today, we will focus on skewness. Skewness is the degree to which returns are asymmetric around the mean. Since a normal distribution is symmetric around the mean, skewness can be taken as one measure of how returns are not distributed normally. Why does skewness matter? If portfolio returns are right, or positively, skewed, it implies numerous small negative returns and a few large positive returns. If portfolio returns are left, or negatively, skewed, it implies numerous small positive returns and few large negative returns. The phrase “large negative returns” should trigger Pavlovian sweating for investors, even if it’s preceded by a diminutive modifier like “just a few”. For a portfolio manager, a negatively skewed distribution of returns implies a portfolio at risk of rare but large losses. This makes us nervous and is a bit like saying, “I’m healthy, except for my occasional massive heart attack.” Let’s get to it.

The main idea was:

•To ensure reproducibility within a stable working directory tree. She proposes the very concise here::here() but other methods are available such as the template or the ProjectTemplate packages..

•To avoid break havoc in other’s computers with rm(list = ls())!.

•To ensure reproducibility within a stable working directory tree. She proposes the very concise here::here() but other methods are available such as the template or the ProjectTemplate packages..

•To avoid break havoc in other’s computers with rm(list = ls())!.

**Introduction to Computational Linguistics and Dependency Trees in data science**

In recent years, the amalgam of deep learning fundamentals with Natural Language Processing techniques has shown a great improvement in the information mining tasks on unstructured text data. The models are now able to recognize natural language and speech comparable to human levels. Despite such improvements, discrepancies in the results still exist as sometimes the information is coded very deep in the syntaxes and syntactic structures of the corpus.

**Artificial Intelligence and the Move Towards Preventive Healthcare**

In this special guest feature, Waqaas Al-Siddiq, Founder and CEO of Biotricity, discusses how AI’s ability to crunch Big Data will play a key role in the healthcare industry’s shift toward preventative care. A physicians’ ability to find the relevant data they need to make a diagnosis will be augmented by new AI enhanced technologies. Waqaas, the founder of Biotricity, is a serial entrepreneur, a former investment advisor and an expert in wireless communication technology. Academically, he was distinguished for his various innovative designs in digital, analog, embedded, and micro-electro-mechanical products. His work was published in various conferences such as IEEE and the National Communication Council. Waqaas has a dual Bachelor’s degree in Computer Engineering and Economics, a Master’s in Computer Engineering from Rochester Institute of Technology, and a Master’s in Business Administration from Henley Business School. He is completing his Doctorate in Business Administration at Henley, with a focus on Transformative Innovations and Billion Dollar Markets.

**15**
*Friday*
Dec 2017

Posted Magister Dixit

in“Digital leaders know their data. They convert their information into actionable business insight. Considering that more data is shared online every second today than was stored in the entire Internet 20 years ago, it’s no wonder that differentiating products and services requires advanced tools.” Mark Barrenechea ( September 11, 2015 )