Distilled News

Automatic machine learning for data scientists

JustML provides automatic machine learning model selection, training and deployment in the cloud.

Data Exchange and Marketplace, a New Business Model in Making.

The Internet of Things (IoT) refers to the network of numerous physical devices, also known as the Internet of objects, refers to the networked interconnection of everyday objects (20 billion by 2020, according to Gartner). Such devices will be an integral part of next-generation computing, additionally, these devices will produce astronomical data volume, catapulting us into the world of zettabytes and yottabytes. Data is a new Oil, which is a byproduct of doing operations and for others, same data can be a catalyst to capture newer insights, build AI models and drive innovation.

R 3.5.0 is released

The build system rolled up R-3.5.0.tar.gz (codename ‘Joy in Playing’) this morning. The list below details the changes in this release. You can get the source code from http://…/R-3.5.0.tar.gz or wait for it to be mirrored at a CRAN site nearer to you. Binaries for various platforms will appear in due course.

An Introduction to Greta

I was surprised by greta. I had assumed that the tensorflow and reticulate packages would eventually enable R developers to look beyond deep learning applications and exploit the TensorFlow platform to create all manner of production-grade statistical applications. But I wasn’t thinking Bayesian. After all, Stan is probably everything a Bayesian modeler could want. Stan is a powerful, production-level probability distribution modeling engine with a slick R interface, deep documentation, and a dedicated development team. But greta lets users write TensorFlow-based Bayesian models directly in R! What could be more charming? greta removes the barrier of learning an intermediate modeling language while still promising to deliver high-performance MCMC models that run anywhere TensorFlow can go.

Absolute and Weighted Frequency of Words in Text

In this tutorial, you’ll learn about absolute and weighted word frequency in text mining and how to calculate it with defaultdict and pandas DataFrames.

Qualitative before Quantitative: How Qualitative Methods Support Better Data Science

Have you ever been embarrassed by the first iteration of one of your machine learning projects, where you didn’t include obvious and important features? In the practical hustle and bustle of trying to build models, we can often forget about the observation step in the scientific method and jump straight to hypothesis testing.

Swiftapply – automatically efficient pandas apply operations

Easily apply any function to a pandas dataframe in the fastest available manner. Time is precious. There is absolutely no reason to be wasting it waiting for your function to be applied to your pandas series (1 column) or dataframe (>1 columns). Don’t get me wrong, pandas is an amazing tool for python users, and a majority of the time pandas operations are very quick. Here, I wish to take the pandas apply function under close inspection. This function is incredibly useful, because it lets you easily apply any function that you’ve specified to your pandas series or dataframe. But there is a cost — the apply function essentially acts as a for loop, and a slow one at that. This means that the apply function is a linear operation, processing your function at O(n) complexity.

R Packages worth a look

Graphical Tools of Histogram PCA (GraphPCA)
Histogram principal components analysis is the generalization of the PCA. Histogram data are adapted to design complex and big data which histograms used as variables (big data adapter). Functions implemented provides numerical and graphical tools of an extension of PCA. Sun Makosso Kallyth (2016) <doi:10.1002/sam.11270>. Sun Makosso Kallyth and Edwin Diday (2012) <doi:10.1007/s11634-012-0108-0>.

Convert Plot to ‘grob’ or ‘ggplot’ Object (ggplotify)
Convert plot function call (using expression or formula) to ‘grob’ or ‘ggplot’ object that compatible to the ‘grid’ and ‘ggplot2’ ecosystem. With this package, we are able to e.g. using ‘cowplot’ to align plots produced by ‘base’ graphics, ‘grid’, ‘lattice’, ‘vcd’ etc. by converting them to ‘ggplot’ objects.

Linear Model with Tree-Based Lasso Regularization for Rare Features (rare)
Implementation of an alternating direction method of multipliers algorithm for fitting a linear model with tree-based lasso regularization, which is proposed in Algorithm 1 of Yan and Bien (2018) <arXiv:1803.06675>. The package allows efficient model fitting on the entire 2-dimensional regularization path for large datasets. The complete set of functions also makes the entire process of tuning regularization parameters and visualizing results hassle-free.

Tools, Measures and Statistical Tests for Cultural Evolution (cultevo)
Provides tools for measuring the compositionality of signalling systems (in particular the information-theoretic measure due to Spike (2016) <http://…/25930> and the Mantel test for distance matrix correlation (after Dietz 1983) <doi:10.1093/sysbio/32.1.21>), functions for computing string and meaning distance matrices as well as an implementation of the Page test for monotonicity of ranks (Page 1963) <doi:10.1080/01621459.1963.10500843> with exact p-values up to k = 22.

Power Analysis for a SMART Design (smartsizer)
A set of tools for determining the necessary sample size in order to identify the optimal dynamic treatment regime in a sequential, multiple assignment, randomized trial (SMART). Utilizes multiple comparisons with the best methodology to adjust for multiple comparisons. Designed for an arbitrary SMART design. Please see Artman (2018) <arXiv:1804.04587> for more details.

If you did not already know

Takeuchi’s Information Criteria (TIC) google
Takeuchi’s Information Criteria (TIC) is a linearization of maximum likelihood estimator bias which shrinks the model parameters towards the maximum entropy distribution, even when the model is mis-specified. In statistical machine learning, $L_2$ regularization (a.k.a. ridge regression) also introduces a parameterized bias term with the goal of minimizing out-of-sample entropy, but generally requires a numerical solver to find the regularization parameter. …

Latent Sequence Decompositions (LSD) google
We present the Latent Sequence Decompositions (LSD) framework. LSD decomposes sequences with variable lengthed output units as a function of both the input sequence and the output sequence. We present a training algorithm which samples valid extensions and an approximate decoding algorithm. We experiment with the Wall Street Journal speech recognition task. Our LSD model achieves 12.9% WER compared to a character baseline of 14.8% WER. When combined with a convolutional network on the encoder, we achieve 9.2% WER. …

Domain Adaptation (DA) google
Domain Adaptation is a field associated with machine learning and transfer learning. This scenario arises when we aim at learning from a source data distribution a well performing model on a different (but related) target data distribution. For instance, one of the tasks of the common spam filtering problem consists in adapting a model from one user (the source distribution) to a new one who receives significantly different emails (the target distribution). Note that, when more than one source distribution is available the problem is referred to as multi-source domain adaptation.
Domain Adaptation with Randomized Expectation Maximization

Whats new on arXiv

Learning Bayesian Networks from Big Data with Greedy Search: Computational Complexity and Efficient Implementation

Learning the structure of Bayesian networks from data is known to be a computationally challenging, NP-hard problem. The literature has long investigated how to perform structure learning from data containing large numbers of variables, following a general interest in high-dimensional applications (‘small n, large p’) in systems biology and genetics. More recently, data sets with large numbers of observations (the so-called ‘big data’) have become increasingly common; and many of these data sets are not high-dimensional, having only a few tens of variables. We revisit the computational complexity of Bayesian network structure learning in this setting, showing that the common choice of measuring it with the number of estimated local distributions leads to unrealistic time complexity estimates for the most common class of score-based algorithms, greedy search. We then derive more accurate expressions under common distributional assumptions. These expressions suggest that the speed of Bayesian network learning can be improved by taking advantage of the availability of closed form estimators for local distributions with few parents. Furthermore, we find that using predictive instead of in-sample goodness-of-fit scores improves both speed and accuracy at the same time. We demonstrate these results on large real-world environmental data and on reference data sets available from public repositories.

Causal network discovery by iterative conditioning: comparison of algorithms

Estimating causal interactions in complex networks is an important problem encountered in many fields of current science. While a theoretical solution for detecting the graph of causal interactions has been previously formulated in the framework of prediction improvement, it generally requires the computation of high-dimensional information functionals — a situation invoking the curse of dimensionality with increasing network size. Recently, several methods have been proposed to alleviate this problem, based on iterative procedures for assessment of conditional (in)dependences. In the current work, we bring a comparison of several such prominent approaches. This is done both by theoretical comparison of the algorithms using a formulation in a common framework, and by numerical simulations including realistic complex coupling patterns. The theoretical analysis shows that the algorithms are strongly related; including one algorithm being in particular situations equivalent to the first phase of another. Moreover, numerical simulations suggest that the accuracy of most of the algorithms is under suitable parameter choice almost indistinguishable. However, particularly for large networks there are substantial differences in their computational demands, suggesting some of the algorithms are relatively more efficient in the case of sparse networks, while other perform better in the case of dense networks. The most recent variant of the algorithm by Runge et al. then provides a promising speedup particularly for large sparse networks, albeit appears to lead to a substantial decrease in accuracy in some scenarios. Based on the analysis of the reviewed algorithms, we propose a hybrid approach that provides competitive results both concerning computational efficiency and accuracy.

Automatic Language Identification in Texts: A Survey

Language identification (LI) is the problem of determining the natural language that a document or part thereof is written in. Automatic LI has been extensively researched for over fifty years. Today, LI is a key part of many text processing pipelines, as text processing techniques generally assume that the language of the input text is known. Research in this area has recently been especially active. This article provides a brief history of LI research, and an extensive survey of the features and methods used so far in the LI literature. For describing the features and methods we introduce a unified notation. We discuss evaluation methods, applications of LI, as well as off-the-shelf LI systems that do not require training by the end user. Finally, we identify open issues, survey the work to date on each issue, and propose future directions for research in LI.

Knowledge-based end-to-end memory networks

End-to-end dialog systems have become very popular because they hold the promise of learning directly from human to human dialog interaction. Retrieval and Generative methods have been explored in this area with mixed results. A key element that is missing so far, is the incorporation of a-priori knowledge about the task at hand. This knowledge may exist in the form of structured or unstructured information. As a first step towards this direction, we present a novel approach, Knowledge based end-to-end memory networks (KB-memN2N), which allows special handling of named entities for goal-oriented dialog tasks. We present results on two datasets, DSTC6 challenge dataset and dialog bAbI tasks.

Statistical and Economic Evaluation of Time Series Models for Forecasting Arrivals at Call Centers

Call centers’ managers are interested in obtaining accurate point and distributional forecasts of call arrivals in order to achieve an optimal balance between service quality and operating costs. We present a strategy for selecting forecast models of call arrivals which is based on three pillars: (i) flexibility of the loss function; (ii) statistical evaluation of forecast accuracy; (iii) economic evaluation of forecast performance using money metrics. We implement fourteen time series models and seven forecast combination schemes on three series of daily call arrivals. Although we focus mainly on point forecasts, we also analyze density forecast evaluation. We show that second moments modeling is important both for point and density forecasting and that the simple Seasonal Random Walk model is always outperformed by more general specifications. Our results suggest that call center managers should invest in the use of forecast models which describe both first and second moments of call arrivals.

Dropping Networks for Transfer Learning

In natural language understanding, many challenges require learning relationships between two sequences for various tasks such as similarity, relatedness, paraphrasing and question matching. Some of these challenges are inherently closer in nature, hence the knowledge acquired from one task to another is easier acquired and adapted. However, transferring all knowledge might be undesired and can lead to sub-optimal results due to \textit{negative} transfer. Hence, this paper focuses on the transferability of both instances and parameters across natural language understanding tasks using an ensemble-based transfer learning method to circumvent such issues. The primary contribution of this paper is the combination of both \textit{Dropout} and \textit{Bagging} for improved transferability in neural networks, referred to as \textit{Dropping} herein. Secondly, we present a straightforward yet novel approach to incorporating source \textit{Dropping} Networks to a target task for few-shot learning that mitigates \textit{negative} transfer. This is achieved by using a decaying parameter chosen according to the slope changes of a smoothed spline error curve at sub-intervals during training. We compare the approach over the hard parameter sharing, soft parameter sharing and single-task learning to compare its effectiveness. The aforementioned adjustment leads to improved transfer learning performance and comparable results to the current state of the art only using few instances from the target task.

Measurement Errors in R

This paper presents an R package to handle and represent measurements with errors in a very simple way. We briefly introduce the main concepts of metrology and propagation of uncertainty, and discuss related R packages. Building upon this, we introduce the ‘errors’ package, which provides a class for associating uncertainty metadata, automated propagation and reporting. Working with ‘errors’ enables transparent, lightweight, less error-prone handling and convenient representation of measurements with errors. Finally, we discuss the advantages, limitations and future work of computing with errors.

Approximate Abstractions of Markov Chains with Interval Decision Processes (Extended Version)

This work introduces a new abstraction technique for reducing the state space of large, discrete-time labelled Markov chains. The abstraction leverages the semantics of interval Markov decision processes and the existing notion of approximate probabilistic bisimulation. Whilst standard abstractions make use of abstract points that are taken from the state space of the concrete model and which serve as representatives for sets of concrete states, in this work the abstract structure is constructed considering abstract points that are not necessarily selected from the states of the concrete model, rather they are a function of these states. The resulting model presents a smaller one-step bisimulation error, when compared to a like-sized, standard Markov chain abstraction. We outline a method to perform probabilistic model checking, and show that the computational complexity of the new method is comparable to that of standard abstractions based on approximate probabilistic bisimulations.

Towards Symbolic Reinforcement Learning with Common Sense

Deep Reinforcement Learning (deep RL) has made several breakthroughs in recent years in applications ranging from complex control tasks in unmanned vehicles to game playing. Despite their success, deep RL still lacks several important capacities of human intelligence, such as transfer learning, abstraction and interpretability. Deep Symbolic Reinforcement Learning (DSRL) seeks to incorporate such capacities to deep Q-networks (DQN) by learning a relevant symbolic representation prior to using Q-learning. In this paper, we propose a novel extension of DSRL, which we call Symbolic Reinforcement Learning with Common Sense (SRL+CS), offering a better balance between generalization and specialization, inspired by principles of common sense when assigning rewards and aggregating Q-values. Experiments reported in this paper show that SRL+CS learns consistently faster than Q-learning and DSRL, achieving also a higher accuracy. In the hardest case, where agents were trained in a deterministic environment and tested in a random environment, SRL+CS achieves nearly 100% average accuracy compared to DSRL’s 70% and DQN’s 50% accuracy. To the best of our knowledge, this is the first case of near perfect zero-shot transfer learning using Reinforcement Learning.

Black-box Adversarial Attacks with Limited Queries and Information

Current neural network-based classifiers are susceptible to adversarial examples even in the black-box setting, where the attacker only has query access to the model. In practice, the threat model for real-world systems is often more restrictive than the typical black-box model of full query access. We define three realistic threat models that more accurately characterize many real-world classifiers: the query-limited setting, the partial-information setting, and the label-only setting. We develop new attacks that fool classifiers under these more restrictive threat models, where previous methods would be impractical or ineffective. We demonstrate that our methods are effective against an ImageNet classifier under our proposed threat models. We also demonstrate a targeted black-box attack against a commercial classifier, overcoming the challenges of limited query access, partial information, and other practical issues to attack the Google Cloud Vision API.

Exploiting Partially Annotated Data for Temporal Relation Extraction
Mixed Quality of Service in Cell-Free Massive MIMO
LightRel SemEval-2018 Task 7: Lightweight and Fast Relation Classification
Learn and Pick Right Nodes to Offload
Exploiting Prior Information in Block Sparse Signals
Robust User Scheduling with COST 2100 Channel Model for Massive MIMO Networks
On the Impact of Fixed Point Hardware for Optical Fiber Nonlinearity Compensation Algorithms
A Robust Process to Identify Pivots inside Sub-communities In Social Networks
Understanding UAV Cellular Communications: From Existing Networks to Massive MIMO
Question Answering Resources Applied to Slot-Filling
Optimum Sidelobe Level Reduction in ACF of NLFM Waveform
Sparse Travel Time Estimation from Streaming Data
SolidWorx: A Resilient and Trustworthy Transactive Platform for Smart and Connected Communities
Complex Network Analysis of Men Single ATP Tennis Matches
Same Representation, Different Attentions: Shareable Sentence Representation Learning from Multiple Tasks
Maximum entropy priors with derived parameters in a specified distribution
Union bound for quantum information processing
Micro-Net: A unified model for segmentation of various objects in microscopy images
The connected metric dimension at a vertex of a graph
Deep Learning in Spiking Neural Networks
Price Competition with Geometric Brownian motion in Exchange Rate Uncertainty
Local White Matter Architecture Defines Functional Brain Dynamics
Boolean functions on high-dimensional expanders
Word Embedding Perturbation for Sentence Classification
A Deep Convolutional Neural Network for Lung Cancer Diagnostic
Maximizing Profit with Convex Costs in the Random-order Model
Stochastic Dynamics II: Finite Random Dynamical Systems, Linear Representation, and Entropy Production
Torus polynomials: an algebraic approach to ACC lower bounds
Towards Practical Constrained Monotone Submodular Maximization
Large Receptive Field Networks for High-Scale Image Super-Resolution
Exchange of Renewable Energy among Prosumers using Blockchain with Dynamic Pricing
Advancing Tabu and Restart in Local Search for Maximum Weight Cliques
Embedding Hypertrees into Steiner Triple Systems
Parabolically induced functions and equidistributed pairs
syGlass: Interactive Exploration of Multidimensional Images Using Virtual Reality Head-mounted Displays
A neural interlingua for multilingual machine translation
Linguistically-Informed Self-Attention for Semantic Role Labeling
Spell Once, Summon Anywhere: A Two-Level Open-Vocabulary Language Model
Towards a Unified Natural Language Inference Framework to Evaluate Sentence Representations
Efficient Object Tracking based on Circular and Structural Multi-level Learners
Synthesizing Distributed Energy Resources in Microgrids with Temporal Logic Specifications
q-Gauss summation formula and q-Analogues of Ramanujan-type series for 1/pi
The amazing world of simplicial complexes
Some New Constructions of Quantum MDS Codes
Poisson statistics at the edge of Gaussian $β$-ensemble at high temperature
Mem2Seq: Effectively Incorporating Knowledge Bases into End-to-End Task-Oriented Dialog Systems
Econometric Modeling of Regional Electricity Spot Prices in the Australian Market
Adapted Performance Assessment For Drivers Through Behavioral Advantage
Multi-scale prediction for robust hand detection and classification
A direct approach to false discovery rates by decoy permutations
Parsing Tweets into Universal Dependencies
An Empirical Comparison of PDDL-based and ASP-based Task Planners
Adapting Blockchain Technology for Scientific Computing
N-fold Superposition: Improving Neural Networks by Reducing the Noise in Feature Maps
PeRView: A Framework for Personalized Review Selection Using Micro-Reviews
How Bad is the Freedom to Flood-It?
Constructing Locally Dense Point Clouds Using OpenSfM and ORB-SLAM2
Constructing Permutation Arrays using Partition and Extension
Memory Attention Networks for Skeleton-based Action Recognition
Progressive refinement: a method of coarse-to-fine image parsing using stacked network
On the Relationship Between Ehrhart Unimodality and Ehrhart Positivity
Clinical Assistant Diagnosis for Electronic Medical Record Based on Convolutional Neural Network
On the Diachronic Stability of Irregularity in Inflectional Morphology
Spectral characterization of the complete graph removing a path of small length
To Create What You Tell: Generating Videos from Captions
Deterministic and Randomized Diffusion based Iterative Generalized Hard Thresholding (DiFIGHT) for Distributed Sparse Signal Recovery
NLITrans at SemEval-2018 Task 12: Transfer of Semantic Knowledge for Argument Comprehension
Jointly Localizing and Describing Events for Dense Video Captioning
Deep Semantic Hashing with Generative Adversarial Networks
PlusEmo2Vec at SemEval-2018 Task 1: Exploiting emotion knowledge from emoji and #hashtags
Memory Matching Networks for One-Shot Image Recognition
On the Design of an Intelligent Speed Advisory System for Cyclists
Succinct Oblivious RAM
Fully Convolutional Adaptation Networks for Semantic Segmentation
Guaranteeing Consistency in a Motion Planning and Control Architecture Using a Kinematic Bicycle Model
MVTec D2S: Densely Segmented Supermarket Dataset
Layered Based Augmented Complex Kalman Filter for Fast Forecasting-Aided State Estimation of Distribution Networks
Representational Issues in the Debate on the Standard Model of the Mind
Deep cross-domain building extraction for selective depth estimation from oblique aerial imagery
A Multi-Beam NOMA Framework for Hybrid mmWave Systems
Minimum Symbol Error Rate-Based Constant Envelope Precoding for Multiuser Massive MISO Downlink
Exploiting Semantics in Neural Machine Translation with Graph Convolutional Networks
Making an Appraiser Work for You
Bilingual Embeddings with Random Walks over Multilingual Wordnets
Online Non-Preemptive Scheduling to Minimize Weighted Flow-time on Unrelated Machines
Asymptotics of even-even correlations in the Ising model
Varieties of Signature Tensors
Understanding Cross-sectional Dependence in Panel Data
Taskonomy: Disentangling Task Transfer Learning
Energy Efficiency of Rate-Splitting Multiple Access, and Performance Benefits over SDMA and NOMA
Client Selection for Federated Learning with Heterogeneous Resources in Mobile Edge
Semantic Parsing with Syntax- and Table-Aware SQL Generation
Randomized Mixture Models for Probability Density Approximation and Estimation
Optimal B-Robust Estimation for the Parameters of Marshall-Olkin Extended Burr XII Distribution and Application for Modeling Data from Pharmacokinetics Study
Deep Facial Expression Recognition: A Survey
Multi-focus Image Fusion using dictionary learning and Low-Rank Representation
DenseFuse: A Fusion Approach to Infrared and Visible Images
Log-transformed kernel density estimation for positive data
VLocNet++: Deep Multitask Learning for Semantic Visual Localization and Odometry
Gaussian Material Synthesis
Convolutional capsule network for classification of breast cancer histology images
BrainSlug: Transparent Acceleration of Deep Learning Through Depth-First Parallelism
STAN: Spatio-Temporal Adversarial Networks for Abnormal Event Detection
Nonlinear state-space modelling of the kinematics of an oscillating circular cylinder in a fluid flow
Stable Lévy processes in a cone
Path Planning in Support of Smart Mobility Applications using Generative Adversarial Networks
Some properties of Bowlin and Brin’s color graphs
Abdominal multi-organ segmentation with organ-attention networks and statistical fusion
An algorithm to compute the Hoffman constant of a system of linear constraints
Efficient Pose Tracking from Natural Features in Standard Web Browsers
Nesting Monte Carlo for high-dimensional Non Linear PDEs
A Spoofing Benchmark for the 2018 Voice Conversion Challenge: Leveraging from Spoofing Countermeasures for Speech Artifact Assessment
Viability approach to finite-time stability
Explicit solutions to utility maximization problems in a regime-switching market model via Laplace transforms
Decorrelated Batch Normalization
Attention Based Natural Language Grounding by Navigating Virtual Environment
Mixing Context Granularities for Improved Entity Linking on Question Answering Data across Entity Categories
Error Bounds for FDD Massive MIMO Channel Covariance Conversion with Set-Theoretic Methods
One-dimensional scaling limits in a planar Laplacian random growth model
Robust Beamforming with Pilot Reuse Scheduling in a Heterogeneous Cloud Radio Access Network
Joint Enhancement and Denoising Method via Sequential Decomposition
The Neumann Boundary Problem for Elliptic Partial Differential Equations with Nonlinear Divergence Terms
High Dimensional Estimation and Multi-Factor Models
Beyond Narrative Description: Generating Poetry from Images by Multi-Adversarial Training
Cosmic Microwave Background Constraints in Light of Priors Over Reionization Histories
ASR Performance Prediction on Unseen Broadcast Programs using Convolutional Neural Networks
Systems of Reflected Stochastic PDEs in a Convex Domain: Analytical Approach
Parallel and I/O-efficient Randomisation of Massive Networks using Global Curveball Trades
On the Asymptotic Normality of Adaptive Multilevel Splitting
Correlations for symplectic and orthogonal Schur measures
ALIGNet: Partial-Shape Agnostic Alignment via Unsupervised Learning
Person Identification from Partial Gait Cycle Using Fully Convolutional Neural Network
On the Capacity of the Peak Power Constrained Vector Gaussian Channel: An Estimation Theoretic Perspective
A New Channel Boosted Convolution Neural Network using Transfer Learning
VectorDefense: Vectorization as a Defense to Adversarial Examples
From the master equation to mean field game limit theory: A central limit theorem
A Conditional Gradient Framework for Composite Convex Minimization with Applications to Semidefinite Programming
Entropy bounds for grammar compression
Eigenvector Computation and Community Detection in Asynchronous Gossip Models
From the master equation to mean field game limit theory: Large deviations and concentration of measure
On the circular correlation coefficients for bivariate von Mises distributions on a torus
False Information on Web and Social Media: A Survey
Spatio-Temporal Neural Networks for Space-Time Series Forecasting and Relations Discovery
Measuring Within and Between Group Inequality in Early-Life Mortality Over Time: A Bayesian Approach with Application to India
Light-weight Head Pose Invariant Gaze Tracking
Perron-Frobenius Theorem for Rectangular Tensors and Directed Hypergraphs
Leveraging Friendship Networks for Dynamic Link Prediction in Social Interaction Networks
The random normal matrix model: insertion of a point charge
Guided Attention for Large Scale Scene Text Verification
Single-User mmWave Massive MIMO: SVD-based ADC Bit Allocation and Combiner Design
On capacities of the two-user union channel with complete feedback
Towards Learning Sparsely Used Dictionaries with Arbitrary Supports
Zero-Shot Visual Imitation
Constrained optimal design of automotive radar arrays using the Weiss-Weinstein Bound
Benchmarking projective simulation in navigation problems

Book Memo: “Introduction to Nature-Inspired Optimization”

Introduction to Nature-Inspired Optimization brings together many of the innovative mathematical methods for non-linear optimization that have their origins in the way various species behave in order to optimize their chances of survival. The book describes each method, examines their strengths and weaknesses, and where appropriate, provides the MATLAB code to give practical insight into the detailed structure of these methods and how they work.
Nature-inspired algorithms emulate processes that are found in the natural world, spurring interest for optimization. Lindfield/Penny provide concise coverage to all the major algorithms, including genetic algorithms, artificial bee colony algorithms, ant colony optimization and the cuckoo search algorithm, among others. This book provides a quick reference to practicing engineers, researchers and graduate students who work in the field of optimization.
• Applies concepts in nature and biology to develop new algorithms for nonlinear optimization
• Offers working MATLAB® programs for the major algorithms described, applying them to a range of problems
• Provides useful comparative studies of the algorithms, highlighting their strengths and weaknesses
• Discusses the current state-of-the-field and indicates possible areas of future development

Book Memo: “Natural Language Processing with PyTorch”

Build Intelligent Language Applications Using Deep Learning
Natural Language Processing (NLP) offers unbounded opportunities for solving interesting problems in artificial intelligence, making it the latest frontier for developing intelligent, deep learning-based applications. If you’re a developer or researcher ready to dive deeper into this rapidly growing area of artificial intelligence, this practical book shows you how to use the PyTorch deep learning framework to implement recently discovered NLP techniques. To get started, all you need is a machine learning background and experience programming with Python. Authors Delip Rao and Goku Mohandas provide you with a solid grounding in PyTorch, and deep learning algorithms, for building applications involving semantic representation of text. Each chapter includes several code examples and illustrations.

Document worth reading: “One Big Net For Everything”

I apply recent work on ‘learning to think’ (2015) and on PowerPlay (2011) to the incremental training of an increasingly general problem solver, continually learning to solve new tasks without forgetting previous skills. The problem solver is a single recurrent neural network (or similar general purpose computer) called ONE. ONE is unusual in the sense that it is trained in various ways, e.g., by black box optimization / reinforcement learning / artificial evolution as well as supervised / unsupervised learning. For example, ONE may learn through neuroevolution to control a robot through environment-changing actions, and learn through unsupervised gradient descent to predict future inputs and vector-valued reward signals as suggested in 1990. User-given tasks can be defined through extra goal-defining input patterns, also proposed in 1990. Suppose ONE has already learned many skills. Now a copy of ONE can be re-trained to learn a new skill, e.g., through neuroevolution without a teacher. Here it may profit from re-using previously learned subroutines, but it may also forget previous skills. Then ONE is retrained in PowerPlay style (2011) on stored input/output traces of (a) ONE’s copy executing the new skill and (b) previous instances of ONE whose skills are still considered worth memorizing. Simultaneously, ONE is retrained on old traces (even those of unsuccessful trials) to become a better predictor, without additional expensive interaction with the enviroment. More and more control and prediction skills are thus collapsed into ONE, like in the chunker-automatizer system of the neural history compressor (1991). This forces ONE to relate partially analogous skills (with shared algorithmic information) to each other, creating common subroutines in form of shared subnetworks of ONE, to greatly speed up subsequent learning of additional, novel but algorithmically related skills. One Big Net For Everything

Whats new on arXiv

Learning More Robust Features with Adversarial Training

In recent years, it has been found that neural networks can be easily fooled by adversarial examples, which is a potential safety hazard in some safety-critical applications. Many researchers have proposed various method to make neural networks more robust to white-box adversarial attacks, but an effective method have not been found so far. In this short paper, we focus on the robustness of the features learned by neural networks. We show that the features learned by neural networks are not robust, and find that the robustness of the learned features is closely related to the resistance against adversarial examples of neural networks. We also find that adversarial training against fast gradients sign method (FGSM) does not make the leaned features very robust, even if it can make the trained networks very resistant to FGSM attack. Then we propose a method, which can be seen as an extension of adversarial training, to train neural networks to learn more robust features. We perform experiments on MNIST and CIFAR-10 to evaluate our method, and the experiment results show that this method greatly improves the robustness of the learned features and the resistance to adversarial attacks.

PEORL: Integrating Symbolic Planning and Hierarchical Reinforcement Learning for Robust Decision-Making

Reinforcement learning and symbolic planning have both been used to build intelligent autonomous agents. Reinforcement learning relies on learning from interactions with real world, which often requires an unfeasibly large amount of experience. Symbolic planning relies on manually crafted symbolic knowledge, which may not be robust to domain uncertainties and changes. In this paper we present a unified framework {\em PEORL} that integrates symbolic planning with hierarchical reinforcement learning (HRL) to cope with decision-making in a dynamic environment with uncertainties. Symbolic plans are used to guide the agent’s task execution and learning, and the learned experience is fed back to symbolic knowledge to improve planning. This method leads to rapid policy search and robust symbolic plans in complex domains. The framework is tested on benchmark domains of HRL.

Right Answer for the Wrong Reason: Discovery and Mitigation

Exposing the weaknesses of neural models is crucial for improving their performance and robustness in real-world applications. One common approach is to examine how input perturbations affect the output. Our analysis takes this to an extreme on natural language processing tasks by removing as many words as possible from the input without changing the model prediction. For question answering and natural language inference, this of- ten reduces the inputs to just one or two words, while model confidence remains largely unchanged. This is an undesireable behavior: the model gets the Right Answer for the Wrong Reason (RAWR). We introduce a simple training technique that mitigates this problem while maintaining performance on regular examples.

Value-aware Quantization for Training and Inference of Neural Networks

We propose a novel value-aware quantization which applies aggressively reduced precision to the majority of data while separately handling a small amount of large data in high precision, which reduces total quantization errors under very low precision. We present new techniques to apply the proposed quantization to training and inference. The experiments show that our method with 3-bit activations (with 2% of large ones) can give the same training accuracy as full-precision one while offering significant (41.6% and 53.7%) reductions in the memory cost of activations in ResNet-152 and Inception-v3 compared with the state-of-the-art method. Our experiments also show that deep networks such as Inception-v3, ResNet-101 and DenseNet-121 can be quantized for inference with 4-bit weights and activations (with 1% 16-bit data) within 1% top-1 accuracy drop.

Understanding AI Data Repositories with Automatic Query Generation

We describe a set of techniques to generate queries automatically based on one or more ingested, input corpuses. These queries require no a priori domain knowledge, and hence no human domain experts. Thus, these auto-generated queries help address the epistemological question of how we know what we know, or more precisely in this case, how an AI system with ingested data knows what it knows. These auto-generated queries can also be used to identify and remedy problem areas in ingested material — areas for which the knowledge of the AI system is incomplete or even erroneous. Similarly, the proposed techniques facilitate tests of AI capability — both in terms of coverage and accuracy. By removing humans from the main learning loop, our approach also allows more effective scaling of AI and cognitive capabilities to provide (1) broader coverage in a single domain such as health or geology; and (2) more rapid deployment to new domains. The proposed techniques also allow ingested knowledge to be extended naturally. Our investigations are early, and this paper provides a description of the techniques. Assessment of their efficacy is our next step for future work.

Sequential Network Transfer: Adapting Sentence Embeddings to Human Activities and Beyond

We study the problem of adapting neural sentence embedding models to the domain of human activities to capture their relations in different dimensions. We introduce a novel approach, Sequential Network Transfer, and show that it largely improves the performance on all dimensions. We also extend this approach to other semantic similarity datasets, and show that the resulting embeddings outperform traditional transfer learning approaches in many cases, achieving state-of-the-art results on the Semantic Textual Similarity (STS) Benchmark. To account for the improvements, we provide some interpretation of what the networks have learned. Our results suggest that Sequential Network Transfer is highly effective for various sentence embedding models and tasks.

CactusNets: Layer Applicability as a Metric for Transfer Learning

Deep neural networks trained over large datasets learn features that are both generic to the whole dataset, and specific to individual classes in the dataset. Learned features tend towards generic in the lower layers and specific in the higher layers of a network. Methods like fine-tuning are made possible because of the ability for one filter to apply to multiple target classes. Much like the human brain this behavior, can also be used to cluster and separate classes. However, to the best of our knowledge there is no metric for how applicable learned features are to specific classes. In this paper we propose a definition and metric for measuring the applicability of learned features to individual classes, and use this applicability metric to estimate input applicability and produce a new method of unsupervised learning we call the CactusNet.

What’s Going On in Neural Constituency Parsers? An Analysis

A number of differences have emerged between modern and classic approaches to constituency parsing in recent years, with structural components like grammars and feature-rich lexicons becoming less central while recurrent neural network representations rise in popularity. The goal of this work is to analyze the extent to which information provided directly by the model structure in classical systems is still being captured by neural methods. To this end, we propose a high-performance neural model (92.08 F1 on PTB) that is representative of recent work and perform a series of investigative experiments. We find that our model implicitly learns to encode much of the same information that was explicitly provided by grammars and lexicons in the past, indicating that this scaffolding can largely be subsumed by powerful general-purpose neural machinery.

Probabilistic Analysis of Balancing Scores for Causal Inference

Propensity scores are often used for stratification of treatment and control groups of subjects in observational data to remove confounding bias when estimating of causal effect of the treatment on an outcome in so-called potential outcome causal modeling framework. In this article, we try to get some insights into basic behavior of the propensity scores in a probabilistic sense. We do a simple analysis of their usage confining to the case of discrete confounding covariates and outcomes. While making clear about behavior of the propensity score our analysis shows how the so-called prognostic score can be derived simultaneously. However the prognostic score is derived in a limited sense in the current literature whereas our derivation is more general and shows all possibilities of having the score. And we call it outcome score. We argue that application of both the propensity score and the outcome score is the most efficient way for reduction of dimension in the confounding covariates as opposed to current belief that the propensity score alone is the most efficient way.

Is feature selection secure against training data poisoning?

Learning in adversarial settings is becoming an important task for application domains where attackers may inject malicious data into the training set to subvert normal operation of data-driven technologies. Feature selection has been widely used in machine learning for security applications to improve generalization and computational efficiency, although it is not clear whether its use may be beneficial or even counterproductive when training data are poisoned by intelligent attackers. In this work, we shed light on this issue by providing a framework to investigate the robustness of popular feature selection methods, including LASSO, ridge regression and the elastic net. Our results on malware detection show that feature selection methods can be significantly compromised under attack (we can reduce LASSO to almost random choices of feature sets by careful insertion of less than 5% poisoned training samples), highlighting the need for specific countermeasures.

A geometric view on Pearson’s correlation coefficient and a generalization of it to non-linear dependencies

Measuring strength or degree of statistical dependence between two random variables is a common problem in many domains. Pearson’s correlation coefficient \rho is an accurate measure of linear dependence. We show that \rho is a normalized, Euclidean type distance between joint probability distribution of the two random variables and that when their independence is assumed while keeping their marginal distributions. And the normalizing constant is the geometric mean of two maximal distances, each between the joint probability distribution when the full linear dependence is assumed while preserving respective marginal distribution and that when the independence is assumed. Usage of it is restricted to linear dependence because it is based on Euclidean type distances that are generally not metrics and considered full dependence is linear. Therefore, we argue that if a suitable distance metric is used while considering all possible maximal dependences then it can measure any non-linear dependence. But then, one must define all the full dependences. Hellinger distance that is a metric can be used as the distance measure between probability distributions and obtain a generalization of \rho for the discrete case.

Viewing Simpson’s Paradox

Well known Simpson’s paradox is puzzling and surprising for many, especially for the empirical researchers and users of statistics. However there is no surprise as far as mathematical details are concerned. A lot more is written about the paradox but most of them are beyond the grasp of such users. This short article is about explaining the phenomenon in an easy way to grasp using simple algebra and geometry. The mathematical conditions under which the paradox can occur are made explicit and a simple geometrical illustrations is used to describe it. We consider the reversal of the association between two binary variables, say, X and Y by a third binary variable, say, Z. We show that it is always possible to define Z algebraically for non-extreme dependence between X and Y, therefore occurrence of the paradox depends on identifying it with a practical meaning for it in a given context of interest, that is up to the subject domain expert. And finally we discuss the paradox in predictive contexts since in literature it is argued that the paradox is resolved using causal reasoning.

Generative Stock Question Answering

We study the problem of stock related question answering (StockQA): automatically generating answers to stock related questions, just like professional stock analysts providing action recommendations to stocks upon user’s requests. StockQA is quite different from previous QA tasks since (1) the answers in StockQA are natural language sentences (rather than entities or values) and due to the dynamic nature of StockQA, it is scarcely possible to get reasonable answers in an extractive way from the training data; and (2) StockQA requires properly analyzing the relationship between keywords in QA pair and the numerical features of a stock. We propose to address the problem with a memory-augmented encoder-decoder architecture, and integrate different mechanisms of number understanding and generation, which is a critical component of StockQA. We build a large-scale Chinese dataset containing over 180K StockQA instances, based on which various technique combinations are extensively studied and compared. Experimental results show that a hybrid word-character model with separate character components for number processing, achieves the best performance.\footnote{The data is publicly available at \url{http://…/}.}

Expert Finding in Community Question Answering: A Review

The rapid development recently of Community Question Answering (CQA) satisfies users quest for professional and personal knowledge about anything. In CQA, one central issue is to find users with expertise and willingness to answer the given questions. Expert finding in CQA often exhibits very different challenges compared to traditional methods. Sparse data and new features violate fundamental assumptions of traditional recommendation systems. This paper focuses on reviewing and categorizing the current progress on expert finding in CQA. We classify all the existing solutions into four different categories: matrix factorization based models (MF-based models), gradient boosting tree based models (GBT-based models), deep learning based models (DL-based models) and ranking based models (R-based models). We find that MF-based models outperform other categories of models in the field of expert finding in CQA. Moreover, we use innovative diagrams to clarify several important concepts of ensemble learning, and find that ensemble models with several specific single models can further boosting the performance. Further, we compare the performance of different models on different types of matching tasks, including text vs. text, graph vs. text, audio vs. text and video vs. text. The results can help the model selection of expert finding in practice. Finally, we explore some potential future issues in expert finding research in CQA.

Empirical Equilibrium

We introduce empirical equilibrium, the prediction in a game that selects the Nash equilibria that can be approximated by a sequence of payoff-monotone distributions, a well-documented proxy for empirically plausible behavior. Then, we reevaluate implementation theory based on this equilibrium concept. We show that in a partnership dissolution environment with complete information, two popular auctions that are essentially equivalent for the Nash equilibrium prediction, can be expected to differ in fundamental ways when they are operated. Besides the direct policy implications, two general consequences follow. First, a mechanism designer may not be constrained by typical invariance properties. Second, a mechanism designer who does not account for the empirical plausibility of equilibria may inadvertently design implicitly biased mechanisms.

Generating Natural Language Adversarial Examples

Deep neural networks (DNNs) are vulnerable to adversarial examples, perturbations to correctly classified examples which can cause the network to misclassify. In the image domain, these perturbations can often be made virtually indistinguishable to human perception, causing humans and state-of-the-art models to disagree. However, in the natural language domain, small perturbations are clearly perceptible, and the replacement of a single word can drastically alter the semantics of the document. Given these challenges, we use a population-based optimization algorithm to generate semantically and syntactically similar adversarial examples. We demonstrate via a human study that 94.3% of the generated examples are classified to the original label by human evaluators, and that the examples are perceptibly quite similar. We hope our findings encourage researchers to pursue improving the robustness of DNNs in the natural language domain.

Swarm Intelligence: Past, Present and Future

Many optimization problems in science and engineering are challenging to solve, and the current trend is to use swarm intelligence (SI) and SI-based algorithms to tackle such challenging problems. Some significant developments have been made in recent years, though there are still many open problems in this area. This paper provides a short but timely analysis about SI-based algorithms and their links with self-organization. Different characteristics and properties are analyzed here from both mathematical and qualitative perspectives. Future research directions are outlined and open questions are also highlighted.

Fine-grained Entity Typing through Increased Discourse Context and Adaptive Classification Thresholds

Fine-grained entity typing is the task of assigning fine-grained semantic types to entity mentions. We propose a neural architecture which learns a distributional semantic representation that leverages a greater amount of semantic context — both document and sentence level information — than prior work. We find that additional context improves performance, with further improvements gained by utilizing adaptive classification thresholds. Experiments show that our approach without reliance on hand-crafted features achieves the state-of-the-art results on three benchmark datasets.

Differentially Private k-Means with Constant Multiplicative Error

We design new differentially private algorithms for the Euclidean k-means problem, both in the centralized model and in the local model of differential privacy. In both models, our algorithms achieve significantly improved error rates over the previous state-of-the-art. In addition, in the local model, our algorithm significantly reduces the number of needed interactions. Although the problem has been widely studied in the context of differential privacy, all of the existing constructions achieve only super constant approximation factors. We present, for the first time, efficient private algorithms for the problem with constant multiplicative error.

Multi-modal space structure: a new kind of latent correlation for multi-modal entity resolution

Multi-modal data is becoming more common than before because of big data issues. Finding the semantically equal or similar objects from different data sources(called entity resolution) is one of the heart problem of multi-modal task. Current models for solving this problem usually needs much paired data to find the latent correlation between multi-modal data, which is of high cost. A new kind latent correlation is proposed in this article. With the correlation, multi-modal objects can be uniformly represented in a commonly shard space. A classifying based model is designed for multi-modal entity resolution task. With the proposed method, the demand of training data can be decreased much.

A Channel-based Exact Inference Algorithm for Bayesian Networks

This paper describes a new algorithm for exact Bayesian inference that is based on a recently proposed compositional semantics of Bayesian networks in terms of channels. The paper concentrates on the ideas behind this algorithm, involving a linearisation (`stretching’) of the Bayesian network, followed by a combination of forward state transformation and backward predicate transformation, while evidence is accumulated along the way. The performance of a prototype implementation of the algorithm in Python is briefly compared to a standard implementation (pgmpy): first results show competitive performance.

Learning from the experts: From expert systems to machine learned diagnosis models

Expert diagnostic support systems have been extensively studied. The practical application of these systems in real-world scenarios have been somewhat limited due to well-understood shortcomings such as extensibility. More recently, machine learned models for medical diagnosis have gained momentum since they can learn and generalize patterns found in very large datasets like electronic health records. These models also have shortcomings. In particular, there is no easy way to incorporate prior knowledge from existing literature or experts. In this paper, we present a method to merge both approaches by using expert systems as generative models that create simulated data on which models can be learned. We demonstrate that such a learned model not only preserve the original properties of the expert systems but also addresses some of their limitations. Furthermore, we show how this approach can also be used as the starting point to combine expert knowledge with knowledge extracted from other data sources such as electronic health records.

Bridgeout: stochastic bridge regularization for deep neural networks

A major challenge in training deep neural networks is overfitting, i.e. inferior performance on unseen test examples compared to performance on training examples. To reduce overfitting, stochastic regularization methods have shown superior performance compared to deterministic weight penalties on a number of image recognition tasks. Stochastic methods such as Dropout and Shakeout, in expectation, are equivalent to imposing a ridge and elastic-net penalty on the model parameters, respectively. However, the choice of the norm of weight penalty is problem dependent and is not restricted to \{L_1,L_2\}. Therefore, in this paper we propose the Bridgeout stochastic regularization technique and prove that it is equivalent to an L_q penalty on the weights, where the norm q can be learned as a hyperparameter from data. Experimental results show that Bridgeout results in sparse model weights, improved gradients and superior classification performance compared to Dropout and Shakeout on synthetic and real datasets.

Neural Sentence Location Prediction for Summarization

A competitive baseline in sentence-level extractive summarization of news articles is the Lead-3 heuristic, where only the first 3 sentences are extracted. The success of this method is due to the tendency for writers to implement progressive elaboration in their work by writing the most important content at the beginning. In this paper, we introduce the Lead-like Recognizer (LeadR) to show how the Lead heuristic can be extended to summarize multi-section documents where it would not usually work well. This is done by introducing a neural model which produces a probability distribution over positions for sentences, so that we can locate sentences with introduction-like qualities. To evaluate the performance of our model, we use the task of summarizing multi-section documents. LeadR outperforms several baselines on this task, including a simple extension of the Lead heuristic designed for the task. Our work suggests that predicted position is a strong feature to use when extracting summaries.

Unsupervised Discrete Sentence Representation Learning for Interpretable Neural Dialog Generation

The encoder-decoder dialog model is one of the most prominent methods used to build dialog systems in complex domains. Yet it is limited because it cannot output interpretable actions as in traditional systems, which hinders humans from understanding its generation process. We present an unsupervised discrete sentence representation learning method that can integrate with any existing encoder-decoder dialog models for interpretable response generation. Building upon variational autoencoders (VAEs), we present two novel models, DI-VAE and DI-VST that improve VAEs and can discover interpretable semantics via either auto encoding or context predicting. Our methods have been validated on real-world dialog datasets to discover semantic representations and enhance encoder-decoder models with interpretable generation.

Decoupled Networks

Inner product-based convolution has been a central component of convolutional neural networks (CNNs) and the key to learning visual representations. Inspired by the observation that CNN-learned features are naturally decoupled with the norm of features corresponding to the intra-class variation and the angle corresponding to the semantic difference, we propose a generic decoupled learning framework which models the intra-class variation and semantic difference independently. Specifically, we first reparametrize the inner product to a decoupled form and then generalize it to the decoupled convolution operator which serves as the building block of our decoupled networks. We present several effective instances of the decoupled convolution operator. Each decoupled operator is well motivated and has an intuitive geometric interpretation. Based on these decoupled operators, we further propose to directly learn the operator from data. Extensive experiments show that such decoupled reparameterization renders significant performance gain with easier convergence and stronger robustness.

On the stab number of rectangle intersection graphs
From Weakly Chaotic Dynamics to Deterministic Subdiffusion via Copula Modeling
Mapping Images to Psychological Similarity Spaces Using Neural Networks
A Self-paced Regularization Framework for Partial-Label Learning
Sampling the Riemann-Theta Boltzmann Machine
The Statistical Model for Ticker, an Adaptive Single-Switch Text-Entry Method for Visually Impaired Users
Generalized Linear Model for Gamma Distributed Variables via Elastic Net Regularization
Generating Descriptions from Structured Data Using a Bifocal Attention Mechanism and Gated Orthogonalization
A Mixed Hierarchical Attention based Encoder-Decoder Approach for Standard Table Summarization
Robust Probabilistic Analysis of Transmission Power Systems based on Equivalent Circuit Formulation
Stochastic subgradient method converges on tame functions
Enumeration in Incremental FPT-Time
Inseparability and Conservative Extensions of Description Logic Ontologies: A Survey
Genus From Sandpile Torsor Algorithm
Spectral gap in random bipartite biregular graphs and its applications
Metrics that respect the support
Broadcast Domination of Triangular Matchstick Graphs and the Triangular Lattice
A Deep Representation Empowered Distant Supervision Paradigm for Clinical Information Extraction
Decidability of Timed Communicating Automata
Identification of Induction Motors with Smart Circuit Breakers
An Aggregated Multicolumn Dilated Convolution Network for Perspective-Free Counting
Autotune: A Derivative-free Optimization Framework for Hyperparameter Tuning
Spectrally Efficient OFDM System Design under Disguised Jamming
Efficient Contextualized Representation: Language Model Pruning for Sequence Labeling
A Multi-Axis Annotation Scheme for Event Temporal Relations
A New Formulation of The Shortest Path Problem with On-Time Arrival Reliability
On mean-field \(GI/GI/1\) queueing model: existence, uniqueness, convergence
A Metropolis-Hastings algorithm for posterior measures with self-decomposable priors
HandyNet: A One-stop Solution to Detect, Segment, Localize & Analyze Driver Hands
ConnNet: A Long-Range Relation-Aware Pixel-Connectivity Network for Salient Segmentation
Online Improper Learning with an Approximation Oracle
Large Scale Automated Reading of Frontal and Lateral Chest X-Rays using Dual Convolutional Neural Networks
Sherali-Adams Integrality Gaps Matching the Log-Density Threshold
Modulus of continuity for polymer fluctuations and weight profiles in Poissonian last passage percolation
Current large deviations for partially asymmetric particle systems on a ring
Joint entity recognition and relation extraction as a multi-head selection problem
Inter-Annotator Agreement Networks
DeepRec: A deep encoder-decoder network for directly solving the PET reconstruction inverse problem
Massive quality factors of disorder-induced cavity modes in photonic crystal waveguides through long-range correlations
Subgoal Discovery for Hierarchical Dialogue Policy Learning
A 0.086-mm$^2$ 9.8-pJ/SOP 64k-Synapse 256-Neuron Online-Learning Digital Spiking Neuromorphic Processor in 28nm CMOS
Comment on ‘Sum of squares of uniform random variables’ by I. Weissman
Propensity Score Methods for Merging Observational and Experimental Datasets
On the ground state of spiking network activity in mammalian cortex
Designing Practical PTASes for Minimum Feedback Vertex Set in Planar Graphs
Gradient Masking Causes CLEVER to Overestimate Adversarial Perturbation Size
Estimating 3D Human Pose on a Configurable Bed from a Single Pressure Image
Multi-lingual Common Semantic Space Construction via Cluster-consistent Word Embedding
Stability analysis of event-triggered anytime control with multiple control laws
Massively Parallel Cross-Lingual Learning in Low-Resource Target Language Translation
Line arrangements and r-Stirling partitions
Event Extraction with Generative Adversarial Imitation Learning
Dynamic Ensemble Selection VS K-NN: why and when Dynamic Selection obtains higher classification performance?
Neural-inspired sensors enable sparse, efficient classification of spatiotemporal data
Social Bots for Online Public Health Interventions
A Cell-Division Search Technique for Inversion with Application to Picture-Discovery and Magnetotellurics
Stochastic Answer Networks for Natural Language Inference
Entity-aware Image Caption Generation
A Nutritional Label for Rankings
A Deep Learning Approach for Air Pollution Forecasting in South Korea Using Encoder-Decoder Networks & LSTM
Taylor’s law for Human Linguistic Sequences
Periodic solution of stochastic process in the distributional sense
Random weighted averages, partition structures and generalized arcsine laws
Unsupervised Natural Language Generation with Denoising Autoencoders
Chain, Generalization of Covering Code, and Deterministic Algorithm for k-SAT
Learning to Refine Human Pose Estimation
Multi-task Learning for Universal Sentence Representations: What Syntactic and Semantic Information is Captured?
Optimization of a plate with holes
A Stable and Effective Learning Strategy for Trainable Greedy Decoding
Genealogical distance under selection
Decoupling Structure and Lexicon for Zero-Shot Semantic Parsing
Coloring of cozero-divisor graphs of commutative von Neumann regular rings
Resolving the Lord’s Paradox
Multi-view registration of unordered range scans by fast correspondence propagation of multi-scale descriptors
DuoRC: Towards Complex Language Understanding with Paraphrased Reading Comprehension
Entire Space Multi-Task Model: An Effective Approach for Estimating Post-Click Conversion Rate
Best subset selection in linear regression via bi-objective mixed integer linear programming
On Associative Confounder Bias
Variational Inference In Pachinko Allocation Machines
Extrofitting: Enriching Word Representation and its Vector Space with Semantic Lexicons
Formal Verification of Platoon Control Strategies
Automated essay scoring with string kernels and word embeddings
Faster Shift-Reduce Constituent Parsing with a Non-Binary, Bottom-Up Strategy
Eval all, trust a few, do wrong to none: Comparing sentence generation models
Efficient Beam Training and Channel Estimation for Millimeter Wave Communications Under Mobility
Finer Tight Bounds for Coloring on Clique-Width
Neural Davidsonian Semantic Proto-role Labeling
Conditional heteroskedasticity in crypto-asset returns
Parallel Implementations of Cellular Automata for Traffic Models
Context-Attentive Embeddings for Improved Sentence Representations
Capacity of Multiple One-Bit Transceivers in a Rayleigh Environment
Macdonald denominators for affine root systems, orthogonal theta functions, and elliptic determinantal point processes
Global Convergence Analysis of the Flower Pollination Algorithm: A Discrete-Time Markov Chain Approach
Stability of the Stochastic Gradient Method for an Approximated Large Scale Kernel Machine
Learning in Games with Cumulative Prospect Theoretic Preferences
Sufficient conditions for the global rigidity of periodic graphs
Integrating Stance Detection and Fact Checking in a Unified Corpus
A 2/3-Approximation Algorithm for Vertex-weighted Matching in Bipartite Graphs
Tracing Equilibrium in Dynamic Markets via Distributed Adaptation
ShapeStacks: Learning Vision-Based Physical Intuition for Generalised Object Stacking
Synthesized Texture Quality Assessment via Multi-scale Spatial and Statistical Texture Attributes of Image and Gradient Magnitude Coefficients
Modeling and Experimental Verification of Adaptive 100% Stator Ground Fault Protection Schemes for Synchronous Generators
Angiodysplasia Detection and Localization Using Deep Convolutional Neural Networks
Ramanujan Graphs and Digraphs
New counts for the number of triangulations of cyclic polytopes
Cross-lingual Semantic Parsing
Learning Myelin Content in Multiple Sclerosis from Multimodal MRI through Adversarial Training
Predicting User Performance and Bitcoin Price Using Block Chain Transaction Network
First Impressions: A Survey on Computer Vision-Based Apparent Personality Trait Analysis
Semi-supervised User Geolocation via Graph Convolutional Networks
Multi-Head Decoder for End-to-End Speech Recognition
HeteroMed: Heterogeneous Information Network for Medical Diagnosis
Nonparametric Bayesian Instrumental Variable Analysis: Evaluating Heterogeneous Effects of Arterial Access Sites for Opening Blocked Blood Vessels
Query Focused Variable Centroid Vectors for Passage Re-ranking in Semantic Search
Adversarial Training for Community Question Answer Selection Based on Multi-scale Matching
Attenuate Locally, Win Globally: An Attenuation-based Framework for Online Stochastic Matching with Timeouts
A Scalable Neural Shortlisting-Reranking Approach for Large-Scale Domain Classification in Natural Language Understanding
Efficient Large-Scale Domain Classification with Personalized Attention
MQGrad: Reinforcement Learning of Gradient Quantization in Parameter Server
On a positivity preserving numerical scheme for jump-extended CIR process: the alpha-stable case
Spin torque oscillator for microwave assisted magnetization reversal
Inducing and Embedding Senses with Scaled Gumbel Softmax
A Spherical Probability Distribution Model of the User-Induced Mobile Phone Orientation
Anchor-based Nearest Class Mean Loss for Convolutional Neural Networks
Tunable glassiness on a two-dimensional atomic spin array
IIIDYT at SemEval-2018 Task 3: Irony detection in English tweets
Swarm robotics in wireless distributed protocol design for coordinating robots involved in cooperative tasks
A Primal-Dual Online Deterministic Algorithm for Matching with Delays
Rician $K$-Factor-Based Analysis of XLOS Service Probability in 5G Outdoor Ultra-Dense Networks
On the Mean Residence Time in Stochastic Lattice-Gas Models
Sampling in Uniqueness from the Potts and Random-Cluster Models on Random Regular Graphs
A constrained risk inequality for general losses
Performance Impact Caused by Hidden Bias of Training Data for Recognizing Textual Entailment
Matching Fingerphotos to Slap Fingerprint Images

Distilled News

A Comprehensive Guide to Understand and Implement Text Classification in Python

One of the widely used natural language processing task in different business problems is “Text Classification”. The goal of text classification is to automatically classify the text documents into one or more defined categories. Some examples of text classification are:
• Understanding audience sentiment from social media,
• Detection of spam and non-spam emails,
• Auto tagging of customer queries, and
• Categorization of news articles into defined topics.

Humans-in-the-Loop? Which Humans? Which Loop?

For people working in Artificial Intelligence, the term “Human-in-the-Loop” is familiar i.e. a human in the process to validate and improve the AI. There are many situations where it applies, as many as there are AI applications. However. there are still some distinct different ways it can be deployed even within the same application.

Machine Learning is Fun Part 6: How to do Speech Recognition with Deep Learning

… But speech recognition has been around for decades, so why is it just now hitting the mainstream? The reason is that deep learning finally made speech recognition accurate enough to be useful outside of carefully controlled environments.

Text Classification using machine learning

The goal is to improve the category classification performance for a set of text posts. The evaluation metric is the macro F1 score.

Why Deep Learning is perfect for NLP (Natural Language Processing)

Deep learning brings multiple benefits in learning multiple levels of representation of natural language. Here we will cover the motivation of using deep learning and distributed representation for NLP, word embeddings and several methods to perform word embeddings, and applications.

NLP – Building a Question Answering model

I recently completed a course on NLP through Deep Learning (CS224N) at Stanford and loved the experience. Learnt a whole bunch of new things. For my final project I worked on a question answering model built on Stanford Question Answering Dataset (SQuAD). In this blog, I want to cover the main building blocks of a question answering model.

Packaging Shiny applications: A deep dive

This post is long overdue. The information contained herein has been built up over years of deploying and hosting Shiny apps, particularly in production environments, and mainly where those Shiny apps are very large and contain a lot of code.


Weird but (sometimes) useful charts