DeepTracker: Visualizing the Training Process of Convolutional Neural Networks

Deep convolutional neural networks (CNNs) have achieved remarkable success in various fields. However, training an excellent CNN is practically a trial-and-error process that consumes a tremendous amount of time and computer resources. To accelerate the training process and reduce the number of trials, experts need to understand what has occurred in the training process and why the resulting CNN behaves as such. However, current popular training platforms, such as TensorFlow, only provide very little and general information, such as training/validation errors, which is far from enough to serve this purpose. To bridge this gap and help domain experts with their training tasks in a practical environment, we propose a visual analytics system, DeepTracker, to facilitate the exploration of the rich dynamics of CNN training processes and to identify the unusual patterns that are hidden behind the huge amount of training log. Specifically,we combine a hierarchical index mechanism and a set of hierarchical small multiples to help experts explore the entire training log from different levels of detail. We also introduce a novel cube-style visualization to reveal the complex correlations among multiple types of heterogeneous training data including neuron weights, validation images, and training iterations. Three case studies are conducted to demonstrate how DeepTracker provides its users with valuable knowledge in an industry-level CNN training process, namely in our case, training ResNet-50 on the ImageNet dataset. We show that our method can be easily applied to other state-of-the-art ‘very deep’ CNN models.


Spectral-Pruning: Compressing deep neural network via spectral analysis

The model size of deep neural network is getting larger and larger to realize superior performance in complicated tasks. This makes it difficult to implement deep neural network in small edge-computing devices. To overcome this problem, model compression methods have been gathering much attention. However, there have been only few theoretical back-grounds that explain what kind of quantity determines the compression ability. To resolve this issue, we develop a new theoretical frame-work for model compression, and propose a new method called {\it Spectral-Pruning} based on the theory. Our theoretical analysis is based on the observation such that the eigenvalues of the covariance matrix of the output from nodes in the internal layers often shows rapid decay. We define ‘degree of freedom’ to quantify an intrinsic dimensionality of the model by using the eigenvalue distribution and show that the compression ability is essentially controlled by this quantity. Along with this, we give a generalization error bound of the compressed model. Our proposed method is applicable to wide range of models, unlike the existing methods, e.g., ones possess complicated branches as implemented in SegNet and ResNet. Our method makes use of both ‘input’ and ‘output’ in each layer and is easy to implement. We apply our method to several datasets to justify our theoretical analyses and show that the proposed method achieves the state-of-the-art performance.


Semantic-Unit-Based Dilated Convolution for Multi-Label Text Classification

We propose a novel model for multi-label text classification, which is based on sequence-to-sequence learning. The model generates higher-level semantic unit representations with multi-level dilated convolution as well as a corresponding hybrid attention mechanism that extracts both the information at the word-level and the level of the semantic unit. Our designed dilated convolution effectively reduces dimension and supports an exponential expansion of receptive fields without loss of local information, and the attention-over-attention mechanism is able to capture more summary relevant information from the source context. Results of our experiments show that the proposed model has significant advantages over the baseline models on the dataset RCV1-V2 and Ren-CECps, and our analysis demonstrates that our model is competitive to the deterministic hierarchical models and it is more robust to classifying low-frequency labels.


Semi-Autoregressive Neural Machine Translation

Existing approaches to neural machine translation are typically autoregressive models. While these models attain state-of-the-art translation quality, they are suffering from low parallelizability and thus slow at decoding long sequences. In this paper, we propose a novel model for fast sequence generation — the semi-autoregressive Transformer (SAT). The SAT keeps the autoregressive property in global but relieves in local and thus are able to produce multiple successive words in parallel at each time step. Experiments conducted on English-German and Chinese-English translation tasks show that the SAT achieves a good balance between translation quality and decoding speed. On WMT’14 English-German translation, the SAT achieves 5.58\times speedup while maintaining 88\% translation quality, significantly better than the previous non-autoregressive methods. When produces two words at each time step, the SAT is almost lossless (only 1\% degeneration in BLEU score).


Evolutionary dynamics of cryptocurrency transaction networks: An empirical study

Cryptocurrency is a well-developed blockchain technology application that is currently a heated topic throughout the world. The public availability of transaction histories offers an opportunity to analyze and compare different cryptocurrencies. In this paper, we present a dynamic network analysis of three representative blockchain-based cryptocurrencies: Bitcoin, Ethereum, and Namecoin. By analyzing the accumulated network growth, we find that, unlike most other networks, these cryptocurrency networks do not always densify over time, and they are changing all the time with relatively low node and edge repetition ratios. Therefore, we then construct separate networks on a monthly basis, trace the changes of typical network characteristics (including degree distribution, degree assortativity, clustering coefficient, and the largest connected component) over time, and compare the three. We find that the degree distribution of these monthly transaction networks cannot be well fitted by the famous power-law distribution, at the same time, different currency still has different network properties, e.g., both Bitcoin and Ethereum networks are heavy-tailed with disassortative mixing, however, only the former can be treated as a small world. These network properties reflect the evolutionary characteristics and competitive power of these three cryptocurrencies and provide a foundation for future research.


Deep Learning: Computational Aspects

In this article we review computational aspects of Deep Learning (DL). Deep learning uses network architectures consisting of hierarchical layers of latent variables to construct predictors for high-dimensional input-output models. Training a deep learning architecture is computationally intensive, and efficient linear algebra libraries is the key for training and inference. Stochastic gradient descent (SGD) optimization and batch sampling are used to learn from massive data sets.


Detecting Outliers in Data with Correlated Measures

Advances in sensor technology have enabled the collection of large-scale datasets. Such datasets can be extremely noisy and often contain a significant amount of outliers that result from sensor malfunction or human operation faults. In order to utilize such data for real-world applications, it is critical to detect outliers so that models built from these datasets will not be skewed by outliers. In this paper, we propose a new outlier detection method that utilizes the correlations in the data (e.g., taxi trip distance vs. trip time). Different from existing outlier detection methods, we build a robust regression model that explicitly models the outliers and detects outliers simultaneously with the model fitting. We validate our approach on real-world datasets against methods specifically designed for each dataset as well as the state of the art outlier detectors. Our outlier detection method achieves better performances, demonstrating the robustness and generality of our method. Last, we report interesting case studies on some outliers that result from atypical events.


Predicting Semantic Relations using Global Graph Properties

Semantic graphs, such as WordNet, are resources which curate natural language on two distinguishable layers. On the local level, individual relations between synsets (semantic building blocks) such as hypernymy and meronymy enhance our understanding of the words used to express their meanings. Globally, analysis of graph-theoretic properties of the entire net sheds light on the structure of human language as a whole. In this paper, we combine global and local properties of semantic graphs through the framework of Max-Margin Markov Graph Models (M3GM), a novel extension of Exponential Random Graph Model (ERGM) that scales to large multi-relational graphs. We demonstrate how such global modeling improves performance on the local task of predicting semantic relations between synsets, yielding new state-of-the-art results on the WN18RR dataset, a challenging version of WordNet link prediction in which ‘easy’ reciprocal cases are removed. In addition, the M3GM model identifies multirelational motifs that are characteristic of well-formed lexical semantic ontologies.


Predefined Sparseness in Recurrent Sequence Models

Inducing sparseness while training neural networks has been shown to yield models with a lower memory footprint but similar effectiveness to dense models. However, sparseness is typically induced starting from a dense model, and thus this advantage does not hold during training. We propose techniques to enforce sparseness upfront in recurrent sequence models for NLP applications, to also benefit training. First, in language modeling, we show how to increase hidden state sizes in recurrent layers without increasing the number of parameters, leading to more expressive models. Second, for sequence labeling, we show that word embeddings with predefined sparseness lead to similar performance as dense embeddings, at a fraction of the number of trainable parameters.


Don’t Give Me the Details, Just the Summary! Topic-Aware Convolutional Neural Networks for Extreme Summarization

We introduce extreme summarization, a new single-document summarization task which does not favor extractive strategies and calls for an abstractive modeling approach. The idea is to create a short, one-sentence news summary answering the question ‘What is the article about?’. We collect a real-world, large-scale dataset for this task by harvesting online articles from the British Broadcasting Corporation (BBC). We propose a novel abstractive model which is conditioned on the article’s topics and based entirely on convolutional neural networks. We demonstrate experimentally that this architecture captures long-range dependencies in a document and recognizes pertinent content, outperforming an oracle extractive system and state-of-the-art abstractive approaches when evaluated automatically and by humans.


What Makes Natural Scene Memorable?

Recent studies on image memorability have shed light on the visual features that make generic images, object images or face photographs memorable. However, a clear understanding and reliable estimation of natural scene memorability remain elusive. In this paper, we provide an attempt to answer: ‘what exactly makes natural scene memorable’. Specifically, we first build LNSIM, a large-scale natural scene image memorability database (containing 2,632 images and memorability annotations). Then, we mine our database to investigate how low-, middle- and high-level handcrafted features affect the memorability of natural scene. In particular, we find that high-level feature of scene category is rather correlated with natural scene memorability. Thus, we propose a deep neural network based natural scene memorability (DeepNSM) predictor, which takes advantage of scene category. Finally, the experimental results validate the effectiveness of DeepNSM.


Adaptive Structural Learning of Deep Belief Network for Medical Examination Data and Its Knowledge Extraction by using C4.5

Deep Learning has a hierarchical network architecture to represent the complicated feature of input patterns. The adaptive structural learning method of Deep Belief Network (DBN) has been developed. The method can discover an optimal number of hidden neurons for given input data in a Restricted Boltzmann Machine (RBM) by neuron generation-annihilation algorithm, and generate a new hidden layer in DBN by the extension of the algorithm. In this paper, the proposed adaptive structural learning of DBN was applied to the comprehensive medical examination data for the cancer prediction. The prediction system shows higher classification accuracy (99.8% for training and 95.5% for test) than the traditional DBN. Moreover, the explicit knowledge with respect to the relation between input and output patterns was extracted from the trained DBN network by C4.5. Some characteristics extracted in the form of IF-THEN rules to find an initial cancer at the early stage were reported in this paper.


Dynamical systems theory for causal inference with application to synthetic control methods

To estimate treatment effects in panel data, suitable control units need to be selected to generate counterfactual outcomes. To guard against cherry-picking of potential controls, which is an important concern in practice, we leverage results from dynamical systems theory. Specifically, key results on delay embeddings in dynamical systems~\citep{Takens1981} show that under fairly general assumptions a dynamical system can be reconstructed up to a one-to-one mapping from scalar observations of the system. This suggests a quantified measure of strength of the dynamical relationship between any two time series variables. The key idea in this paper is to use this measure to ensure that selected control units are dynamically related to treated units, and thus guard against cherry-picking of controls. We illustrate our approach on the synthetic control methodology of~\citet{Abadie2003}, which generates counterfactuals using a model of treated unit outcomes fitted on outcomes from control units. In this setting, we propose to screen out control units that have a weak dynamical relationship to the single treated unit before the model is fit. In simulated studies, we show that the standard synthetic control methodology can be biased towards any desirable direction by adversarially creating artificial control units, but the bias is largely mitigated if we apply the aforementioned screening. In real-world applications, the proposed approach contributes to more reliable control selection, and thus more robust estimation of treatment effects.


A new Taxonomy of Continuous Global Optimization Algorithms

Surrogate-based optimization and nature-inspired metaheuristics have become the state-of-the-art in solving real-world optimization problems. Still, it is difficult for beginners and even experts to get an overview that explains their advantages in comparison to the large number of available methods in the scope of continuous optimization. Available taxonomies lack the integration of surrogate-based approaches and thus their embedding in the larger context of this broad field. This article presents a taxonomy of the field, which further matches the idea of nature-inspired algorithms, as it is based on the human behavior in path finding. Intuitive analogies make it easy to conceive the most basic principles of the search algorithms, even for beginners and non-experts in this area of research. However, this scheme does not oversimplify the high complexity of the different algorithms, as the class identifier only defines a descriptive meta-level of the algorithm search strategies. The taxonomy was established by exploring and matching algorithm schemes, extracting similarities and differences, and creating a set of classification indicators to distinguish between five distinct classes. In practice, this taxonomy allows recommendations for the applicability of the corresponding algorithms and helps developers trying to create or improve their own algorithms.


A Study of Reinforcement Learning for Neural Machine Translation

Recent studies have shown that reinforcement learning (RL) is an effective approach for improving the performance of neural machine translation (NMT) system. However, due to its instability, successfully RL training is challenging, especially in real-world systems where deep models and large datasets are leveraged. In this paper, taking several large-scale translation tasks as testbeds, we conduct a systematic study on how to train better NMT models using reinforcement learning. We provide a comprehensive comparison of several important factors (e.g., baseline reward, reward shaping) in RL training. Furthermore, to fill in the gap that it remains unclear whether RL is still beneficial when monolingual data is used, we propose a new method to leverage RL to further boost the performance of NMT systems trained with source/target monolingual data. By integrating all our findings, we obtain competitive results on WMT14 English- German, WMT17 English-Chinese, and WMT17 Chinese-English translation tasks, especially setting a state-of-the-art performance on WMT17 Chinese-English translation task.


Piecewise Linear Approximation in Data Streaming: Algorithmic Implementations and Experimental Analysis

Piecewise Linear Approximation (PLA) is a well-established tool to reduce the size of the representation of time series by approximating the series by a sequence of line segments while keeping the error introduced by the approximation within some predetermined threshold. With the recent rise of edge computing, PLA algorithms find a complete new set of applications with the emphasis on reducing the volume of streamed data. In this study, we identify two scenarios set in a data-stream processing context: data reduction in sensor transmissions and datacenter storage. In connection to those scenarios, we identify several streaming metrics and propose streaming protocols as algorithmic implementations of the state of the art PLA techniques. In an experimental evaluation, we measure the quality of the reviewed meth- ods and protocols and evaluate their performance against those streaming statistics. All known methods have defi- ciencies when it comes to handling streaming-like data, e.g. inflation of the input stream, high latency or poor aver- age error. Our experimental results highlight the challenges raised when transferring those classical methods into the stream processing world and present alternative techniques to overcome them and balance the related trade-offs.


Extracting Sentiment Attitudes From Analytical Texts

In this paper we present the RuSentRel corpus including analytical texts in the sphere of international relations. For each document we annotated sentiments from the author to mentioned named entities, and sentiments of relations between mentioned entities. In the current experiments, we considered the problem of extracting sentiment relations between entities for the whole documents as a three-class machine learning task. We experimented with conventional machine-learning methods (Naive Bayes, SVM, Random Forest).


Nowcasting the Stance of Social Media Users in a Sudden Vote: The Case of the Greek Referendum
Scale Drift Correction of Camera Geo-Localization using Geo-Tagged Images
Rain Streak Removal for Single Image via Kernel Guided CNN
Whitney’s Theorem, Triangular Sets and Probabilistic Descent on Manifolds
Doubly Robust Sure Screening for Elliptical Copula Regression Model
Hamilton cycles in vertex-transitive graphs of order a product of two primes
On the joint distribution of the marginals of multipartite random quantum states
Convolutional Neural Networks for Aerial Vehicle Detection and Recognition
Analyzing Learned Representations of a Deep ASR Performance Prediction Model
Malliavin regularity and weak approximation of semilinear SPDE with Lévy noise
Title-Guided Encoding for Keyphrase Generation
Automatic 3D bi-ventricular segmentation of cardiac images by a shape-constrained multi-task deep learning approach
Vector Approximate Message Passing Algorithm for Structured Perturbed Sensing Matrix
Hypercoercivity of Piecewise Deterministic Markov Process-Monte Carlo
Asymptotically good edge correspondence colouring
CGIntrinsics: Better Intrinsic Image Decomposition through Physically-Based Rendering
Label and Sample: Efficient Training of Vehicle Object Detector from Sparsely Labeled Data
A Perspective on Unique Information: Directionality, Intuitions, and Secret Key Agreement
Adversarially Regularising Neural NLI Models to Integrate Logical Background Knowledge
Single Image Dehazing Based on Generic Regularity
Ensemble Learning Applied to Classify GPS Trajectories of Birds into Male or Female
Online Human Activity Recognition using Low-Power Wearable Devices
Autonomous Driving without a Burden: View from Outside with Elevated LiDAR
Discriminative but Not Discriminatory: A Comparison of Fairness Definitions under Different Worldviews
Semi-Supervised Event Extraction with Paraphrase Clusters
Bayesian inference for a single factor copula stochastic volatility model using Hamiltonian Monte Carlo
Identifying Domain Adjacent Instances for Semantic Parsers
Multi-Level Network Embedding with Boosted Low-Rank Matrix Approximation
Secrecy Performance Analysis of UAV Transmissions Subject to Eavesdropping and Jamming
Rule Module Inheritance with Modification Restrictions
Strong and Weak Optimizations in Classical and Quantum Models of Stochastic Processes
Scientific Relation Extraction with Selectively Incorporated Concept Embeddings
The Disparate Effects of Strategic Classification
Localized solar power prediction based on weather data from local history and global forecasts
Fast Super-resolution 3D SAR Imaging Using an Unfolded Deep Network
Novel Time Asynchronous NOMA schemes for Downlink Transmissions
Fast and Accurate Recognition of Chinese Clinical Named Entities with Residual Dilated Convolutions
Approach for Video Classification with Multi-label on YouTube-8M Dataset
IIIDYT at IEST 2018: Implicit Emotion Classification With Deep Contextualized Word Representations
Exploring the Applications of Faster R-CNN and Single-Shot Multi-box Detection in a Smart Nursery Domain
Tableau Correspondences and Representation Theory
Regression Adjustments for Estimating the Global Treatment Effect in Experiments with Interference
HMS-Net: Hierarchical Multi-scale Sparsity-invariant Network for Sparse Depth Completion
Empirical Analysis of Common Subgraph Isomorphism Approaches to the Lost-in-Space Star Identification Problem
Deeply Supervised Depth Map Super-Resolution as Novel View Synthesis
Stereo Computation for a Single Mixture Image
Explicit 3-colorings for exponential graphs
Generalized Capsule Networks with Trainable Routing Procedure
Augmenting Bottleneck Features of Deep Neural Network Employing Motor State for Speech Recognition at Humanoid Robots
Generating Text through Adversarial Training using Skip-Thought Vectors
Is the Sibuya distribution a progeny?
Bisplit graphs satisfy the Chen-Chvátal conjecture
Harnack Inequality and Applications for SDEs Driven by $G$-Brownian motion
Wide Activation for Efficient and Accurate Image Super-Resolution
On determination of Zero-sum $\ell$-generalized Schur Numbers for some linear equations
simNet: Stepwise Image-Topic Merging Network for Generating Detailed and Comprehensive Image Captions
Stars of Empty Simplices
Human migration patterns in large scale spatial with the resume data
Comparing Attention-based Convolutional and Recurrent Neural Networks: Success and Limitations in Machine Reading Comprehension
Automorphisms of Kronrod-Reeb graphs of Morse functions on compact surfaces
Hadamard full propelinear codes with associated group $C_{2t}\times C_2$; rank and kernel
Generalisation in humans and deep neural networks
Learning from Positive and Unlabeled Data under the Selected At Random Assumption
Natural Language Inference with Hierarchical BiLSTM Max Pooling Architecture
On the convergence of optimistic policy iteration for stochastic shortest path problem
Intrinsic wavelet regression for surfaces of Hermitian positive definite matrices
Identifiability of Low-Rank Sparse Component Analysis
Learning behavioral context recognition with multi-stream temporal convolutional networks
Solving Partition Problems Almost Always Requires Pushing Many Vertices Around
Learning Multilingual Word Embeddings in a Latent Metric Space: A Geometric Approach
Field Formulation of Parzen Data Analysis
Deep Stochastic Attraction and Repulsion Embedding for Image Based Localization
Improving Cross-Lingual Word Embeddings by Meeting in the Middle
Amobee at IEST 2018: Transfer Learning from Language Models
Sparsity in Deep Neural Networks – An Empirical Investigation with TensorQuant
Transparent Tx and Rx Waveform Processing for 5G New Radio Mobile Communications
A Directed Information Learning Framework for Event-Driven M2M Traffic Prediction
SPULTRA: Low-Dose CT Image Reconstruction with Joint Statistical and Learned Image Models
Empirical likelihood for linear models with spatial errors
An Auto-Encoder Matching Model for Learning Utterance-Level Semantic Dependency in Dialogue Generation
Theoretical Foundations of the A2RD Project: Part I
The Martin Gardner Polytopes
Beyond expectation: Deep joint mean and quantile regression for spatio-temporal problems
Discriminative Representation Combinations for Accurate Face Spoofing Detection
Attentive Sequence to Sequence Translation for Localizing Clips of Interest by Natural Language Descriptions
Exponential inequalities for nonstationary Markov Chains
The Complexity of Student-Project-Resource Matching-Allocation Problems
A Monotone Preservation Result for Boolean Queries Expressed as a Containment of Conjunctive Queries
Gradient-based Training of Slow Feature Analysis by Differentiable Approximate Whitening
Real-Time MDNet
A strong baseline for question relevancy ranking
Analysis of temporal properties of wind extremes
WiSeBE: Window-based Sentence Boundary Evaluation
Multi-operator spectrum sharing using matching game in small cells network
Binary additive MRD codes with minimum distance n-1 must contain a semifield spread set
Central limit theorems for non-symmetric random walks on nilpotent covering graphs: Part II
Summarizing Opinions: Aspect Extraction Meets Sentiment Prediction and They Are Both Weakly Supervised
Accelerating Asynchronous Stochastic Gradient Descent for Neural Machine Translation
Facial Information Recovery from Heavily Damaged Images using Generative Adversarial Network- PART 1
An exactly solvable record model for rainfall
BézierGAN: Automatic Generation of Smooth Curves from Interpretable Low-Dimensional Parameters
A note on palindromic length of Sturmian sequences
Improved Breast Mass Segmentation in Mammograms with Conditional Residual U-net
Realizing quantum linear regression with auxiliary qumodes
Which Emoji Talks Best for My Picture?
Random generation under the Ewens distribution
Efficient Data Ingestion and Query Processing for LSM-Based Storage Systems
Phase transition for the interchange and quantum Heisenberg models on the Hamming graph
Fair redistricting is hard
Statistics on Multisets
Communication-Rounds Tradeoffs for Common Randomness and Secret Key Generation
Efficient size estimation and impossibility of termination in uniform dense population protocols
Deep Learning for Stress Field Prediction Using Convolutional Neural Networks
Turning Cliques into Paths to Achieve Planarity
Opportunistic Treating Interference as Noise
Smoothed Dilated Convolutions for Improved Dense Prediction
Unsupervised Multilingual Word Embeddings
Locality of the critical probability for transitive graphs of exponential growth
Improving Information Extraction from Images with Learned Semantic Models
Why Self-Attention? A Targeted Evaluation of Neural Machine Translation Architectures
Dissecting Contextual Word Embeddings: Architecture and Representation