Sparkle: Optimizing Spark for Large Memory Machines and Analytics

Spark is an in-memory analytics platform that targets commodity server environments today. It relies on the Hadoop Distributed File System (HDFS) to persist intermediate checkpoint states and final processing results. In Spark, immutable data are used for storing data updates in each iteration, making it inefficient for long running, iterative workloads. A non-deterministic garbage collector further worsens this problem. Sparkle is a library that optimizes memory usage in Spark. It exploits large shared memory to achieve better data shuffling and intermediate storage. Sparkle replaces the current TCP/IP-based shuffle with a shared memory approach and proposes an off-heap memory store for efficient updates. We performed a series of experiments on scale-out clusters and scale-up machines. The optimized shuffle engine leveraging shared memory provides 1.3x to 6x faster performance relative to Vanilla Spark. The off-heap memory store along with the shared-memory shuffle engine provides more than 20x performance increase on a probabilistic graph processing workload that uses a large-scale real-world hyperlink graph. While Sparkle benefits at most from running on large memory machines, it also achieves 1.6x to 5x performance improvements over scale out cluster with equivalent hardware setting.

Steiner Distance in Graphs–A Survey

For a connected graph G of order at least 2 and S\subseteq V(G), the \emph{Steiner distance} d_G(S) among the vertices of S is the minimum size among all connected subgraphs whose vertex sets contain S. In this paper, we summarize the known results on the Steiner distance parameters, including Steiner distance, Steiner diameter, Steiner center, Steiner median, Steiner interval, Steiner distance hereditary graph, Steiner distance stable graph, average Steiner distance, and Steiner Wiener index. It also contains some conjectures and open problems for further studies.

Semi-supervised Conditional GANs

We introduce a new model for building conditional generative models in a semi-supervised setting to conditionally generate data given attributes by adapting the GAN framework. The proposed semi-supervised GAN (SS-GAN) model uses a pair of stacked discriminators to learn the marginal distribution of the data, and the conditional distribution of the attributes given the data respectively. In the semi-supervised setting, the marginal distribution (which is often harder to learn) is learned from the labeled + unlabeled data, and the conditional distribution is learned purely from the labeled data. Our experimental results demonstrate that this model performs significantly better compared to existing semi-supervised conditional GAN models.

A Brief Survey of Deep Reinforcement Learning

Deep reinforcement learning is poised to revolutionise the field of AI and represents a step towards building autonomous systems with a higher level understanding of the visual world. Currently, deep learning is enabling reinforcement learning to scale to problems that were previously intractable, such as learning to play video games directly from pixels. Deep reinforcement learning algorithms are also applied to robotics, allowing control policies for robots to be learned directly from camera inputs in the real world. In this survey, we begin with an introduction to the general field of reinforcement learning, then progress to the main streams of value-based and policy-based methods. Our survey will cover central algorithms in deep reinforcement learning, including the deep Q-network, trust region policy optimisation, and asynchronous advantage actor-critic. In parallel, we highlight the unique advantages of deep neural networks, focusing on visual understanding via reinforcement learning. To conclude, we describe several current areas of research within the field.

Agent-based computing from multi-agent systems to agent-based Models: a visual survey

Agent-Based Computing is a diverse research domain concerned with the building of intelligent software based on the concept of ‘agents’. In this paper, we use Scientometric analysis to analyze all sub-domains of agent-based computing. Our data consists of 1,064 journal articles indexed in the ISI web of knowledge published during a twenty year period: 1990-2010. These were retrieved using a topic search with various keywords commonly used in sub-domains of agent-based computing. In our proposed approach, we have employed a combination of two applications for analysis, namely Network Workbench and CiteSpace – wherein Network Workbench allowed for the analysis of complex network aspects of the domain, detailed visualization-based analysis of the bibliographic data was performed using CiteSpace. Our results include the identification of the largest cluster based on keywords, the timeline of publication of index terms, the core journals and key subject categories. We also identify the core authors, top countries of origin of the manuscripts along with core research institutes. Finally, our results have interestingly revealed the strong presence of agent-based computing in a number of non-computing related scientific domains including Life Sciences, Ecological Sciences and Social Sciences.

Transitory Queueing Networks

Queueing networks are notoriously difficult to analyze sans both Markovian and stationarity assumptions. Much of the theoretical contribution towards performance analysis of time-inhomogeneous single class queueing networks has focused on Markovian networks, with the recent exception of work in Liu and Whitt (2011) and Mandelbaum and Ramanan (2010). In this paper, we introduce transitory queueing networks as a model of inhomogeneous queueing networks, where a large, but finite, number of jobs arrive at queues in the network over a fixed time horizon. The queues offer FIFO service, and we assume that the service rate can be time-varying. The non-Markovian dynamics of this model complicate the analysis of network performance metrics, necessitating approximations. In this paper we develop fluid and diffusion approximations to the number-in-system performance metric by scaling up the number of external arrivals to each queue, following Honnappa et al. (2014). We also discuss the implications for bottleneck detection in tandem queueing networks.

Incremental Import Vector Machines for Classifying Hyperspectral Data

In this paper we propose an incremental learning strategy for import vector machines (IVM), which is a sparse kernel logistic regression approach. We use the procedure for the concept of self-training for sequential classification of hyperspectral data. The strategy comprises the inclusion of new training samples to increase the classification accuracy and the deletion of non-informative samples to be memory- and runtime-efficient. Moreover, we update the parameters in the incremental IVM model without re-training from scratch. Therefore, the incremental classifier is able to deal with large data sets. The performance of the IVM in comparison to support vector machines (SVM) is evaluated in terms of accuracy and experiments are conducted to assess the potential of the probabilistic outputs of the IVM. Experimental results demonstrate that the IVM and SVM perform similar in terms of classification accuracy. However, the number of import vectors is significantly lower when compared to the number of support vectors and thus, the computation time during classification can be decreased. Moreover, the probabilities provided by IVM are more reliable, when compared to the probabilistic information, derived from an SVM’s output. In addition, the proposed self-training strategy can increase the classification accuracy. Overall, the IVM and the its incremental version is worthwhile for the classification of hyperspectral data.

Boltzmann machines for time-series

We review Boltzmann machines extended for time-series. These models often have recurrent structure, and back propagration through time (BPTT) is used to learn their parameters. The per-step computational complexity of BPTT in online learning, however, grows linearly with respect to the length of preceding time-series (i.e., learning rule is not local in time), which limits the applicability of BPTT in online learning. We then review dynamic Boltzmann machines (DyBMs), whose learning rule is local in time. DyBM’s learning rule relates to spike-timing dependent plasticity (STDP), which has been postulated and experimentally confirmed for biological neural networks.

A Capacity Scaling Law for Artificial Neural Networks

In this article, we derive the calculation of two critical numbers that quantify the capabilities of artificial neural networks with gating functions, such as sign, sigmoid, or rectified linear units. First, we derive the calculation of the Vapnik-Chervonenkis dimension of a network with binary output layer, which is the theoretical limit for perfect fitting of the training data. Second, we derive what we call the MacKay dimension of the network. This is a theoretical limit indicating necessary catastrophic forgetting i.e., the upper limit for most uses of the network. Our derivation of the capacity is embedded into a Shannon communication model, which allows measuring the capacities of neural networks in bits. We then compare our theoretical derivations with experiments using different network configurations, diverse neural network implementations, varying activation functions, and several learning algorithms to confirm our upper bound. The result is that the capacity of a fully connected perceptron network scales strictly linear with the number of weights.

Improving Deep Learning using Generic Data Augmentation

Deep artificial neural networks require a large corpus of training data in order to effectively learn, where collection of such training data is often expensive and laborious. Data augmentation overcomes this issue by artificially inflating the training set with label preserving transformations. Recently there has been extensive use of generic data augmentation to improve Convolutional Neural Network (CNN) task performance. This study benchmarks various popular data augmentation schemes to allow researchers to make informed decisions as to which training methods are most appropriate for their data sets. Various geometric and photometric schemes are evaluated on a coarse-grained data set using a relatively simple CNN. Experimental results, run using 4-fold cross-validation and reported in terms of Top-1 and Top-5 accuracy, indicate that cropping in geometric augmentation significantly increases CNN task performance.

Neural Block Sampling

Efficient Monte Carlo inference often requires manual construction of model-specific proposals. We propose an approach to automated proposal construction by training neural networks to provide fast approximations to block Gibbs conditionals. The learned proposals generalize to occurrences of common structural motifs both within a given model and across models, allowing for the construction of a library of learned inference primitives that can accelerate inference on unseen models with no model-specific training required. We explore several applications including open-universe Gaussian mixture models, in which our learned proposals outperform a hand-tuned sampler, and a real-world named entity recognition task, in which our sampler’s ability to escape local modes yields higher final F1 scores than single-site Gibbs.

nuts-flow/ml: data pre-processing for deep learning

Data preprocessing is a fundamental part of any machine learning application and frequently the most time-consuming aspect when developing a machine learning solution. Preprocessing for deep learning is characterized by pipelines that lazily load data and perform data transformation, augmentation, batching and logging. Many of these functions are common across applications but require different arrangements for training, testing or inference. Here we introduce a novel software framework named nuts-flow/ml that encapsulates common preprocessing operations as components, which can be flexibly arranged to rapidly construct efficient preprocessing pipelines for deep learning.

Vector Space Model as Cognitive Space for Text Classification

In this era of digitization, knowing the user’s sociolect aspects have become essential features to build the user specific recommendation systems. These sociolect aspects could be found by mining the user’s language sharing in the form of text in social media and reviews. This paper describes about the experiment that was performed in PAN Author Profiling 2017 shared task. The objective of the task is to find the sociolect aspects of the users from their tweets. The sociolect aspects considered in this experiment are user’s gender and native language information. Here user’s tweets written in a different language from their native language are represented as Document – Term Matrix with document frequency as the constraint. Further classification is done using the Support Vector Machine by taking gender and native language as target classes. This experiment attains the average accuracy of 73.42% in gender prediction and 76.26% in the native language identification task.

The Microsoft 2017 Conversational Speech Recognition System

We describe the 2017 version of Microsoft’s conversational speech recognition system, in which we update our 2016 system with recent developments in neural-network-based acoustic and language modeling to further advance the state of the art on the Switchboard speech recognition task. The system adds a CNN-BLSTM acoustic model to the set of model architectures we combined previously, and includes character-based and dialog session aware LSTM language models in rescoring. For system combination we adopt a two-stage approach, whereby subsets of acoustic models are first combined at the senone/frame level, followed by a word-level voting via confusion networks. We also added a confusion network rescoring step after system combination. The resulting system yields a 5.1\% word error rate on the 2000 Switchboard evaluation set.

Practical Minimum Cut Algorithms

The minimum cut problem for an undirected edge-weighted graph asks us to divide its set of nodes into two blocks while minimizing the weight sum of the cut edges. Here, we introduce a linear-time algorithm to compute near-minimum cuts. Our algorithm is based on cluster contraction using label propagation and Padberg and Rinaldi’s contraction heuristics [SIAM Review, 1991]. We give both sequential and shared-memory parallel implementations of our algorithm. Extensive experiments on both real-world and generated instances show that our algorithm finds the optimal cut on nearly all instances significantly faster than other state-of-the-art algorithms while our error rate is lower than that of other heuristic algorithms. In addition, our parallel algorithm shows good scalability.

Notes: A Continuous Model of Neural Networks. Part I: Residual Networks

In this series of notes, we try to model neural networks as as discretizations of continuous flows on the space of data, which can be called flow model. The idea comes from an observation of their similarity in mathematical structures. This conceptual analogy has not been proven useful yet, but it seems interesting to explore. In this part, we start with a linear transport equation (with nonlinear transport velocity field) and obtain a class of residual type neural networks. If the transport velocity field has a special form, the obtained network is found similar to the original ResNet. This neural network can be regarded as a discretization of the continuous flow defined by the transport flow. In the end, a summary of the correspondence between neural networks and transport equations is presented, followed by some general discussions.

Learning Spread-out Local Feature Descriptors

We propose a simple, yet powerful regularization technique that can be used to significantly improve both the pairwise and triplet losses in learning local feature descriptors. The idea is that in order to fully utilize the expressive power of the descriptor space, good local feature descriptors should be sufficiently ‘spread-out’ over the space. In this work, we propose a regularization term to maximize the spread in feature descriptor inspired by the property of uniform distribution. We show that the proposed regularization with triplet loss outperforms existing Euclidean distance based descriptor learning techniques by a large margin. As an extension, the proposed regularization technique can also be used to improve image-level deep feature embedding.

The Stochastic Replica Approach to Machine Learning: Stability and Parameter Optimization
Cross-Lingual Dependency Parsing for Closely Related Languages – Helsinki’s Submission to VarDial 2017
Anomalous elasticity, fluctuations and disorder in elastic membranes
Neural machine translation for low-resource languages
Security, Privacy and Safety Evaluation of Dynamic and Static Fleets of Drones
Geometry Of The Expected Value Set And The Set-Valued Sample Mean Process
Dynamic Connectivity Game for Adversarial Internet of Battlefield Things Systems
Identification of individual coherent sets associated with flow trajectories using Coherent Structure Coloring
The Natural Stories Corpus
Data-Driven Tree Transforms and Metrics
Deterministic coding theorems for blind sensing: optimal measurement rate and fractal dimension
A Stronger Foundation for Computer Science and P=NP
The Wellposedness of FBSDEs (II)
Boolean Unateness Testing with $\widetilde{O}(n^{3/4})$ Adaptive Queries
Pentavalent symmetric graphs admitting transitive non-abelian characteristically simple groups
CLaC @ QATS: Quality Assessment for Text Simplification
The CLaC Discourse Parser at CoNLL-2016
On the Contribution of Discourse Structure on Text Complexity Assessment
ClaC: Semantic Relatedness of Words and Phrases
Measuring the Effect of Discourse Relations on Blog Summarization
Coarsening model on $\mathbb{Z}^d$ with biased zero-energy flips and an exponential large deviation bound for ASEP
Serre’s Properties for Quadratic Generated Domains from Graphs
A Proof of Willcocks’s Conjecture
Discovery of Visual Semantics by Unsupervised and Self-Supervised Representation Learning
Analysing Soccer Games with Clustering and Conceptors
The distinguishing number and the distinguishing index of line and graphoidal graphs
Applying Deep Bidirectional LSTM and Mixture Density Network for Basketball Trajectory Prediction
Visual Forecasting by Imitating Dynamics in Natural Sequences
High Voltage Insulator Surface Evaluation Using Image Processing
Convergence of series of strongly integrable random variables
Common change point estimation in panel data from the least squares and maximum likelihood viewpoints
A Data and Model-Parallel, Distributed and Scalable Framework for Training of Deep Networks in Apache Spark
Spanning Simplicial Complexes of Uni-Cyclic Multigraphs
Image2song: Song Retrieval via Bridging Image Content and Lyric Words
Tail and moment estimates for a class of random chaoses of order two
The CLaC Discourse Parser at CoNLL-2015
Martingale representations in progressive enlargement setting: the role of the accessible jump times
Decomposition of mean-field Gibbs distributions into product measures
Heat kernel estimates for time fractional equations
A plurality problem with three colors and query size three
Power Allocation for Adaptive OFDM Index Modulation in Cooperative Networks
Outage Performance Analysis of Multicarrier Relay Selection for Cooperative Networks
UE4Sim: A Photo-Realistic Simulator for Computer Vision Applications
The Spatial Outage Capacity of Wireless Networks
What Drives the International Development Agenda? An NLP Analysis of the United Nations General Debate 1970-2016
A novel agent-based simulation framework for sensing in complex adaptive environments
Percolation Thresholds in Hyperbolic Lattices
Event-Radar: Real-time Local Event Detection System for Geo-Tagged Tweet Streams
Regularized Estimation and Testing for High-Dimensional Multi-Block Vector-Autoregressive Models
Teaching UAVs to Race Using UE4Sim
Arabic Multi-Dialect Segmentation: bi-LSTM-CRF vs. SVM
An Improved Multi-Output Gaussian Process RNN with Real-Time Validation for Early Sepsis Detection
Identifying down and up-regulated chromosome regions using RNA-Seq data
Computer-aided diagnosis of lung nodule using gradient tree boosting and Bayesian optimization
Group twin coloring of graphs
An FPT algorithm for planar multicuts with sources and sinks on the outer face
Electricity Theft Detection using Machine Learning
Designing virus-resistant, high-performance networks: a game-formation approach
Accelerating Kernel Classifiers Through Borders Mapping
Asymptotically optimal appointment schedules with customer no-shows
A Deep Q-Network for the Beer Game with Partial Information
X-PACS: eXPlaining Anomalies by Characterizing Subspaces
Solving a New 3D Bin Packing Problem with Deep Reinforcement Learning Method
Innovations orthogonalization: a solution to the major pitfalls of EEG/MEG ‘leakage correction’
Fundamental Limits of Weak Recovery with Applications to Phase Retrieval
Software-Defined Robotics — Idea & Approach
Message Passing in C-RAN: Joint User Activity and Signal Detection
The Helsinki Neural Machine Translation System
Neural Machine Translation with Extended Context
Golden Angle Modulation
Time-dependent Real-space Renormalization-Group Approach: application to an adiabatic random quantum Ising model
On line arrangements over fields with $1-ad$ structure
New extremal singly even self-dual codes of lengths $64$ and $66$
An End-to-End Trainable Neural Network Model with Belief Tracking for Task-Oriented Dialog
BSDEs with weak reflections and partial hedging of American options
Customers’ abandonment strategy in an M/G/1 queue
Kirchhoff Index As a Measure of Edge Centrality in Weighted Networks: Nearly Linear Time Algorithms
Neural Networks Compression for Language Modeling
On the topology effects in wireless sensor networks based prognostics and health management
Applying Data Augmentation to Handwritten Arabic Numeral Recognition Using Deep Learning Neural Networks
An improved watermarking scheme for Internet applications
Shapelet-based Sparse Representation for Landcover Classification of Hyperspectral Images
On the construction of small subsets containing special elements in a finite field
Edge-regular graphs with regular cliques
Stochastic Primal-Dual Proximal ExtraGradient Descent for Compositely Regularized Optimization
An Efficient Single Chord-based Accumulation Technique (SCA) to Detect More Reliable Corners
Attentive Semantic Video Generation using Captions
Binary functions, degeneracy, and alternating dimaps
Perceptual audio loss function for deep learning
LSTM Network for Inflected Abbreviation Expansion
A Batch Noise Contrastive Estimation Approach for Training Large Vocabulary Language Models
Efficient Online Inference for Infinite Evolutionary Cluster models with Applications to Latent Social Event Discovery
Quantum state certification
Ergodicity of the KPZ Fixed Point
Boltzmann machines and energy-based models
Modelling Word Burstiness in Natural Language: A Generalised Polya Process for Document Language Models in Information Retrieval
Product Matrix Minimum Storage Regenerating Codes with Flexible Number of Helpers
Conversion of Mersenne Twister to double-precision floating-point numbers
Learning to Paraphrase for Question Answering
Joint Multi-view Face Alignment in the Wild
The rectangular representation of the double affine Hecke algebra via elliptic Schur-Weyl duality
Portuguese Word Embeddings: Evaluating on Word Analogies and Natural Language Tasks
DeepBreath: Deep Learning of Breathing Patterns for Automatic Stress Recognition using Low-Cost Thermal Imaging in Unconstrained Settings
Quantile-based Mean-Field Games with Common Noise
More cat than cute? Interpretable Prediction of Adjective-Noun Pairs
Multi-version Coding for Consistent Distributed Storage of Correlated Data Updates
Solitons in a modified discrete nonlinear Schroedinger equation
Efficient algorithms for scheduling equal-length jobs with processing set restrictions on uniform parallel batch machines
Least Sparsity of $p$-norm based Optimization Problems with $p > 1$
Lower bounds on the sizes of defining sets in full $n$-Latin squares and full designs
An appetizer to modern developments on the Kardar-Parisi-Zhang universality class
Compact modes in quasi one dimensional coupled magnetic oscillators
Scientific Information Extraction with Semi-supervised Neural Tagging
ExSIS: Extended Sure Independence Screening for Ultrahigh-dimensional Linear Models
First passage problems for upwards skip-free random walks via the $Φ,W,Z$ paradigm
Block Markov Superposition Transmission of BCH Codes with Iterative Erasures-and-Errors Decoders
The perimeter of uniform and geometric words: a probabilistic analysis
Distantly Supervised Road Segmentation
e-Counterfeit: a mobile-server platform for document counterfeit detection
Revisiting knowledge transfer for training object class detectors
Some Distributions on Finite Rooted Binary Trees
Evasion Attacks against Machine Learning at Test Time
Well-posedness and Optimal Regularity of Stochastic Evolution Equations with Multiplicative Noises
Scalable Kernelization for Maximum Independent Sets
Physiological Gaussian Process Priors for the Hemodynamics in fMRI Analysis
Generalized chordality, vertex separators and hyperbolicity on graphs
Economic Design of Memory-Type Control Charts: The Fallacy of the Formula Proposed by Lorenzen and Vance (1986)
Total variation regularization of multi-material topology optimization
Sparsity Within and Across Overlapping Groups
Approximate and exact controllability of linear difference equations
Optimally Gathering Two Robots
Seernet at EmoInt-2017: Tweet Emotion Intensity Estimator
Network of families in a contemporary population: regional and cultural assortativity
Sparse polynomial interpolation: compressed sensing, super resolution, or Prony?
Numerical methods for SDEs with drift discontinuous on a set of positive reach
Local asymptotics for the area under the random walk excursion
Sobolev regularity for first order Mean Field Games
Counting Walks in the Quarter Plane
Segmentation of retinal cysts from Optical Coherence Tomography volumes via selective enhancement
On the approximation by single hidden layer feedforward neural networks with fixed weights
A subspace code of size $333$ in the setting of a binary $q$-analog of the Fano plane
Recognizing Involuntary Actions from 3D Skeleton Data Using Body States
Fake News in Social Networks
Deep Convolutional Neural Networks for Massive MIMO Fingerprint-Based Positioning
GraphR: Accelerating Graph Processing Using ReRAM
Probabilistic Relation Induction in Vector Space Embeddings
Partial-Duplex Amplify-and-Forward Relaying: Spectral Efficiency Analysis under Self-Interference
Distribution flows associated with positivity preserving coercive forms
Simple and Near-Optimal Distributed Coloring for Sparse Graphs
The CARESSES EU-Japan project: making assistive robots culturally competent
A refined count of Coxeter element factorizations
Asymptotics of empirical eigen-structure for high dimensional sample covariance matrices of general form
Employing Weak Annotations for Medical Image Analysis Problems
Bounds on absolutely maximally entangled states from shadow inequalities, and the quantum MacWilliams identity
A general framework for Vecchia approximations of Gaussian processes
Network Model Selection for Task-Focused Attributed Network Inference
Efficient Nonparametric Bayesian Inference For X-Ray Transforms
Nonlinear association structures in flexible Bayesian additive joint models