LRC: Dependency-Aware Cache Management for Data Analytics Clusters

Memory caches are being aggressively used in today’s data-parallel systems such as Spark, Tez, and Piccolo. However, prevalent systems employ rather simple cache management policies–notably the Least Recently Used (LRU) policy–that are oblivious to the application semantics of data dependency, expressed as a directed acyclic graph (DAG). Without this knowledge, memory caching can at best be performed by ‘guessing’ the future data access patterns based on historical information (e.g., the access recency and/or frequency), which frequently results in inefficient, erroneous caching with low hit ratio and a long response time. In this paper, we propose a novel cache replacement policy, Least Reference Count (LRC), which exploits the application-specific DAG information to optimize the cache management. LRC evicts the cached data blocks whose reference count is the smallest. The reference count is defined, for each data block, as the number of dependent child blocks that have not been computed yet. We demonstrate the efficacy of LRC through both empirical analysis and cluster deployments against popular benchmarking workloads. Our Spark implementation shows that, compared with LRU, LRC speeds up typical applications by 60%.


Improving Classification by Improving Labelling: Introducing Probabilistic Multi-Label Object Interaction Recognition

This work deviates from easy-to-define class boundaries for object interactions. For the task of object interaction recognition, often captured using an egocentric view, we show that semantic ambiguities in verbs and recognising sub-interactions along with concurrent interactions result in legitimate class overlaps (Figure 1). We thus aim to model the mapping between observations and interaction classes, as well as class overlaps, towards a probabilistic multi-label classifier that emulates human annotators. Given a video segment containing an object interaction, we model the probability for a verb, out of a list of possible verbs, to be used to annotate that interaction. The proba- bility is learnt from crowdsourced annotations, and is tested on two public datasets, comprising 1405 video sequences for which we provide annotations on 90 verbs. We outper- form conventional single-label classification by 11% and 6% on the two datasets respectively, and show that learning from annotation probabilities outperforms majority voting and enables discovery of co-occurring labels.


A Hybrid Deep Learning Approach for Texture Analysis

Texture classification is a problem that has various applications such as remote sensing and forest species recognition. Solutions tend to be custom fit to the dataset used but fails to generalize. The Convolutional Neural Network (CNN) in combination with Support Vector Machine (SVM) form a robust selection between powerful invariant feature extractor and accurate classifier. The fusion of experts provides stability in classification rates among different datasets.


K-Means Clustering using Tabu Search with Quantized Means

The Tabu Search (TS) metaheuristic has been proposed for K-Means clustering as an alternative to Lloyd’s algorithm, which for all its ease of implementation and fast runtime, has the major drawback of being trapped at local optima. While the TS approach can yield superior performance, it involves a high computational complexity. Moreover, the difficulty in parameter selection in the existing TS approach does not make it any more attractive. This paper presents an alternative, low-complexity formulation of the TS optimization procedure for K-Means clustering. This approach does not require many parameter settings. We initially constrain the centers to points in the dataset. We then aim at evolving these centers using a unique neighborhood structure that makes use of gradient information of the objective function. This results in an efficient exploration of the search space, after which the means are refined. The proposed scheme is implemented in MATLAB and tested on four real-world datasets, and it achieves a significant improvement over the existing TS approach in terms of the intra cluster sum of squares and computational time.


MSE estimates for multitaper spectral estimation and off-grid compressive sensing

Localization of a microtubule organizing center by kinesin motors

A recursive point process model for infectious diseases

Fast and Flexible Successive-Cancellation List Decoders for Polar Codes

Speeding up TestU01 with the use of HTCondor

Flare: Native Compilation for Heterogeneous Workloads in Apache Spark

Millimeter Wave MIMO Channel Estimation Based on Adaptive Compressed Sensing

3D spatially-resolved optical energy density enhanced by wavefront shaping

A Novel Millimeter-Wave Channel Simulator and Applications for 5G Wireless Communications

Forcing clique immersions through chromatic number

Semi-Automatic Segmentation and Ultrasonic Characterization of Solid Breast Lesions

Millimeter Wave Small-Scale Spatial Statistics in an Urban Microcell Scenario

Efficient regularization with wavelet sparsity constraints in PAT

Mean-Field Controllability and Decentralized Stabilization of Markov Chains, Part I: Global Controllability and Rational Feedbacks

TokTrack: A Complete Token Provenance and Change Tracking Dataset for the English Wikipedia

On the Robustness of Convolutional Neural Networks to Internal Architecture and Weight Perturbations

SINR and Throughput of Dense Cellular Networks with Stretched Exponential Path Loss

The Dependence of Machine Learning on Electronic Medical Record Quality

Improved NN-JPDAF for Joint Multiple Target Tracking and Feature Extraction

Mixing Time of Random Walk on Poisson Geometry Small World

Supervisor Synthesis of POMDP based on Automata Learning

A Nonconvex Splitting Method for Symmetric Nonnegative Matrix Factorization: Convergence Analysis and Optimality

Combinatorial metrics: MacWilliams-type identities, isometries and extension property

An Asymptotically Tighter Bound on Sampling for Frequent Itemsets Mining

View Adaptive Recurrent Neural Networks for High Performance Human Action Recognition from Skeleton Data

Experimental Identification of Hard Data Sets for Classification and Feature Selection Methods with Insights on Method Selection

Diffusion L0-norm constraint improved proportionate LMS algorithm for sparse distributed estimation

The Multi-Armed Bandit Problem: An Efficient Non-Parametric Solution

Evolutionary Stability of Reputation Management System in Peer to Peer Networks

Deep Direct Regression for Multi-Oriented Scene Text Detection

Projective divisible binary codes

Multi-Level Discovery of Deep Options

An online slow manifold approach for efficient optimal control of multiple time-scale kinetics

Interacting Conceptual Spaces I : Grammatical Composition of Concepts

Arc-transitive cyclic and dihedral covers of pentavalent symmetric graphs of order twice a prime

Are crossing dependencies really scarce?

Hyper Zagreb Index of Bridge and Chain Grpahs

Taming Tail Latency for Erasure-coded, Distributed Storage Systems

Anderson localization of a Rydberg electron along a classical orbit

Event-based State Estimation: An Emulation-based Approach

Nonparametric Bayesian analysis for support boundary recovery

Scalable Person Re-identification on Supervised Smoothed Manifold

A new class of three-weight linear codes from weakly regular plateaued functions

A randomized primal distributed algorithm for partitioned and big-data non-convex optimization

Optimal Service Elasticity in Large-Scale Distributed Systems

A duality-based approach for distributed min-max optimization with application to demand side management

Feature Fusion using Extended Jaccard Graph and Stochastic Gradient Descent for Robot

Smart Augmentation – Learning an Optimal Data Augmentation Strategy

The KMS Condition for the homoclinic equivalence relation and Gibbs probabilities

Self-organized pattern formation of run-and-tumble chemotactic bacteria: Instability analysis of a kinetic chemotaxis model

DeepVisage: Making face recognition simple yet with powerful generalization skills

Smart Meter Privacy with Renewable Energy and a Storage Device

Stochastic Calculus with respect to Gaussian Processes: Part I

Zero controllability in discrete-time structured systems

Volterra differential equations with singular kernels

Reasoning by Cases in Structured Argumentation

Asymmetric Learning Vector Quantization for Efficient Nearest Neighbor Classification in Dynamic Time Warping Spaces

A Bitcoin-inspired infinite-server model with a random fluid limit

A vehicle-to-infrastructure communication based algorithm for urban traffic control

Combinatorial Ricci curvature on cell-complex and Gauss-Bonnnet Theorem

On the compensator in the Doob-Meyer decomposition of the Snell envelope

A bijective proof of the hook-length formula for skew shapes

Calendar.help: Designing a Workflow-Based Scheduling Agent with Humans in the Loop

Modeling and Estimation for Self-Exciting Spatio-Temporal Models of Terrorist Activity

Metric random matchings with applications

Linear classifier design under heteroscedasticity in Linear Discriminant Analysis

Moments of the Hermitian Matrix Jacobi process

Object Region Mining with Adversarial Erasing: A Simple Classification to Semantic Segmentation Approach

Constant Threshold Intersection Graphs of Orthodox Paths in Trees

Virtualization technology for distributed time sensitive domains

Batch-normalized joint training for DNN-based distant speech recognition

Medical Image Retrieval using Deep Convolutional Neural Network

Overcoming Catastrophic Forgetting by Incremental Moment Matching

Long-Term Evolution of Genetic Programming Populations

Multiscale Granger causality

regsem: Regularized Structural Equation Modeling

Content-Based Image Retrieval Based on Late Fusion of Binary and Local Descriptors

Multi-stage Multi-recursive-input Fully Convolutional Networks for Neuronal Boundary Detection

Local Deep Neural Networks for Age and Gender Classification

Partitions of multigraphs under degree constraints

Generalized Nash Equilibrium Problem by the Alternating Direction Method of Multipliers

ALLSAT compressed with wildcards. Part 2: All k-models of a BDD

An Extension of Feller’s Strong Law of Large Numbers

Interactive Natural Language Acquisition in a Multi-modal Recurrent Neural Architecture

Mean-Field Controllability and Decentralized Stabilization of Markov Chains, Part II: Asymptotic Controllability and Polynomial Feedbacks

Radiomics strategies for risk assessment of tumour failure in head-and-neck cancer

Rejection-free Ensemble MCMC with applications to Factorial Hidden Markov Models

Joint Modeling of Event Sequence and Time Series with Attentional Twin Recurrent Neural Networks

An Algorithmic Approach to the Asynchronous Computability Theorem

Turing instability in a model with two interacting Ising lines: hydrodynamic limit

A Dynamic Programming Principle for Distribution-Constrained Optimal Stopping

PonyGE2: Grammatical Evolution in Python

Crowdsourcing Universal Part-Of-Speech Tags for Code-Switching

Advertisements