Sampling High Throughput Data for Anomaly Detection of Data-Base Activity

Data leakage and theft from databases is a dangerous threat to organizations. Data Security and Data Privacy protection systems (DSDP) monitor data access and usage to identify leakage or suspicious activities that should be investigated. Because of the high velocity nature of database systems, such systems audit only a portion of the vast number of transactions that take place. Anomalies are investigated by a Security Officer (SO) in order to choose the proper response. In this paper we investigate the effect of sampling methods based on the risk the transaction poses and propose a new method for ‘combined sampling’ for capturing a more varied sample.

Attacking Automatic Video Analysis Algorithms: A Case Study of Google Cloud Video Intelligence API

Due to the growth of video data on Internet, automatic video analysis has gained a lot of attention from academia as well as companies such as Facebook, Twitter and Google. In this paper, we examine the robustness of video analysis algorithms in adversarial settings. Specifically, we propose targeted attacks on two fundamental classes of video analysis algorithms, namely video classification and shot detection. We show that an adversary can subtly manipulate a video in such a way that a human observer would perceive the content of the original video, but the video analysis algorithm will return the adversary’s desired outputs. We then apply the attacks on the recently released Google Cloud Video Intelligence API. The API takes a video file and returns the video labels (objects within the video), shot changes (scene changes within the video) and shot labels (description of video events over time). Through experiments, we show that the API generates video and shot labels by processing only the first frame of every second of the video. Hence, an adversary can deceive the API to output only her desired video and shot labels by periodically inserting an image into the video at the rate of one frame per second. We also show that the pattern of shot changes returned by the API can be mostly recovered by an algorithm that compares the histograms of consecutive frames. Based on our equivalent model, we develop a method for slightly modifying the video frames, in order to deceive the API into generating our desired pattern of shot changes. We perform extensive experiments with different videos and show that our attacks are consistently successful across videos with different characteristics. At the end, we propose introducing randomness to video analysis algorithms as a countermeasure to our attacks.

Collaborative Filtering using Denoising Auto-Encoders for Market Basket Data

Recommender systems (RS) help users navigate large sets of items in the search for ‘interesting’ ones. One approach to RS is Collaborative Filtering (CF), which is based on the idea that similar users are interested in similar items. Most model-based approaches to CF seek to train a machine-learning/data-mining model based on sparse data; the model is then used to provide recommendations. While most of the proposed approaches are effective for small-size situations, the combinatorial nature of the problem makes it impractical for medium-to-large instances. In this work we present a novel approach to CF that works by training a Denoising Auto-Encoder (DAE) on corrupted baskets, i.e., baskets from which one or more items have been removed. The DAE is then forced to learn to reconstruct the original basket given its corrupted input. Due to recent advancements in optimization and other technologies for training neural-network models (such as DAE), the proposed method results in a scalable and practical approach to CF. The contribution of this work is twofold: (1) to identify missing items in observed baskets and, thus, directly providing a CF model; and, (2) to construct a generative model of baskets which may be used, for instance, in simulation analysis or as part of a more complex analytical method.

Distance and Similarity Measures Effect on the Performance of K-Nearest Neighbor Classifier – A Review

The K-nearest neighbor (KNN) classifier is one of the simplest and most common classifiers, yet its performance competes with the most complex classifiers in the literature. The core of this classifier depends mainly on measuring the distance or similarity between the tested example and the training examples. This raises a major question about which distance measures to be used for the KNN classifier among a large number of distance and similarity measures? This review attempts to answer the previous question through evaluating the performance (measured by accuracy, precision and recall) of the KNN using a large number of distance measures, tested on a number of real world datasets, with and without adding different levels of noise. The experimental results show that the performance of KNN classifier depends significantly on the distance used, the results showed large gaps between the performances of different distances. We found that a recently proposed non-convex distance performed the best when applied on most datasets comparing to the other tested distances. In addition, the performance of the KNN degraded only about 20\% while the noise level reaches 90\%, this is true for all the distances used. This means that the KNN classifier using any of the top 10 distances tolerate noise to a certain degree. Moreover, the results show that some distances are less affected by the added noise comparing to other distances.

Training Neural Networks with Very Little Data — A Draft

Deep neural networks are complex architectures composed of many layers of nodes, resulting in a large number of parameters including weights and biases that must be estimated through training the network. Larger and more complex networks typically require more training data for adequate convergence than their more simple counterparts. The data available to train these networks is often limited or imbalanced. We propose the radial transform in polar coordinate space for image augmentation to facilitate the training of neural networks from limited source data. Pixel-wise coordinate transforms provide representations of the original image in the polar coordinate system and both augment data as well as increase the diversity of poorly represented classes. Experiments performed on MNIST and a set of multimodal medical images using the AlexNet and GoogLeNet neural network models show high classification accuracy using the proposed method.

Constrained Community Detection in Social Networks

Community detection in networks is the process of identifying unusually well-connected sub-networks and is a central component of many applied network analyses. The paradigm of modularity optimization stipulates a partition of the network’s vertices which maximizes the difference between the fraction of edges within groups (communities) and the expected fraction if edges were randomly distributed. The modularity objective function incorporates the network’s topology exclusively and has been extensively studied whereas the integration of constraints or external information on community composition has largely remained unexplored. We impose a penalty function on the modularity objective function to regulate the constitution of communities and apply our methodology in identifying health care communities (HCCs) within a network of hospitals such that the number of cardiac defibrillator surgeries performed within each HCC exceeds a minimum threshold. This restriction permits meaningful comparisons in cardiac care among the resulting health care communities by standardizing the distribution of cardiac care across the hospital network.

Theoretical Foundation of Co-Training and Disagreement-Based Algorithms

Disagreement-based approaches generate multiple classifiers and exploit the disagreement among them with unlabeled data to improve learning performance. Co-training is a representative paradigm of them, which trains two classifiers separately on two sufficient and redundant views; while for the applications where there is only one view, several successful variants of co-training with two different classifiers on single-view data instead of two views have been proposed. For these disagreement-based approaches, there are several important issues which still are unsolved, in this article we present theoretical analyses to address these issues, which provides a theoretical foundation of co-training and disagreement-based approaches.

Extractive Summarization using Deep Learning

This paper proposes a text summarization approach for factual reports using a deep learning model. This approach consists of three phases: feature extraction, feature enhancement, and summary generation, which work together to assimilate core information and generate a coherent, understandable summary. We are exploring various features to improve the set of sentences selected for the summary, and are using a Restricted Boltzmann Machine to enhance and abstract those features to improve resultant accuracy without losing any important information. The sentences are scored based on those enhanced features and an extractive summary is constructed. Experimentation carried out on several articles demonstrates the effectiveness of the proposed approach.

The Trimmed Lasso: Sparsity and Robustness

Nonconvex penalty methods for sparse modeling in linear regression have been a topic of fervent interest in recent years. Herein, we study a family of nonconvex penalty functions that we call the trimmed Lasso and that offers exact control over the desired level of sparsity of estimators. We analyze its structural properties and in doing so show the following: 1) Drawing parallels between robust statistics and robust optimization, we show that the trimmed-Lasso-regularized least squares problem can be viewed as a generalized form of total least squares under a specific model of uncertainty. In contrast, this same model of uncertainty, viewed instead through a robust optimization lens, leads to the convex SLOPE (or OWL) penalty. 2) Further, in relating the trimmed Lasso to commonly used sparsity-inducing penalty functions, we provide a succinct characterization of the connection between trimmed-Lasso- like approaches and penalty functions that are coordinate-wise separable, showing that the trimmed penalties subsume existing coordinate-wise separable penalties, with strict containment in general. 3) Finally, we describe a variety of exact and heuristic algorithms, both existing and new, for trimmed Lasso regularized estimation problems. We include a comparison between the different approaches and an accompanying implementation of the algorithms.

A learning framework for winner-take-all networks with stochastic synapses
On Bounds of Spectral Efficiency of Optimally Beamformed NLOS Millimeter Wave Links
The Stochastic-Calculus Approach to Multi-Receiver Poisson Channels
New solution approaches for the maximum-reliability stochastic network interdiction problem
Uniqueness of Gibbs Measures for Continuous Hardcore Models
Semantically-Secured Message-Key Trade-off over Wiretap Channels with Random Parameters
The Complexity of Distributed Edge Coloring with Small Palettes
On the Spectral Norms of Pseudo-Wigner and Related Matrices
Z-knotted triangulations
Emotion Detection on TV Show Transcripts with Sequence-based Convolutional Neural Networks
PBW bases and marginally large tableaux in types B and C
Superadditivity in trade-off capacities of quantum channnels
An ELU Network with Total Variation for Image Denoising
Cyber-Physical Interference Modeling for Predictable Reliability of Inter-Vehicle Communications
Situation Recognition with Graph Neural Networks
Optimization of Heterogeneous Coded Caching
Improved Answer Selection with Pre-Trained Word Embeddings
Optimum thresholding using mean and conditional mean square error
On $k$-normality of Very Ample Lattice Polytopes
Graphettes: Constant-time determination of graphlet and orbit identity including (possibly disconnected) graphlets up to size 8
Spectral Methods for Passive Imaging: Non-asymptotic Performance and Robustness
Approximation of Minimal Functions by Extreme Functions
On a topological version of Pach’s overlap theorem
Benchmark Environments for Multitask Learning in Continuous Domains
On the Euler discretization error of Brownian motion about random times
Graph Classification via Deep Learning with Virtual Nodes
Continuous Representation of Location for Geolocation and Lexical Dialectology using Mixture Density Networks
Deep Edge-Aware Saliency Detection
Quasi-PTAS for Scheduling with Precedences using LP Hierarchies
Dockerface: an easy to install and use Faster R-CNN face detector in a Docker container
Enumerations relating braid and commutation classes
A novel sandwich algorithm for empirical Bayes analysis of rank data
Towards Learning Reward Functions from User Interactions
Streaming Periodicity with Mismatches
Fluency-Guided Cross-Lingual Image Captioning
Learning body-affordances to simplify action spaces
BiRank: Towards Ranking on Bipartite Graphs
Monocular Dense 3D Reconstruction of a Complex Dynamic Scene from Two Perspective Frames
Bringing Background into the Foreground: Making All Classes Equal in Weakly-supervised Video Semantic Segmentation
Shared Spectrum Access Communications: A Neutral Host Micro Operator Approach
Linking systems of difference sets
The square of a planar cubic graph is $7$-colorable
Throughput Enhancement of Multicarrier Cognitive M2M Networks: Universal-Filtered OFDM Systems
Sobolev regularity for the porous medium equation with a force
Resource Allocation in Shared Spectrum Access Communications for Operators with Diverse Service Requirements
A class of cyclotomic linear codes and their generalized Hamming weights
Discrete time Pontryagin maximum principle for optimal control problems under state-action-frequency constraints
Skill of global raw and postprocessed ensemble predictions of rainfall over northern tropical Africa
Supercritical Superprocesses: Proper Normalization and Non-degenerate Strong Limit
Distributed Weighted Sum-Rate Maximization in Multicell MU-MIMO OFDMA Downlink
Smart Meter Privacy via the Trapdoor Channel
Coexistence of Systems with Different Multicarrier Waveforms in LSA Communications
Knock-Knock: Acoustic Object Recognition by using Stacked Denoising Autoencoders
Granular hopping conduction in (Ag,Mo)$_x$(SnO$_2$)$_{1-x}$ films in the dielectric regime
Efficient Downlink Channel Probing and Uplink Feedback in FDD Massive MIMO Systems
Convex Approximated Weighted Sum-Rate Maximization for Multicell Multiuser OFDM
Actively Learning what makes a Discrete Sequence Valid
Well-posedness of the martingale problem for non-local perturbations of Lévy-type generators
Comparison of Decoding Strategies for CTC Acoustic Models
On the centre of mass of a random walk
Ensemble Methods for Personalized E-Commerce Search Challenge at CIKM Cup 2016
Lifting tropical bitngents
On Vector ARMA Models Consistent with a Finite Matrix Covariance Sequence
Learning with Rethinking: Recurrently Improving Convolutional Neural Networks through Feedback
Edge-magic labelings for constellations and armies of caterpillars
Sparse Inverse Covariance Estimation for High-throughput microRNA Sequencing Data in the Poisson Log-Normal Graphical Model
Linear algebraic analogues of the graph isomorphism problem and the Erdős-Rényi model
Pathological Pulmonary Lobe Segmentation from CT Images using Progressive Holistically Nested Neural Networks and Random Walker
Directed Ramsey number for trees
DesnowNet: Context-Aware Deep Network for Snow Removal
Stable matchings in high dimensions via the Poisson-weighted infinite tree
Polynomial-time algorithms for the Longest Induced Path and Induced Disjoint Paths problems on graphs of bounded mim-width
Sample Efficient Estimation and Recovery in Sparse FFT via Isolation on Average
Improved Regularization of Convolutional Neural Networks with Cutout
Efficient Intersection Control for Minimally Guided Vehicles: A Self-Organised and Decentralized Approach
Database of Parliamentary Speeches in Ireland, 1919-2013
Interactions between species introduce spurious associations in microbiome studies
Automatic Summarization of Online Debates
Gold Standard Online Debates Summaries and First Experiments Towards Automatic Summarization of Online Debate Data
Segmentation-Aware Convolutional Networks Using Local Attention Masks
A Robust Consensus Algorithm for Current Sharing and Voltage Regulation in DC Microgrids