# Book Memo: “Convex Analysis and Global Optimization”

**04**
*Sunday*
Dec 2016

Posted Books

in
**04**
*Sunday*
Dec 2016

Posted Books

in
**04**
*Sunday*
Dec 2016

Posted R Packages

in* Plotting for Bayesian Models* (

Plotting functions for posterior analysis, model checking, and MCMC diagnostics. The package is designed not only to provide convenient functionality for users, but also a common set of functions that can be easily used by developers working on a variety of R packages for Bayesian modeling, particularly (but not exclusively) packages interfacing with Stan.

Generates a frequency distribution. The frequency distribution includes raw frequencies, percentages in each category, and cumulative frequencies. The frequency distribution can be stored as a data frame.

An implementation of the generated effect modifier (GEM) method. This method constructs composite variables by linearly combining pre-treatment scalar patient characteristics to create optimal treatment effect modifiers in linear models. The optimal linear combination is called a GEM. Treatment is assumed to have been assigned at random. For reference, see E Petkova, T Tarpey, Z Su, and RT Ogden. Generated effect modifiers (GEMs) in randomized clinical trials. Biostatistics (First published online: July 27, 2016, <doi:10.1093/biostatistics/kxw035>).

Provides functions for the Bayesian analysis of extreme value models. The ‘rust’ package <https://…/package=rust> is used to simulate a random sample from the required posterior distribution. The functionality of ‘revdbayes’ is similar to the ‘evdbayes’ package <https://…/package=evdbayes>, which uses Markov Chain Monte Carlo (MCMC) methods for posterior simulation. See the ‘revdbayes’ website for more information, documentation and examples.

Log-linear modeling is a popular method for the analysis of contingency table data. When the table is sparse, the data can fall on the boundary of the convex support, and we say that ‘the MLE does not exist’ in the sense that some parameters cannot be estimated. However, an extended MLE always exists, and a subset of the original parameters will be estimable. The ‘eMLEloglin’ package determines which sampling zeros contribute to the non-existence of the MLE. These problematic zero cells can be removed from the contingency table and the model can then be fit (as far as is possible) using the glm() function.

**04**
*Sunday*
Dec 2016

Posted What is ...

inOne of the most important ideas in a research project is the unit of analysis. The unit of analysis is the major entity that you are analyzing in your study. For instance, any of the following could be a unit of analysis in a study:

• individuals

• groups

• artifacts (books, photos, newspapers)

• geographical units (town, census tract, state)

• social interactions (dyadic relations, divorces, arrests)

Why is it called the ‘unit of analysis’ and not something else (like, the unit of sampling)? Because it is the analysis you do in your study that determines what the unit is. For instance, if you are comparing the children in two classrooms on achievement test scores, the unit is the individual child because you have a score for each child. On the other hand, if you are comparing the two classes on classroom climate, your unit of analysis is the group, in this case the classroom, because you only have a classroom climate score for the class as a whole and not for each individual student. For different analyses in the same study you may have different units of analysis. If you decide to base an analysis on student scores, the individual is the unit. But you might decide to compare average classroom performance. In this case, since the data that goes into the analysis is the average itself (and not the individuals’ scores) the unit of analysis is actually the group. Even though you had data at the student level, you use aggregates in the analysis. In many areas of social research these hierarchies of analysis units have become particularly important and have spawned a whole area of statistical analysis sometimes referred to as hierarchical modeling. This is true in education, for instance, where we often compare classroom performance but collected achievement data at the individual student level. … Unit of Analysis

**04**
*Sunday*
Dec 2016

Posted Magister Dixit

in“Enormous data sets often consist of enormous numbers of small sets of data, none of which by themselves are enough to solve the thing you are interested in, and they fit together in some complicated way.” John D. Cook ( 15 December 2010 )

**03**
*Saturday*
Dec 2016

Posted Books

in
**03**
*Saturday*
Dec 2016

Posted R Packages

in* Relabel Loadings from MCMC Output for Confirmatory Factor Analysis* (

In confirmatory factor analysis (CFA), structural constraints typically ensure that the model is identified up to all possible reflections, i.e., column sign changes of the matrix of loadings. Such reflection invariance is problematic for Bayesian CFA when the reflection modes are not well separated in the posterior distribution. Imposing rotational constraints — fixing some loadings to be zero or positive in order to pick a factor solution that corresponds to one reflection mode — may not provide a satisfactory solution for Bayesian CFA. The function ‘relabel’ uses the relabeling algorithm of Erosheva and Curtis to correct for sign invariance in MCMC draws from CFA models. The MCMC draws should come from Bayesian CFA models that are fit without rotational constraints.

Selection of the number of clusters in cluster analysis using stability methods.

Interactive visualizations and tabulations for diffrprojects. All presentations are based on the htmlwidgets framework allowing for interactivity via HTML and Javascript, Rstudio viewer integration, RMarkdown integration, as well as Shiny compatibility.

Estimation and inference methods for the continuous threshold expectile regression. It can fit the continuous threshold expectile regression and test the existence of change point, for the paper, ‘Feipeng Zhang and Qunhua Li (2016). A continuous threshold expectile regression, submitted.’

Solve systematic reserve design problems using integer programming techniques. To solve problems most efficiently users can install three optional packages not available on CRAN: the ‘gurobi’ optimizer (available from <http://…/> ) and the conservation prioritization package ‘marxan’ (available from <https://…/marxan> ).

**03**
*Saturday*
Dec 2016

Posted What is ...

inThe Web Ontology Language (OWL) is a family of knowledge representation languages for authoring ontologies. Ontologies are a formal way to describe taxonomies and classification networks, essentially defining the structure of knowledge for various domains: the nouns representing classes of objects and the verbs representing relations between the objects. Ontologies resemble class hierarchies in object-oriented programming but there are several critical differences. Class hierarchies are meant to represent structures used in source code that evolve fairly slowly (typically monthly revisions) where as ontologies are meant to represent information on the Internet and are expected to be evolving almost constantly. Similarly, ontologies are typically far more flexible as they are meant to represent information on the Internet coming from all sorts of heterogeneous data sources. Class hierarchies on the other hand are meant to be fairly static and rely on far less diverse and more structured sources of data such as corporate databases. The OWL languages are characterized by formal semantics. They are built upon a W3C XML standard for objects called the Resource Description Framework (RDF). OWL and RDF have attracted significant academic, medical and commercial interest. In October 2007, a new W3C working group was started to extend OWL with several new features as proposed in the OWL 1.1 member submission. W3C announced the new version of OWL on 27 October 2009. This new version, called OWL 2, soon found its way into semantic editors such as Protégé and semantic reasoners such as Pellet, RacerPro, FaCT++ and HermiT. The OWL family contains many species, serializations, syntaxes and specifications with similar names. OWL and OWL2 are used to refer to the 2004 and 2009 specifications, respectively. Full species names will be used, including specification version (for example, OWL2 EL). When referring more generally, OWL Family will be used. … Web Ontology Language (OWL)

**03**
*Saturday*
Dec 2016

Posted Magister Dixit

in“Data modeling, simulation, and other digital tools are reshaping how we innovate.” Bob McDonald

**02**
*Friday*
Dec 2016

Posted Distilled News

in**Extracting Tables from PDFs in R using the Tabulizer Package**

Recently I wanted to extract a table from a pdf file so that I could work with the table in R. Specifically, I wanted to get data on layoffs in California from the California Employment Development Department. The EDD publishes a list of all of the layoffs in the state that fall under the WARN act here. Unfortunately, the tables are available only in pdf format. I wanted an interactive version of the data that I could work with in R and export to a csv file. Fortunately, the tabulizer package in R makes this a cinch. In this post, I will use this scenario as a working example to show how to extract data from a pdf file using the tabulizer package in R.

**Change Point Detection. Part I – a frequentist approach.**

Change Point Detection (CPD) refers to the problem of estimating the time at which the statistical properties of a time series… well… change. It originates in the 1950s, as a method used to automatically detect failures in industrial processes (quality control) and it is currently an active area of research that can boast of having a website on its own. CPD is a generally interesting problem with lots of potential applications other than quality control, ranging from predicting anomalies in the electricity market (and, more generally, in financial markets) to detecting security attacks in a network system or even detecting electrical activity in the brain. The point is to have an algorithm that can automatically detect changes in the properties of the time series for us to make the appropriate decisions. Whatever the application, the general framework is always the same: the underlying probability distribution function of the time series is assumed to change at one (or more) moments in time.

**Efficiently Saving and Sharing Data in R**

After spending a day the other week struggling to make sense of a federal data set shared in an archaic format (ASCII fixed format dat file). It is essential for the effective distribution and sharing of data that it use the minimum amount of disk space and be rapidly accessible for use by potential users. In this post I test four different file formats available to R users. These formats are comma separated values csv (write.csv()), object representation format as a ASCII txt (dput()), a serialized R object (saveRDS()), and a Stata file (write.dta() from the foreign package). For reference, rds files seem to be identical to Rdata files except that they deal with only one object rather than potentially multiple.

We introduce the task of Visual Dialog, which requires an AI agent to hold a meaningful dialog with humans in natural, conversational language about visual content. Specifically, given an image, a dialog history, and a question about the image, the agent has to ground the question in image, infer context from history, and answer the question accurately. Visual Dialog is disentangled enough from a specific downstream task so as to serve as a general test of machine intelligence, while being grounded in vision enough to allow objective evaluation of individual responses and benchmark progress. We develop a novel two-person chat data-collection protocol to curate a large-scale Visual Dialog dataset (VisDial). Data collection is underway and on completion, VisDial will contain 1 dialog with 10 question-answer pairs on all ~200k images from COCO, with a total of 2M dialog question-answer pairs. We introduce a family of neural encoder-decoder models for Visual Dialog with 3 encoders — Late Fusion, Hierarchical Recurrent Encoder and Memory Network — and 2 decoders (generative and discriminative), which outperform a number of sophisticated baselines. We propose a retrieval-based evaluation protocol for Visual Dialog where the AI agent is asked to sort a set of candidate answers and evaluated on metrics such as mean-reciprocal-rank of human response. We quantify gap between machine and human performance on the Visual Dialog task via human studies. Our dataset, code, and trained models will be released publicly. Putting it all together, we demonstrate the first ‘visual chatbot’!

**DGraph: A Scalable, Distributed, Low Latency, High Throughput Graph Database**

Dgraph is a distributed, low-latency, high throughput, native graph database. It’s written from scratch in Go[1] language, with concurrency and scalability in mind.

**02**
*Friday*
Dec 2016

Posted arXiv Papers

in**Principal component analysis of periodically correlated functional time series**

Within the framework of functional data analysis, we develop principal component analysis for periodically correlated time series of functions. We define the components of the above analysis including periodic, operator-valued filters, score processes and the inversion formulas. We show that these objects are defined via convergent series under a simple condition requiring summability of the Hilbert-Schmidt norms of the filter coefficients, and that they poses optimality properties. We explain how the Hilbert space theory reduces to an approximate finite-dimensional setting which is implemented in a custom build R package. A data example and a simulation study show that the new methodology is superior to existing tools if the functional time series exhibit periodic characteristics.

**PAG2ADMG: A Novel Methodology to Enumerate Causal Graph Structures**

Causal graphs, such as directed acyclic graphs (DAGs) and partial ancestral graphs (PAGs), represent causal relationships among variables in a model. Methods exist for learning DAGs and PAGs from data and for converting DAGs to PAGs. However, these methods only output a single causal graph consistent with the independencies/dependencies (the Markov equivalence class ) estimated from the data. However, many distinct graphs may be consistent with , and a data modeler may wish to select among these using domain knowledge. In this paper, we present a method that makes this possible. We introduce PAG2ADMG, the first method for enumerating all causal graphs consistent with , under certain assumptions. PAG2ADMG converts a given PAG into a set of acyclic directed mixed graphs (ADMGs). We prove the correctness of the approach and demonstrate its efficiency relative to brute-force enumeration.

**Towards Robust Deep Neural Networks with BANG**

Machine learning models, including state-of-the-art deep neural networks, are vulnerable to small perturbations that cause unexpected classification errors. This unexpected lack of robustness raises fundamental questions about their generalization properties and poses a serious concern for practical deployments. As such perturbations can remain imperceptible – commonly called adversarial examples that demonstrate an inherent inconsistency between vulnerable machine learning models and human perception – some prior work casts this problem as a security issue as well. Despite the significance of the discovered instabilities and ensuing research, their cause is not well understood, and no effective method has been developed to address the problem highlighted by adversarial examples. In this paper, we present a novel theory to explain why this unpleasant phenomenon exists in deep neural networks. Based on that theory, we introduce a simple, efficient and effective training approach, Batch Adjusted Network Gradients (BANG), which significantly improves the robustness of machine learning models. While the BANG technique does not rely on any form of data augmentation or the application of adversarial images for training, the resultant classifiers are more resistant to adversarial perturbations while maintaining or even enhancing classification performance overall.

**A New Method for Classification of Datasets for Data Mining**

Decision tree is an important method for both induction research and data mining, which is mainly used for model classification and prediction. ID3 algorithm is the most widely used algorithm in the decision tree so far. In this paper, the shortcoming of ID3’s inclining to choose attributes with many values is discussed, and then a new decision tree algorithm which is improved version of ID3. In our proposed algorithm attributes are divided into groups and then we apply the selection measure 5 for these groups. If information gain is not good then again divide attributes values into groups. These steps are done until we get good classification/misclassification ratio. The proposed algorithms classify the data sets more accurately and efficiently.

**Learning in an Uncertain World: Representing Ambiguity Through Multiple Hypotheses**

Many prediction tasks contain uncertainty. In the case of next-frame or future prediction the uncertainty is inherent in the task itself, as it is impossible to foretell what exactly is going to happen in the future. Another source of uncertainty or ambiguity is the way data is labeled. Sometimes not all objects of interest are annotated in a given image or the annotation is ambiguous, e.g. in the form of occluded joints in human pose estimation. We present a method that is able to handle these problems by predicting not a single output but multiple hypotheses. More precisely, we propose a framework for re-formulating existing single prediction models as multiple hypothesis prediction (MHP) problems as well as a meta loss and an optimization procedure to train the resulting MHP model. We consider three entirely different applications, i.e. future prediction, image classification and human pose estimation, and demonstrate how existing single hypothesis predictors (SHPs) can be turned into MHPs. The performed experiments show that the resulting MHP outperforms the existing SHP and yields additional insights regarding the variation and ambiguity of the predictions.

**Training Bit Fully Convolutional Network for Fast Semantic Segmentation**

Fully convolutional neural networks give accurate, per-pixel prediction for input images and have applications like semantic segmentation. However, a typical FCN usually requires lots of floating point computation and large run-time memory, which effectively limits its usability. We propose a method to train Bit Fully Convolution Network (BFCN), a fully convolutional neural network that has low bit-width weights and activations. Because most of its computation-intensive convolutions are accomplished between low bit-width numbers, a BFCN can be accelerated by an efficient bit-convolution implementation. On CPU, the dot product operation between two bit vectors can be reduced to bitwise operations and popcounts, which can offer much higher throughput than 32-bit multiplications and additions. To validate the effectiveness of BFCN, we conduct experiments on the PASCAL VOC 2012 semantic segmentation task and Cityscapes. Our BFCN with 1-bit weights and 2-bit activations, which runs 7.8x faster on CPU or requires less than 1\% resources on FPGA, can achieve comparable performance as the 32-bit counterpart.

**Bootstrapping incremental dialogue systems: using linguistic knowledge to learn from minimal data**

We present a method for inducing new dialogue systems from very small amounts of unannotated dialogue data, showing how word-level exploration using Reinforcement Learning (RL), combined with an incremental and semantic grammar – Dynamic Syntax (DS) – allows systems to discover, generate, and understand many new dialogue variants. The method avoids the use of expensive and time-consuming dialogue act annotations, and supports more natural (incremental) dialogues than turn-based systems. Here, language generation and dialogue management are treated as a joint decision/optimisation problem, and the MDP model for RL is constructed automatically. With an implemented system, we show that this method enables a wide range of dialogue variations to be automatically captured, even when the system is trained from only a single dialogue. The variants include question-answer pairs, over- and under-answering, self- and other-corrections, clarification interaction, split-utterances, and ellipsis. This generalisation property results from the structural knowledge and constraints present within the DS grammar, and highlights some limitations of recent systems built using machine learning techniques only.

**Spatial Decompositions for Large Scale SVMs**

Although support vector machines (SVMs) are theoretically well understood, their underlying optimization problem becomes very expensive if, for example, hundreds of thousands of samples and a non-linear kernel are considered. Several approaches have been proposed in the past to address this serious limitation. In this work we investigate a decomposition strategy that learns on small, spatially defined data chunks. Our contributions are two fold: On the theoretical side we establish an oracle inequality for the overall learning method using the hinge loss, and show that the resulting rates match those known for SVMs solving the complete optimization problem with Gaussian kernels. On the practical side we compare our approach to learning SVMs on small, randomly chosen chunks. Here it turns out that for comparable training times our approach is significantly faster during testing and also reduces the test error in most cases significantly. Furthermore, we show that our approach easily scales up to 10 million training samples: including hyper-parameter selection using cross validation, the entire training only takes a few hours on a single machine. Finally, we report an experiment on 32 million training samples.

**Definition Modeling: Learning to define word embeddings in natural language**

Distributed representations of words have been shown to capture lexical semantics, as demonstrated by their effectiveness in word similarity and analogical relation tasks. But, these tasks only evaluate lexical semantics indirectly. In this paper, we study whether it is possible to utilize distributed representations to generate dictionary definitions of words, as a more direct and transparent representation of the embeddings’ semantics. We introduce definition modeling, the task of generating a definition for a given word and its embedding. We present several definition model architectures based on recurrent neural networks, and experiment with the models over multiple data sets. Our results show that a model that controls dependencies between the word being defined and the definition words performs significantly better, and that a character-level convolution layer designed to leverage morphology can complement word-level embeddings. Finally, an error analysis suggests that the errors made by a definition model may provide insight into the shortcomings of word embeddings.

• On the number of maximum independent sets in Doob graphs

• Efficient quantum tomography II

• Synchronization over Cartan motion groups via contraction

• Empirical Bayes Methods, Reference Priors, Cross Entropy and the EM Algorithm

• Efficient Estimation in Single Index Models through Smoothing splines

• Sparse generalised polynomials

• Two Methods For Wild Variational Inference

• Model based approach for household clustering with mixed scale variables

• Texture Enhancement via High-Resolution Style Transfer for Single-Image Super-Resolution

• Semi-supervised Kernel Metric Learning Using Relative Comparisons

• Beyond standard benchmarks: Parameterizing performance evaluation in visual object tracking

• Free-Endpoint Optimal Control of Inhomogeneous Bilinear Ensemble Systems

• Computer Assisted Composition with Recurrent Neural Networks

• Optimizing Quantiles in Preference-based Markov Decision Processes

• A representation-theoretic interpretation of positroid classes

• Noise-Tolerant Life-Long Matrix Completion via Adaptive Sampling

• Shape Completion using 3D-Encoder-Predictor CNNs and Shape Synthesis

• Flow polytopes with Catalan volumes

• Robust Optimization for Tree-Structured Stochastic Network Design

• Menu-Based Pricing for Charging of Electric Vehicles with Vehicle-to-Grid Service

• When to Reset Your Keys: Optimal Timing of Security Updates via Learning

• Bayesian Non-parametric Simultaneous Quantile Regression for Complete and Grid Data

• Two-weight and three-weight codes from trace codes over $\mathbb{F}_p+u\mathbb{F}_p+v\mathbb{F}_p+uv\mathbb{F}_p$

• Video Scene Parsing with Predictive Feature Learning

• Optimal three-weight cubic codes

• A Novel Artificial Fish Swarm Algorithm for Pattern Recognition with Convex Optimization

• Three-weight codes and the quintic construction

• A Simple Generalization of a Result for Random Matrices with Independent Sub-Gaussian Rows

• Trace Codes with Few Weights over $\mathbb{F}_p+u\mathbb{F}_p$

• Predicting Long-term Outcomes of Educational Interventions Using the Evolutionary Causal Matrices and Markov Chain Based on Educational Neuroscience

• Secure Polar Coding for the Two-Way Wiretap Channel

• Blind Estimation of Sparse Multi-User Massive MIMO Channels

• CDVAE: Co-embedding Deep Variational Auto Encoder for Conditional Variational Generation

• Estimation and Model Identification of Locally Stationary Varying-Coefficient Additive Models

• RMPE: Regional Multi-person Pose Estimation

• Strong Second-Order Karush–Kuhn–Tucker Optimality Conditions for Vector Optimization

• Bounding the Dimension of Points on a Line

• BASS Net: Band-Adaptive Spectral-Spatial Feature Learning Neural Network for Hyperspectral Image Classification

• Domain Adaptation for Named Entity Recognition in Online Media with Word Embeddings

• Decentralized Consensus Optimization with Asynchrony and Delays

• $1/f^α$ power spectrum in the Kardar-Parisi-Zhang universality class

• Representable Chow classes of a product of projective spaces

• Adversarial Images for Variational Autoencoders

• Global and fixed-terminal cuts in digraphs

• Maximum likelihood drift estimation for Gaussian process with stationary increments

• On the critical branching random walk II: Branching capacity and branching recurrence

• Minimal clusters of four planar regions with the same area

• Online Offering Strategies for Storage-Assisted Renewable Power Producer in Hour-Ahead Market

• Monge’s Optimal Transport Distance with Applications for Nearest Neighbour Image Classification

• An impossibility theorem for paired comparisons

• Efficient Orthogonal Parametrisation of Recurrent Neural Networks Using Householder Reflections

• Equilibrium Computation in Atomic Splittable Singleton Congestion Games

• Flight Dynamics-based Recovery of a UAV Trajectory using Ground Cameras

• Learning Potential Energy Landscapes using Graph Kernels

• Estimating a monotone probability mass function with known flat regions

• Towards a multigrid method for the minimum-cost flow problem

• Analysis of the Human-Computer Interaction on the Example of Image-based CAPTCHA by Association Rule Mining

• Local conditions for exponentially many subdivisions

• A distributed voltage stability margin for power distribution networks

• Mismatched Multi-letter Successive Decoding for the Multiple-Access Channel

• Learning to Generate Images of Outdoor Scenes from Attributes and Semantic Layouts

• Fully Convolutional Crowd Counting On Highly Congested Scenes

• The Coconut Model with Heterogeneous Strategies and Learning

• Interaction Networks for Learning about Objects, Relations and Physics

• Robust Coordinated Transmission and Generation Expansion Planning Considering Ramping Requirements and Construction Periods

• On Coreferring Text-extracted Event Descriptions with the aid of Ontological Reasoning

• Video Captioning with Multi-Faceted Attention

• Persistent random walks. II. Functional Scaling Limits

• An Evaluation of Models for Runtime Approximation in Link Discovery

• Multilingual Multiword Expressions

• Convex hulls of random walks: Expected number of faces and face probabilities

• Remarks on Lagrange Multiplier Rules in Set Valued Optimization

• The Hat Game and covering codes

• Sparsity Preserving Algorithms for Octagons

• A theory of pictures for quasi-posets

• Consensus Control for Linear Systems with Optimal Energy Cost

• Anisotropic (2+1)d growth and Gaussian limits of q-Whittaker processes

• Optimal discrimination designs for semi-parametric models

• Recovering the uniform boundary observability with spectral Legendre-Galerkin formulations of the 1-D wave equation

• A Theoretical Framework for Robustness of (Deep) Classifiers Under Adversarial Noise

• Hippocampus Temporal Lobe Epilepsy Detection using a Combination of Shape-based Features and Spherical Harmonics Representation

• A Compositional Object-Based Approach to Learning Physical Dynamics

• Global Minimum for a Finsler Elastica Minimal Path Approach

• A Diffeomorphic Approach to Multimodal Registration with Mutual Information: Applications to CLARITY Mouse Brain Images

• Large-scale Validation of Counterfactual Learning Methods: A Test-Bed

• Optimization of image description metrics using policy gradient methods

• Multi-modal Variational Encoder-Decoders

• Playing Doom with SLAM-Augmented Deep Reinforcement Learning

• Using Random Boundary Conditions to simulate disordered quantum spin models in 2D-systems

• Tuning the Scheduling of Distributed Stochastic Gradient Descent with Bayesian Optimization

• Temporal Attention-Gated Model for Robust Sequence Classification

• Diet2Vec: Multi-scale analysis of massive dietary data

• Anomaly Detection in Video Using Predictive Convolutional Long Short-Term Memory Networks

• Hypervolume-based Multi-objective Bayesian Optimization with Student-t Processes

• Learning Shape Abstractions by Assembling Volumetric Primitives

• Connectivity properties of Branching Interlacements

• Computerized Multiparametric MR image Analysis for Prostate Cancer Aggressiveness-Assessment

• Deep Variational Information Bottleneck

• Distributed Nash Equilibrium Seeking via the Alternating Direction Method of Multipliers

• Bulk Universality for Generalized Wigner Matrices With Few Moments

• TorontoCity: Seeing the World with a Million Eyes

• Double robust matching estimators for high dimensional confounding adjustment

• On some Euler-Mahonian distributions

• Generalizing Skills with Semi-Supervised Reinforcement Learning

• Stationary random walks on the lattice

• Blocking duality for $p$-modulus on networks

• Efficient Pose and Cell Segmentation using Column Generation