“You shouldn’t be collecting Big Data under the premise that more data is better, cooler, sexier, etc.” Pradyumna S. Upadrashta ( February 13, 2015 )
C’ and ‘Java’ Source Code Generator for Fitted Glm Objects (glm.deploy)
Provides two functions that generate source code implementing the predict function of fitted glm objects. In this version, code can be generated for either ‘C’ or ‘Java’. The idea is to provide a tool for the easy and fast deployment of glm predictive models into production. The source code generated by this package implements two function/methods. One of such functions implements the equivalent to predict(type=’response’), while the second implements predict(type=’link’). Source code is written to disk as a .c or .java file in the specified path. In the case of c, an .h file is also generated.
Bayesian Synthetic Likelihood with Graphical Lasso (BSL)
Bayesian synthetic likelihood (BSL, Price et al. (2018) <doi:10.1080/10618600.2017.1302882>) is an alternative to standard, non-parametric approximate Bayesian computation (ABC). BSL assumes a multivariate normal distribution for the summary statistic likelihood and it is suitable when the distribution of the model summary statistics is sufficiently regular. This package provides a Metropolis Hastings Markov chain Monte Carlo implementation of BSL and BSL with graphical lasso (BSLasso, An et al. (2018) <https://…/> ), which is computationally more efficient when the dimension of the summary statistic is high. Extensions to this package are planned.
ANSI Control Sequence Aware String Functions (fansi)
Counterparts to R string manipulation functions that account for the effects of ANSI text formatting control sequences.
Photometry Tools (ProFound)
Core package containing all the tools for simple and advanced source extraction. This is used to create inputs for ‘ProFit’, or for source detection, extraction and photometry in its own right.
Report Functions to Create HTML and PDF Files (R3port)
Create and combine HTML and PDF reports from within R. Possibility to design tables and listings for reporting and also include R plots.
2. Think Like a Freak — by Dubner & Levitt
3. Innumeracy — by John Allen Paulos
4. Naked Statistics — by Charles Wheelan
5. Practical Statistics for Data Scientists — by Andrew & Peter Bruce
6. Think Stats — by Allen B. Downey
Principal Components Analysis (PCA) is one of the most widely used dimension reduction techniques. Robust PCA (RPCA) refers to the problem of PCA when the data may be corrupted by outliers. Recent work by Candes, Wright, Li, and Ma defined RPCA as a problem of decomposing a given data matrix into the sum of a low-rank matrix (true data) and a sparse matrix (outliers). The column space of the low-rank matrix then gives the PCA solution. This simple definition has lead to a large amount of interesting new work on provably correct, fast, and practically useful solutions to the RPCA problem. More recently, the dynamic (time-varying) version of the RPCA problem has been studied and a series of provably correct, fast, and memory efficient tracking solutions have been proposed. Dynamic RPCA (or robust subspace tracking) is the problem of tracking data lying in a (slowly) changing subspace while being robust to sparse outliers. This article provides an exhaustive review of the last decade of literature on RPCA and its dynamic counterpart (robust subspace tracking), along with describing their theoretical guarantees, discussing the pros and cons of various approaches, and providing empirical comparisons of performance and speed. Static and Dynamic Robust PCA via Low-Rank + Sparse Matrix Decomposition: A Review
Algorithmic Social Intervention
Social and behavioral interventions are a critical tool for governments and communities to tackle deep-rooted societal challenges such as homelessness, disease, and poverty. However, real-world interventions are almost always plagued by limited resources and limited data, which creates a computational challenge: how can we use algorithmic techniques to enhance the targeting and delivery of social and behavioral interventions? The goal of my thesis is to provide a unified study of such questions, collectively considered under the name ‘algorithmic social intervention’. This proposal introduces algorithmic social intervention as a distinct area with characteristic technical challenges, presents my published research in the context of these challenges, and outlines open problems for future work. A common technical theme is decision making under uncertainty: how can we find actions which will impact a social system in desirable ways under limitations of knowledge and resources? The primary application area for my work thus far is public health, e.g. HIV or tuberculosis prevention. For instance, I have developed a series of algorithms which optimize social network interventions for HIV prevention. Two of these algorithms have been pilot-tested in collaboration with LA-area service providers for homeless youth, with preliminary results showing substantial improvement over status-quo approaches. My work also spans other topics in infectious disease prevention and underlying algorithmic questions in robust and risk-aware submodular optimization. …
Competitive Intelligence (CI)
Competitive intelligence is the action of defining, gathering, analyzing, and distributing intelligence about products, customers, competitors, and any aspect of the environment needed to support executives and managers making strategic decisions for an organization. Competitive intelligence essentially means understanding and learning what’s happening in the world outside your business so one can be as competitive as possible. It means learning as much as possible-as soon as possible-about one’s industry in general, one’s competitors, or even one’s county’s particular zoning rules. In short, it empowers you to anticipate and face challenges head on. A more focused definition of CI regards it as the organizational function responsible for the early identification of risks and opportunities in the market before they become obvious. Experts also call this process the early signal analysis. This definition focuses attention on the difference between dissemination of widely available factual information (such as market statistics, financial reports, newspaper clippings) performed by functions such as libraries and information centers, and competitive intelligence which is a perspective on developments and events aimed at yielding a competitive edge.
Competitive Intelligence and 6 Tips for Its Effective Use …
In this paper, we present a novel non-parametric clustering technique, which is based on an iterative algorithm that peels off layers of points around the clusters. Our technique is based on the notion that each latent cluster is comprised of layers that surround its core, where the external layers, or border points, implicitly separate the clusters. Analyzing the K-nearest neighbors of the points makes it possible to identify the border points and associate them with points of inner layers. Our clustering algorithm iteratively identifies border points, peels them, and separates the latent clusters. We show that the peeling process adapts to the local density and successfully separates adjacent clusters. A notable quality of the Border-Peeling algorithm is that it does not require any parameter tuning in order to outperform state-of-the-art finely-tuned non-parametric clustering methods, including Mean-Shift and DBSCAN. We further assess our technique on high-dimensional datasets that vary in size and characteristics. In particular, we analyze the space of deep features that were trained by a convolutional neural network. …
“Hadoop has an irreparably fractured ecosystem.” Joey Zwicker ( 12. February 2015 )