Machine Learning Canvas

The Machine Learning Canvas is a template for developing new or documenting existing intelligent systems based on Machine Learning. It is a visual chart with elements describing the key aspects of such systems: value proposition, data to learn from (to create predictive models), utilization of predictions (to create proposed value), requirements, and measures of performance. It helps teams of data scientists, software engineers, product and business managers, align their activities.

From Strategy to Implementation – Planning an AI-First Company

Our recent series of articles on AI strategies shows the options available for the strategic direction of your AI-first company. Here are some thoughts on moving from strategy to implementation, including some useful tools to help in planning.

MnasNet: Towards Automating the Design of Mobile Machine Learning Models

Convolutional neural networks (CNNs) have been widely used in image classification, face recognition, object detection and many other domains. Unfortunately, designing CNNs for mobile devices is challenging because mobile models need to be small and fast, yet still accurate. Although significant effort has been made to design and improve mobile models, such as MobileNet and MobileNetV2, manually creating efficient models remains challenging when there are so many possibilities to consider. Inspired by recent progress in AutoML neural architecture search, we wondered if the design of mobile CNN models could also benefit from an AutoML approach. In ‘MnasNet: Platform-Aware Neural Architecture Search for Mobile’, we explore an automated neural architecture search approach for designing mobile models using reinforcement learning. To deal with mobile speed constraints, we explicitly incorporate the speed information into the main reward function of the search algorithm, so that the search can identify a model that achieves a good trade-off between accuracy and speed. In doing so, MnasNet is able to find models that run 1.5x faster than state-of-the-art hand-crafted MobileNetV2 and 2.4x faster than NASNet, while reaching the same ImageNet top 1 accuracy.

Python IF, ELIF, and ELSE Statements

1 1 Sejal Jaiswal August 8th, 2018 python +1 Python IF, ELIF, and ELSE Statements In this tutorial, you will learn exclusively about Python if else statements. While writing code in any language, you will have to control the flow of your program. This is generally the case when there is decision making involved – you will want to execute a certain line of codes if a condition is satisfied, and a different set of code incase it is not. In Python, you have the if, elif and the else statements for this purpose. In this tutorial, you will work with an example to learn about the simple if statement and gradually move on to if-else and then the if-elif-else statements. You will also learn about nesting and see an nested if example. Lets get started….

Extending the OpenImageR package with Gabor feature extraction

This blog post illustrates the new functionality of the OpenImageR package (Gabor Feature Extraction). The Gabor features have been used extensively in image analysis and processing (Character and Face recognition). Gabor (Nobel prize winner, an electrical engineer, and physicist) used the following wording which I think it´s worth mentioning in this blog post, ‘You can´t predict the future, but you can invent it.’ (source).

New from RStudio: Package Manager

One of the few remaining hurdles when working with R in the enterprise is consistent access to CRAN. Often desktop class systems will have unrestricted access while server systems might not have any access at all. This inconsistency often stems from security concerns about allowing servers access to the internet. There have been many different approaches to solving this problem, with some organisations reluctantly allowing outbound access to CRAN, some rolling their own internal CRAN-like repositories, and others installing a fixed set of packages and leaving it at that.

RStudio Package Manager

RStudio Package Manager is a new repository management server to organize and centralize R packages across your team, department, or entire organization. Get offline access to CRAN, automate CRAN syncs, share local packages, restrict package access, find packages across repositories, and more. Experience reliable and consistent package management, optimized for teams who use R.

Use R to write multiple tables to a single Excel file

The possibility of saving several tables in a single file is a nice feature of Excel. When sharing results with colleagues, it might be useful to compact everything in a single file. As a bioinformatician, I am too lazy to do that manually, and I searched the web for tools that allow doing that. I found out that there are at least two R packages that work well: xlsx and openxlsx.

ShinyProxy 2.0.1 is out!

Although Shiny apps are very popular for interactive data analysis purposes, many organizations communicated a need to more closely integrate these apps within larger applications and portals. In previous releases we broke down the walls to make this happen: hiding the navbar, single-sign on, theming the landing page and advanced networking support were only a few steps in that direction. With ShinyProxy 2.0.1 we finished the job by implementing a REST API that allows to manage (launch, shut down) Shiny apps and consume the content programmatically in unprecedented ways. Data scientists can now keep ownership of their Shiny app knowing that it will seamlessly integrate with any other technology that is used by their organization´s websites, applications or portals.

What they forgot to teach you about R

In this workshop you´ll learn holistic workflows that address the most common sources of friction in data analysis. We´ll work on project-oriented workflows, version control for data science (Git/GitHub!), and how to plan for collaboration, communication, and iteration (incl. RMarkdown). In terms of your R skills, expect to come away with new knowledge of your R installation, how to maintain it, robust strategies for working with the file system, and ways to use the purrr package for repetitive tasks.

mmpf: Monte-Carlo Methods for Prediction Functions

Machine learning methods can often learn high-dimensional functions which generalize well but are not human interpretable. The mmpf package marginalizes prediction functions using Monte-Carlo methods, allowing users to investigate the behavior of these learned functions, as on a lower dimensional subset of input features: partial dependence and variations thereof. This makes machine learning methods more useful in situations where accurate prediction is not the only goal, such as in the social sciences where linear models are commonly used because of their interpretability. Many methods for estimating prediction functions produce estimated functions which are not directly human-interpretable because of their complexity: for example, they may include high dimensional interactions and/or complex nonlinearities. While a learning method´s capacity to automatically learn interactions and nonlinearities is attractive when the goal is prediction, there are many cases where users want good predictions and the ability to understand how predictions depend on the features. mmpf implements general methods for interpreting prediction functions using Monte-Carlo methods. These methods allow any function which generates predictions to be be interpreted. mmpf is currently used in other packages for machine learning like edarf and mlr (Jones and Linder, 2016; Bischl et al., 2016).

MGLM: An R Package for Multivariate Categorical Data Analysis

Data with multiple responses is ubiquitous in modern applications. However, few tools are available for regression analysis of multivariate counts. The most popular multinomial-logit model has a very restrictive mean-variance structure, limiting its applicability to many data sets. This article introduces an R package MGLM, short for multivariate response generalized linear models, that expands the current tools for regression analysis of polytomous data. Distribution fitting, random number generation, regression, and sparse regression are treated in a unifying framework. The algorithm, usage, and implementation details are discussed.

ArCo: An R package to Estimate Artificial Counterfactuals

In this paper we introduce the ArCo package for R which consists of a set of functions to implement the the Artificial Counterfactual (ArCo) methodology to estimate causal effects of an intervention (treatment) on aggregated data and when a control group is not necessarily available. The ArCo method is a two-step procedure, where in the first stage a counterfactual is estimated from a large panel of time series from a pool of untreated peers. In the second-stage, the average treatment effect over the post-intervention sample is computed. Standard inferential procedures are available. The package is illustrated with both simulated and real datasets.

Bayesian Testing, Variable Selection and Model Averaging in Linear Models using R with BayesVarSel

In this paper, objective Bayesian methods for hypothesis testing and variable selection in linear models are considered. The focus is on BayesVarSel, an R package that computes posterior probabilities of hypotheses/models and provides a suite of tools to properly summarize the results. We introduce the usage of specific functions to compute several types of model averaging estimations and predictions weighted by posterior probabilities. BayesVarSel contains exact algorithms to perform fast computations in problems of small to moderate size and heuristic sampling methods to solve large problems. We illustrate the functionalities of the package with several data examples.

R Package imputeTestbench to Compare Imputation Methods for Univariate Time Series

Missing observations are common in time series data and several methods are available to impute these values prior to analysis. Variation in statistical characteristics of univariate time series can have a profound effect on characteristics of missing observations and, therefore, the accuracy of different imputation methods. The imputeTestbench package can be used to compare the prediction accuracy of different methods as related to the amount and type of missing data for a user-supplied dataset. Missing data are simulated by removing observations completely at random or in blocks of different sizes depending on characteristics of the data. Several imputation algorithms are included with the package that vary from simple replacement with means to more complex interpolation methods. The testbench is not limited to the default functions and users can add or remove methods as needed. Plotting functions also allow comparative visualization of the behavior and effectiveness of different algorithms. We present example applications that demonstrate how the package can be used to understand differences in prediction accuracy between methods as affected by characteristics of a dataset and the nature of missing data.

ICSOutlier: Unsupervised Outlier Detection for Low-Dimensional Contamination Structure

Detecting outliers in a multivariate and unsupervised context is an important and ongoing problem notably for quality control. Many statistical methods are already implemented in R and are briefly surveyed in the present paper. But only a few lead to the accurate identification of potential outliers in the case of a small level of contamination. In this particular context, the Invariant Coordinate Selection (ICS) method shows remarkable properties for identifying outliers that lie on a low-dimensional subspace in its first invariant components. It is implemented in the ICSOutlier package. The main function of the package, ics.outlier, offers the possibility of labelling potential outliers in a completely automated way. Four examples, including two real examples in quality control, illustrate the use of the function. Comparing with several other approaches, it appears that ICS is generally as efficient as its competitors and shows an advantage in the context of a small proportion of outliers lying in a low-dimensional subspace. In quality control, the method may help in properly identifying some defective products while not detecting too many false positives.

GitHub Python Data Science Spotlight: AutoML, NLP, Visualization, ML Workflows

This post includes a wide spectrum of data science projects, all of which are open source and are present on GitHub repositories.
1. Auto-Keras – This is an automated machine learning (AutoML) package
2. Finetune – Scikit-learn style model finetuning for NLP
3. GluonNLP – NLP made easy
4. animatplot – A python package for animating plots build on matplotlib
5. MLflow – Open source platform for the machine learning lifecycle