The EU Cannot Shape the Future of AI with Regulation

To cut smoking, the US government taxes tobacco, yet it subsidises tobacco farming. The EU’s approach to AI displays a similar contradiction: it funds AI research while subjecting it to the world’s strictest regulations. Lisbeth Kirk The European Commission recently announced plans to increase that funding, to make more data available for use in AI, and to work with EU member states on a strategy for deploying AI in the European economy. But at the same time, the EU’s new General Data Protection Regulation (GDPR) puts tight restrictions on uses of AI that involve personal data, and EU policymakers continue to search for additional restrictions on AI to address their remaining fears. Unlike tobacco, AI has many beneficial uses, and the potential risks depend on how it is developed and used over the long-term. The irony is that if Europe over-regulates AI now, it will miss its chance for global influence over the technology’s future.


Quantile Regression (home made)

After my series of post on classification algorithms, it´s time to get back to R codes, this time for quantile regression. Yes, I still want to get a better understanding of optimization routines, in R. Before looking at the quantile regression, let us compute the median, or the quantile, from a sample.


IoT on AWS: Machine Learning Models and Dashboards from Sensor Data

I developed my first IoT project using my notebook as an IoT device and AWS IoT as infrastructure, with this ‘simple’ idea: collect CPU Temperature from my Notebook running on Ubuntu, send to Amazon AWS IoT, save data, make it available for Machine Learning models and dashboards.


Statistics, Causality, and What Claims are Difficult to Swallow: Judea Pearl debates Kevin Gray

Recently, renowned computer scientist and artificial intelligence researcher Judea Pearl released his latest book, titled ‘The Book of Why: The New Science of Cause and Effect,’ which he co-authored with Dana Mackenzie. While the book has quickly become a best seller, it has also struck a chord with some segment of readers. After its release, marketing scientist and analytics consultant (and regular KDnuggets contributor) Kevin Gray penned what could be considered both a review and a retort to the book, which was run on KDnuggets. Pearl then responded to Gray, who then responded in turn, after which Pearl responded… you get the picture.


Execute Anomaly Detection at Scale

Anomaly Detection: What, Why, & How. Anomaly detection can be useful in a number of fields and industries where rare events are very important or impactful, but they are hard to find within data.
Because of its wide array of applications, mastering anomaly detection is incredibly valuable. In this guidebook you will find:
• A breakdown of the types of anomalies and anomaly detection use cases.
• A step-by-step guide to running an anomaly detection project, both from a business and a technical perspective.
• A walkthrough of an example fraud detection case, including code samples.


Data Lake – the evolution of data processing

In recent years, rapid technology advancements have led to a dramatic increase in information traffic. Our mobile networks have increased coverage and data throughput. Landlines are being slowly upgraded from copper to fiber optics. Thanks to this, more and more people are constantly online through various devices using many different services. Numerous cheap information sensing IoT devices increasingly gather data sets – aerial information, images, sound, RFID information, weather data, etc. All this progress results in more data being shared online. Data sets have rapidly risen in both volume and complexity, and traditional data processing applications started being inadequate to deal with. This vast volume of data has introduced new challenges in data capturing, storage, analysis, search, sharing, transfer, visualization, querying, updating, and information privacy. Inevitably, these challenges required completely new architecture design and new technologies, which help us to store, analyze, and gain insights from these large and complex data sets. Here I will present the Data Lake architecture, which introduces an interesting twist on storing and processing data. Data Lake is not a revolution in the big data world, a one-size-fits-all solution, but a simple evolutionary step in data processing, which naturally came to be.


Taming LSTMs: Variable-sized mini-batches and why PyTorch is good for your health

If you´ve used PyTorch you have likely experienced euphoria, increased energy and may have even felt like walking in the sun for a bit. Your life feels complete again. That is, until you tried to have variable-sized mini-batches using RNNs. All hope is not lost. After reading this, you´ll be back to fantasies of you + PyTorch eloping into the sunset while your Recurrent Networks achieve new accuracies you´ve only read about on Arxiv.


Working with Your Facebook Data in R

I recently learned that you can download all of your Facebook data, so I decided to check it out and bring it into R. To access your data, go to Facebook, and click on the white down arrow in the upper-right corner. From there, select Settings, then, from the column on the left, ‘Your Facebook Information.’ When you get the Facebook Information screen, select ‘View’ next to ‘Download Your Information.’ On this screen, you’ll be able to select the kind of data you want, a date range, and format. I only wanted my posts, so under ‘Your Information,’ I deselected everything but the first item on the list, ‘Posts.’ (Note that this will still download all photos and videos you posted, so it will be a large file.) To make it easy to bring into R, I selected JSON under Format (the other option is HTML).


Detecting unconscious bias in models, with R

There’s growing awareness that the data we collect, and in particular the variables we include as factors in our predictive models, can lead to unwanted bias in outcomes: from loan applications, to law enforcement, and in many other areas. In some instances, such bias is even directly regulated by laws like the Fair Housing Act in the US. But even if we explicitly remove ‘obvious’ variables like sex, age or ethnicity from predictive models, unconscious bias might still be a factor in our predictions as a result of highly-correlated proxy variables that are included in our model. As a result, we need to be aware of the biases in our model and take steps to address them. For an excellent general overview of the topic, I highly recommend watching the recent presentation by Rachel Thomas, ‘Analyzing and Preventing Bias in ML’. And for a practical demonstration of one way you can go about detecting proxy bias in R, take a look at the vignette created by my colleague Paige Bailey for the ROpenSci conference, ‘Ethical Machine Learning: Spotting and Preventing Proxy Bias’.


Mixed Models with Adaptive Gaussian Quadrature

In this post, I would like to introduce my new R package GLMMadaptive for fitting mixed-effects models for non-Gaussian grouped/clustered outcomes using marginal maximum likelihood. Admittedly, there is a number of packages available for fitting similar models, e.g., lme4, glmmsr, glmmTMB, glmmEP, and glmmML among others; more information on other available packages can also be found in GLMM-FAQ. GLMMadaptive differs from these packages in approximating the integrals over the random effects in the definition of the marginal log-likelihood function using an adaptive Gaussian quadrature rule, while allowing for multiple correlated random effects.
Advertisements