Machine Learning with R Caret – Part 1

This blog post series is on machine learning with R. We will use the Caret package in R. In this part, we will first perform exploratory Data Analysis (EDA) on a real-world dataset, and then apply non-regularized linear regression to solve a supervised regression problem on the dataset. We will predict power output given a set of environmental readings from various sensors in a natural gas-fired power generation plant.


Clustering with Deep Learning: Taxonomy and New Methods

Clustering is a fundamental machine learning method. The quality of its results is dependent on the data distribution. For this reason, deep neural networks can be used for learning better representations of the data. In this paper, we propose a systematic taxonomy for clustering with deep learning, in addition to a review of methods from the field. Based on our taxonomy, creating new methods is more straightforward. We also propose a new approach which is built on the taxonomy and surpasses some of the limitations of some previous work. Our experimental evaluation on image datasets shows that the method approaches state-of-the-art clustering quality, and performs better in some cases.


Our Final Kaggle Dataset Publishing Awards Winners’ Interviews (November 2017 and December 2017)

As we move into 2018, the monthly Datasets Publishing Awards has concluded. We’re pleased to have recognized many publishers of high-quality, original, and impactful datasets. It was only a little over a year ago that we opened up our public Datasets platform to data enthusiasts all over the world to share their work. We’ve now reached almost 10,000 public datasets, making choosing winners each month a difficult task! These interviews feature the stories and backgrounds of the November and December winners of the prize.


Kogentix Automated Machine Learning Platform

Kogentix Automated Machine Learning Platform is the only solution we have seen that runs natively on Spark and includes all of the elements required to build and run a machine learning application.


Operational Best Practices for Enterprise Data Science

Join Team Anaconda for a live webinar, Jan 30, 2pm CT, as we tackle the four main concerns we hear from our customers and show you best practices for managing enterprise data science: scalability, security, integration, and governance.


Managing Machine Learning Workflows with Scikit-learn Pipelines Part 3: Multiple Models, Pipelines, and Grid Searches

In this post, we will be using grid search to optimize models built from a number of different types estimators, which we will then compare and properly evaluate the best hyperparameters that each model has to offer.


A quick intro to experience mapping

In this video segment, James Kalbach introduces the concept of experience mapping. The experience someone has when interacting with a service, company, or brand is abstract and invisible. Modeling their actions, thoughts, and feelings provides new insight into improvements and opportunities.


Anomaly detection with Apache MXNet

In recent years, the term “anomaly detection” (also referred to as “outlier detection”) has started popping up more and more on the internet and in conference presentations. This is not a new topic by any means, though. Niche fields have been using it for a long time. Nowadays, though, due to advances in banking, auditing, the Internet of Things (IoT), etc., anomaly detection has become a fairly common task in a broad spectrum of domains. As with other tasks that have widespread applications, anomaly detection can be tackled using multiple techniques and tools. This, of course, can cause a lot of confusion concerning what it’s for and how it works. This article takes a look at how different types of neural networks can be applied to detect anomalies in time series data using Apache MXNet, a fast and scalable training and inference framework with an easy-to-use, concise API for machine learning, in Python using Jupyter Notebooks.


Probabilistic interpretation of AUC

Unfortunately this was not taught in any of my statistics or data analysis classes at university (wtf it so needs to be :scream_cat:). So it took me some until I learned that the AUC has a nice probabilistic meaning.


Which Implied Volatility Ratio Is Best?

This post will be about comparing a volatility signal using three different variations of implied volatility indices to predict when to enter a short volatility position. In volatility trading, there are three separate implied volatility indices that have a somewhat long history for trading-the VIX (everyone knows this one), the VXV (more recently changed to be called the VIX3M), which is like the VIX, except for a three-month period), and the VXMT, which is the implied six-month volatility period.


Exploratory Data Analysis & Data Preparation with ‘funModeling’

This package contains a set of functions related to exploratory data analysis, data preparation, and model performance. It is used by people coming from business, research, and teaching (professors and students).
Advertisements