Explainable Artificial Intelligence (XAI)

Dramatic success in machine learning has led to a torrent of Artificial Intelligence (AI) applications. Continued advances promise to produce autonomous systems that will perceive, learn, decide, and act on their own. However, the effectiveness of these systems is limited by the machine’s current inability to explain their decisions and actions to human users. The Department of Defense is facing challenges that demand more intelligent, autonomous, and symbiotic systems. Explainable AI—especially explainable machine learning—will be essential if future warfighters are to understand, appropriately trust, and effectively manage an emerging generation of artificially intelligent machine partners.


Solving Multi-Label Classification problems (Case studies included)

For some reason, Regression and Classification problems end up taking most of the attention in machine learning world. People don’t realize the wide variety of machine learning problems which can exist. I, on the other hand, love exploring different variety of problems and sharing my learning with the community here. Previously, I shared my learnings on Genetic algorithms with the community. Continuing on with my search, I intend to cover a topic which has much less widespread but a nagging problem in the data science community – which is multi-label classification. In this article, I will give you an intuitive explanation of what multi-label classification entails, along with illustration of how to solve the problem. I hope it will show you the horizon of what data science encompasses. So lets get on with it!


Boosting the accuracy of your Machine Learning models

Tired of getting low accuracy on your machine learning models? Boosting is here to help. Boosting is a popular machine learning algorithm that increases accuracy of your model, something like when racers use nitrous boost to increase the speed of their car. Boosting uses a base machine learning algorithm to fit the data. This can be any algorithm, but Decision Tree is most widely used. For an answer to why so, just keep reading. Also, the boosting algorithm is easily explained using Decision Trees, and this will be focus of this article. It builds upon approaches other than boosting, that improve accuracy of Decision Trees.


Beyond Datawarehouse – The Data Lake

Over the last few years, organizations have made a strategic decision to turn big data into competitive advantage. Owing to rapid changes in the trends of BI and DW space, Big Data has been driving the organizations to explore the implementation aspects on how to integrate big data into the existing EDW infrastructure. The process of extracting data from multiple sources such as social media, weblogs, sensor data etc. and transforming that data suit the organization’s analytical needs is central to the integration challenge. The said process is what is famously called ETL (Extract Transform & Load). It’s essential to think unconventionally about storing such huge volumes of data and also processing the data economically. Hence there is a compelling reason to integrate different technology components in the right places, besides selecting the right technology components. Thanks to Open Source Foundation, we do have lot of options to embrace Big Data at very economical cost and with very high computing throughput. Data Lake has become synonymous with the Big Data.


R Nonlinear Regression Analysis

Nonlinear Regression and Generalized Linear Models: Regression is nonlinear when at least one of its parameters appears nonlinearly. It commonly sorts and analyzes data of various industries like retail and banking sectors. It also helps to draw conclusions and predict future trends on the basis of user’s activities on the net. The nonlinear regression analysis is the process of building a nonlinear function. On the basis of independent variables, this process predicts the outcome of a dependent variable with the help of model parameters that depend on the degree of relationship among variables. Generalized linear models (GLMs) calculates nonlinear regression when the variance in sample data is not constant or when errors are not normally distributed.


Basics of Linear Regression

Regression analysis is a statistical tool to determine relationships between different types of variables. Variables that remain unaffected by changes made in other variables are known as independent variables, also known as a predictor or explanatory variables while those that are affected are known as dependent variables also known as the response variable. Linear regression is a statistical procedure which is used to predict the value of a response variable, on the basis of one or more predictor variables.


Dynamic Programming in Python: Bayesian Blocks

Of all the programming styles I have learned, dynamic programming is perhaps the most beautiful. It can take problems that, at first glance, look ugly and intractable, and solve the problem with clean, concise code. Where a simplistic algorithm might accomplish something by brute force, dynamic programming steps back, breaks the task into a smaller set of sequential parts, and then proceeds in the most efficient way possible.


Deep Learning is not the AI future

Deep Learning is not the AI future


Unbottling “.msg” Files in R

There was a discussion on Twitter about the need to read in “.msg” files using R. The “MSG” file format is one of the many binary abominations created by Microsoft to lock folks and users into their platform and tools. Thankfully, they (eventually) provided documentation for the MSG file format which helped me throw together a small R package — msgxtractr — that can read in these ‘.msg’ files and produce a list as a result.


How to prepare and apply machine learning to your dataset

If you are a newbie in the world of machine learning, then this tutorial is exactly what you need in order to introduce yourself to this exciting new part of the data science world.
This post includes a full machine learning project that will guide you step by step to create a “template,” which you can use later on other datasets.

In this step-by-step tutorial you will:
1. Use one of the most popular machine learning packages in R.
2. Explore a dataset by using statistical summaries and data visualization.
3. Build 5 machine-learning models, pick the best, and build confidence that the accuracy is reliable.

The process of a machine learning project may not be exactly the same, but there are certain standard and necessary steps:
1. Define Problem.
2. Prepare Data.
3. Evaluate Algorithms.
4. Improve Results.
5. Present Results.


One-way ANOVA in R

Suppose as a business manager you have the responsibility for testing and comparing the lifetimes of four brands (Apollo, Bridgestone, CEAT and Falken) of automobile tyres. The lifetime of these sample observations are measured in mileage run in ’000 miles. For each brand of automobile tyre, sample of 15 observations have been collected. On the basis of these information, you have to take you decision regarding the four brands of automobile tyre. The data is provided in the csv file format (called, tyre.csv).
Advertisements