Data Cleaning and Wrangling With R

One of the big issues when it comes to working with data in any context is the issue of data cleaning and merging of datasets, since it is often the case that you will find yourself having to collate data across multiple files, and will need to rely on R to carry out functions that you would normally carry out using commands like VLOOKUP in Excel. The 10 tips I give below for data manipulation in R are not exhaustive – there are a myriad of ways in which R can be used for the same. However, the below are particularly useful for Excel users who wish to use similar data sorting methods within R itself.


Analyzing Obesity across USA

The main aim of this project is to study the states which had the most obese population among adults and children as well as teens in USA. Secondly, another objective of this project is to learn how to scrape data in R from an HTML page using rvest package and generate beautiful maps using ggplot and maps package in R. A similar post was published earlier at DataScience+.


Visualizing High Dimensional Data In Augmented Reality

Imagine walking into your office on a Monday morning, just a couple years from now. You pour yourself a cup of coffee, check the news, and then put on a pair of AR glasses. You find yourself surrounded by a sea of gently glowing, colored orbs. The orbs represent all of the data that drives your business. You know this data well. The patterns and colors of these orbs are like a fingerprint. But there’s something atypical about the data floating over the coffee maker. You reach out and select that data. A summary of all the relevant details appears on a nearby computer screen. If something matters to your business, your systems track it. When you want to consume all of that info, you use this immersive visualization; bursting beyond the bounds of a computer screen, information dense, efficient, and aesthetically pleasing.


Text Classifier Algorithms in Machine Learning

One of the main ML problems is text classification, which is used, for example, to detect spam, define the topic of a news article, or right mining of the word in the context. The Statsbot team has already written how to train your own model for detecting spam emails, spam messages, and spam user comments. For this article, we asked a data scientist, Roman Trusov, to go deeper with machine learning text analysis.


How Artificial Intelligence is Outpacing Humans

I’d second that. Artificial Intelligence has been pushing the boundaries of human imagination. The machines today are capable of doing a lot of things that we could not imagine doing, 20 years back. Artificial Intelligence has changed the way we look at learning and inventing. From drug discovery to sports analysis to protecting the oceans, AI has marked its presence everywhere. But, is artificial intelligence outperforming humans? It’s an undoubtful yes. And, in this article, I will try to sum up where and how.


SQL for Data Analysis – Tutorial for Beginners – ep5

Combining tables is a key component, when you are doing data analysis. And SQL is really good at it! Of course each case is different, but I’ve run into analytical tasks too many times, where joining two (very big) data tables took around 20-30 minutes in Python and bash – and ~10-20 seconds in SQL. I’m not saying, I couldn’t have done the task in Python or bash at all… But for sure SQL JOIN was the easiest solution! So let’s learn how to use SQL JOIN to top up your analytics projects!


Revisiting the Unreasonable Effectiveness of Data

There has been remarkable success in the field of computer vision over the past decade, much of which can be directly attributed to the application of deep learning models to this machine perception task. Furthermore, since 2012 there have been significant advances in representation capabilities of these systems due to (a) deeper models with high complexity, (b) increased computational power and (c) availability of large-scale labeled data. And while every year we get further increases in computational power and the model complexity (from 7-layer AlexNet to 101-layer ResNet), available datasets have not scaled accordingly. A 101-layer ResNet with significantly more capacity than AlexNet is still trained with the same 1M images from ImageNet circa 2011. As researchers, we have always wondered: if we scale up the amount of training data 10x, will the accuracy double? How about 100x or maybe even 300x? Will the accuracy plateau or will we continue to see increasing gains with more and more data?


The Guerrilla Guide to Machine Learning with Julia

This post is a lean look at learning machine learning with Julia. It is a complete, if very short, course for the quick study hacker with no time (or patience) to spare.


Dealing with S3 methods in R with a simple example

R has three object systems: S3, S4 and RC. S3 is by far the easiest to work with and it can make you codes much understandable and organized, especially if you are working on a package. The idea is very simple. First we must define a class to some object in R and then we define methods (functions) for this class based on generic functions that you may create or use the ones available.
Advertisements