Boost your data science skills. Learn linear algebra.

I’d like to introduce a series of blog posts and their corresponding Python Notebooks gathering notes on the Deep Learning Book from Ian Goodfellow, Yoshua Bengio, and Aaron Courville (2016). The aim of these notebooks is to help beginners/advanced beginners to grasp linear algebra concepts underlying deep learning and machine learning. Acquiring these skills can boost your ability to understand and apply various data science algorithms. In my opinion, it is one of the bedrock of machine learning, deep learning and data science. These notes cover the chapter 2 on Linear Algebra. I liked this chapter because it gives a sense of what is most used in the domain of machine learning and deep learning. It is thus a great syllabus for anyone who want to dive in deep learning and acquire the concepts of linear algebra useful to better understand deep learning algorithms.


25 Open Datasets for Deep Learning Every Data Scientist Must Work With

The key to getting better at deep learning (or most fields in life) is practice. Practice on a variety of problems – from image processing to speech recognition. Each of these problem has it’s own unique nuance and approach. But where can you get this data? A lot of research papers you see these days use proprietary datasets that are usually not released to the general public. This becomes a problem, if you want to learn and apply your newly acquired skills. If you have faced this problem, we have a solution for you. We have curated a list of openly available datasets for your perusal. In this article, we have listed a collection of high quality datasets that every deep learning enthusiast should work on to apply and improve their skillset. Working on these datasets will make you a better data scientist and the amount of learning you will have will be invaluable in your career. We have also included papers with state-of-the-art (SOTA) results for you to go through and improve your models.
1. MNIST
2. MS-COCO
3. ImageNet
4. Open Images Dataset
5. VisualQA
6. The Street View House Numbers (SVHN)
7. CIFAR-10
8. Fashion-MNIST
9. IMDB Reviews
10. Twenty Newsgroups
11. Sentiment140
12. WordNet
13. Yelp Reviews
14. The Wikipedia Corpus
15. The Blog Authorship Corpus
16. Machine Translation of European Languages
17. Free Spoken Digit Dataset
18. Free Music Archive (FMA)
19. Ballroom
20. Million Song Dataset
21. LibriSpeech
22. VoxCeleb
23. Twitter Sentiment Analysis
24. Age Detection of Indian Actors
25. Urban Sound Classification


Using Machine Learning to Discover Neural Network Optimizers

Deep learning models have been deployed in numerous Google products, such as Search, Translate and Photos. The choice of optimization method plays a major role when training deep learning models. For example, stochastic gradient descent works well in many situations, but more advanced optimizers can be faster, especially for training very deep networks. Coming up with new optimizers for neural networks, however, is challenging due to to the non-convex nature of the optimization problem. On the Google Brain team, we wanted to see if it could be possible to automate the discovery of new optimizers, in a way that is similar to how AutoML has been used to discover new competitive neural network architectures.


What machine learning engineers need to know

In this episode of the Data Show, I spoke Jesse Anderson, managing director of the Big Data Institute, and my colleague Paco Nathan, who recently became co-chair of Jupytercon. This conversation grew out of a recent email thread the three of us had on machine learning engineers, a new job role that LinkedIn recently pegged as the fastest growing job in the U.S. In our email discussion, there was some disagreement on whether such a specialized job role/title was needed in the first place. As Eric Colson pointed out in his beautiful keynote at Strata Data San Jose, when done too soon, creating specialized roles can slow down your data team.


Introducing purging: An R package for addressing mediation effects

Mediation can occur when one independent variable swamps the effect of another, suggesting high correlation between the two variables. Though there are some great packages for mediation analysis out there, the simple intuition of its need is often ambiguous, especially for younger graduate students. Thus, in this blog post, it is my goal to introduce an intuitive overview of mediation and offer a simple method for “purging” variables of mediation effects for their simultaneous use in multivariate analysis. The purging process detailed in this blog is available in my recently released R package, purging, which is available on CRAN or at my GitHub.


Facebook’s Cambridge Analytica Trouble Highlights IT Data Privacy Concerns

It’s been a rough couple of weeks for Facebook. The company has lost $80 billion in market value (depending on what day you check the stock price). Its CEO has been repeatedly called on to testify before Congress regarding the company’s data privacy practices. The attorneys general for 37 US states have asked the company for more details on how it monitors the way third-party developers handle customer data. The FTC has opened an investigation of the company. Some observers are even asking if this is the beginning of the end for Facebook.


Processing Huge Dataset with Python

This tutorial introduces the processing of a huge dataset in python. It allows you to work with a big quantity of data with your own laptop. With this method, you could use the aggregation functions on a dataset that you cannot import in a DataFrame.


Tidy Sentiment Analysis in R

Learn how to perform tidy sentiment analysis in R on Prince’s songs, sentiment over time, song level sentiment, the impact of bigrams, and much more!


Using Tensorflow Object Detection to do Pixel Wise Classification

In the past I have used Tensorflow Object Detection API to implement object detection with the output being bounding boxes around different objects of interest in the image. For more please look at my article. Tensorflow recently added new functionality and now we can extend the API to determine pixel by pixel location of objects of interest.


DT 0.4: Editing Tables, Smart Filtering, and More

It has been more than two years since we announced the initial version of the DT package. Today we want to highlight a few significant changes and new features in the recent releases v0.3 and v0.4. The full changes can be found in the release notes.
Advertisements