DataScience: Elevate

DataScience: Elevate is a full-day event dedicated to data science best practices. Register today to hear from experts at Uber, Facebook, Salesforce, and more. DataScience: Elevate provides a closer look at how today’s top companies use machine learning and artificial intelligence to do better business. Free to attend, this multi-city event features presentations, panels, and networking sessions designed to elevate data science work and connect you with the companies that are driving change in enterprise data science.


10 Audio Processing Tasks to get you started with Deep Learning Applications (with Case Studies)

Imagine a world where machines understand what you want and how you are feeling when you call at a customer care – if you are unhappy about something, you speak to a person quickly. If you are looking for a specific information, you may not need to talk to a person (unless you want to!). This is going to be the new order of the world – you can already see this happening to a good degree. Check out the highlights of 2017 in the data science industry. You can see the breakthroughs that deep learning was bringing in a field which were difficult to solve before. One such field that deep learning has a potential to help solving is audio/speech processing, especially due to its unstructured nature and vast impact. So for the curious ones out there, I have compiled a list of tasks that are worth getting your hands dirty when starting out in audio processing. I’m sure there would be a few more breakthroughs in time to come using Deep Learning. The article is structured to explain each task and its importance. There is also a research paper that goes in the details of that specific task, along with a case study that would help you get started in solving the task. So let’s get cracking!


Supervised Learning Use Cases: Low-Hanging Fruit in Data Science for Businesses

In the past few years, machine learning (ML) has revolutionized the way we do business. A disruptive breakthrough that differentiates machine learning from other approaches to automation is a step away from the rules-based programming. ML algorithms allowed engineers to leverage data without explicitly programming machines to follow specific paths of problem-solving. Instead, machines themselves arrive at the right answers based on the data they have. This capability made business executives reconsider the ways they use data to make decisions. In layman terms, machine learning is applied to make forecasts on incoming data using historic data as a training example. For instance, you may want to predict a customer lifetime value in an eCommerce store measuring the net profit of the future relationship with a customer. If you already have historic data on different customer interactions with your website and net profits associated with these customers, you may want to use machine learning. It will allow for early detection of those customers who are likely to bring the most net profit enabling you to focus greater service effort on them. While there are multiple learning styles, i.e. the approaches to training algorithms using data, the most common style is called supervised learning. This time, we’ll talk about this branch of data science and explain why it is considered low-hanging fruit for businesses that plan to embark on the ML initiative, additionally describing the most common use cases.


Data Science, Machine Learning, and AI Conferences and Events 2018

We’ve compiled a list of the hottest events and conferences from the world of Data Science, Machine Learning and Artificial Intelligence happening in 2018. Below are all the links you need to get yourself to these great events!


Top 15 Scala Libraries for Data Science in 2018

In this article, we have outlined some of the Scala libraries that can be very useful while performing major data scientific tasks. They have proved to be highly helpful and effective for achieving the best results.


An Intuitive Introduction to Generative Adversarial Networks

In this article, I’ll talk about Generative Adversarial Networks, or GANs for short. GANs are one of the very few machine learning techniques which has given good performance for generative tasks, or more broadly unsupervised learning. In particular, they have given splendid performance for a variety of image generation related tasks. Yann LeCun, one of the forefathers of deep learning, has called them “the best idea in machine learning in the last 10 years”. Most importantly, the core conceptual ideas associated with a GAN are quite simple to understand (and in-fact, you should have a good idea about them by the time you finish reading this article).


Are you monitoring your machine learning systems?

How are you monitoring your Python applications? Take the short survey – the results will be published on KDnuggets and you will get all the details.


Propensity Score Matching in R

Propensity scores are an alternative method to estimate the effect of receiving treatment when random assignment of treatments to subjects is not feasible.


Gradient Boosting in TensorFlow vs XGBoost

Tensorflow 1.4 was released a few weeks ago with an implementation of Gradient Boosting, called TensorFlow Boosted Trees (TFBT). Unfortunately, the paper does not have any benchmarks, so I ran some against XGBoost. For many Kaggle-style data mining problems, XGBoost has been the go-to solution since its release in 2006. It’s probably as close to an out-of-the-box machine learning algorithm as you can get today, as it gracefully handles un-normalized or missing data, while being accurate and fast to train.


Introducing RLlib: A composable and scalable reinforcement learning library

In a previous post, I outlined emerging applications of reinforcement learning (RL) in industry. I began by listing a few challenges facing anyone wanting to apply RL, including the need for large amounts of data, and the difficulty of reproducing research results and deriving the error estimates needed for mission-critical applications. Nevertheless, the success of RL in certain domains has been the subject of much media coverage. This has sparked interest, and companies are beginning to explore some of the use cases and applications I described in my earlier post. Many tasks and professions, including software development, are poised to incorporate some forms of AI-powered automation. In this post, I’ll describe how RISE Lab’s Ray platform continues to mature and evolve just as companies are examining use cases for RL. Assuming one has identified suitable use cases, how does one get started with RL? Most companies that are thinking of using RL for pilot projects will want to take advantage of existing libraries.


Package Management for Reproducible R Code

Any programming environment should be optimized for its task, and not all tasks are alike. For example, if you are exploring uncharted mountain ranges, the portability of a tent is essential. However, when building a house to weather hurricanes, investing in a strong foundation is important. Similarly, when beginning a new data science programming project, it is prudent to assess how much effort should be put into ensuring the code is reproducible. Note that it is certainly possible to go back later and “shore up” the reproducibility of a project where it is weak. This is often the case when an “ad-hoc” project becomes an important production analysis. However, the first step in starting a project is to make a decision regarding the trade-off between the amount of time to set up the project and the probability that the project will need to be reproducible in arbitrary environments.


Managing Machine Learning Workflows with Scikit-learn Pipelines Part 2: Integrating Grid Search

Another simple yet powerful technique we can pair with pipelines to improve performance is grid search, which attempts to optimize model hyperparameter combinations.
Advertisements