Open Source Tools for Enterprise Data Science

Get This White Paper Proprietary solutions, once the mainstay of enterprise data science, are now being eclipsed by open source projects like R, Spark, and TensorFlow. There are several reasons for this trend: Open source tools offer endless opportunities for collaboration and contribution, and many have been built out to the point that they provide very real value, even at an enterprise level. Did you know that 62% of analytics professionals prefer open source languages Python and R to proprietary legacy solution SAS? Open source tools are winning at the enterprise level, but deciding which ones to add to your tech stack is not so straightforward.

Scaling Machine Learning

In this episode of the Data Show, I spoke with Reza Zadeh, adjunct professor at Stanford University, co-organizer of ScaledML, and co-founder of Matroid, a startup focused on commercial applications of deep learning and computer vision. Zadeh also is the co-author of the forthcoming book TensorFlow for Deep Learning (now in early release). Our conversation took place on the eve of the recent ScaledML conference, and much of our conversation was focused on practical and real-world strategies for scaling machine learning. In particular, we spoke about the rise of deep learning, hardware/software interfaces for machine learning, and the many commercial applications of computer vision.

Microsoft R Server 9.1 now available

During today’s Data Amp online event, Joseph Sirosh announced the new Microsoft R Server 9.1, which is available for customers now. In addition the updated Microsoft R Client, which has the same capabilities for local use, is available free for everyone on both Windows and — new to this update — Linux.

Machine Learning Using Support Vector Machines

Support Vector Machines (SVM) is a data classification method that separates data using hyperplanes. The concept of SVM is very intuitive and easily understandable. If we have labeled data, SVM can be used to generate multiple separating hyperplanes such that the data space is divided into segments and each segment contains only one kind of data. SVM technique is generally useful for data which has non-regularity which means, data whose distribution is unknown.

R for Enterprise: Understanding R’s Startup

R’s startup behavior is incredibly powerful. R sets environment variables, loads base packages, and understands whether you’re running a script, an interactive session, or even a build command. Most R users will never have to worry about changing R’s startup process. In fact, for portability and reproducibility of code, we recommend that users do not modify R’s startup profile. But, for system administrators, package developers, and R enthusiasts, customizing the launch process can provide a powerful tool and help avoid common gotchas. R’s behavior is thoroughly documented in R’s base documentation: “Initialization at Start of an R Session”. This post will elaborate on the official documentation and provide some examples.

Negative Results on Negative Images: Major Flaw in Deep Learning?

This is an overview of recent research outlining the limitations of the capabilities of image recognition using deep neural networks. But should this really be considered a ‘limitation?’