When “learning Python” becomes “practicing R” (spoiler)

15 years ago, a student of mine told me that I should start learning Python, that it was really a great language. Students started to learn it, but I kept postponing. A few years ago, I started also Python for Kids, which is really nice actually, with my son. That was nice, but not really challenging. A few weeks ago, I also started a crash course in Python, taught by Pierre. The truth is I think I will probably give up. I keep telling myself (1) I can do anything much faster in R (2) Python is not intuitive, especially when you’re used to practice R for almost 20 years… Last week, I also had to link Python and R for our pricing game : Ali wrote some template codes in Python, and I had to translate them in R. And it was difficult…

Cognitive Search Is More Than Just AI

Cognitive search, widely accepted as the next evolution of enterprise search, offers the potential for dramatic improvements in the accuracy, relevance, and efficiency of insight discovery. Although some see cognitive search as simply traditional search enhanced by machine learning and artificial intelligence, there is actually a sophisticated combination of capabilities that make it distinct from — and superior to — traditional enterprise search. Cognitive search goes well beyond search engines to bring together myriad data sources, along with sophisticated tagging automation and personalization, vastly improving how an organization’s employees find, discover and access the information they need to do their jobs.

Compare outlier detection methods with the OutliersO3 package

There are many different methods for identifying outliers and a lot of them are available in R. But are outliers a matter of opinion? Do all methods give the same results? Articles on outlier methods use a mixture of theory and practice. Theory is all very well, but outliers are outliers because they don’t follow theory. Practice involves testing methods on data, sometimes with data simulated based on theory, better with `real’ datasets. A method can be considered successful if it finds the outliers we all agree on, but do we all agree on which cases are outliers? The Overview Of Outliers (O3) plot is designed to help compare and understand the results of outlier methods. It is implemented in the OutliersO3 package and was presented at last year’s useR! in Brussels. Six methods from other R packages are included (and, as usual, thanks are due to the authors for making their functions available in packages).

Analyzing Metadata for CRAN Packages

I have been searching for various ways to find information about R packages for some time now, but I only recently learned about the CRAN_package_db() function in the base tools package. If a colleague hadn’t pointed it out to me, I am sure I would never have found it on my own.

Operationalizing machine learning

Dinesh Nirmal explains how real-world machine learning reveals assumptions embedded in business processes that cause expensive misunderstandings.

Data science in the cloud

Alex Smola shares lessons learned from AWS SageMaker, an integrated framework for handling all stages of analysis.

Differentiating via data science

Eric Colson explains why companies must now think very differently about the role and placement of data science in organizations.

Text Processing in R

This tutorial goes over some basic concepts and commands for text processing in R. R is not the only way to process text, nor is it always the best way. Python is the de-facto programming language for processing text, with a lot of built-in functionality that makes it easy to use, and pretty fast, as well as a number of very mature and full featured packages such as NLTK and textblob. Basic shell scripting can also be many orders of magnitude faster for processing extremely large text corpora — for a classic reference see Unix for Poets. Yet there are good reasons to want to use R for text processing, namely that we can do it, and that we can fit it in with the rest of our analyses. Furthermore, there is a lot of very active development going on in the R text analysis community right now (see especially the quanteda package).