Advertisements

Quotes

|Quotes| = 380
Jake Porway Data is new eyes.
Satya Nadella
(February 21, 2017)
Bots are the new apps.
Daniel Tunkelang Failure is a great teacher.
Brian Mitchell
(08 Dec 15)
Work Smarter and Not Harder.
Jonathan Lenaghan Having no competitors is bad.
Anna Smith Work on positivity and patience.
Miss Piggy Never eat more than you can lift.
Isabelle Nuage
(February 24, 2015)
Big Data by itself is of little use.
Ray Major
(November 6, 2014)
Better Predictions = Higher Profits.
Ed Burns
(June 2015)
Machine learning automates analytics.
John Langford
(2005)
Prefer simplicity in algorithm design.
Eric Jonas Life is too short to not be having fun.
Kira Radinsky Working with data is like an adventure.
William Edwards Deming
(1900-1993)
In God we trust, all others bring data.
Jon Greenberg
(January 2, 2015)
It’s a good time to be a data scientist!
Manoj Sharma
(December 30, 2014)
A picture is worth a thousand statistics.
David Cearley
(2014)
Every app now needs to be an analytic app.
Caitlin Smallwood It’s very obvious how different people are.
Patrick Gerald
(May 22, 2015)
No good business succeeds without analytics.
Joey Zwicker
(12. February 2015)
Hadoop has an irreparably fractured ecosystem.
Timothy E. Carone
(January 30, 2015)
Big Data is the oxygen for autonomous systems.
Amy Heineike Data science is already kind of a broad church.
Kamil Bartocha
(26. Apr 2015)
Academia and business are two different worlds.
Ryan Irwin
(August 20, 2014)
Have a sense of humor, and never stop learning!
John Keats Nothing ever becomes real till it is experienced.
Daniel Tunkelang Anything that looks interesting is probably wrong.
Emil Gumbel It’s impossible for the improbable to never occur.
John Foreman I find it tough to find and hire the right people.
One of Barbara Doyle’s Professors If you need statistics to prove it, it isn’t true.
Kira Radinsky The person you hire has to understand the business.
Miguel de Cervantes from Don Quixote By a small sample, we may judge of the whole piece.
Kirk Borne
(2015)
Data science: It’s greater than the sum of the parts.
Light, Singer and Willett You can’t fix by analysis what you bungled by design.
Victor Hugo Nothing is stronger than an idea whose time has come.
Larry Hardesty
(August 15, 2014)
In the age of big data, visualization tools are vital.
Daniel Tunkelang Intuition is really a well-trained association network.
George Box Essentially, all models are wrong, but some are useful.
Joel Cadwell
(August 21, 2014)
R makes it so easy to fit many models to the same data.
Mitchell A. Sanders
(August 27, 2013)
If you can’t use the tools, you can’t analyze the data.
Aditi Joshi
(August 4, 2015)
Behind every successful person, there is tons of coffee.
Attributed to Einstein Models should be as simple as possible, but not more so.
Rob Kitchin Big data should complement small data, not replace them.
Sherlock Holmes (Arthur Conan Doyle) It is a capital mistake to theorize before one has data.
Ed Burns
(September 2015)
Big data analytics architecture requires integration push.
Niels Bohr Prediction is very difficult, especially about the future.
John McCarthy He who refuses to do arithmetic is doomed to talk nonsense.
Daniel Tunkelang As data scientists, our job is to extract signal from noise.
Christopher Bishop
(2013)
Half of what we do at Microsoft Research is Machine Learning.
Kamil Bartocha
(26. Apr 2015)
You will spend most of your time cleaning and preparing data.
Robin Bloor
(April 13, 2015)
The ‘great age of silo building’ ended some time around 2005.
Daniel Tunkelang Search is the problem at the heart of the information economy.
Eric Jonas What really matters is who’s actually using and paying for it.
Foster Provost & Tom Fawcett
(2014)
Computing similarity is one of the main tools of data science.
Victor Hu It is hard to know what you really need until you dig into it.
Hans Rosling The idea is to go from numbers to information to understanding.
Ed Burns
(2012)
Visualization makes sense of messy data – if you don’t mess up.
Isaiah, XXX 8 Now go, write it before them in a table, and note it in a book.
Rishi Shah
(September 24, 2014)
Big data profitability depends on your employee’s data literacy.
Ghandi If one takes care of the means, the end will take care of itself.
Chris Wiggins The main driver of my ideas has been seeing people doing it ‘wrong’
Jake Porway Every company has data that can help make the world a better place.
Albert Einstein If you can’t explain it simply, you don’t understand it well enough.
Mark van Rijmenam
(08.01.2015)
Great insights are achieved when you combine different data sources.
Newspaper headline posted on Maya Bar Hillel’s board. Every third person in Israel saw 1.8 public theater shows last year.
Gil Allouche
(August 26, 2014)
Real time data (Analytics) isn’t just a good idea, it is a necessity.
Lewis Platt If we only knew what we know, we would be 30 percent more productive.
arago
(2015)
All analysis starts with an understandable set of data and algorithms.
Jake Porway There’s almost no limit to where data and data science can be applied.
PWC
(2014)
It´s no longer good enough to make decisions based on intuition alone.
Shayne Miel You have to turn your inputs into things the algorithm can understand.
Matthew Zeiler
(2014)
Google is not really a search company. It’s a machine-learning company.
Andre Karpistsenko There is a big part of intuition in choosing the most important problem.
Caitlin Smallwood The top things for people in hiring are hunger and insatiable curiosity.
Josh Bloom
(2014)
The first rule of data science is: don’t ask how to define data science.
Jake Porway Data scientists in the business world are all generally well-compensated.
John Foreman Talking to users is crucial because they point you in the right direction.
Michael L. Brodie
(03.07.2015)
Doubt everything. Use evidence-based methods to verify things that matter.
Caitlin Smallwood You imagine a data set & you salivate at just thinking about that data set.
Kamil Bartocha
(26. Apr 2015)
There is no fully automated Data Science. You need to get your hands dirty.
Steve Jobs
(May 25, 1998)
A lot of times, people don’t know what they want until you show it to them.
Daniel Tunkelang One thing we’ve learned is that there’s no such thing as over-communicating.
Caitlin Smallwood Discovering things about people through their data was a really a cool thing.
Jeff Dean
(November 2014)
Anything humans can do in 0.1 sec, the right big 10-layer network can do too.
Jeffrey Fry Having more data does not always give you the power to make better decisions.
John Foreman Twitter is probably the best place to start conversations about data science.
Deepak Mohapatra
(2014)
Anytime you can correlate a person, location and time, you can identify schemes.
Evans and Richardson
(2011)
Powerful language of graphs might make formulations and models more transparent.
Kaiser Fung One of the biggest myth of Big Data is that data alone produce complete answers.
Lana Klein
(01.01.2015)
Analytics today is at the point of high awareness and very little understanding.
Lord Ernest Rutherford If your experiment needs statistics, you ought to have done a better experiment.
Andre Karpistsenko The idea or the initial enthusiasm is just a small part of doing something great.
Bob McDonald Data modeling, simulation, and other digital tools are reshaping how we innovate.
Jonathan Lenaghan Your location history that is important, not necessarily where you are right now.
n.n. Multi-Criteria Decision Making is the aim to order multidimensional alternatives.
ATKearney Is Big Data the 21st century equivalent of the Industrial Revolution? We think so.
eoda
(2015)
R is in the process of becoming the multi-platform lingua franca of data analysis.
Foster Provost & Tom Fawcett
(2014)
Increasingly, business decisions are being made automatically by computer systems.
Manoj Sharma
(December 30, 2014)
One of the most important steps in the Data Analytics process is Feature Selection.
Richard Pugh
(25 June 2015)
In my opinion, the single most important skill for a data scientist is … Empathy.
Andre Karpistsenko The core lesson from tool-and-method explorations is that there is NO silver bullet.
Brian Caffo Like nearly all aspects of statistics, good modeling decisions are context dependent.
Jake Porway
(October 1, 2015)
We must convey what constitutes data, what it can be used for, and why it’s valuable.
Joel Cadwell
(August 21, 2014)
Naming is an art, yet be careful not to add surplus meaning by being overly creative.
arago
(2015)
To become valuable for statistics and machine learning the data has to be centralized.
Jake Porway The world will be more effective if everyone can at least converse about data science.
Jonathan Lenaghan Losing somebody else’s money is one of the most horrible sinking feelings in the world.
John Tukey Numerical quantities focus on expected values, graphical summaries on unexpected values.
Tony Fisher
(May 15, 2015)
Today, big data is considered a differentiator. Soon, it will be considered a commodity.
Confucius Tell me, and I will forget. Show me and I may remember. Involve me, and I will understand.
Pete Werner
(March 14, 2015)
Much of R is built around the assumption you are working with a table-like data structure.
George Bernard Shaw You see things and you say ‘why’. But I dream things that never were; and I say, ‘why not’?
Jeffrey Heer It’s an absolute myth that you can send an algorithm over raw data and have insights pop up.
John W. Tukey
(1977)
The greatest value of a picture is when it forces us to notice what we never expected to see.
Tamara Dull
(March 20, 2015)
The data lake is essential for any organization who wants to take full advantage of its data.
Thomas J. Watson The great accomplishments of man have resulted from the transmission of ideas and enthusiasm.
John Mount
(April 19, 2013)
Machine learning and statistics may be the stars, but data science orchestrates the whole show.
Strategy&
(2014)
Big data can significantly increase top-line revenues and markedly reduce operational expenses.
Max Kuhn, Kjell Johnson Unfortunately, the predictive models that are most powerful are usually the least interpretable.
(Tweet)
(2014)
When you staff a project with people who are skilled and fascinated by the problem, you get gold.
Henri Poincaré
(1854–1912)
Mit Logik kann man Beweise führen, aber keine neuen Erkenntnisse gewinnen, dazu gehört Intuition.
John Cook
(26 March 2015)
Statistics aims to build accurate models … Machine learning aims to solve problems more directly.
Pierre Simon, Marquis de Laplace The most important questions of life are, for the most part, really only problems of probability.
Pradyumna S. Upadrashta
(February 13, 2015)
You shouldn’t be collecting Big Data under the premise that more data is better, cooler, sexier, etc.
Yann LeCun It’s useful for a company to have its scientists actually publish what they do. It keeps them honest.
BI Community What is the most used feature in any business intelligence solution? It is the Export to Excel button.
Brian Fanzo
(January 9, 2015)
If you’re not doing more listening and analytics than pushing or posting than you’re doing social wrong!
Chris Wiggins The most exciting thing is realizing that something everybody thinks is new is actually really damn old.
David Hilbert Mathematics knows no races or geographic boundaries; for mathematics, the cultural world is one country.
Gabriel Lowy
(February 24, 2015)
Big data does not change the relationship between data quality and decision outcomes. It underscores it.
John Foreman What we focus on, and this is going to sound goofy for a data scientist – is the happiness of our users.
Milton Friedman The only relevant test of the validity of a hypothesis is comparison of its predictions with experience.
W. Edwards Deming The only useful function for a statistician is to make predictions, and thus provide a basis for action.
Krzysztof Zawadzki
(August 30, 2014)
Finding a data scientist is hard. Finding people who understand who a data scientist is, is equally hard.
Eric Jonas The biggest thing people should be working on is problems they find interesting, exciting, and meaningful.
Linus Torvalds Bad programmers worry about the code. Good programmers worry about data structures and their relationships
A. N. Whitehead The aim of science is to seek the simplest explanation of complex facts… Seek simplicity and distrust it.
Eran Levy
(12.02.2015)
Mashing up multiple data sources to generate a single source of truth is an integral part of data analysis.
TJ Laher
(November 14, 2014)
Leading organizations have already begun to see serious returns on deploying a pervasive analytics strategy.
H.G. Wells/Samuel S. Wilks
(1895/1951)
Statistical thinking will one day be as necessary for efficient citizenship as the ability to read and write.
Michael Greene
(2014)
To find new trends and strong patterns from large complex data sets, a strong analytics foundation is needed.
John Foreman If you’re solving problems appropriately and you can explain yourself well, you’re not going to lose your job.
Jonathan Lenaghan It is very important to be self-critical: always question your assumptions and be paranoid about your outputs.
Andre Karpistsenko Maybe the most important thing is to surround yourself with people greater than you are and to learn from them.
Andrew Gelman
(28 April 2015)
Measurement, measurement, measurement. It’s central to statistics. It’s central to how we learn about the world.
Ivan Vasilev The hidden layer is where the (neural) network stores it’s internal abstract representation of the training data.
P. Dawid
(1979)
Causal inference is one of the most important, most subtle, and most neglected of all the problems of Statistics.
R. A. Fisher … the actual and physical conduct of an experiment must govern the statistical procedure of its interpretation.
Xavier Conort The algorithms we used are very standard for Kagglers. […] We spent most of our efforts in feature engineering.
Yann LeCun Most of the knowledge in the world in the future is going to be extracted by machines and will reside in machines.
Thomas Carlyle A judicious man looks on statistics not to get knowledge, but to save himself from having ignorance foisted on him.
Dikesh Jariwala
(29.12.2016)
There are four basic presentation types for charts:
1. Comparison
2. Composition
3. Distribution
4. Relationship
Nina Zumel
(January 5, 2015)
The true purpose of a test procedure is to estimate how well a classifier will work in future production situations.
Dell
(2014)
Big data is about infrastructure, while analytics is about enabling informed decisions and measuring business impact.
Pelin Thorogood
(August 21, 2014)
We really need people who have the left brain and right working in balance, while also knowledgeable of the business.
Cassius J. Keyser Absolute certainty is a privilege of uneducated minds-and fanatics. It is, for scientific folk, an unattainable ideal.
Arthur Samuel
(1959)
[Machine learning is the] field of study that gives computers the ability to learn without being explicitly programmed.
Chris Lynch Big data is at the foundation of all the megatrends that are happening today, from social to mobile to cloud to gaming.
Hadley Wickham Any real data analysis involves data manipulation (sometimes called wrangling or munging), visualization and modelling.
n.n. Data does replace heuristics, hard-coded rules, assumptions and beliefs. Machine learning only enables data to do that.
Alex Jones
(22.05.2015)
Creating a hodge-podge of pretty pictures of every datapoint is a guaranteed way to destroy the value of a visualization.
Hilaire Belloc Statistics are the triumph of the quantitative method, and the quantitative method is the victory of sterility and death.
John Foreman E-mail data is powerful, because as a communications channel it generates more revenue per recipient than Social Channels.
Kluge et al.
(2001)
Knowledge is at the heart of much of today’s global economy and managing knowledge has become vital to companies’ success.
Ed Burns
(August 2014)
One of the keys to success in big data analytics projects is building strong ties between data analysts and business units.
Yann LeCun The data sets are truly gigantic. There are some areas where there’s more data than we can currently process intelligently.
Datameer
(June 2014)
Traditional BI looks at data through a soda straw. Big data analytics looks at data through powerful, wide-angle binoculars.
Amy Heineike The key is figuring out how you get those three things: the right problem, the right data, and the right methodology to meld.
Mike Barlow
(2017)
Thanks to a perfect storm of recent advances in the tech industry, AI has risen from the ashes and regained its aura of cool.
Daniel Tunkelang Query understanding offers the opportunity to bridge the gap between what the searcher means and what the machine understands.
Antoine de Saint-Exupéry A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.
Michael Walker
(October 14, 2014)
Beware of tech firms selling you data tech with fantastic claims of finding meaning in data and creating competitive advantage.
Paul Roehrig, Ben Pring
(2013)
It’s a new era in business, one in which growth will be driven as much by insight and foresight as by physical products and assets.
Suetonia Palmer
(10.06.2015)
Innovative statistical techniques’ are important, but the key to getting good results here is a mind-boggling amount of actual work.
Diego Kuonen The key element for a successful (big) data analytics and data science future is statistical rigor and statistical thinking of humans.
Foster Provost & Tom Fawcett
(2014)
Take big data to mean datasets that are too large for traditional data-processing systems and that therefore require new technologies.
Jake Porway
(October 1, 2015)
Data is not truth, and tech is not an answer in-and-of-itself. Without designing for the humans on the other end, our work is in vain.
Claudia Perlich The conversation is based around how to properly deal with even more sensitive information about where exactly people spend their lives.
Pradyumna S. Upadrashta
(February 13, 2015)
Before jumping on the Big Data bandwagon, I think it is important to ask the question of whether the problem you have requires much data.
David Puglia, FrontRange
(30. December 2014)
In comparison to IPv4’s 4.3 billion IP addresses, IPv6 can assign about 340 trillion trillion trillion addresses and corresponding devices.
Yann LeCun You don’t want to just hire clones of the same person, because then they will all want to explore the same things. You want some diversity.
Josh Wills A data scientist is someone who is better at statistics than any software engineer and better at software engineering than any statistician.
Andrew Ng Coming up with features is difficult, time-consuming, requires expert knowledge. “Applied machine learning” is basically feature engineering.
Christophe Bourguignat
(Sep 16, 2014)
Complex models are to Data Science, what “haute couture” is to the clothing industry : they are not made to be daily used, but are necessary.
John Foreman
(1/30/2015)
If your goal is to positively impact the business, not to build a clustering algorithm that leverages storm and the Twitter API, you’ll be OK.
Erin Shellman As a data scientist, even if you don’t have the domain expertise you can learn it, and can work on any problem that can be quantitatively described.
John W. Tukey The combination of some data and an aching desire for an answer does not ensure that a reasonable answer can be extracted from a given body of data.
Joel Cadwell
(November 14, 2014)
R also becomes the interface to a diverse range of applications. This is R’s unique selling proposition. It is where one goes for new ways of seeing.
Michele Nemschoff
(August 30, 2014)
Big data isn’t just for developers and analysts in the technical arena. In today’s digital age, big data has become a powerful tool across industries.
European Union’s General Data Protection Regulation (GDPR)
(Dec. 2016)
Organizations that use ML to make user-impacting decisions must be able to fully explain the data and algorithms that resulted in a particular decision.
Foster Provost & Tom Fawcett
(2014)
At a high level, data science is a set of fundamental principles that support and guide the principled extraction of information and knowledge from data.
H. James Harrington
(1929)
If you can’t measure something, you can’t understand it. If you can’t understand it, you can’t control it. If you can’t control it, you can’t improve it.
John W. Tukey Far better an approximate answer to the right question which is often vague, than an exact answer to the wrong question which can always be made precise.
import.io
(August 20, 2015)
Every number has a story. As a data scientist, you have the incredible job of digging in and analyzing massive sets of numbers to find what that story is.
RStudio Data science is the process of turning data into understanding and actionable insight. Two key data science tools are data manipulation and visualization.
Michael Young
(17.04.2015)
For many organisations, the accessibility of the tools and products to deliver analytics and data mining has led to an increased awareness of the benefits.
Ashish Jain
(March 29, 2015)
The next breakthrough in data analysis may not be in individual algorithms, but in the ability to rapidly combine, deploy, and maintain existing algorithms.
Eric Jonas Graduate students, perhaps because of an adherence to sunk cost fallacy, often write really great surveys of the field at the beginning of their PhD thesis.
Andre Karpistsenko Getting through life, through those uncertainties in a way, when you look back and see things still connect and exist, that’s the biggest measure of success.
Jonathan Lenaghan People under pressure to find patterns are prone to fall into the common human fallacies of over insufficient data and over-reading correlation as causation.
Kelly Sheridan
(4/27/2015)
Before implementing an advanced analytics strategy, you might have plenty of questions. The key is to be sure you’re asking, and addressing, the correct ones.
Yann LeCun Knowledge is some compilation of data that allows you to make decisions, and what we find today is that computers are making a lot of decisions automatically.
Foster Provost, Tom Fawcett
(2013)
However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz.
Third Nature
(2013)
Data warehouses have not been able to keep up with business demands for new sources of information, new types of data, more complex analysis and greater speed.
Eric Jonas When I evaluate machine learning papers, what I am looking to find out is whether the technique worked or not. This is something that the world needs to know …
Tess Nesbitt Sampling – analyzing representative portions of the available information – can help speed development time on models, enabling them to be deployed more quickly.
Zachary Chase Lipton
(28.04.2015)
The whole reason we turn to machine learning and not handcraft decision rules is that for many problems, simple, easily understood decision theory is insufficient.
John Foreman It’s essential for a data science team to hire people who can really speak about the technical things they’ve done in a way that nontechnical people can understand.
David Lewis-Williams Scientists do not collect data randomly and utterly comprehensively. The data they collect are only those that they consider ‘relevant’ to some hypothesis or theory.
Fred Brooks Show me your flowchart and conceal your tables, and I shall continue to be mystified. Show me your tables, and I won’t usually need your flowchart; it’ll be obvious.
Ronald Fisher To call in the statistician after the experiment is done may be no more than asking him to perform a post-mortem – he may be able to say what the experiment died of.
Kaan Turnali
(Feb 21, 2015)
Passion Matters: Some people go to work. Others get up each morning and work with a desire to make a difference. We don’t need to save the world to make a difference.
Kaiser Fung Before getting into the methodological issues, one needs to ask the most basic question. Did the researchers check the quality of the data or just take the data as is?
Brian Hopkins
(June 27, 2015)
I saw it coming last year. Big data isn’t what it used to be. Not because firms are disillusioned with the technology, but rather because the term is no longer helpful.
W. H. Auden Thou shalt not answer questionnaires Or quizzes upon world affairs, Nor with compliance Take any test. Thou shalt not sit with statisticians nor commit A social science.
Hadley Wickham I think there are three main steps in a data science project: you collect data (and questions), analyze it (using visualization and models), then communicate the results.
Justin Washtell
(November 3, 2014)
The central premise of predictive modeling is precisely that one size does not fit all – otherwise we would just assign the same outcome to all cases and be done with it.
Davenport & Beck
(2001)
Attention is focused mental engagement on a particular item of information. Items come into our awareness, we attend to a particular item, and then we decide whether to act.
John Foreman Vendors are there to sell you a tool for a problem you may or may not have yet, and they’re very good at convincing you that you need it whether you actually need it or not.
Victor Hu Hiring data scientists is very exciting at this time because in some ways there are no established guidelines on how to do it. People have skills in so many different areas.
William S. Cleveland
({2000-2014})
Data analysis needs to be part of the blood stream of each department and all should be aware of the workings of subject matter investigations and derive stimulus from them.
Daniel Tunkelang It’s easy to be lazy and look at aggregates. Drilling down into the differences and looking at specific examples is often what gives us a real understanding of what’s going on.
Istvan Hajnal
(February 23, 2015)
My advice to the market research world is to stop conceptualizing so much when it comes to Big Data and Data Science and simply apply the new techniques there were appropriate.
ATKearney Although Big Data processes large, diverse data sets to reveal complex relationships, humans are the crucial ingredient for interpreting the data and relationships into insights.
Colorado Reed
(2014)
What should I do if I want to get ‘better’ at machine learning, but I don’t know what I want to learn? Excellent question! My answer: consistently work your way through textbooks.
William C. Blackwelder … a hypothesis test tells us whether the observed data are consistent with the null hypothesis, and a confidence interval tells us which hypotheses are consistent with the data.
S. N. D. North The science of statistics is the chief instrumentality through which the progress of civilization is now measured, and by which its development hereafter will be largely controlled.
Valdis Krebs Innovation happens at the intersection of two or more different, yet similar, groups. Where one technology meets another, one discipline meets another, one department meets another.
Mrs. Dillman
(2013)
We didn’t know in the past that strawberry Pop-Tarts increase in sales, like seven times their normal sales rate, ahead of a hurricane, and the pre-hurricane top-selling item was beer.
IBM
(June 2015)
IBM will educate one million data scientists and data engineers on Apache Spark through extensive partnerships with AMPLab, DataCamp, MetiStream, Galvanize and Big Data University MOOC.
Martyn Jones
(March 12, 2015)
Is Big Data really about high volumes, high velocity and high variety, or is it in fact about much noise, too much pomposity and abundant similarity leading to unnecessary high anxiety?
n.n. We Learn . . .
10% of what we read
20% of what we hear
30% of what we see
50% of what we see and hear
70% of what we discuss
80% of what we experience
95% of what we teach others.
n.n. Learning new tools and techniques in data science is sort of like running on treadmill – you have to run continuously to stay on top of it. The minute you stop, you start falling behind.
Tavish Srivastava
(May 19, 2015)
Machine Learning algorithms are like solving a Rubik Cube. You grapple at the beginning to figure out the hidden algorithm, but once learnt, some can even solve it in less than 7 seconds.
Tom Gilley
(April 21, 2015)
It’s easy to underestimate your data. And even more so – not think of your IoT data as valuable. However, if you examine your data, you never know the sort of insights you could discover.
Yann LeCun The idea that somehow you can put a bunch of research scientists together and then put some random manager who’s not a scientist directing them doesn’t work. I’ve never ever seen it work.
Sundar Pichai Machine learning is a core, transformative way by which we’re rethinking everything we’re doing. We’re thoughtfully applying it across all our products, be it search, ads, YouTube or Play.
Dell
(2014)
To quickly detect and respond to issues, organizations need an analytics platform that offers rich statistical process control (SPC) functionality as well as real-time monitoring and alerting.
Julie Hunt
(April 7, 2015)
The need to analyze data is at the foundation of every effective data management strategy, whether the analysis is handled from the business perspective or the technology side of the equation.
Nikhil Buduma
(29 December 2014)
In general, choosing smart training cases is a very good idea. There’s lots of research that shows that by engineering a clever training set, you can make your neural net a lot more effective.
David McCandless By visualizing information, we turn it into a landscape that you can explore with your eyes, a sort of information map. And when you’re lost in information, an information map is kind of useful.
Kaiser Fung
(May 2015)
Story time is the moment in a report on data analysis when the author deftly moves from reporting a finding of data to the telling of stories based on assumptions that do not come from the data.
Robert Neuhaus Feature engineering and feature selection are not mutually exclusive. They are both useful. I’d say feature engineering is more important though, especially because you can’t really automate it.
William S. Cleveland
({2000-2014})
Model building is complex because it requires combining information from exploring the data and information from sources external to the data such as subject matter theory and other sets of data.
Yann LeCun The amount of human brainpower on the planet is actually increasing exponentially as well, but with a very, very, very small exponent. It’s very slow growth rate compared to the data growth rate.
David Lillis
(2014)
One data manipulation task that you need to do in pretty much any data analysis is recode data. It’s almost never the case that the data are set up exactly the way you need them for your analysis.
SBS documentary “The Age of Big Data” Data is becoming a powerful and most valuable commodity in 21st century. It is leading to scientific insights and new ways of understanding human behaviour. Data can also make you rich. Very rich.
Lord Kelvin When you can measure what you are speaking
about and express it in numbers, you know
something about it. When you cannot express it in
numbers, your knowledge is of a meagre and
unsatisfactory kind.
Roger Ehrenberg The biggest lesson is to have a very clear set of customers that you’re going to serve, notwithstanding the fact you may be building something that can ultimately help many different types of customers.
European Union’s General Data Protection Regulation (GDPR)
(Dec. 2016)
How could a result be explained, especially a result of a machine learning model, without a versioned record of what data was input to generate the result and what data was output representing the result?
Lyndsay Wise
(February 21, 2015)
One of the benefits of cloud analytics and computing in general is the ability for small and mid-sized companies to take advantage of technology and applications that may have previously been out of reach.
John Foreman
(05/08/2014)
What’s better : A simple model that’s used, updated, and kept running ? Or a complex model that works when you babysit it but the moment you move on to another problem no one knows what the hell it’s doing?
William S. Cleveland
({2000-2014})
Theory, both mathematical and non-mathematical theory, is vital to data science. … Tools of data science – models and methods together with computational methods and computing systems – link data and theory.
Jeroen Janssens Data scientists love to create interesting models and exciting data visualizations. However, before they get to that point, usually much effort goes into obtaining, scrubbing, and exploring the required data.
Mark Hammond
(2017)
There are 18 million developers in the world, but only one in a thousand have expertise in artificial intelligence. To a lot of developers, AI is inscrutable and inaccessible. We’re trying to ease the burden.
Michal Klos
(January 28, 2015)
We are in the Golden Age of Data. For those of us on the front-lines, it doesn’t feel that way. Every step forward this technology takes, the need for deeper analytics takes two. We’re constantly catching up.
Simon Moss
(21.12.2015)
A key differentiator between heterogeneous analytics and traditional BI is the ability to rapidly deploy ideas into solutions, adapt to changes in the environment and maintain flexibility of one’s various assets.
Fatih Hamurcu
(May 7, 2015)
On a sequential computer, the fast algorithm is the best algorithm, but for new science area, I believe we need more creative approaches for algorithm design in order to extract more valuable insight in real-time.
Kaiser Fung We are not saying that statisticians should not tell stories. Story-telling is one of our responsibilities. What we want to see is a clear delineation of what is data-driven and what is theory (i.e., assumptions).
Kune, Konugurthi, Agarwal, Chillarige, Buyya
(2015)
Big Data and traditional data warehousing systems, however, have the similar goals to deliver business value through the analysis of data, but, they differ in the analytics methods and the organization of the data.
Suman Malekani
(January 29, 2015)
While working on Big Data & planning to implement it for the benefit of business, it is very important to explain the insights & valuable knowledge in a way that non-technical business user can actually understand.
Dr. Olly Downs
(May 18, 2015)
Most of the big data investment focus to date has been on the underlying infrastructure, while development of the applications that make use of that infrastructure – and that deliver actual business value – has lagged.
Gartner
(2014)
Data integration features have gained prominence during the last year as companies struggled to incorporate new data sources in their analysis, a process that can consume a sizable percentage of the total project time.
Jeff Leek
(09.02.2015)
To evaluate a person’s work or their productivity requires three things:
1. To be an expert in what they do
2. To have absolutely no reason to care whether they succeed or not
3. To have time available to evaluate them.
Jeroen Janssens
(August 20, 2014)
We data scientists love to create exciting data visualizations and insightful statistical models. However, before we get to that point, usually much effort goes into obtaining, scrubbing, and exploring the required data.
Jeff Leek Data science is the process of formulating a quantitative question that can be answered with data, collecting and cleaning the data, analyzing the data, and communicating the answer to the question to a relevant audience.
R. A. Fisher … the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only to give the facts a chance of disproving the null hypothesis.
Amir Hajian
(2017)
A good data scientist in my mind is the person that takes the science part in data science very seriously; a person who is able to find problems and solve them using statistics, machine learning, and distributed computing.
William Vorhies
(October 8, 2014)
Many first time users of predictive models are happy to have the benefit of a good model with which to target their marketing initiatives and don’t ask the equally important question, is this the best model we can be using?
Julia Evans Cleaning up data to the point where you can work with it is a huge amount of work. If you’re trying to reconcile a lot of sources of data that you don’t control like in this flight search example, it can take 80% of your time.
aschinchon.wordpress.com
(08.05.2015)
One robust way to determine if two times series, xt and yt, are related is to analyze if there exists an equation like yt=βxt+ut such us residuals (ut) are stationary (its mean and variance does not change when shifted in time).
Eric Jonas Academic culture teaches you that you’re dumb and that you’re probably wrong because most things never work, nature is very hard, and the best you can hope for is working on interesting problems and making a tiny bit of progress.
Rick Delgado
(January 2015)
Myths change with understanding. Misunderstandings on some of the current myths surrounding big data as follows will fade away: big data is made for big business, big data adoption is high and machine learning overcomes human bias.
Donald Rumsfeld There are known knowns. These are things we know that we know. There are known unknowns. That is to say, there are things that we know we don´t know. But there are also unknown unknowns. There are things we don´t know we don´t know.
Marcia Kaufman, Daniel Kirsch
(2014)
It is no longer sufficient for businesses to understand what has happened in the past, rather it has become essential to ask what will happen in the future, to anticipate trends and to take action that optimize results for business.
Eric Jonas The right thing to do is to not build a tool company but to build a consultancy based on the tools. Identify the company, identify the market, and build a consultancy. Later, if that works, you can then pivot to being a tool company.
Stephan Duquesnoy
(2015)
Comment of a DeepLearning user: As a side-note, even though I’m good with pattern-based thinking, I do not have an academic background. I lack patience and feel the need to create, rather than to completely understand what I’m doing.
Eric Colson, Brad Klingenberg, Jeff Magnusson
(March 31, 2015)
Data science can directly enable a strategic differentiator if the company’s core competency depends on its data and analytic capabilities. When this happens, the company becomes supportive to data science instead of the other way around.
Eric Jonas I actually think a lot of the future is in small data …. As the big data hype cycle crests, we’re going to see more and more people recognizing that what they really want to be doing is asking interesting questions of smaller data sets.
Alex Jones
(September 18, 2014)
Over time, more industries will fundamentally change or be disrupted as companies begin to leverage analytics, enhance efficiency, and allow data to drive decisions. Simply put, the competitive environment will necessitate data science capabilities.
Jonathan Symonds
(January 8, 2015)
The standard Machine Learning workflow is:
1. Get the data
2. Transform the data to create meaningful entities
3. Transform data for Machine Learning algorithms
4. Build supervised/unsupervised models/representations
5. Deploy the model in production.
Analise Polsky
(2014)
Improving Visual Data Discovery:
1. Always have new data sources.
2. Always have new techniques.
3. Always have new tools and platforms.
Visual data discovery is not once and done. It is an iterative process that requires communication and exploration.
Nikhil Buduma
(29 December 2014)
So what’s the idea behind backpropagation? We don’t know what the hidden units ought to be doing, but what we can do is compute how fast the error changes as we change a hidden activity. Essentially we’ll be trying to find the path of steepest descent!
Nikhil Garg
(19.05.2017)
Most would agree that the single biggest bottleneck for all machine learning is software engineering. We all collectively in the tech industry are still figuring out the best practices, tools, abstractions, and systems that can enable large organizations
datacamp.com
(March 4th, 2015)
Data Science has its own language. So, if you want to have at least a slight chance of surviving in the enterprise world of tomorrow -with its obsessive focus on collecting and analyzing data- you better have started yesterday with learning this terminol.
Kaiser Fung
(September 2014)
This is the norm in statistical analysis. Every time you sit down to write something up, you notice additional nuances or nits. Sometimes, the problem is severe enough I have to re-run everything. Other times, you just decide to gloss over it and move on.
Bill Franks
(December 10, 2015)
One of the legendary events in the history of analytics was the original Netflix prize. The event led to a terrific example of the need to focus on not only theoretical results, but also pragmatically achievable results, when developing analytic processes.
Brandon Rohrer
(Dec 19, 2015)
Before data science can build the solution to simplify your life or make you lots of money, you have to give it some high quality raw materials to work with. Just like making a pizza, the better the ingredients you start with, the better the final product.
Richard Fichera Part of Hadoop’s appeal is that it is not specifically optimized for any specific solution or data type but rather a general framework for parallel processing, so your developers and data scientists can add any relevant data, whatever its format or source.
Richard A. Becker, William S. Cleveland
(1996)
Making graphs is very basic to data analysis. Whether you use the leading edge of statistical methods, or whether you want to quickly see the main features of your data, graphs are a must. They are the single most powerful class of tools for analyzing data.
Alon Hazan, Yoel Shoshan, Daniel Khapun, Roy Aladjem, Vadim Ratner
(29 May 2018)
Deep neural networks have demonstrated impressive performance in various machine learning tasks. However, they are notoriously sensitive to changes in data distribution. Often, even a slight change in the distribution can lead to drastic performance reduction.
Guerrilla Analytics
(July 21, 2015)
Data Science done well tells you:
• what you didn’t already know about the data
• what an appropriate algorithm should be, given what you now know about the data
• what the measurable expectations of that algorithm should be when it is automated in production
T. Alan Keahey Analytics plays a key role by helping to reduce the size and complexity of big data to a point where it can be effectively visualized and understood. In the best scenario, the visualization and analytics are integrated so that they work seamlessly with each other.
Nathan Yau What is good visualization? It is a representation of data that helps you see what you otherwise would have been blind to if you looked only at the naked source. It enables you to see trends, patterns and outliers that tell you about yourself and what surrounds you.
Vladimir N. Vapnik
(1999)
After the success of the SVM in solving real-life problems, the interest in statistical learning theory significantly increased. For the first time, abstract mathematical results in statistical learning theory have a direct impact on algorithmic tools of data analysis.
Zachary Chase Lipton
(January 2015)
Generally, the systems implementation of machine learning methodology and ongoing software maintenance challenges are an understudied area that will continue to grow in importance as machine learning systems become more commonplace in commercial and open source software.
Foster Provost & Tom Fawcett
(2014)
Understanding the fundamental concepts, and having frameworks for organizing data-analytic thinking, not only will allow one to interact competently, but will help to envision opportunities for improving datadriven decision making or to see data-oriented competitive threats.
R. A. Fisher If … we choose a group of social phenomena with no antecedent knowledge of the causation or absence of causation among them, then the calculation of correlation coefficients, total or partial, will not advance us a step toward evaluating the importance of the causes at work.
Ferris Jumah
(Sep 3, 2014)
We see that machine learning, data mining, data analysis and statistics are all highly ranking skills in the (Data Science Skill) network. This indicates that being able to understand and represent data mathematically, with statistical intuition, is a key skill for data scientists.
Kune, Konugurthi, Agarwal, Chillarige, Buyya
(2015)
Big Data technologies are being adopted widely for information exploitation with the help of new analytics tools and large scale computing infrastructure to process huge variety of multi-dimensional data in several areas ranging from business intelligence to scientific explorations.
H. Simon The aim … is to provide a clear and rigorous basis for determining when a causal ordering can be said to hold between two variables or groups of variables in a model . . . . The concepts refer to a model-a system of equations-and not to the ‘real’ world the model purports to describe.
Ajay Kelkar
(September 2, 2014)
So consumers are happy to share personal information as long as they see a “value add” for themselves. And organisations with trust-based information sharing relationships with customers will have significant competitive advantage over those with traditional data gathering relationships.
Dean Abbott
(December 06, 2015)
This kind of mindset is not learned in a university program; it is part of the personality of the individual. Good predictive modelers need to have a forensic mindset and intellectual curiosity, whether or not they understand the mathematics enough to derive the equations for linear regression.
Rao Naveen
(2017)
There’s been a lot of talk about trying to make AI work on existing infrastructure. But the sad reality is that you’re always going to end up with something that’s far less than state-of-the-art. And I don’t mean it will be 30 or 40 percent slower. It’s more likely to be a thousand times slower
Gilles Louppe
(July 2014)
There is often no need to build single models over immensely large datasets. Good performance can often be achieved by building models on (very) small random parts of the data and then combining them all in an ensemble, thereby avoiding all practical burdens of making large data fit into memory.
Mark Barrenechea
(September 11, 2015)
Digital leaders know their data. They convert their information into actionable business insight. Considering that more data is shared online every second today than was stored in the entire Internet 20 years ago, it’s no wonder that differentiating products and services requires advanced tools.
Jonas Salk Reason alone will not serve. Intuition alone can be improved by reason, but reason alone without intuition can easily lead the wrong way … both are necessary. For myself, that’s how my mind works, and that’s how I work … It’s this combination that must be recognized and acknowledged and valued.
Jeffrey Heer, Michael Bostock, Vadim Ogievetsky
(2010)
Graphical Perception Experiments find that spatial position (as in a scatter plot or bar chart) leads to the most accurate decoding of numerical data and is generally preferable to visual variables such as angle, one-dimensional length, two-dimensional area, three-dimensional volume, and color saturation.
Hernán Resnizky
(May 15, 2015)
Sometimes some data scientists seem to ignore this: you can think of using the most sophisticated and trendy algorithm, come up with brilliant ideas, imagine the most creative visualizations but, if you do not know how to get the data and handle it in the exact way you need it, all of this becomes worthless.
Kevin Daly
(10.11.2014)
Big data is not for the feint of heart, you and your team must be willing to master many disciplines in order to be successful. You’ll need understanding of code, hardware, Virtualization, networking, databases (SQL & NoSQL), ETL, Cloud, and more. Don’t fool yourself, you’ll need some serious skills on-board.
Lana Klein
(01.01.2015)
Remember that the most critical thing is not building analytic solution but making sure that your organization starts using it: that means creating buy-in, working to build adoption, educating and training, redesigning processes to include analytics. Give it time, be persistent, improve and results will follow!
SAP
(2013)
The SAP Real-Time Data Platform, with SAP HANA at its core combines Hadoop with SAP Sybase IQ and other SAP technologies to provide a single platform for OLTP and analytics, with common administration, operational management, and lifecycle management support for structured, unstructured, and semistructured data.
Randy Bartlett
(18.05.2015)
The ‘information rush’ is producing a sense of urgency; a great deal of opportunity; and spectacular breakthroughs coming from everywhere. Meanwhile, the combination of low statistics literacy and overzealous promotional hype is facilitating dysfunctional data analysis, which is more detrimental than UFO sightings.
Tamara Dull
(September 24, 2014)
Today, we live in an always-on digital world. We work online. We socialize online. We shop online. We bank online. We support causes online. Not to mention, we drive on toll roads with our EZPasses, go to Disney World with our MagicBands, and check our personal stats with our Fitbits. We are living in a big data world.
Enric Junqué de Fortuny, David Martens, Foster Provost
(2014)
This study provides a clear illustration that larger data indeed can be more valuable assets for predictive analytics. This implies that institutions with larger data assets – plus the skill to take advantage of them – potentially can obtain substantial competitive advantage over institutions without such access or skill.
Nikhil Buduma
(29 December 2014)
[In Neural Networks] It is not required that a neuron has its outlet connected to the inputs of every neuron in the next layer. In fact, selecting which neurons to connect to which other neurons in the next layer is an art that comes from experience. Allowing maximal connectivity will more often than not result in overfitting.
Christophe Bourguignat
(Sep 16, 2014)
In real organizations, people need dead simple story-telling – Which features are you using ? How your algorithms work ? What is your strategy ? etc. … If your models are not parsimonious enough, you risk to lose the audience confidence. Convincing stackeholders is a key driver for success, and people trust what they understand.
David Smith
(August 18, 2014)
While there are projects underway to help automate the data cleaning process and reduce the time it takes, the task of automation is made difficult by the fact that the process is as much art as science, and no two data preparation tasks are the same. That’s why flexible, high-level languages like R are a key part of the process.
Mark van Rijmenam
(October 16, 2014)
Although such Business Intelligence is still quite common and does give you at least some insights, the fast-changing world of today requires a different approach. Organisations today should strive for a holistic overview of their internal and external data that is analysed on the spot and returned graphically via live storylines.
Michael Yamnitsky
(Feb. 5, 2015)
Software Goes Invisible: Software is getting smarter, thanks to predictive analytics, machine learning, and artificial intelligence (AI). Whereas the current generation of software is about enabling smarter decision-making for humans, we’re starting to see “invisible software” capable of performing tasks without human intervention.
John Von Neumann The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work.
Chiyuan Zhang, Samy Bengio, Moritz Hardt, Benjamin Recht, Oriol Vinyals
(10 Nov 2016)
Indeed, in neural networks, we almost always choose our model as the output of running stochastic gradient descent. Appealing to linear models, we analyze how SGD acts as an implicit regularizer. For linear models, SGD always converges to a solution with small norm. Hence, the algorithm itself is implicitly regularizing the solution.
The Economist
(2014)
The end of data scientists. Data science moves from the specialist to the everyman. Familiarity with data analysis becomes part of the skill set of ordinary business users, not experts with “analyst” in their titles. Organizations that use data to make decisions are more successful, and those that don’t use data begin to fall behind.
Foster Provost & Tom Fawcett
(2014)
On a scale less grand, but probably more common, data-analytics projects reach into all business units. Employees throughout these units must interact with the data-science team. If these employees do not have a fundamental grounding in the principles of data-analytic thinking, they will not really understand what is happening in the business.
Dan Hirpara
(3/30/2015)
What data fusion brings to the table is the idea that end-users, whether they are humans or machines, are brought into the data processing loop as collaborators. By iteratively combining multiple data streams in new and interesting ways, driven by the changing needs of users, data fusion produces a wide variety of ways to aggregate data streams.
Gil Allouche
(January 9, 2015)
Improvements in technology and big data trends have given rise to improvements in machine learning. The sheer volume of data is growing exponentially, and companies are looking for faster speeds and real-time analytics. Cognitive computing combines machine learning and artificial intelligence to go beyond data mining and provide actionable insights.
Mark van Rijmenam
(September 2, 2014)
All these new Big Data applications require a new way of working. As a result General Motors is currently undergoing a massive, cultural, change to become data-driven; hiring thousands of new employees will have a profound affect on the company culture, but in the end all existing and new employees must learn and adapt to this new, data-driven and information-centric, culture.
Hal Varian If you are looking for a career where your services will be in high demand, you should find something where you provide a scarce, complementary service to something that is getting ubiquitous and cheap. So what’s getting ubiquitous and cheap? Data. And what is complementary to data? Analysis. So my recommendation is to take lots of courses about how to manipulate and analyze data.
Joyce Jackson In many applications, particularly in the business domain, the data is not stationary, but rather changing and evolving. This changing data may make previously discovered patterns invalid and as a result, there is clearly a need for incremental methods that are able to update changing models, and for strategies to identify and manage patterns of temporal change in knowledge bases.
Wojciech Bolanowski
(21.01.2015)
Numerous changes and innovations have come to life recently. The pace of digital revolution is unimaginable concerning it keeps on increasing. There is no doubt most of approaching digital changes are potentially disruptive to older habits, businesses, beliefs. Unconditionally they are changing former way of life on the globe. They push whole humanity into something very new and completely unknown.
John Geer
(May 6, 2015)
There is predictable data as far as the eye can see. Millions of variables quietly tracing the path we thought, and perhaps hoped, they would. Because there are so many, noticing when one of these variables does something unexpected is a task that is unsolvable by diligence alone. In order to spot these rare unexpected observations, we need an often-overlooked statistical analysis: anomaly detection.
Jeff Leek
(Feb. 14, 2014)
Since most people performing data analysis are not statisticians there is a lot of room for error in the application of statistical methods. This error is magnified enormously when naive analysts are given too many “researcher degrees of freedom”. If a naive analyst can pick any of a range of methods and does not understand how they work, they will generally pick the one that gives them maximum benefit.
Daniel Kirsch
(2014)
R, an open source programming language for computational statistics, visualization and data is becoming a ubiquitous tool in advanced analytics offerings. Nearly every top vendor of advanced analytics has integrated R into their offering and so that they can now import R models. This allows data scientists, statisticians and other sophisticated enterprise users to leverage R within their analytics package.
TIBCO
(2017)
Data scientists know that it is futile to impose raw math and statistics on people who are not adept at them. The goal is to get an analytics platform into the hands of people who can build the models for use all around the organization. Every analytics platform claims ease of use, but that is not enough. It must be sufficiently powerful to meet the needs of data scientists yet easy enough for LOB staff to use.
Daniel Gutierrez
(December 31, 2014)
What hiring companies consider requirements for being a data scientist. Here is a short list for an honest assessment:
– Are you really good at math – undeterred with calculus, differential equations, and linear algebra? Are you also strong in statistics and probability theory?
– Do you also know R and/or Python for developing machine learning algorithms?
– Do you have deep domain knowledge of a particular industry?
Marissa Mayer
(January 24, 2013)
The Web is so vast … you need to extend categorization and make sense of the content and have a Web ordered for you … One of the key pieces is you have to understand and decide what the Ontology of entities is. Meaning how things are named and how are they organized into hierarchies … By mapping people’s search habits you pull all their content together and have a feed of information that is the web ordered for you.
PWC
(2014)
Some decisions you need to make are big enough to change the course for your business. And your past experiences may not be good predictors of the future. More data are within your reach to understand what was previously unknown. Sophisticated analytical tools are available to you to ‘see’ a wider range of possibilities and evaluate them quickly. Now is a good time for an upgrade in your decision making capabilities.
Judy Selby
(April 20, 2015)
Big Data’s undeniable impact on companies’ goodwill and reputation has permeated the landscape of corporate valuation. Recent research confirms that companies need to face the new normal whereby corporate reputations suffer after mishaps with data under their control. Today’s companies must appreciate that their use, misuse and governance of Big Data can have an impactful effect on their goodwill and resulting valuation.
James Robert Lloyd
(2014)
Making sense of data is one of the great challenges of the information age we live in. While it is becoming easier to collect and store all kinds of data, from personal medical data, to scientific data, to public data, and commercial data, there are relatively few people trained in the statistical and machine learning methods required to test hypotheses, make predictions, and otherwise create interpretable knowledge from this data.
Michael Walker
(November 19, 2014)
Simply looking at big data (e.g., total offensive or defensive yards) will not provide the right information – and only focusing on the single data point of pass completion percentage will not provide the valuable intelligence to help reach the goal of improving pass completion percentage. Only integrating and analyzing a variety of smaller smart data points will provide the actionable knowledge to make the best possible decisions.
Joyce Jackson Monitoring and maintenance are important issues if the data mining result becomes part of the day-to-day business and its environment. A careful preparation of a maintenance strategy helps to avoid unnecessarily long periods of incorrect usage of data mining results. To monitor the deployment of the data mining result(s), the project needs a detailed plan on the monitoring process. This plan takes into account the specific type of deployment.
Gordon S. Linoff
(September 15, 2014)
In any case, I come to the conclusion that Data Science is just another term in a long-line of terms. Whether called statistics or customer analytics or data mining or analytics or data science, the goal is the same. Computers have been and are gathering incredible amounts of data about people, businesses, markets, economies, needs, desires, and solutions – there will always be people who take up the challenge of transforming the data into solutions.
Anmol Rajpurohit
(May 15, 2014)
For a long time, Predictive Analytics has been primarily the responsibility of the Data Science and Analytics team, but this outlook is changing fast. While Data Science team still remains the primary contributor, the responsibility is increasingly being shared with database management, BI, LOB (Line of Business) analysts and others. This clearly demonstrates the need for better training and support for the non-technical users of Predictive Analytics.
Avi Kalderon
(JAN 27, 2015)
Without effective data governance and data management, big data can mean big problems for many organizations already struggling with more data than they can handle. That “lake” they are building can very easily become a “cesspool” without appropriate data management practices that are adapted to this new platform. The solution? Firms need to actively adapt their data governance and data management capabilities – from implementing to ongoing maintenance.
Jeffrey P. Bigham
(2017)
A machine isn’t a human. It’s not going to necessarily incorporate bias even from biased training data in the same way that a human would. Machine learning isn’t necessarily going to adopt-for lack of a better word-a clearly racist bias. It’s likely to have some kind of much more nuanced bias that is far more difficult to predict. It may, say, come up with very specific instances of people it doesn’t want to hire that may not even be related to human bias.
Foster Provost & Tom Fawcett
(2013)
We should expect a ‘Big Data 2.0’ phase to follow ‘Big Data 1.0’. Once firms have become capable of processing massive data in a flexible fashion, they should begin asking: ‘What can I do that couldn´t do before, or do better than I could do before?’ This is likely to be the golden era of data science. The principles and techniques (introduced currently e.g. due to ‘Predictive Analytics’ and HANA) will be applied far more broadly and deeply than they are today.
Marilyn Matz
(September 2, 2014)
Hadoop is well suited for simple parallel problems but it comes up short for large-scale complex analytics. A growing number of complex analytics use cases are proving to be unworkable in Hadoop. Some examples include recommendation engines based on millions of customers and products, running massive correlations across giant arrays of genetic sequencing data and applying powerful noise reduction algorithms to finding actionable information in sensor and image data.
Gregor Heinrich
(2008)
The intuition behind ‘latent semantic analysis’ (LSA) is to find the latent structure of ‘topics’ or ‘concepts’ in a text corpus, which captures the meaning of the text that is imagined to be obscured by “word choice” noise. The term ‘latent semantic analysis’ has been coined by Deerwester et al. who empirically showed that the co-occurrence structure of terms in text documents can be used to recover this latent topic structure, notably without any usage of background knowledge.
Albert Einstein You believe in a God who plays dice, and I in complete law and order in a world which objectively exists, and which I, in a wildly speculative way, am trying to capture. I firmly believe, but hope that someone will discover a more realistic way, or rather a more tangible basis than it has been my lot to do. Even the great initial success of the quantum theory does not make me believe in the fundamental dice game, although I am well aware that your younger colleagues interpret this as a consequence of senility.
Mkhuseli Mthukwane
(August 27, 2015)
Data Science forms the very substratum of an Analytics Practitioners’ work, it’s what sets us apart from Statisticians or Mathematicians. However in some instances we cannot rely on it alone, we need to employ other measures to increase its definitiveness. In any event I am sure many Data Scientists use math and other means to augment the potency of their Analytics, some not even scientific at all. It is undeniably prudent to do so where necessary, especially in fields that demand a higher standard of accuracy and care.
Mirko Krivanek
(October 1, 2014)
‘The end of the Data Scientist Bubble’. This was the subject of a provocative article posted on Oracle’s blog, two days ago. It certainly shows how far from the reality some big companies are. They confuse people who call themselves data scientists (or get assigned that job title), with those who are true data scientists, and might use a different job title. Many times, the issue is internal politics that create the confusion, and not recognizing a real data scientist with success stories to share, or not leveraging them.
Paul Barsch
(October 13, 2014)
Forecasting is hard, and even those who sometimes get it right, often fail on a continuous basis. But fear not, there are three steps you can take to drastically improve your forecast accuracy, but you’ll have to be willing to put in the work, and possibly put your ego aside to get there.
1) First, understand that domain knowledge of a particular area doesn’t necessarily mean you’ll see the future better than anyone else.
2) Second, if you want better forecasts, run your expert opinions by others.
3) Third, bring your data – in fact, bring all of them.
Mark van Rijmenam
(October 16, 2014)
In the fast moving world of today, data is being created at lightning speed. Data comes from an infinite variety of sources and all this data can be used to discover valuable business insights. Combining internal and external data can enable organisations to beat the competition, as the analysis will provide valuable insights. The more business users that work with such insights, the better your organisation will become. Organisations should therefore strive for a data-driven, information-centric culture, where every business user makes decisions based on data.
Durgesh Kaushik
(October 9, 2015)
Analytics no matter how advanced they are, does not remove the need for human insights. On the contrary, there is a compelling need for skilled people with the ability to understand data, think from the business point of view and come up with insights. For this very reason technology professionals with Analytics skill are finding themselves in high demand as businesses look to harness the power of Big Data. A professional with the Analytical skills can master the ocean of Big Data and become a vital asset to an organization, boosting the business and their career.
Foster Provost & Tom Fawcett
(2013)
It is important to understand data science even if you never intend to do it yourself, because data analysis is now so critical to business strategy. Businesses increasingly are driven by data analytics, so there is great professional advantage in being able to interact competently with and within such businesses. Understanding the fundamental concepts, and having frameworks for organizing data-analytic thinking not only will allow one to interact competently, but will help to envision opportunities for improving data-driven decision-making, or to see data-oriented competitive threads.
Robert Morison Robert Morison, lead faculty member for the International Institute for Analytics, provided three reasons businesses experience big data failures. Briefly, they are as follows:
1. As cited in the piece, clinging to a traditional IT project management style. Solution: Think R&D.
2. Businesses are taken in by the hype and make their first big data project a big deal. Solution: Businesses should start with a smaller project that will “move the proverbial needle.”
3. Reasonably good analytics are done, but they are not adopted. Solution: The business has to own the problem or the ambition to improve.
Shahbaz Ali
(DEC 24, 2014)
When data is locked in silos, organizations are unable to find and include all enterprise data for use with big data analytics tools. Planning to implement a data centric data management strategy enables the distributed metadata repository to be a source for analytics tools, as it can be used to provide real-time insight, without having to migrate data from silos to a separate analytics platform. It also enhances the quality of results, because having more relevant data often produces more accurate analysis. If organizations can harness all of its data, they will attain a greater competitive advantage.
Strategy& Big data have the potential to improve or transform existing business operations and reshape entire economic sectors. Big data can pave the way for disruptive, entrepreneurial companies and allow new industries to emerge. The technological aspect is important, but insufficient to allow big data to show their full potential and to stop companies from feeling swamped by this information. What matters is to reshape internal decision-making culture so that executives base their judgments on data rather than hunches. Research already indicates that companies that have managed this are more likely to be productive and profitable than the competition.
Foster Provost & Tom Fawcett
(2013)
Success in today´s data-oriented business environment requires being able to think about how these fundamental concepts (Data Mining, Predictive Analytics) apply to particular business problems – to think data-analytically. Data should be thought of as a business asset, and once we are thinking in this direction we start to ask whether (and how much) we should invest in data. Thus, an understanding of these fundamental concepts is important not only for data scientists themselves, but for any one working with data scientists, employing data scientists, investing in data-heavy ventures, or directing the application of analytics in an organization.
Philipp Max Hartmann, Mohamed Zaki, Niels Feldmann, Andy Neely In the field of ‘big data’, Gartner identified five different types of data source used to ‘exploit big data’ in a company (Buytendijk et al., 2013): ‘Operational data comes from transaction systems, the monitoring of streaming data and sensor data; Dark data is data that you already own but don’t use: emails, contracts, written reports and so forth; Commercial data may be structured or unstructured, and is purchased from industry organisations, social media providers and so on; Social data comes from Twitter, Facebook and other interfaces; Public data can have numerous formats and topics, such as economic data, socio-demographic data and even weather data.’
Tracey Wallace
(September 8, 2014)
Our Collective Data Science Duty: Here’s the thing, technology is empowering the public in never before seen ways, and data is the backbone of that shift. Between wearable tech and digital identity platforms, people are creating more data every day than has ever been created in decades, no, centuries past. Each of us is essentially our own personal data scientist, and those working in the digital space have very much been their own statisticians for quite some time. It’s why platforms like Google Analytics, Omniture and more are so popular across the industry. They put the power of analytics in the hands of users, requiring little training but returning lots of measurability.
Tom Phelan
(February 10, 2015)
An agile environment is one that’s adaptive and promotes evolutionary development and continuous improvement. It fosters flexibility and champions fast failures. Perhaps most importantly, it helps software development teams build and deliver optimal solutions as rapidly as possible. That’s because in today’s competitive market chock-full of tech-savvy customers used to new apps and app updates every day and copious amounts of data with which to work, IT teams can no longer respond to IT requests with months-long development cycles. It doesn’t matter if the request is from a product manager looking to map the next rev’s upgrade or a data scientist asking for a new analytics model.
Or Shani
(January 27, 2015)
What was once just a figment of the imagination of some our most famous science fiction writers, artificial intelligence (AI) is taking root in our everyday lives. We’re still a few years away from having robots at our beck and call, but AI has already had a profound impact in more subtle ways. Weather forecasts, email spam filtering, Google’s search predictions, and voice recognition, such Apple’s Siri, are all examples. What these technologies have in common are machine-learning algorithms that enable them to react and respond in real time. There will be growing pains as AI technology evolves, but the positive effect it will have on society in terms of efficiency is immeasurable.
Jeff Leek
(17.03.2015)
Data science done well looks easy – and that is a big problem for data scientists. The really tricky twist is that bad data science looks easy too. You can scrape a data set off the web and slap a machine learning algorithm on it no problem. So how do you judge whether a data science project is really ‘hard’ and whether the data scientist is an expert? Just like with anything, there is no easy shortcut to evaluating data science projects. You have to ask questions about the details of how the data were collected, what kind of biases might exist, why they picked one data set over another, etc. In the meantime, don’t be fooled by what looks like simple data science – it can often be pretty effective.
Guerrilla Analytics
(July 21, 2015)
Data Scientists and automation (data products, algorithms, production code, whatever) are complementary functions. Good Data Science supports automation. It quickly adds value by investigating, testing, and quantifying hypotheses about existing data and potential new data. Simply switching on software ignores the reality of working with data, regardless of the claims of that software. Data is full of nuances, errors and unknown relationships that are best discovered and tested by an expert Data Scientist. This takes time and does not scale but it does not have to scale. It is the necessary prudent investment that you make before spending months in product development and automation of the wrong algorithm on the wrong or broken data.
Mike Barlow
(2017)
Top takeaways from my interviews with experts from organizations offering AI products and services:
• AI is too big for any single device or system
• AI is a distributed phenomenon
• AI will deliver value to users through devices, but the heavy lifting will be performed in the cloud
• AI is a two-way street, with information passed back and forth between local devices and remote systems
• AI apps and interfaces will be designed and engineered increasingly for nontechnical users
• Companies will incorporate AI capabilities into new products and services routinely
• A new generation of AI-enriched products and services will be connected and supported through the cloud
• AI in the cloud will become a standard combination, like peanut butter and jelly
Foster Provost & Tom Fawcett
(2013)
On a scale less grand, but probably more common, data analytics projects reach into all business units. Employees throughout these units must interact with the data science team. If these employees do not have a fundamental grounding in the principles of data-analytic thinking, they will not really understand what is happening in the business. This lack of understanding is much more damaging in data science projects than in other technical projects, because the data science is supporting improved decision-making. This requires a close interaction between the data scientists and the business people responsible for decision-making. Firms where the business people do not understand what the data scientists are doing are at a substantial disadvantage, because they waste time and effort or, worse, because they ultimately make wrong decisions.
Alice Zheng
(2015)
If we think of training the model as a part of it, then even after you’ve trained a model and evaluated it and found it to be good by some evaluation metric standards, when you deploy it, where it actually goes and faces users, then there’s a different set of metrics that would impact the users. You might measure: how long do users actually interact with this model? Does it actually make a difference in the length of time? Did they used to interact less and now they’re more engaged, or vice versa? That’s different from whatever evaluation metric that you used, like AUC or per class accuracy or precision and recall. … It’s probably not enough to just say this model has a .85 F1 score and expect someone who has not done any data science to understand what that means. How good are the results? What does it actually mean to the end users of the product?
Philip Russom
(2013)
Managing big data for analytics is not the same as managing DW data for reporting. In fact, the two are almost opposites … . For example, reporting is about seeing the latest values of the numbers that you track over time via a report. Obviously, you know the report, the business entities it represents, and the data warehouse that feeds the report. An analysis is more about discovering variables you don’t know, based on data that you probably don’t know very well. Also, a report requires a solid audit trail, so its data must be managed with welldocumented metadata and possibly master data, too. Since most analyses have no expectation of an audit trail, there’s no need to manage one. That’s just a sampling of the differences. The point is to embrace Big Data Management for analytics as a unique practice that doesn’t follow all the strict rules we’re taught for reporting and data warehousing.
Vincent Granville
(November 15, 2014)
A different perspective on what data scientists are capable of:
• Imagine dozens of scenarios and rank them by chance of occurring
• Get siloed data from various departments (finance, sales, marketing, product, IT)
• Analyze the data in connection with the scenarios (including checking data validity)
• Get external data (competitive intelligence) as needed
• Find the causes (not just correlations)
• Find the remedies
• Detect issues well before anyone else can see them, by looking in summary data
• Complete the analysis with a 48 hours turnaround
Such a data scientist who can save billions to a company, is usually not hired, for the following reasons
• Companies are looking for coders, not business solvers, when they hire a data guru, despite claiming the contrary
• A data scientist without Python on his resume is unlikely to ever get hired
• Hard work gets rewarded, smart work does not.
Randy Bartlett
(18.05.2015)
Today’s information rush is exemplified by the great promise of overflowing observational data, hyper communications, and the approaching Internet of Things. The promotional hype intially comes from journals, self-glorifying books, and vendors, all with a certain perspective that is not informed by practice experience—publishers are unable to discern qualifications. This creates misinformation stampedes with energized statistics deniers writing amplifying blogs, presentation decks, et al., which further mischaracterize and even adulterate statistics. The downstream echos talk everyone into believing their own hyped fabrications. Two of the problems are that 1. Selling good statistics practice can be less lucrative than cutting some serious corners; and 2. Promoting services, workshops, data-analysis results, etc. is easier when not encombered by competently weilding and accurately depicting statistics.
Yanir Seroussi
(19.10.2015)
People like simple explanations for complex phenomena. If you work as a data scientist, or if you are planning to become/hire one, you’ve probably seen storytelling listed as one of the key skills that data scientists should have. Unlike “real” scientists that work in academia and have to explain their results mostly to peers who can handle technical complexities, data scientists in industry have to deal with non-technical stakeholders who want to understand how the models work. However, these stakeholders rarely have the time or patience to understand how things truly work. What they want is a simple hand-wavy explanation to make them feel as if they understand the matter – they want a story, not a technical report (an aside: don’t feel too smug, there is a lot of knowledge out there and in matters that fall outside of our main interests we are all non-technical stakeholders who get fed simple stories).
Ray Major
(October 8, 2014)
Descriptive Analytics: insight into the past

Which use data aggregation and data mining techniques to provide insight into the past and answer: “What has happened?”
Use Descriptive statistics when you need to understand at an aggregate level what is going on in your company, and when you want to summarize and describe different aspects of your business.
Predictive Analytics: understanding the future

Which use statistical models and forecasts techniques to understand the future and answer: “What could happen?”
Use Predictive analysis any time you need to know something about the future, or fill in the information that you do not have.
Prescriptive Analytics: advise on possible outcomes

Which use optimization and simulation algorithms to advice on possible outcomes and answer: “What should we do?
Use prescriptive statistics anytime you need to provide users with advice on what action to take.
Strategy& There is no general rule dictating how organizations should navigate the stages of big data maturity. They must each decide for themselves, based on their own situation – the competitive environment they are operating in, their business model, and their existing internal capabilities. In less-advanced sectors, with executives still grappling with existing data, making intelligent use of what they already possess may have a substantial impact on decision making.
The main priorities for executives are to:
• develop a clear (big) data strategy;
• prove the value of data in pilot schemes;
• identify the owner for “big data” in the organization and formally establish a “Chief Data Scientist” position (where applicable);
• recruit/train talent to ask the right questions and technical personnel to provide the systems and tools to allow data scientists to answer those questions;
• position big data as an integral element of the operating model; and establish a data-driven decision culture and launch a communication campaign around it.
Mark van Rijmenam
(31 Dec. 2014)
Pattern Analytics can be defined as a discipline of Big Data that enables business leaders to understand how different variables of the business interact and are linked with each other. Variables can be of any kind and within any data source, structured as well as unstructured. Such patterns can indicate opportunities for innovation or threats of disruption for your business and therefore require action. Finding patterns within the data and sifting it out is difficult. Machine learning can contribute in helping us humans find patterns that are relevant, but too difficult for us to see. This enables organizations to find patterns they act on. Business leaders can learn from these patterns and use them in their decision-making process. Business leaders therefore should rely less on their gut feeling and years of experience, and more on the data. Pattern Analytics does not require predefined models; the algorithms will do the work for you and find whatever is relevant in a combination of large sets of data. The key with pattern analytics is automatically revealing intelligence that is hidden in the data and these insights will help you grow your business.
Alice Zheng
(2015)
There’s structure in it, but it’s kind of a different form. … It’s spit out by machines and programs. There’s structure, but that structure is difficult to understand for humans. … So, you can’t just throw all of it into an algorithm and expect the algorithm to be able to make sense of it. You really have to process the features, do a lot of pre-processing, and first do things like extract out the frequent sequences, maybe, or figure out what’s the right way to represent IP addresses, for instance. Maybe you don’t want to represent latency by the actual latency number, which could have a very skewed distribution, with lots and lots of large numbers. You might want to assign them into bins or something. There are a lot of things that you need to do to get the data into a format that’s friendly to the model, and then you want to choose the right model. Maybe after you choose the model, you realize this model really is suitable for numeric data and not categorical data. Then you need to go back to the feature engineering part and figure out the best way to represent the data. … I hesitate to say anything critical because half of my friends are in machine learning, which is all about algorithms. I think we already have enough algorithms. It’s not that we don’t need more and better algorithms. I think a much, much bigger challenge is data itself, features, and feature engineering.
Michael Jordan
(1998)
Graphical models are a marriage between probability theory and graph theory. They provide a natural tool for dealing with two problems that occur throughout applied mathematics and engineering — uncertainty and complexity — and in particular they are playing an increasingly important role in the design and analysis of machine learning algorithms. Fundamental to the idea of a graphical model is the notion of modularity — a complex system is built by combining simpler parts. Probability theory provides the glue whereby the parts are combined, ensuring that the system as a whole is consistent, and providing ways to interface models to data. The graph theoretic side of graphical models provides both an intuitively appealing interface by which humans can model highly-interacting sets of variables as well as a data structure that lends itself naturally to the design of efficient general-purpose algorithms. Many of the classical multivariate probabalistic systems studied in fields such as statistics, systems engineering, information theory, pattern recognition and statistical mechanics are special cases of the general graphical model formalism — examples include mixture models, factor analysis, hidden Markov models, Kalman filters and Ising models. The graphical model framework provides a way to view all of these systems as instances of a common underlying formalism. This view has many advantages — in particular, specialized techniques that have been developed in one field can be transferred between research communities and exploited more widely. Moreover, the graphical model formalism provides a natural framework for the design of new systems.
Istvan Hajnal
(February 23, 2015)
There are few trends in the Big Data and Data Science world that can be of interest to market researchers:
• Visualization. There is a lot of interest in the Big Data and Data Science world for everything that has to do with Visualization. I’ll admit that sometimes it is Visualize to Impress rather than to Inform, but when it comes to informing clearly, communicating in a simple and understandable way, storytelling, and so on, we market researchers have a head start.
• Natural Language Processing. One of the 4 V’s of Big Data stands for Variety. Very often this refers to unstructured data, which sometimes refers to free text. Big Data and Data Science folks, for instance, start to analyze text that is entered in the free fields of production systems. This problem is not disimilar to what we do when we analyse open questions. Again market research has an opportunity to play a role here. By the way, it goes beyond sentiment analysis. Techniques that I’ve seen successfully used in the Big Data / Data Science world are topic generation and document classification. Think about analysing customer complaints, for instance.
• Deep Learning. Deep learning risks to become the next fad, largely because of the name Deep. But deep here does not refer to profound, but rather to the fact that you have multiple hidden layers in a neural network. And a neural network is basically a logistic regression (OK, I simplify a bit here). So absolutely no magic here, but absolutely great results. Deep learning is a machine learning technique that tries to model high-level abstractions by using so called learning representations of data where data is transformed to a representation of that data that is easier to use with other Machine Learning techniques. A typical example is a picture that constitutes of pixels. These pixels can be represented by more abstract elements such as edges, shapes, and so on. These edges and shapes can on their turn be furthere represented by simple objects, and so on. In the end, this example, leads to systems that are able to reasonably describe pictures in broad terms, but nonetheless useful for practical purposes, especially, when processing by humans is not an option. How can this be applied in Market Research? Already today (shallow) Neural networks are used in Market Research. One research company I know uses neural networks to classify products sold in stores in broad buckets such as petfood, clothing, and so on, based on the free field descriptions that come with the barcode data that the stores deliver.
Alistair Croll, Benjamin Yoskovitz
(2013)
What makes a good metric?

Here are some rules of thumb for what makes a good metric-a number that will drive the changes you’re looking for.

A good metric is comparative.

Being able to compare a metric to other time periods, groups of users, or competitors helps you understand which way things are moving. “Increased conversion from last week” is more meaningful than “2% conversion”.

A good metric is understandable.

If people can’t remember it and discuss it, it’s much harder to turn a change in the data into a change in the culture.

A good metric is a ratio or a rate.

Accountants and financial analysts have several ratios they look at to understand, at a glance, the fundamental health of a company. You need some, too.

There are several reasons ratios tend to be the best metrics:

1 Ratios are easier to act on. Think about driving a car. Distance travelled is informational. But speed-distance per hour-is something you can act on, because it tells you about your current state, and whether you need to go faster or slower to get to your destination on time.

2 Ratios are inherently comparative. If you compare a daily metric to the same metric over a month, you’ll see whether you’re looking at a sudden spike or a long-term trend. In a car, speed is one metric, but speed right now over average speed this hour shows you a lot about whether you’re accelerating or slowing down.

3 Ratios are also good for comparing factors that are somehow opposed, or for which there’s an inherent tension. In a car, this might be distance covered divided by traffic tickets. The faster you drive, the more distance you cover-but the more tickets you get. This ratio might suggest whether or not you should be breaking the speed limit. A good metric changes the way you behave. This is by far the most important criterion for a metric: what will you do differently based on changes in the metric?

1 “Accounting” metrics like daily sales revenue, when entered into your spreadsheet, need to make your predictions more accurate. These metrics form the basis of Lean Startup’s innovation accounting, showing you how close you are to an ideal model and whether your actual results are converging on your business plan.

2 “Experimental” metrics, like the results of a test, help you to optimize the product, pricing, or market. Changes in these metrics will significantly change your behavior. Agree on what that change will be before you collect the data: if the pink website generates more revenue than the alternative, you’re going pink; if more than half your respondents say they won’t pay for a feature, don’t build it; if your curated MVP doesn’t increase order size by 30%, try something else. Drawing a line in the sand is a great way to enforce a disciplined approach. A good metric changes the way you behave precisely because it’s aligned to your goals of keeping users, encouraging word of mouth, acquiring customers efficiently, or generating revenue. If you want to choose the right metrics, you need to keep five things in mind:

1 Qualitative versus quantitative metrics

Qualitative metrics are unstructured, anecdotal, revealing, and hard to aggregate; quantitative metrics involve numbers and statistics, and provide hard numbers but less insight.

2 Vanity versus actionable metrics

Vanity metrics might make you feel good, but they don’t change how you act. Actionable metrics change your behavior by helping you pick a course of action.

3 Exploratory versus reporting metrics

Exploratory metrics are speculative and try to find unknown insights to give you the upper hand, while reporting metrics keep you abreast of normal, managerial, day-to-day operations.

4 Leading versus lagging metrics

Leading metrics give you a predictive understanding of the future; lagging metrics explain the past. Leading metrics are better because you still have time to act on them-the horse hasn’t left the barn yet.

5 Correlated versus causal metrics

If two metrics change together, they’re correlated, but if one metric causes another metric to change, they’re causal. If you find a causal relationship between something you want (like revenue) and something you can control (like which ad you show), then you can change the future

Analysts look at specific metrics that drive the business, called key performance indicators (KPIs). Every industry has KPIs-if you’re a restaurant owner, it’s the number of covers (tables) in a night; if you’re an investor, it’s the return on an investment; if you’re a media website, it’s ad clicks; and so on.
Vincent Granville
(2014)
Data Science

First, let’s start by describing data science, the new discipline.

Job titles include data scientist, chief scientist, senior analyst, director of analytics and many more. It covers all industries and fields, but especially digital analytics, search technology, marketing, fraud detection, astronomy, energy, healhcare, social networks, finance, forensics, security (NSA), mobile, telecommunications, weather forecasts, and fraud detection.

Projects include taxonomy creation (text mining, big data), clustering applied to big data sets, recommendation engines, simulations, rule systems for statistical scoring engines, root cause analysis, automated bidding, forensics, exo-planets detection, and early detection of terrorist activity or pandemics, An important component of data science is automation, machine-to-machine communications, as well as algorithms running non-stop in production mode (sometimes in real time), for instance to detect fraud, predict weather or predict home prices for each home (Zillow).

An example of data science project is the creation of the fastest growing data science Twitter profile, for computational marketing. It leverages big data, and is part of a viral marketing / growth hacking strategy that also includes automated high quality, relevant, syndicated content generation (in short, digital publishing version 3.0).

Unlike most other analytic professions, data scientists are assumed to have great business acumen and domain expertize — one of the reasons why they tend to succeed as entrepreneurs.There are many types of data scientists, as data science is a broad discipline. Many senior data scientists master their art/craftsmanship and possess the whole spectrum of skills and knowledge; they really are the unicorns that recruiters can’t find. Hiring managers and uninformed executives favor narrow technical skills over combined deep, broad and specialized business domain expertize – a byproduct of the current education system that favors discipline silos, while true data science is a silo destructor. Unicorn data scientists (a misnomer, because they are not rare – some are famous VC’s) usually work as consultants, or as executives. Junior data scientists tend to be more specialized in one aspect of data science, possess more hot technical skills (Hadoop, Pig, Cassandra) and will have no problems finding a job if they received appropriate training and/or have work experience with companies such as Facebook, Google, eBay, Apple, Intel, Twitter, Amazon, Zillow etc. Data science projects for potential candidates can be found here.



Data science overlaps with

Computer science: computational complexity, Internet topology and graph theory, distributed architectures such as Hadoop, data plumbing (optimization of data flows and in-memory analytics), data compression, computer programming (Python, Perl, R) and processing sensor and streaming data (to design cars that drive automatically)

Statistics: design of experiments including multivariate testing, cross-validation, stochastic processes, sampling, model-free confidence intervals, but not p-value nor obscure tests of thypotheses that are subjects to the curse of big data

Machine learning and data mining: data science indeed fully encompasses these two domains.

Operations research: data science encompasses most of operations research as well as any techniques aimed at optimizing decisions based on analysing data.

Business intelligence: every BI aspect of designing/creating/identifying great metrics and KPI’s, creating database schemas (be it NoSQL or not), dashboard design and visuals, and data-driven strategies to optimize decisions and ROI, is data science.



Comparison with other analytic discplines

Machine learning: Very popular computer science discipline, data-intensive, part of data science and closely related to data mining. Machine learning is about designing algorithms (like data mining), but emphasis is on prototyping algorithms for production mode, and designing automated systems (bidding algorithms, ad targeting algorithms) that automatically update themselves, constantly train/retrain/update training sets/cross-validate, and refine or discover new rules (fraud detection) on a daily basis. Python is now a popular language for ML development. Core algorithms include clustering and supervised classification.

Data mining: This discipline is about designing algorithms to extract insights from rather large and potentially unstructured data (text mining), sometimes called nugget discovery, for instance unearthing a massive Botnets after looking at 50 million rows of data.Techniques include pattern recognition, feature selection, clustering, supervised classification and encompasses a few statistical techniques (though without the p-values or confidence intervals attached to most statistical methods being used). Instead, emphasis is on robust, data-driven, scalable techniques, without much interest in discovering causes or interpretability. Data mining thus have some intersection with statistics, and it is a subset of data science. Data mining is applied computer engineering, rather than a mathematical science. Data miners use open source and software such as Rapid Miner.

Predictive modeling: Not a discipline per se. Predictive modeling projects occur in all industries across all disciplines. Predictive modeling applications aim at predicting future based on past data, usually but not always based on statistical modeling. Predictions often come with confidence intervals. Roots of predictive modeling are in statistical science.

Statistics. Currently, statistics is mostly about surveys (typically performed with SPSS software), theoretical academic research, bank and insurance analytics (marketing mix optimization, cross-selling, fraud detection, usually with SAS and R), statistical programming, social sciences, global warming research (and space weather modeling), economic research, clinical trials (pharmaceutical industry), medical statistics, epidemiology, biostatistics.and government statistics. Agencies hiring statisticians include the Census Bureau, IRS, CDC, EPA, BLS, SEC, and EPA (environmental/spatial statistics). Jobs requiring a security clearance are well paid and relatively secure, but the well paid jobs in the pharmaceutical industry (the golden goose for statisticians) are threatened by a number of factors – outsourcing, company mergings, and pressures to make healthcare affordable. Because of the big influence of the conservative, risk-adverse pharmaceutical industry, statistics has become a narrow field not adapting to new data, and not innovating, loosing ground to data science, industrial statistics, operations research, data mining, machine learning — where the same clustering, cross-validation and statistical training techniques are used, albeit in a more automated way and on bigger data. Many professionals who were called statisticians 10 years ago, have seen their job title changed to data scientist or analyst in the last few years.

Industrial statistics. Statistics frequently performed by non-statisticians (engineers with good statistical training), working on engineering projects such as yield optimization or load balancing (system analysts). They use very applied statistics, and their framework is closer to six sigma, quality control and operations research, than to traditional statistics. Also found in oil and manufacturing industries. Techniques used include time series, ANOVA, experimental design, survival analysis, signal processing (filtering, noise removal, deconvolution), spatial models, risk and reliability models.

Mathematical optimization. Solves business optimization problems with techniques such as the simplex algorithm, Fourier transforms (signal processing), differential equations, and software such as Matlab. They are found in big companies such as IBM, research labs, and NSA and in the finance industry (sometimes recruiting physics or engineer graduates). These professionals sometimes solve the exact same problems as statisticians do, using the exact same techniques, though they use different names. Mathematicians use least square optimization for interpolation or extrapolation; statisticians use linear regression for predictions and model fitting, but both concepts are identical, and rely on the exact same mathematical machinery: it’s just two names describing the same thing. Mathematical optimization is however closer to operations research than statistics, the choice of hiring a mathematician rather than another practitioner (data scientist) is often dictated by historical reasons, especially for organizations such as NSA or IBM.

Actuarial sciences. Just a subset of statistics focusing on insurance (car, health, etc.) using survival models: predicting when you will die, what your health expenditures will be based on your health status (smoker, gender, previous diseases) to determine your insurance premiums. Also predicts extreme floods and weather events to determine premiums. These latter models are notoriously erroneous (recently) and have resulted in far bigger payouts than expected. For some reasons, this is a very vibrant, secretive community of statisticians, that do not call themselves statisticians anymore (job title is actuary). They have seen their average salary increase nicely over time: access to profession is restricted and regulated just like for lawyers, for no other reasons than protectionism to boost salaries and reduce the number of qualified applicants to job openings. Actuarial sciences is indeed data science (a sub-domain).
HPC. High performance computing, not a discipline per se, but should be of concern to data scientists, big data practitioners, computer scientists and mathematicians, as it can redefine the computing paradigms in these fields. If quantum computers ever become successful, it will totally change the way algorithms are designed and implemented. HPC should not be confused with Hadoop and Map-Reduce: HPC is hardware-related, Hadoop is software-related (though heavily relying on Internet bandwidth and servers configuration and proximity).

Operations research. Abbreviated as OR. They separated from statistics a while back (like 20 years ago), but they are like twin brothers, and their respective organizations (INFORMS and ASA) partner together. OR is about decision science and optimizing traditional business projects: inventory management, supply chain, pricing. They heavily use Markov Chain models, Monter-Carlo simulations, queuing and graph theory, and software such as AIMS, Matlab or Informatica. Big, traditional old companies use OR, new and small ones (start-ups) use data science to handle pricing, inventory management or supply chain problems. Many operations research analysts are becoming data scientists, as there is far more innovation and thus growth prospect in data science, compared to OR. Also, OR problems can be solved by data science. OR has a siginficant overlap with six-sigma (see below), also solves econometric problems, and has many practitioners/applications in the army and defense sectors.

Six sigma. It’s more a way of thinking (a business philosophy, if not a cult) rather than a discipline, and was heavily promoted by Motorola and GE a few decades ago. Used for quality control and to optimize engineering processes (see entry on industrial statistics in this article), by large, traditional companies. They have a LinkedIn group with 270,000 members, twice as large as any other analytic LinkedIn groups including our data science group. Their motto is simple: focus your efforts on the 20% of your time that yields 80% of the value. Applied, simple statistics are used (simple stuff works must of the time, I agree), and the idea is to eliminate sources of variances in business processes, to make them more predictable and improve quality. Many people consider six sigma to be old stuff that will disappear. Perhaps, but the fondamental concepts are solid and will remain: these are also fundamental concepts for all data scientists. You could say that six sigma is a much more simple if not simplistic version of operations research (see above entry), where statistical modeling is kept to a minimum. Risks: non qualified people use non-robust black-box statistical tools to solve problems, it can result in disasters. In some ways, six sigma is a discipline more suited for business analysts (see business intelligence entry below) than for serious statisticians.

Quant. Quant people are just data scientists working for Wall Street on problems such as high frequency trading or stock market arbitraging. They use C++, Matlab, and come from prestigious universities, earn big bucks but lose their job right away when ROI goes too South too quickly. They can also be employed in energy trading. Many who were fired during the great recession now work on problems such as click arbitraging, ad optimization and keyword bidding. Quants have backgrounds in statistics (few of them), mathematical optimization, and industrial statistics.

Artificial intelligence. It’s coming back. The intersection with data science is pattern recognition (image analysis) and the design of automated (some would say intelligent) systems to perform various tasks, in machine-to-machine communication mode, such as identifying the right keywords (and right bid) on Google AdWords (pay-per-click campaigns involving millions of keywords per day). I also consider smart search (creating a search engine returning the results that you expect and being much broader than Google) one of the greatest problems in data science, arguably also an AI and machine learning problem.

Computer science. Data science has some overlap with computer science: Hadoop and Map-Reduce implementations, algorithmic and computational complexity to design fast, scalable algorithms, data plumbing, and problems such as Internet topology mapping, random number generation, encryption, data compression, and steganography (though these problems overlap with statistical science and mathematical optimization as well).

Econometrics. Why it became separated from statistics is unclear. So many branches disconnected themselves from statistics, as they became less generic and start developing their own ad-hoc tools. But in short, econometrics is heavily statistical in nature, using time series models such as auto-regressive processes. Also overlapping with operations research (itself overlaping with statistics!) and mathematical optimization (simplex algorithm). Econometricians like ROC and efficiency curves (so do six sigma practitioners, see corresponding entry in this article). Many do not have a strong statistical background, and Excel is their main or only tool.

Data engineering. Performed by software engineers (developers) or architects (designers) in large organizations (sometimes by data scientists in tiny companies), this is the applied part of computer science (see entry in this article), to power systems that allow all sorts of data to be easily processed in-memory or near-memory, and to flow nicely to (and between) end-users, including heavy data consumers such as data scientists. A sub-domain currently under attack is data warehousing, as this term is associated with static, siloed conventational data bases, data architectures, and data flows, threatened by the rise of NoSQL, NewSQL and graph databases. Transforming these old architectures into new ones (only when needed) or make them compatible with new ones, is a lucrative business.

Business intelligence. Abbreviated as BI. Focuses on dashboard creation, metric selection, producing and scheduling data reports (statistical summaries) sent by email or delivered/presented to executives, competitive intelligence (analyzing third party data), as well as involvement in database schema design (working with data architects) to collect useful, actionable business data efficiently. Typical job title is business analyst, but some are more involved with marketing, product or finance (forecasting sales and revenue). They typically have an MBA degree. Some have learned advanced statistics such as time series, but most only use (and need) basic stats, and light analytics, relying on IT to maintain databases and harvest data. They use tools such as Excel (including cubes and pivot tables, but not advanced analytics), Brio (Oracle browser client), Birt, Micro-Sreategy or Business Objects (as end-users to run queries), though some of these tools are increasingly equipped with better analytic capabilities. Unless they learn how to code, they are competing with some polyvalent data scientists that excel in decision science, insights extraction and presentation (visualization), KPI design, business consulting, and ROI/yield/business/process optimization. BI and market research (but not competitive intelligence) are currently experiencing a decline, while AI is experiencing a come-back. This could be cyclical. Part of the decline is due to not adapting to new types of data (e.g. unstructured text) that require engineering or data science techniques to process and extract value.

Data analysis. This is the new term for business statistics since at least 1995, and it covers a large spectrum of applications including fraud detection, advertising mix modeling, attribution modeling, sales forecasts, cross-selling optimization (retails), user segmentation, churn analysis, computing long-time value of a customer and cost of acquisition, and so on. Except in big companies, data analyst is a junior role; these practitioners have a much more narrow knwoledge and experience than data scientists, and they lack (and don’t need) business vision. They are detailed-orirented and report to managers such as data scientists or director of analytics, In big companies, someone with a job title such as data analyst III might be very senior, yet they usually are specialized and lack the broad knowledge gained by data scientists working in a variety of companies large and small.

Business analytics. Same as data analysis, but restricted to business problems only. Tends to have a bit more of a finacial, marketing or ROI flavor. Popular job titles include data analyst and data scientist, but not business analyst (see business intelligence entry for business intelligence, a different domain).



Finally, there are more specialized analytic disciplines that recently emerged: health analytics, computational chemistry and bioinformatics (genome research), for instance.
Advertisements

5 thoughts on “Quotes”

  1. Dear Michael !
    I liked your Quotes really. You can see my work at http://happydatascientist.blogspot.com/2015/04/data-scientists-at-work-thoughts-that.html. Also you can 2 video here. It’s original for kdnuggets post 😉
    my best regards
    andy

    Like

    • Hello Andy, thank you very much for your hint. I had a look at your list and found 40 which were not in my list right now. My list now contains >700 from which I publish one a day. So at least another 2 Years …. There are some typos in your list, e.g “better plac”. You might have a look. Thank you very much, Michael

      Like

      • Hello Michael !
        Thanks a lot for your attention to my humble work. I have fixed typo “better place” and hope for best. How did you find videos for #1, #2 interviews quotes ?
        I hope you enjoy it too :)) I saw your web site and found it very useful for me.
        So thanks again for your attention.
        andy

        Like

  2. Very nice post. I simply stumbled upon your weblog and wanted to say that
    I’ve really loved browsing your blog posts. In any case
    I will be subscribing for your rss feed and I’m hoping you write again soon!

    Like

  3. hatemgkotb said:

    This is simply AMAZING!

    Like

Leave a Reply

Fill in your details below or click an icon to log in:

WordPress.com Logo

You are commenting using your WordPress.com account. Log Out /  Change )

Google+ photo

You are commenting using your Google+ account. Log Out /  Change )

Twitter picture

You are commenting using your Twitter account. Log Out /  Change )

Facebook photo

You are commenting using your Facebook account. Log Out /  Change )

w

Connecting to %s

This site uses Akismet to reduce spam. Learn how your comment data is processed.