|Quotes| = 112
Daniel Tunkelang Failure is a great teacher.
Isabelle Nuage
(February 24, 2015)
Big Data by itself is of little use.
Eric Jonas Life is too short to not be having fun.
Kira Radinsky Working with data is like an adventure.
William Edwards Deming
In God we trust, all others bring data.
Timothy E. Carone
(January 30, 2015)
Big Data is the oxygen for autonomous systems.
Amy Heineike Data science is already kind of a broad church.
Daniel Tunkelang Anything that looks interesting is probably wrong.
John Foreman I find it tough to find and hire the right people.
Larry Hardesty
(August 15, 2014)
In the age of big data, visualization tools are vital.
Joel Cadwell
(August 21, 2014)
R makes it so easy to fit many models to the same data.
Niels Bohr Prediction is very difficult, especially about the future.
Christopher Bishop
Half of what we do at Microsoft Research is Machine Learning.
Daniel Tunkelang Search is the problem at the heart of the information economy.
Victor Hu It is hard to know what you really need until you dig into it.
Isaiah, XXX 8 Now go, write it before them in a table, and note it in a book.
Rishi Shah
(September 24, 2014)
Big data profitability depends on your employee’s data literacy.
Chris Wiggins The main driver of my ideas has been seeing people doing it ‘wrong’
All analysis starts with an understandable set of data and algorithms.
Shayne Miel You have to turn your inputs into things the algorithm can understand.
Josh Bloom
The first rule of data science is: don’t ask how to define data science.
Caitlin Smallwood You imagine a data set & you salivate at just thinking about that data set.
Jeff Dean
(November 2014)
Anything humans can do in 0.1 sec, the right big 10-layer network can do too.
Jeffrey Fry Having more data does not always give you the power to make better decisions.
Kaiser Fung One of the biggest myth of Big Data is that data alone produce complete answers.
Andre Karpistsenko The idea or the initial enthusiasm is just a small part of doing something great.
Bob McDonald Data modeling, simulation, and other digital tools are reshaping how we innovate.
ATKearney Is Big Data the 21st century equivalent of the Industrial Revolution? We think so.
Foster Provost & Tom Fawcett
Increasingly, business decisions are being made automatically by computer systems.
Andre Karpistsenko The core lesson from tool-and-method explorations is that there is NO silver bullet.
John W. Tukey
The greatest value of a picture is when it forces us to notice what we never expected to see.
Tamara Dull
(March 20, 2015)
The data lake is essential for any organization who wants to take full advantage of its data.
Henri Poincaré
Mit Logik kann man Beweise führen, aber keine neuen Erkenntnisse gewinnen, dazu gehört Intuition.
John Cook
(26 March 2015)
Statistics aims to build accurate models … Machine learning aims to solve problems more directly.
Pierre Simon, Marquis de Laplace The most important questions of life are, for the most part, really only problems of probability.
Yann LeCun It’s useful for a company to have its scientists actually publish what they do. It keeps them honest.
BI Community What is the most used feature in any business intelligence solution? It is the Export to Excel button.
David Hilbert Mathematics knows no races or geographic boundaries; for mathematics, the cultural world is one country.
John Foreman What we focus on, and this is going to sound goofy for a data scientist – is the happiness of our users.
Milton Friedman The only relevant test of the validity of a hypothesis is comparison of its predictions with experience.
Eran Levy
Mashing up multiple data sources to generate a single source of truth is an integral part of data analysis.
TJ Laher
(November 14, 2014)
Leading organizations have already begun to see serious returns on deploying a pervasive analytics strategy.
Michael Greene
To find new trends and strong patterns from large complex data sets, a strong analytics foundation is needed.
Andrew Gelman
(28 April 2015)
Measurement, measurement, measurement. It’s central to statistics. It’s central to how we learn about the world.
Ivan Vasilev The hidden layer is where the (neural) network stores it’s internal abstract representation of the training data.
P. Dawid
Causal inference is one of the most important, most subtle, and most neglected of all the problems of Statistics.
Xavier Conort The algorithms we used are very standard for Kagglers. […] We spent most of our efforts in feature engineering.
Yann LeCun Most of the knowledge in the world in the future is going to be extracted by machines and will reside in machines.
Thomas Carlyle A judicious man looks on statistics not to get knowledge, but to save himself from having ignorance foisted on him.
n.n. Data does replace heuristics, hard-coded rules, assumptions and beliefs. Machine learning only enables data to do that.
Antoine de Saint-Exupéry A designer knows he has achieved perfection not when there is nothing left to add, but when there is nothing left to take away.
Paul Roehrig, Ben Pring
It’s a new era in business, one in which growth will be driven as much by insight and foresight as by physical products and assets.
Jake Porway
(October 1, 2015)
Data is not truth, and tech is not an answer in-and-of-itself. Without designing for the humans on the other end, our work is in vain.
Claudia Perlich The conversation is based around how to properly deal with even more sensitive information about where exactly people spend their lives.
Yann LeCun You don’t want to just hire clones of the same person, because then they will all want to explore the same things. You want some diversity.
Andrew Ng Coming up with features is difficult, time-consuming, requires expert knowledge. “Applied machine learning” is basically feature engineering.
John Foreman
If your goal is to positively impact the business, not to build a clustering algorithm that leverages storm and the Twitter API, you’ll be OK.
Michele Nemschoff
(August 30, 2014)
Big data isn’t just for developers and analysts in the technical arena. In today’s digital age, big data has become a powerful tool across industries.
European Union’s General Data Protection Regulation (GDPR)
(Dec. 2016)
Organizations that use ML to make user-impacting decisions must be able to fully explain the data and algorithms that resulted in a particular decision.
H. James Harrington
If you can’t measure something, you can’t understand it. If you can’t understand it, you can’t control it. If you can’t control it, you can’t improve it.
Eric Jonas Graduate students, perhaps because of an adherence to sunk cost fallacy, often write really great surveys of the field at the beginning of their PhD thesis.
Foster Provost, Tom Fawcett
However, there is confusion about what exactly data science is, and this confusion could lead to disillusionment as the concept diffuses into meaningless buzz.
Kaiser Fung Before getting into the methodological issues, one needs to ask the most basic question. Did the researchers check the quality of the data or just take the data as is?
W. H. Auden Thou shalt not answer questionnaires Or quizzes upon world affairs, Nor with compliance Take any test. Thou shalt not sit with statisticians nor commit A social science.
Justin Washtell
(November 3, 2014)
The central premise of predictive modeling is precisely that one size does not fit all – otherwise we would just assign the same outcome to all cases and be done with it.
William S. Cleveland
Data analysis needs to be part of the blood stream of each department and all should be aware of the workings of subject matter investigations and derive stimulus from them.
Martyn Jones
(March 12, 2015)
Is Big Data really about high volumes, high velocity and high variety, or is it in fact about much noise, too much pomposity and abundant similarity leading to unnecessary high anxiety?
Yann LeCun The idea that somehow you can put a bunch of research scientists together and then put some random manager who’s not a scientist directing them doesn’t work. I’ve never ever seen it work.
Sundar Pichai Machine learning is a core, transformative way by which we’re rethinking everything we’re doing. We’re thoughtfully applying it across all our products, be it search, ads, YouTube or Play.
David McCandless By visualizing information, we turn it into a landscape that you can explore with your eyes, a sort of information map. And when you’re lost in information, an information map is kind of useful.
Kaiser Fung
(May 2015)
Story time is the moment in a report on data analysis when the author deftly moves from reporting a finding of data to the telling of stories based on assumptions that do not come from the data.
Robert Neuhaus Feature engineering and feature selection are not mutually exclusive. They are both useful. I’d say feature engineering is more important though, especially because you can’t really automate it.
William S. Cleveland
Model building is complex because it requires combining information from exploring the data and information from sources external to the data such as subject matter theory and other sets of data.
SBS documentary “The Age of Big Data” Data is becoming a powerful and most valuable commodity in 21st century. It is leading to scientific insights and new ways of understanding human behaviour. Data can also make you rich. Very rich.
Lord Kelvin When you can measure what you are speaking
about and express it in numbers, you know
something about it. When you cannot express it in
numbers, your knowledge is of a meagre and
unsatisfactory kind.
European Union’s General Data Protection Regulation (GDPR)
(Dec. 2016)
How could a result be explained, especially a result of a machine learning model, without a versioned record of what data was input to generate the result and what data was output representing the result?
William S. Cleveland
Theory, both mathematical and non-mathematical theory, is vital to data science. … Tools of data science – models and methods together with computational methods and computing systems – link data and theory.
Kune, Konugurthi, Agarwal, Chillarige, Buyya
Big Data and traditional data warehousing systems, however, have the similar goals to deliver business value through the analysis of data, but, they differ in the analytics methods and the organization of the data.
Suman Malekani
(January 29, 2015)
While working on Big Data & planning to implement it for the benefit of business, it is very important to explain the insights & valuable knowledge in a way that non-technical business user can actually understand.
Dr. Olly Downs
(May 18, 2015)
Most of the big data investment focus to date has been on the underlying infrastructure, while development of the applications that make use of that infrastructure – and that deliver actual business value – has lagged.
Data integration features have gained prominence during the last year as companies struggled to incorporate new data sources in their analysis, a process that can consume a sizable percentage of the total project time.
Jeff Leek
To evaluate a person’s work or their productivity requires three things:
1. To be an expert in what they do
2. To have absolutely no reason to care whether they succeed or not
3. To have time available to evaluate them.
R. A. Fisher … the null hypothesis is never proved or established, but is possibly disproved, in the course of experimentation. Every experiment may be said to exist only to give the facts a chance of disproving the null hypothesis.
Julia Evans Cleaning up data to the point where you can work with it is a huge amount of work. If you’re trying to reconcile a lot of sources of data that you don’t control like in this flight search example, it can take 80% of your time.
Eric Colson, Brad Klingenberg, Jeff Magnusson
(March 31, 2015)
Data science can directly enable a strategic differentiator if the company’s core competency depends on its data and analytic capabilities. When this happens, the company becomes supportive to data science instead of the other way around.
Analise Polsky
Improving Visual Data Discovery:
1. Always have new data sources.
2. Always have new techniques.
3. Always have new tools and platforms.
Visual data discovery is not once and done. It is an iterative process that requires communication and exploration.
(March 4th, 2015)
Data Science has its own language. So, if you want to have at least a slight chance of surviving in the enterprise world of tomorrow -with its obsessive focus on collecting and analyzing data- you better have started yesterday with learning this terminol.
Richard Fichera Part of Hadoop’s appeal is that it is not specifically optimized for any specific solution or data type but rather a general framework for parallel processing, so your developers and data scientists can add any relevant data, whatever its format or source.
Vladimir N. Vapnik
After the success of the SVM in solving real-life problems, the interest in statistical learning theory significantly increased. For the first time, abstract mathematical results in statistical learning theory have a direct impact on algorithmic tools of data analysis.
Zachary Chase Lipton
(January 2015)
Generally, the systems implementation of machine learning methodology and ongoing software maintenance challenges are an understudied area that will continue to grow in importance as machine learning systems become more commonplace in commercial and open source software.
Rao Naveen
There’s been a lot of talk about trying to make AI work on existing infrastructure. But the sad reality is that you’re always going to end up with something that’s far less than state-of-the-art. And I don’t mean it will be 30 or 40 percent slower. It’s more likely to be a thousand times slower
Jeffrey Heer, Michael Bostock, Vadim Ogievetsky
Graphical Perception Experiments find that spatial position (as in a scatter plot or bar chart) leads to the most accurate decoding of numerical data and is generally preferable to visual variables such as angle, one-dimensional length, two-dimensional area, three-dimensional volume, and color saturation.
Lana Klein
Remember that the most critical thing is not building analytic solution but making sure that your organization starts using it: that means creating buy-in, working to build adoption, educating and training, redesigning processes to include analytics. Give it time, be persistent, improve and results will follow!
Enric Junqué de Fortuny, David Martens, Foster Provost
This study provides a clear illustration that larger data indeed can be more valuable assets for predictive analytics. This implies that institutions with larger data assets – plus the skill to take advantage of them – potentially can obtain substantial competitive advantage over institutions without such access or skill.
Nikhil Buduma
(29 December 2014)
[In Neural Networks] It is not required that a neuron has its outlet connected to the inputs of every neuron in the next layer. In fact, selecting which neurons to connect to which other neurons in the next layer is an art that comes from experience. Allowing maximal connectivity will more often than not result in overfitting.
Christophe Bourguignat
(Sep 16, 2014)
In real organizations, people need dead simple story-telling – Which features are you using ? How your algorithms work ? What is your strategy ? etc. … If your models are not parsimonious enough, you risk to lose the audience confidence. Convincing stackeholders is a key driver for success, and people trust what they understand.
Mark van Rijmenam
(October 16, 2014)
Although such Business Intelligence is still quite common and does give you at least some insights, the fast-changing world of today requires a different approach. Organisations today should strive for a holistic overview of their internal and external data that is analysed on the spot and returned graphically via live storylines.
John Von Neumann The sciences do not try to explain, they hardly even try to interpret, they mainly make models. By a model is meant a mathematical construct which, with the addition of certain verbal interpretations, describes observed phenomena. The justification of such a mathematical construct is solely and precisely that it is expected to work.
Foster Provost & Tom Fawcett
On a scale less grand, but probably more common, data-analytics projects reach into all business units. Employees throughout these units must interact with the data-science team. If these employees do not have a fundamental grounding in the principles of data-analytic thinking, they will not really understand what is happening in the business.
Gil Allouche
(January 9, 2015)
Improvements in technology and big data trends have given rise to improvements in machine learning. The sheer volume of data is growing exponentially, and companies are looking for faster speeds and real-time analytics. Cognitive computing combines machine learning and artificial intelligence to go beyond data mining and provide actionable insights.
Some decisions you need to make are big enough to change the course for your business. And your past experiences may not be good predictors of the future. More data are within your reach to understand what was previously unknown. Sophisticated analytical tools are available to you to ‘see’ a wider range of possibilities and evaluate them quickly. Now is a good time for an upgrade in your decision making capabilities.
Avi Kalderon
(JAN 27, 2015)
Without effective data governance and data management, big data can mean big problems for many organizations already struggling with more data than they can handle. That “lake” they are building can very easily become a “cesspool” without appropriate data management practices that are adapted to this new platform. The solution? Firms need to actively adapt their data governance and data management capabilities – from implementing to ongoing maintenance.
Mkhuseli Mthukwane
(August 27, 2015)
Data Science forms the very substratum of an Analytics Practitioners’ work, it’s what sets us apart from Statisticians or Mathematicians. However in some instances we cannot rely on it alone, we need to employ other measures to increase its definitiveness. In any event I am sure many Data Scientists use math and other means to augment the potency of their Analytics, some not even scientific at all. It is undeniably prudent to do so where necessary, especially in fields that demand a higher standard of accuracy and care.
Durgesh Kaushik
(October 9, 2015)
Analytics no matter how advanced they are, does not remove the need for human insights. On the contrary, there is a compelling need for skilled people with the ability to understand data, think from the business point of view and come up with insights. For this very reason technology professionals with Analytics skill are finding themselves in high demand as businesses look to harness the power of Big Data. A professional with the Analytical skills can master the ocean of Big Data and become a vital asset to an organization, boosting the business and their career.
Shahbaz Ali
(DEC 24, 2014)
When data is locked in silos, organizations are unable to find and include all enterprise data for use with big data analytics tools. Planning to implement a data centric data management strategy enables the distributed metadata repository to be a source for analytics tools, as it can be used to provide real-time insight, without having to migrate data from silos to a separate analytics platform. It also enhances the quality of results, because having more relevant data often produces more accurate analysis. If organizations can harness all of its data, they will attain a greater competitive advantage.
Philipp Max Hartmann, Mohamed Zaki, Niels Feldmann, Andy Neely In the field of ‘big data’, Gartner identified five different types of data source used to ‘exploit big data’ in a company (Buytendijk et al., 2013): ‘Operational data comes from transaction systems, the monitoring of streaming data and sensor data; Dark data is data that you already own but don’t use: emails, contracts, written reports and so forth; Commercial data may be structured or unstructured, and is purchased from industry organisations, social media providers and so on; Social data comes from Twitter, Facebook and other interfaces; Public data can have numerous formats and topics, such as economic data, socio-demographic data and even weather data.’
Tracey Wallace
(September 8, 2014)
Our Collective Data Science Duty: Here’s the thing, technology is empowering the public in never before seen ways, and data is the backbone of that shift. Between wearable tech and digital identity platforms, people are creating more data every day than has ever been created in decades, no, centuries past. Each of us is essentially our own personal data scientist, and those working in the digital space have very much been their own statisticians for quite some time. It’s why platforms like Google Analytics, Omniture and more are so popular across the industry. They put the power of analytics in the hands of users, requiring little training but returning lots of measurability.
Jeff Leek
Data science done well looks easy – and that is a big problem for data scientists. The really tricky twist is that bad data science looks easy too. You can scrape a data set off the web and slap a machine learning algorithm on it no problem. So how do you judge whether a data science project is really ‘hard’ and whether the data scientist is an expert? Just like with anything, there is no easy shortcut to evaluating data science projects. You have to ask questions about the details of how the data were collected, what kind of biases might exist, why they picked one data set over another, etc. In the meantime, don’t be fooled by what looks like simple data science – it can often be pretty effective.
Mike Barlow
Top takeaways from my interviews with experts from organizations offering AI products and services:
• AI is too big for any single device or system
• AI is a distributed phenomenon
• AI will deliver value to users through devices, but the heavy lifting will be performed in the cloud
• AI is a two-way street, with information passed back and forth between local devices and remote systems
• AI apps and interfaces will be designed and engineered increasingly for nontechnical users
• Companies will incorporate AI capabilities into new products and services routinely
• A new generation of AI-enriched products and services will be connected and supported through the cloud
• AI in the cloud will become a standard combination, like peanut butter and jelly
Strategy& There is no general rule dictating how organizations should navigate the stages of big data maturity. They must each decide for themselves, based on their own situation – the competitive environment they are operating in, their business model, and their existing internal capabilities. In less-advanced sectors, with executives still grappling with existing data, making intelligent use of what they already possess may have a substantial impact on decision making.
The main priorities for executives are to:
• develop a clear (big) data strategy;
• prove the value of data in pilot schemes;
• identify the owner for “big data” in the organization and formally establish a “Chief Data Scientist” position (where applicable);
• recruit/train talent to ask the right questions and technical personnel to provide the systems and tools to allow data scientists to answer those questions;
• position big data as an integral element of the operating model; and establish a data-driven decision culture and launch a communication campaign around it.
Alice Zheng
There’s structure in it, but it’s kind of a different form. … It’s spit out by machines and programs. There’s structure, but that structure is difficult to understand for humans. … So, you can’t just throw all of it into an algorithm and expect the algorithm to be able to make sense of it. You really have to process the features, do a lot of pre-processing, and first do things like extract out the frequent sequences, maybe, or figure out what’s the right way to represent IP addresses, for instance. Maybe you don’t want to represent latency by the actual latency number, which could have a very skewed distribution, with lots and lots of large numbers. You might want to assign them into bins or something. There are a lot of things that you need to do to get the data into a format that’s friendly to the model, and then you want to choose the right model. Maybe after you choose the model, you realize this model really is suitable for numeric data and not categorical data. Then you need to go back to the feature engineering part and figure out the best way to represent the data. … I hesitate to say anything critical because half of my friends are in machine learning, which is all about algorithms. I think we already have enough algorithms. It’s not that we don’t need more and better algorithms. I think a much, much bigger challenge is data itself, features, and feature engineering.
Istvan Hajnal
(February 23, 2015)
There are few trends in the Big Data and Data Science world that can be of interest to market researchers:
• Visualization. There is a lot of interest in the Big Data and Data Science world for everything that has to do with Visualization. I’ll admit that sometimes it is Visualize to Impress rather than to Inform, but when it comes to informing clearly, communicating in a simple and understandable way, storytelling, and so on, we market researchers have a head start.
• Natural Language Processing. One of the 4 V’s of Big Data stands for Variety. Very often this refers to unstructured data, which sometimes refers to free text. Big Data and Data Science folks, for instance, start to analyze text that is entered in the free fields of production systems. This problem is not disimilar to what we do when we analyse open questions. Again market research has an opportunity to play a role here. By the way, it goes beyond sentiment analysis. Techniques that I’ve seen successfully used in the Big Data / Data Science world are topic generation and document classification. Think about analysing customer complaints, for instance.
• Deep Learning. Deep learning risks to become the next fad, largely because of the name Deep. But deep here does not refer to profound, but rather to the fact that you have multiple hidden layers in a neural network. And a neural network is basically a logistic regression (OK, I simplify a bit here). So absolutely no magic here, but absolutely great results. Deep learning is a machine learning technique that tries to model high-level abstractions by using so called learning representations of data where data is transformed to a representation of that data that is easier to use with other Machine Learning techniques. A typical example is a picture that constitutes of pixels. These pixels can be represented by more abstract elements such as edges, shapes, and so on. These edges and shapes can on their turn be furthere represented by simple objects, and so on. In the end, this example, leads to systems that are able to reasonably describe pictures in broad terms, but nonetheless useful for practical purposes, especially, when processing by humans is not an option. How can this be applied in Market Research? Already today (shallow) Neural networks are used in Market Research. One research company I know uses neural networks to classify products sold in stores in broad buckets such as petfood, clothing, and so on, based on the free field descriptions that come with the barcode data that the stores deliver.

5 thoughts on “Quotes”

  1. Dear Michael !
    I liked your Quotes really. You can see my work at Also you can 2 video here. It’s original for kdnuggets post 😉
    my best regards


    • Hello Andy, thank you very much for your hint. I had a look at your list and found 40 which were not in my list right now. My list now contains >700 from which I publish one a day. So at least another 2 Years …. There are some typos in your list, e.g “better plac”. You might have a look. Thank you very much, Michael


      • Hello Michael !
        Thanks a lot for your attention to my humble work. I have fixed typo “better place” and hope for best. How did you find videos for #1, #2 interviews quotes ?
        I hope you enjoy it too :)) I saw your web site and found it very useful for me.
        So thanks again for your attention.


  2. Very nice post. I simply stumbled upon your weblog and wanted to say that
    I’ve really loved browsing your blog posts. In any case
    I will be subscribing for your rss feed and I’m hoping you write again soon!


  3. hatemgkotb said:

    This is simply AMAZING!


Leave a Reply

Fill in your details below or click an icon to log in: Logo

You are commenting using your account. Log Out / Change )

Twitter picture

You are commenting using your Twitter account. Log Out / Change )

Facebook photo

You are commenting using your Facebook account. Log Out / Change )

Google+ photo

You are commenting using your Google+ account. Log Out / Change )

Connecting to %s