The best data mining books

3 authors have picked their favorite books about data mining and why they recommend each book.

Soon, you will be able to filter by genre, age group, and more. Sign up here to follow our story as we build a better way to explore books.

Shepherd is reader supported. When you buy through links on our website, we may earn an affiliate commission (learn more).

Rage Inside the Machine

By Robert Elliott Smith,

Book cover of Rage Inside the Machine: The Prejudice of Algorithms, and How to Stop the Internet Making Bigots of Us All

OK, I’m biased here because Rob is an old friend of mine. We first met at academic conferences and had several heated debates (arguments). But after spending a little time together at a workshop we realised each probably knew what they were talking about after all. Robert Elliott Smith, I should make clear it's not the Rob Smith who writes about “Artificial Superintelligence”. Those books definitely do not make this list.

Our Rob is a coherent, grounded scientist with bags of real-world experience, and he brings his knowledge to this title with gusto, telling us about how AI is affecting our lives in ways you never thought possible – and often not in a good way. If you want to understand what can go wrong with AI and what we should be doing to stop it, don’t read about singularities or other such nonsense, read this.


Who am I?

I’ve been a geeky kid all my life. (I don’t think I’ve quite grown up yet.) Born in the 1970s, my childhood was a wonderful playground of building robots and software. I was awarded one of the early degrees in AI, and a PhD in genetic algorithms. I’ve since spent 25 years exploring how to make computers think, build, invent, compose… and I’ve also spent 20 years writing popular science books. I’m lucky enough to be a Professor in one of the world’s best universities for Computer Science and Machine Learning: UCL, and I guess I’ve written two or three hundred scientific papers over the years. I still think I know nothing at all about real or artificial intelligence, but then does anyone?


I wrote...

Artificial Intelligence and Robotics: Ten Short Lessons

By Peter J. Bentley,

Book cover of Artificial Intelligence and Robotics: Ten Short Lessons

What is my book about?

In Artificial Intelligence and Robotics: Ten Short Lessons, leading expert Peter J. Bentley breaks down the fast-moving world of computers into ten pivotal lessons, presenting the reader with the essential information they need to get to understand our most powerful technology and its remarkable implications for our species.

From the origins and motivation behind the birth of AI and robotics to using smart algorithms that allow us to build good robots, from the technologies that enable computers to understand a huge range of sensory information, including language and communication, to the challenges of emotional intelligence, unpredictable environments, and imagination in artificial intelligence, this is a cutting-edge, expert-led guide for curious minds. Packed full of easy-to-understand diagrams, pictures, and fact boxes, these ten lessons cover all the basics, as well as the latest understanding and developments, to enlighten the nonscientist.

Be Data Literate

By Jordan Morrow,

Book cover of Be Data Literate: The Data Literacy Skills Everyone Needs to Succeed

Not everybody needs to be a data scientist, but everybody does need to be data literate. Without an intentional focus on evangelism and building a strong data culture in your organization it will be an uphill battle to make meaningful change. This book helps individuals and leaders to understand what data literacy is, and how we can build it like any other skill.


Who am I?

I am a leader in analytics and AI strategy, and have a broad range of experience in aviation, energy, financial services, and the public sector.  I have worked with several major organizations to help them establish a leadership position in data science and to unlock real business value using advanced analytics. 


I wrote...

Minding the Machines: Building and Leading Data Science and Analytics Teams

By Jeremy Adamson,

Book cover of Minding the Machines: Building and Leading Data Science and Analytics Teams

What is my book about?

In Minding the Machines: Building and Leading Data Science and Analytics Teams, AI and analytics strategy expert Jeremy Adamson delivers an accessible and insightful roadmap to structuring and leading a successful analytics team. The book explores the tasks, strategies, methods, and frameworks necessary for an organization beginning their first foray into the analytics space or one that is rebooting its team for the umpteenth time in search of success.

Perfect for executives, managers, team leads, and other business leaders tasked with structuring and leading a successful analytics team, Minding the Machines is also an indispensable resource for data scientists and analysts who seek to better understand how their individual efforts fit into their team’s overall results.

Machine Learning For Absolute Beginners

By Oliver Theobald,

Book cover of Machine Learning For Absolute Beginners: A Plain English Introduction

This could be the first stop of your brand new machine learning journey. I personally like how the technical concept is translated into plain English – each chapter starts with a high-level overview of a ML algorithm or methodology, concise and clear, followed by lots of visual examples and real world scenarios. I can guarantee you won’t get lost halfway. The book focuses on getting you introduced to ML with minimal math. But if you want to grasp some more of math, the next book I recommend is waiting for you. 


Who am I?

I have been a machine learning engineer applying my ML expertise in computational advertising, and search domain. I am an author of 8 machine learning books. My first book was ranked the #1 bestseller in its category on Amazon in 2017 and 2018 and was translated into many languages. I am also a ML education enthusiast and used to teach ML courses in Toronto, Canada.  


I wrote...

Python Machine Learning By Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn

By Yuxi (Hayden) Liu,

Book cover of Python Machine Learning By Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn

What is my book about?

Python Machine Learning By Example begins with an introduction to important ML concepts and implementations using Python. Each chapter of the book walks you through an industry adopted application. At the same time, this book provides actionable insights into the key fundamentals of ML with Python. 

With the help of this extended and updated 3rd edition, you’ll understand how to tackle data-driven problems and implement your solutions with popular Python packages such as TensorFlow, PyTorch, scikit-learn, and Keras. To aid your understanding of popular ML algorithms, the book covers interesting and easy-to-follow examples such as recommendation engine, stock price prediction with artificial neural networks, clothing categorization, sequence prediction, decision making leveraging reinforcement learning, and more. Hayden applies his expertise to demonstrate implementations of algorithms in Python, both from scratch and with libraries. 

By the end of the book, you’ll have put together a broad picture of the ML ecosystem and will be well-versed with the best practices of applying ML techniques to make the most out of new opportunities.

Introduction to Machine Learning with Python

By Andreas C. Müller, Sarah Guido,

Book cover of Introduction to Machine Learning with Python: A Guide for Data Scientists

This book is more advanced than the first book I recommended. It presents ML theoretical and practical aspects step-by-step from the bottom up. Each chapter elaborates at length on a core building block in the ML life cycle. For example, feature engineering, supervised learning, and model evaluation have their own separate chapters, with intuitive discussions of how they work. Most of the concept is taught through the simple yet powerful Python Module Scikit-Learn so it won’t overburden you with heavy programming. This book will be perfect for practitioners with some understanding of statistics and linear algebra.


Who am I?

I have been a machine learning engineer applying my ML expertise in computational advertising, and search domain. I am an author of 8 machine learning books. My first book was ranked the #1 bestseller in its category on Amazon in 2017 and 2018 and was translated into many languages. I am also a ML education enthusiast and used to teach ML courses in Toronto, Canada.  


I wrote...

Python Machine Learning By Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn

By Yuxi (Hayden) Liu,

Book cover of Python Machine Learning By Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn

What is my book about?

Python Machine Learning By Example begins with an introduction to important ML concepts and implementations using Python. Each chapter of the book walks you through an industry adopted application. At the same time, this book provides actionable insights into the key fundamentals of ML with Python. 

With the help of this extended and updated 3rd edition, you’ll understand how to tackle data-driven problems and implement your solutions with popular Python packages such as TensorFlow, PyTorch, scikit-learn, and Keras. To aid your understanding of popular ML algorithms, the book covers interesting and easy-to-follow examples such as recommendation engine, stock price prediction with artificial neural networks, clothing categorization, sequence prediction, decision making leveraging reinforcement learning, and more. Hayden applies his expertise to demonstrate implementations of algorithms in Python, both from scratch and with libraries. 

By the end of the book, you’ll have put together a broad picture of the ML ecosystem and will be well-versed with the best practices of applying ML techniques to make the most out of new opportunities.

Fundamentals of Machine Learning for Predictive Data Analytics, Second Edition

By John D. Kelleher, Brian Mac Namee, Aoife D'Arcy

Book cover of Fundamentals of Machine Learning for Predictive Data Analytics, Second Edition: Algorithms, Worked Examples, and Case Studies

Another practical book that I highly recommend. Its intuitive structure is the first thing I like about it. It gives you a comprehensive walkthrough of the ML workflow, from data exploration to learning. It covers abundant practical guides that get you prepared for real world challenges, such as how to handle outliers and to impute missing data. As a ML practitioner, I appreciate the dedicated case studies throughout the entire book. They really excite learners for future real world applications.


Who am I?

I have been a machine learning engineer applying my ML expertise in computational advertising, and search domain. I am an author of 8 machine learning books. My first book was ranked the #1 bestseller in its category on Amazon in 2017 and 2018 and was translated into many languages. I am also a ML education enthusiast and used to teach ML courses in Toronto, Canada.  


I wrote...

Python Machine Learning By Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn

By Yuxi (Hayden) Liu,

Book cover of Python Machine Learning By Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn

What is my book about?

Python Machine Learning By Example begins with an introduction to important ML concepts and implementations using Python. Each chapter of the book walks you through an industry adopted application. At the same time, this book provides actionable insights into the key fundamentals of ML with Python. 

With the help of this extended and updated 3rd edition, you’ll understand how to tackle data-driven problems and implement your solutions with popular Python packages such as TensorFlow, PyTorch, scikit-learn, and Keras. To aid your understanding of popular ML algorithms, the book covers interesting and easy-to-follow examples such as recommendation engine, stock price prediction with artificial neural networks, clothing categorization, sequence prediction, decision making leveraging reinforcement learning, and more. Hayden applies his expertise to demonstrate implementations of algorithms in Python, both from scratch and with libraries. 

By the end of the book, you’ll have put together a broad picture of the ML ecosystem and will be well-versed with the best practices of applying ML techniques to make the most out of new opportunities.

Power Pivot and Power BI

By Rob Collie, Avichal Singh,

Book cover of Power Pivot and Power BI: The Excel User's Guide to DAX, Power Query, Power BI & Power Pivot in Excel 2010-2016

Rob Collie was a pioneer at Microsoft. After leaving the Excel team, he helped architect the tools that would become Power Pivot and Power BI. He left Microsoft and started his own consultancy, helping big companies answer important questions in just a few hours with Power Pivot.

His books are the best-selling books in the category. 

While Matt Allington will get you up to speed, Rob will explain every nuance of Power Pivot, DAX, and Power BI.


Who am I?

I’ve been running the MrExcel website since 1998 and have written 66 books about Excel. I am an Excel generalist – I know a fair amount about almost every aspect of Excel. But I respect the specialists who become experts on one part of Excel and offer deep knowledge dives into those portions of Excel. Cleaning data with Power Query, calculating “impossible” calculations with DAX, and then presenting them on interactive dashboards are some of the deep dives that you will learn on this list.


I wrote...

Power Excel 2019 with MrExcel: Master Pivot Tables, Subtotals, VLOOKUP, Power Query, Dynamic Arrays & Data Analysis

By Bill Jelen,

Book cover of Power Excel 2019 with MrExcel: Master Pivot Tables, Subtotals, VLOOKUP, Power Query, Dynamic Arrays & Data Analysis

What is my book about?

657 Excel mysteries solved. Direct from MrExcel himself, Bill Jelen. If you use Excel 10, 20, 40 hours per week, you will save 50 hours per year with the tricks in this book.

Competing on Analytics

By Thomas H. Davenport, Jeanne G. Harris,

Book cover of Competing on Analytics: The New Science of Winning

This is a foundational book on analytics and data science as a business function and helped to shape the development of the practice. It provides a view of the discipline through a business lens and avoids deep technical examinations. Though much has changed in the 15 years since it was originally published, it is still essential reading for a leader in the field. No book since has captured as well the competitive differentiation that analytics provides.


Who am I?

I am a leader in analytics and AI strategy, and have a broad range of experience in aviation, energy, financial services, and the public sector.  I have worked with several major organizations to help them establish a leadership position in data science and to unlock real business value using advanced analytics. 


I wrote...

Minding the Machines: Building and Leading Data Science and Analytics Teams

By Jeremy Adamson,

Book cover of Minding the Machines: Building and Leading Data Science and Analytics Teams

What is my book about?

In Minding the Machines: Building and Leading Data Science and Analytics Teams, AI and analytics strategy expert Jeremy Adamson delivers an accessible and insightful roadmap to structuring and leading a successful analytics team. The book explores the tasks, strategies, methods, and frameworks necessary for an organization beginning their first foray into the analytics space or one that is rebooting its team for the umpteenth time in search of success.

Perfect for executives, managers, team leads, and other business leaders tasked with structuring and leading a successful analytics team, Minding the Machines is also an indispensable resource for data scientists and analysts who seek to better understand how their individual efforts fit into their team’s overall results.

R for Data Science

By Hadley Wickham, Garrett Grolemund,

Book cover of R for Data Science: Import, Tidy, Transform, Visualize, and Model Data

For those intending to use R with an eye on the popular 'Tidyverse' suite of packages – which facilitate the handling, manipulation, and visualisation of data setsit's hard to go past this book. From the founding contributors of the RStudio/Tidyverse worlds, this is a great way to learn about this dialect of R against the overarching backdrop of statistical data analysis and data science.


Who am I?

I’m an applied statistician and academic researcher/lecturer at New Zealand’s oldest university – the University of Otago. R facilitates everything I do – research, academic publication, and teaching. It’s the latter part of my job that motivated my own book on R. From first-year statistics students who have never seen R to my own Ph.D. students using R to implement novel and highly complex statistical methods and models, my experience is that all ultimately love the ease with which the R language permits exploration, visualisation, analysis, and inference of one’s data. The ever-growing need in today’s society for skilled statisticians and data scientists means there's never been a better time to learn this essential language.


I wrote...

The Book of R: A First Course in Programming and Statistics

By Tilman M. Davies,

Book cover of The Book of R: A First Course in Programming and Statistics

What is my book about?

The Book of R is a comprehensive, beginner-friendly guide to R, the world’s most popular programming language for statistical analysis. Even if you have no programming experience and little more than a grounding in the basics of mathematics, you’ll find everything you need to begin using R effectively for statistical analysis.

You’ll start with the basics, like how to handle data and write simple programs, before moving on to more advanced topics, like producing statistical summaries of your data and performing statistical tests and modelling. You’ll even learn how to create impressive data visualisations with R’s basic graphics tools and contributed packages, like ggplot2 and ggvis, as well as interactive 3D visualisations using the rgl package.

Programming Collective Intelligence

By Toby Segaran,

Book cover of Programming Collective Intelligence: Building Smart Web 2.0 Applications

This was my favorite book when I started my career. It talks about how information is processed, in an intelligent way, in the internet age. It acts as a tutorial to teach developers how to code our own ML programs, from online dating services, to document analyzer, and search engine. The author did an excellent job of explaining abstract ML algorithms with clear examples. His coding style in Python reads clearly, which makes the book more beginner-friendly.

Don’t get disappointed when you know this book is more than a decade old. It was a visionary book back in the day and it is still relevant today.


Who am I?

I have been a machine learning engineer applying my ML expertise in computational advertising, and search domain. I am an author of 8 machine learning books. My first book was ranked the #1 bestseller in its category on Amazon in 2017 and 2018 and was translated into many languages. I am also a ML education enthusiast and used to teach ML courses in Toronto, Canada.  


I wrote...

Python Machine Learning By Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn

By Yuxi (Hayden) Liu,

Book cover of Python Machine Learning By Example: Build intelligent systems using Python, TensorFlow 2, PyTorch, and scikit-learn

What is my book about?

Python Machine Learning By Example begins with an introduction to important ML concepts and implementations using Python. Each chapter of the book walks you through an industry adopted application. At the same time, this book provides actionable insights into the key fundamentals of ML with Python. 

With the help of this extended and updated 3rd edition, you’ll understand how to tackle data-driven problems and implement your solutions with popular Python packages such as TensorFlow, PyTorch, scikit-learn, and Keras. To aid your understanding of popular ML algorithms, the book covers interesting and easy-to-follow examples such as recommendation engine, stock price prediction with artificial neural networks, clothing categorization, sequence prediction, decision making leveraging reinforcement learning, and more. Hayden applies his expertise to demonstrate implementations of algorithms in Python, both from scratch and with libraries. 

By the end of the book, you’ll have put together a broad picture of the ML ecosystem and will be well-versed with the best practices of applying ML techniques to make the most out of new opportunities.

The Elements of Statistical Learning

By Trevor Hastie, Robert Tibshirani, Jerome Friedman

Book cover of The Elements of Statistical Learning: Data Mining, Inference, and Prediction

This book might as well be called Introduction to machine learning, and it is probably one of the only books truly deserving of the title. Did you know neural networks have been used for decades to scan checks at the bank? They are called Boltzman Machine. Have you ever heard of how decision trees were used in old-school data mining? You could only get them from proprietary software packages from the early 2000s.

In quant trading, you will constantly face compute power constraints, so it is invaluable to understand the mathematical foundations of the most old-school machine learning methods out there. Researchers 20 years ago used to do a lot of impressive work with a lot less computing power.


Who am I?

I am a financial data scientist. I think it is important that data scientists are highly specialized if they want to be effective in their careers. I run a business called Conlan Scientific out of Charlotte, NC where me and my team of financial data scientists tackle complicated machine learning problems for our clients. Quant trading is a gladiator’s arena of financial data science. Anyone can try it, but few succeed at it. I am sharing my top five list of math books that are essential to success in this field. I hope you enjoy.


I wrote...

Algorithmic Trading with Python: Quantitative Methods and Strategy Development

By Chris Conlan,

Book cover of Algorithmic Trading with Python: Quantitative Methods and Strategy Development

What is my book about?

Algorithmic Trading with Python discusses modern quant trading methods in Python with a heavy focus on pandas, numpy, and scikit-learn. After establishing an understanding of technical indicators and performance metrics, readers will walk through the process of developing a trading simulator, strategy optimizer, and financial machine learning pipeline. 

This book maintains a high standard of reproducibility. All code and data are self-contained in a GitHub repo. The data includes hyper-realistic simulated price data and alternative data based on real securities. 

Or, view all 11 books about data mining

New book lists related to data mining

All book lists related to data mining

Bookshelves related to data mining