The most recommended big data books

Who picked these books? Meet our 34 experts.

34 authors created a book list connected to big data, and here are their favorite big data books.
When you buy books, we may earn a commission that helps keep our lights on (or join the rebellion as a member).

What type of big data book?

Loading...
Loading...

Book cover of Data Feminism

Aubrey Clayton Author Of Bernoulli's Fallacy: Statistical Illogic and the Crisis of Modern Science

From my list on for data scientists trying to be ethical people.

Why am I passionate about this?

I studied statistics and data science for years before anyone ever suggested to me that these topics might have an ethical dimension, or that my numerical tools were products of human beings with motivations specific to their time and place. I’ve since written about the history and philosophy of mathematical probability and statistics, and I’ve come to understand just how important that historical background is and how critically important it is that the next generation of data scientists understand where these ideas come from and their potential to do harm. I hope anyone who reads these books avoids getting blinkered by the ideas that data = objectivity and that science is morally neutral.

Aubrey's book list on for data scientists trying to be ethical people

Aubrey Clayton Why did Aubrey love this book?

If you’ve never thought of “intersectional feminism” or “the gender binary” as essentially data-scientific terms, please allow this book to correct that. Data science is a locus of power, and that power can be wielded in the service of oppression or liberation. This book raises essential questions about the predominantly white, male, technocratic interests served by the traditional narratives of data analysis and what feminism and data science have to offer each other. Bottom line: the data doesn’t speak for itself, never has, and never will.

By Catherine D'Ignazio, Lauren F. Klein,

Why should I read it?

1 author picked Data Feminism as one of their favorite books, and they share why you should read it.

What is this book about?

A new way of thinking about data science and data ethics that is informed by the ideas of intersectional feminism.

Today, data science is a form of power. It has been used to expose injustice, improve health outcomes, and topple governments. But it has also been used to discriminate, police, and surveil. This potential for good, on the one hand, and harm, on the other, makes it essential to ask: Data science by whom? Data science for whom? Data science with whose interests in mind? The narratives around big data and data science are overwhelmingly white, male, and techno-heroic. In…


Book cover of Predict and Surveil: Data, Discretion, and the Future of Policing

Luke Hunt Author Of Police Deception and Dishonesty: The Logic of Lying

From my list on the cluster-f*ck we call policing.

Why am I passionate about this?

I’m an Associate Professor in the University of Alabama’s Department of Philosophy. I worked as an FBI Special Agent before making the natural transition to academic philosophy. Being a professor was always a close second to Quantico, but that scene in Point Break in which Keanu Reeves and Patrick Swayze fight Anthony Kiedis on the beach made it seem like the FBI would be more fun than academia. In my current position as a professor at the University of Alabama, I teach in my department’s Jurisprudence Specialization. My primary research interests are at the intersection of philosophy of law, political philosophy, and criminal justice. I’ve written three books on policing.

Luke's book list on the cluster-f*ck we call policing

Luke Hunt Why did Luke love this book?

I love this book because it reminds us of the many ways that technology can affect justice.

It is tempting to think sophisticated tactics such as “predictive policing” can solve all problems relating to human bias. However, Brayne shows that data and algorithms do not eliminate bias and discretion. Instead, high-tech police tools simply make bias less overt and visible, which erodes the public’s ability to hold the police accountable.

I especially enjoyed how the book flips the script, considering diverse ways to use these tools to help the public. For example, how can municipalities use technology to analyze the underlying factors that contribute to policing problems in the first place?

By Sarah Brayne,

Why should I read it?

1 author picked Predict and Surveil as one of their favorite books, and they share why you should read it.

What is this book about?

The scope of criminal justice surveillance, from the police to the prisons, has expanded rapidly in recent decades. At the same time, the use of big data has spread across a range of fields, including finance, politics, health, and marketing. While law enforcement's use of big data is hotly contested, very little is known about how the police actually use it in daily operations and with what consequences.

In Predict and Surveil, Sarah Brayne offers an unprecedented, inside look at how police use big data and new surveillance technologies, leveraging on-the-ground fieldwork with one of the most technologically advanced law…


Book cover of Discriminating Data: Correlation, Neighborhoods, and the New Politics of Recognition

David Theo Goldberg Author Of The Threat of Race: Reflections on Racial Neoliberalism

From my list on spotlighting race and neoliberalization.

Why am I passionate about this?

I grew up and completed the formative years of my college education in Cape Town, South Africa, while active also in anti-apartheid struggles. My Ph.D. dissertation in the 1980s focused on the elaboration of key racial ideas in the modern history of philosophy. I have published extensively on race and racism in the U.S. and globally, in books, articles, and public media. My interests have especially focused on the transforming logics and expressions of racism over time, and its updating to discipline and constrain its conventional targets anew and new targets more or less conventionally. My interest has always been to understand racism in order to face it down.

David's book list on spotlighting race and neoliberalization

David Theo Goldberg Why did David love this book?

Digital technology, like technology generally, is commonly assumed to be value neutral. Wendy Chun reveals that structurally embedded in digital operating systems and data collection are values that reproduce and extend existing modes of discriminating while also originating new ones. In prompting and promoting the grouping together of people who are alike—in habits, culture, looks, and preferences—the logic of the algorithm reproduces and amplifies discriminatory trends. Chun reveals how the logics of the digital reinforce the restructuring of racism by the neoliberal turn that my own book lays out.

By Wendy Hui Kyong Chun, Alex Barnett (illustrator),

Why should I read it?

1 author picked Discriminating Data as one of their favorite books, and they share why you should read it.

What is this book about?

How big data and machine learning encode discrimination and create agitated clusters of comforting rage.

In Discriminating Data, Wendy Hui Kyong Chun reveals how polarization is a goal—not an error—within big data and machine learning. These methods, she argues, encode segregation, eugenics, and identity politics through their default assumptions and conditions. Correlation, which grounds big data’s predictive potential, stems from twentieth-century eugenic attempts to “breed” a better future. Recommender systems foster angry clusters of sameness through homophily. Users are “trained” to become authentically predictable via a politics and technology of recognition. Machine learning and data analytics thus seek to disrupt…


Book cover of Out of the Crisis

Steve Fenton Author Of Web Operations Dashboards, Monitoring, & Alerting

From my list on DevOps from before DevOps was invented.

Why am I passionate about this?

I'm a programmer and technical author at Octopus Deploy and I'm deeply interested in DevOps. Since the 1950s, people have been studying software delivery in search of better ways of working. We’ve seen many revolutions since Lincoln Labs first introduced us to phased delivery, with lightweight methods transforming how we wrote software at the turn of the century. My interest in DevOps goes beyond my enthusiasm for methods in general, because we now have a great body of research that adds to our empirical observations on the ways we work.

Steve's book list on DevOps from before DevOps was invented

Steve Fenton Why did Steve love this book?

Before Agile and Lean had rocked the software development industry, William Deming was busy forging this new world of work.

Out of the Crisis is predominantly a management book, but it’s really the spark that started the lightweight movement in software delivery. A key concept in the book is how to identify the work system's performance, separate from the performance of individuals.

By W. Edwards Deming,

Why should I read it?

3 authors picked Out of the Crisis as one of their favorite books, and they share why you should read it.

What is this book about?

Essential reading for managers and leaders, this is the classic work on management, problem solving, quality control, and more—based on the famous theory, 14 Points for Management

In his classic Out of the Crisis, W. Edwards Deming describes the foundations for a completely new and transformational way to lead and manage people, processes, and resources. Translated into twelve languages and continuously in print since its original publication, it has proved highly influential. Research shows that Deming’s approach has high levels of success and sustainability. Readers today will find Deming’s insights relevant, significant, and effective in business thinking and practice. This…


Book cover of The Genome War: How Craig Venter Tried to Capture the Code of Life and Save the World

Roger Highfield Author Of The Dance of Life: Symmetry, Cells and How We Become Human

From my list on what big data is and how it impacts us.

Why am I passionate about this?

I’m the Science Director of the Science Museum Group, based at the Science Museum in London, and visiting professor at the Dunn School, University of Oxford, and Department of Chemistry, University College London. Every time I write a book I swear that it will be my last and yet I'm now working on my ninth, after earlier forays into the physics of Christmas and the love life of Albert Einstein. Working with Peter Coveney of UCL, we're exploring ideas about computation and complexity we tackled in our two earlier books, along with the revolutionary implications of creating digital twins of people from the colossal amount of patient data now flowing from labs worldwide.

Roger's book list on what big data is and how it impacts us

Roger Highfield Why did Roger love this book?

This might not look like a big data book but, for me, the race to read the human genome marks the birth of big data in biology, in the form of a tsunami of DNA sequencing data. I edited Craig Venter’s A Life Decoded, the first genetic autobiography, which explored the implications of becoming the first person to gaze upon all six billion ‘letters’ of their own genetic code. While working on Craig’s extraordinary story I came across The Genome War and thought James Shreeve did a brilliant job in describing the drama, rivalry, and personalities in the race to sequence the very first human genomes between government-backed scientists and Celera, Craig’s company.

By James Shreeve,

Why should I read it?

2 authors picked The Genome War as one of their favorite books, and they share why you should read it.

What is this book about?

The long-awaited story of the science, the business, the politics, the intrigue behind the scenes of the most ferocious competition in the history of modern science—the race to map the human genome.
On May 10, 1998, biologist Craig Venter, director of the Institute for Genomic Research, announced that he was forming a private company that within three years would unravel the complete genetic code of human life—seven years before the projected finish of the U.S. government’s Human Genome Project. Venter hoped that by decoding the genome ahead of schedule, he would speed up the pace of biomedical research and save…


Book cover of Advanced Analytics with Spark: Patterns for Learning from Data at Scale

Tomasz Lelek Author Of Software Mistakes and Tradeoffs: How to make good programming decisions

From my list on big data processing ecosystem.

Why am I passionate about this?

I am motivated by working on products that many people use. I've been a part of companies that deliver products impacting millions of people. To achieve it, I am working in the Big Data ecosystem and striving to simplify it by contributing to Dremio's Data LakeHouse solution. I worked on projects using Spark, HDFS, Cassandra, and Kafka technologies. I have been working in the software engineering industry for ten years now, and I've tried to share my experience and lessons learned in the Software Mistakes and Tradeoffs book, hoping that it will allow current and the next generation of engineers to create better software, leading to more happy users.

Tomasz's book list on big data processing ecosystem

Tomasz Lelek Why did Tomasz love this book?

Apache Spark has a very high point of entry for newcomers to the Big Data ecosystem.

However, it is a key tool that almost everyone is using for running distributed processing. I recommend everyone to read this book before delving into production solutions based on Apache Spark.

This book will allow you to alleviate many spark problems, such as serialization, memory utilization, and parallelization of processing.

By Sandy Ryza, Uri Laserson, Sean Owen , Josh Wills

Why should I read it?

1 author picked Advanced Analytics with Spark as one of their favorite books, and they share why you should read it.

What is this book about?

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You'll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques-classification, collaborative filtering, and anomaly detection among others-to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you'll find these patterns useful for…


Book cover of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 3e: Concepts, Tools, and Techniques to Build Intelligent Systems

Tomasz Lelek Author Of Software Mistakes and Tradeoffs: How to make good programming decisions

From my list on big data processing ecosystem.

Why am I passionate about this?

I am motivated by working on products that many people use. I've been a part of companies that deliver products impacting millions of people. To achieve it, I am working in the Big Data ecosystem and striving to simplify it by contributing to Dremio's Data LakeHouse solution. I worked on projects using Spark, HDFS, Cassandra, and Kafka technologies. I have been working in the software engineering industry for ten years now, and I've tried to share my experience and lessons learned in the Software Mistakes and Tradeoffs book, hoping that it will allow current and the next generation of engineers to create better software, leading to more happy users.

Tomasz's book list on big data processing ecosystem

Tomasz Lelek Why did Tomasz love this book?

The Hands-on Machine Learning book presents an end-to-end approach to many problems that can be solved with machine learning.

Every concept and topic is backed up with a running code that you can experiment with and adapt to your real-world problems.

Thanks to this book, you will be able to understand the state of the art of today's machine learning and feel comfortable using the most up-to-date ML methods.

By Géron Aurélien,

Why should I read it?

1 author picked Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 3e as one of their favorite books, and they share why you should read it.

What is this book about?

Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This best-selling book uses concrete examples, minimal theory, and production-ready Python frameworks--scikit-learn, Keras, and TensorFlow--to help you gain an intuitive understanding of the concepts and tools for building intelligent systems.

With this updated third edition, author Aurelien Geron explores a range of techniques, starting with simple linear regression and progressing to deep neural networks. Numerous code examples and exercises throughout…


Book cover of Dear Data

Roger Highfield Author Of The Dance of Life: Symmetry, Cells and How We Become Human

From my list on what big data is and how it impacts us.

Why am I passionate about this?

I’m the Science Director of the Science Museum Group, based at the Science Museum in London, and visiting professor at the Dunn School, University of Oxford, and Department of Chemistry, University College London. Every time I write a book I swear that it will be my last and yet I'm now working on my ninth, after earlier forays into the physics of Christmas and the love life of Albert Einstein. Working with Peter Coveney of UCL, we're exploring ideas about computation and complexity we tackled in our two earlier books, along with the revolutionary implications of creating digital twins of people from the colossal amount of patient data now flowing from labs worldwide.

Roger's book list on what big data is and how it impacts us

Roger Highfield Why did Roger love this book?

Over a single year, Giorgia Lupi, an Italian living in New York, and Stefanie Posavec, an American in London, exchanged hand-drawn postcards to chart the granular details of their lives using clusters, plots, and graphs. We featured the outpourings of these talented “information designers” in a 2016 Science Museum exhibition on big data and these striking images, in turn, paved the way for their book, Dear Data, which provides a remarkable portrait of these artists. An intimate and human take on big data that invites us all to ponder how to represent our own lives.   

By Giorgia Lupi, Stefanie Posavec,

Why should I read it?

2 authors picked Dear Data as one of their favorite books, and they share why you should read it.

What is this book about?

From an award-winning project comes an inspiring, collaborative book that makes data artistic, personal - and open to all

Each week for a year, Giorgia and Stefanie sent each other a postcard describing what had happened to them during that week around a particular theme. But they didn't write it, they drew it: a week of smiling, a week of apologies, a week of desires.

Presenting their fifty-two cards, along with thoughts and ideas about the data-drawing process, Dear Data hopes to inspire you to draw, slow down and make connections with other people, to see the world through a…


Book cover of The Exponential Age: How Accelerating Technology is Transforming Business, Politics and Society

Tom Wheeler Author Of From Gutenberg to Google: The History of Our Future

From my list on today’s roadmap to tomorrow.

Why am I passionate about this?

I have been fortunate to have spent the last 40 years of my professional life dealing with new networks and new technology. From the early days of cable television and mobile communications to the development of digital video and the transmission of data over cable lines and satellite. It was a career topped off with the privilege of being the Chairman of the Federal Communications Commission (FCC) with regulatory responsibly for approximately 1/6th of the American economy (on which the other 5/6s depended). 

Tom's book list on today’s roadmap to tomorrow

Tom Wheeler Why did Tom love this book?

Tech analyst and investor Azeem Azhar concisely pulls together his take on how the arc of technology has moved from linear to exponential both in its development as well as its impact on society and business.

Azhar brings great insight into how exponential growth – creating an “exponential gap” – has put strains not only on businesses, but also on government and society writ large. 

By Azeem Azhar,

Why should I read it?

1 author picked The Exponential Age as one of their favorite books, and they share why you should read it.

What is this book about?

*2021 Financial Times Best Book of the Year*


A bold exploration and call-to-arms over the widening gap between AI, automation, and big data—and our ability to deal with its effects


We are living in the first exponential age.

High-tech innovations are created at dazzling speeds; technological forces we barely understand remake our homes and workplaces; centuries-old tenets of politics and economics are upturned by new technologies. It all points to a world that is getting faster at a dizzying pace.


Azeem Azhar, renowned technology analyst and host of the Exponential View podcast, offers a revelatory new model for understanding how…


Book cover of Forewarned: A Sceptic's Guide to Prediction

David F. Hendry Author Of Forecasting: An Essential Introduction

From my list on getting an insight into forecasting.

Why am I passionate about this?

Accurate and precise forecasting is essential for successful planning and policy from economics to epidemiology. We have been keen to understand why so many forecasts turn out to be highly inaccurate since making dreadful forecasts ourselves, and advising UK government agencies (Treasury, Parliament, Bank of England) during turbulent periods. As simple extrapolation often beats model-based forecasting, we have been developing improved methods that draw on the best aspects of both, and have published more than 60 articles and 6 books attracting more than 6000 citations by other scholars. Our recommended books cover a wide range of forecasting methods—suggesting there is no optimal way to look into the future.

David's book list on getting an insight into forecasting

David F. Hendry Why did David love this book?

When can we trust a forecast? Given how often forecasts end up being very wide of the mark, a degree of scepticism might well be warranted. Paul Goodwin provides an entertaining account of forecasting, arguing that intuition may serve us well in some settings, but that computer-based analysis of big data might be expected to prevail in others.        

By Paul Goodwin,

Why should I read it?

1 author picked Forewarned as one of their favorite books, and they share why you should read it.

What is this book about?

Whether it's an unforeseen financial crash, a shock election result or a washout summer that threatens to ruin a holiday in the sun, forecasts are part and parcel of our everyday lives. We rely wholeheartedly on them, and become outraged when things don't go exactly to plan.

But should we really put so much trust in predictions? Perhaps gut instincts can trump years of methodically compiled expert knowledge? And when exactly is a forecast not a forecast? Forewarned will answer all of these intriguing questions, and many more.

Packed with fun anecdotes and startling facts, Forewarned is a myth-busting guide…


Book cover of Data Feminism
Book cover of Predict and Surveil: Data, Discretion, and the Future of Policing
Book cover of Discriminating Data: Correlation, Neighborhoods, and the New Politics of Recognition

Share your top 3 reads of 2024!

And get a beautiful page showing off your 3 favorite reads.

1,351

readers submitted
so far, will you?