The best books for data scientists trying to be ethical people

Aubrey Clayton Author Of Bernoulli's Fallacy: Statistical Illogic and the Crisis of Modern Science
By Aubrey Clayton

Who am I?

I studied statistics and data science for years before anyone ever suggested to me that these topics might have an ethical dimension, or that my numerical tools were products of human beings with motivations specific to their time and place. I’ve since written about the history and philosophy of mathematical probability and statistics, and I’ve come to understand just how important that historical background is and how critically important it is that the next generation of data scientists understand where these ideas come from and their potential to do harm. I hope anyone who reads these books avoids getting blinkered by the ideas that data = objectivity and that science is morally neutral.


I wrote...

Bernoulli's Fallacy: Statistical Illogic and the Crisis of Modern Science

By Aubrey Clayton,

Book cover of Bernoulli's Fallacy: Statistical Illogic and the Crisis of Modern Science

What is my book about?

There is a logical flaw in the statistical methods used across experimental science. This fault is not a minor academic quibble: it underlies a reproducibility crisis now threatening entire disciplines. In an increasingly statistics-reliant society, this same deeply rooted error shapes decisions in medicine, law, and public policy with profound consequences. The foundation of the problem is a misunderstanding of probability and its role in making inferences from observations.

Aubrey Clayton traces the history of how statistics went astray, beginning with the groundbreaking work of the seventeenth-century mathematician Jacob Bernoulli and winding through gambling, astronomy, and genetics. Clayton recounts the feuds among rival schools of statistics, exploring the surprisingly human problems that gave rise to the discipline and the all-too-human shortcomings that derailed it. 

The books I picked & why

Shepherd is reader supported. We may earn an affiliate commission when you buy through links on our website. This is how we fund this project for readers and authors (learn more).

Social Sciences as Sorcery

By Stanislav Andreski,

Book cover of Social Sciences as Sorcery

Why this book?

This book is now 50 years old, but its message is as relevant and important now as when it was written. In a series of witty essays that border on rants, Andreski attacks much of social science as fluff obscured by technical jargon and methodology. In particular, he laments the growth of quantitative methods as an attempt to add objectivity to social science and make it appear “harder.” True objectivity is about more than mechanical number-crunching, he says; it’s about a commitment to fairness and resisting the temptations of wishful thinking – a challenge anyone who works with data concerning people and their lives should take seriously.


Biology as Ideology: The Doctrine of DNA

By Richard C. Lewontin,

Book cover of Biology as Ideology: The Doctrine of DNA

Why this book?

People need less Dawkins in their lives and more Lewontin, whose thought-provoking, accessible writing about evolutionary biology stands in fierce opposition to the trend toward genetic determinism that seems to be the rage nowadays. We are not simply our genes, Lewontin says, because the effects DNA has on our lives are mediated by social and environmental factors, many of which we can influence. While it’s nominally about biology, I also read this as a critique of causal inference, generally. What we consider a “cause” reveals our ideological commitments to certain aspects of the world being maintained, and we should be careful what causal lessons we draw from data.


The Golem: What You Should Know about Science

By Harry M. Collins, Trevor Pinch,

Book cover of The Golem: What You Should Know about Science

Why this book?

The thing you should know about science is that it’s a human enterprise. As a result, it’s dependent on human factors like social consensus and prejudice. In this series of case studies of famously expensive and difficult-to-replicate experiments probing the limits of scientific understanding from biology to theoretical physics, Collins and Pinch show how scientific knowledge gathering is rarely straightforward because there are always alternative explanations available for the data. Was the phenomenon real or was the experiment set up badly? We can never know for sure, but we decide collectively what we believe. Scientists are experts participating in human culture, they argue, not mysterious clergy issuing declarations of absolute truth.


Superior: The Return of Race Science

By Angela Saini,

Book cover of Superior: The Return of Race Science

Why this book?

The fact that race is a social construct and not a biological reality seems to be a lesson that we are destined to learn and re-learn many times. Saini uses a personal, journalistic style to tell the story of the pernicious myth of biological race in the sciences, drawing a continuous line from scientific racists like Francis Galton in the 1800s to present-day medicine and right-wing politics. The story is alternately funny and horrifying, with incredibly timely significance. It should be read by all data-adjacent individuals as a cautionary tale about avoiding the mistakes of the past and present. 


Data Feminism

By Catherine D'Ignazio, Lauren F. Klein,

Book cover of Data Feminism

Why this book?

If you’ve never thought of “intersectional feminism” or “the gender binary” as essentially data-scientific terms, please allow this book to correct that. Data science is a locus of power, and that power can be wielded in the service of oppression or liberation. This book raises essential questions about the predominantly white, male, technocratic interests served by the traditional narratives of data analysis and what feminism and data science have to offer each other. Bottom line: the data doesn’t speak for itself, never has, and never will.


5 book lists we think you will like!

Interested in data science, eugenics, and methodology?

5,716 authors have recommended their favorite books and what they love about them. Browse their picks for the best books about data science, eugenics, and methodology.

Data Science Explore 21 books about data science
Eugenics Explore 13 books about eugenics
Methodology Explore 24 books about methodology

And, 3 books we think you will enjoy!

We think you will like Stand Out of Our Light, Algorithms of Oppression, and The Utopia of Rules if you like this list.