30 books like Database Internals

By Alex Petrov,

Here are 30 books that Database Internals fans have personally recommended if you like Database Internals. Shepherd is a community of 10,000+ authors and super readers sharing their favorite books with the world.

Shepherd is reader supported. When you buy books, we may earn an affiliate commission.

Book cover of Designing Data-Intensive Applications: The Big Ideas Behind Reliable, Scalable, and Maintainable Systems

Tomasz Lelek Author Of Software Mistakes and Tradeoffs: How to make good programming decisions

From my list on big data processing ecosystem.

Why am I passionate about this?

I am motivated by working on products that many people use. I've been a part of companies that deliver products impacting millions of people. To achieve it, I am working in the Big Data ecosystem and striving to simplify it by contributing to Dremio's Data LakeHouse solution. I worked on projects using Spark, HDFS, Cassandra, and Kafka technologies. I have been working in the software engineering industry for ten years now, and I've tried to share my experience and lessons learned in the Software Mistakes and Tradeoffs book, hoping that it will allow current and the next generation of engineers to create better software, leading to more happy users.

Tomasz's book list on big data processing ecosystem

Tomasz Lelek Why did Tomasz love this book?

Designing Data-Intensive Applications is the best book if you want to learn about the main principles behind every system that is able to store and process big amounts of data.

You'll learn about distributed storage systems, their tradeoffs (availability, consistency, fault-tolerance), streaming processing systems, and main algorithms.

Those are the critical concepts behind almost every successful company that needs to create scalable solutions. 

By Martin Kleppmann,

Why should I read it?

1 author picked Designing Data-Intensive Applications as one of their favorite books, and they share why you should read it.

What is this book about?

Data is at the center of many challenges in system design today. Difficult issues need to be figured out, such as scalability, consistency, reliability, efficiency, and maintainability. In addition, we have an overwhelming variety of tools, including NoSQL datastores, stream or batch processors, and message brokers. What are the right choices for your application? How do you make sense of all these buzzwords? In this practical and comprehensive guide, author Martin Kleppmann helps you navigate this diverse landscape by examining the pros and cons of various technologies for processing and storing data. Software keeps changing, but the fundamental principles remain…


Book cover of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 3e: Concepts, Tools, and Techniques to Build Intelligent Systems

Tomasz Lelek Author Of Software Mistakes and Tradeoffs: How to make good programming decisions

From my list on big data processing ecosystem.

Why am I passionate about this?

I am motivated by working on products that many people use. I've been a part of companies that deliver products impacting millions of people. To achieve it, I am working in the Big Data ecosystem and striving to simplify it by contributing to Dremio's Data LakeHouse solution. I worked on projects using Spark, HDFS, Cassandra, and Kafka technologies. I have been working in the software engineering industry for ten years now, and I've tried to share my experience and lessons learned in the Software Mistakes and Tradeoffs book, hoping that it will allow current and the next generation of engineers to create better software, leading to more happy users.

Tomasz's book list on big data processing ecosystem

Tomasz Lelek Why did Tomasz love this book?

The Hands-on Machine Learning book presents an end-to-end approach to many problems that can be solved with machine learning.

Every concept and topic is backed up with a running code that you can experiment with and adapt to your real-world problems.

Thanks to this book, you will be able to understand the state of the art of today's machine learning and feel comfortable using the most up-to-date ML methods.

By Géron Aurélien,

Why should I read it?

1 author picked Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 3e as one of their favorite books, and they share why you should read it.

What is this book about?

Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This best-selling book uses concrete examples, minimal theory, and production-ready Python frameworks--scikit-learn, Keras, and TensorFlow--to help you gain an intuitive understanding of the concepts and tools for building intelligent systems.

With this updated third edition, author Aurelien Geron explores a range of techniques, starting with simple linear regression and progressing to deep neural networks. Numerous code examples and exercises throughout…


Book cover of Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale

Tomasz Lelek Author Of Software Mistakes and Tradeoffs: How to make good programming decisions

From my list on big data processing ecosystem.

Why am I passionate about this?

I am motivated by working on products that many people use. I've been a part of companies that deliver products impacting millions of people. To achieve it, I am working in the Big Data ecosystem and striving to simplify it by contributing to Dremio's Data LakeHouse solution. I worked on projects using Spark, HDFS, Cassandra, and Kafka technologies. I have been working in the software engineering industry for ten years now, and I've tried to share my experience and lessons learned in the Software Mistakes and Tradeoffs book, hoping that it will allow current and the next generation of engineers to create better software, leading to more happy users.

Tomasz's book list on big data processing ecosystem

Tomasz Lelek Why did Tomasz love this book?

Apache Kafka is the backbone of almost every streaming-based system today.

The solutions created and implemented in Kafka are the key concepts in every streaming system that you will work with.

This book will allow you to fully understand the Kafka architecture, its internals, and APIs and allow you to become an expert in this technology.

By Neha Narkhede, Gwen Shapira, Todd Palino

Why should I read it?

1 author picked Kafka as one of their favorite books, and they share why you should read it.

What is this book about?

Every enterprise application creates data, whether it's log messages, metrics, user activity, outgoing messages, or something else. And how to move all of this data becomes nearly as important as the data itself. If you're an application architect, developer, or production engineer new to Apache Kafka, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds.

Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you'll learn Kafka's…


Book cover of Advanced Analytics with Spark: Patterns for Learning from Data at Scale

Tomasz Lelek Author Of Software Mistakes and Tradeoffs: How to make good programming decisions

From my list on big data processing ecosystem.

Why am I passionate about this?

I am motivated by working on products that many people use. I've been a part of companies that deliver products impacting millions of people. To achieve it, I am working in the Big Data ecosystem and striving to simplify it by contributing to Dremio's Data LakeHouse solution. I worked on projects using Spark, HDFS, Cassandra, and Kafka technologies. I have been working in the software engineering industry for ten years now, and I've tried to share my experience and lessons learned in the Software Mistakes and Tradeoffs book, hoping that it will allow current and the next generation of engineers to create better software, leading to more happy users.

Tomasz's book list on big data processing ecosystem

Tomasz Lelek Why did Tomasz love this book?

Apache Spark has a very high point of entry for newcomers to the Big Data ecosystem.

However, it is a key tool that almost everyone is using for running distributed processing. I recommend everyone to read this book before delving into production solutions based on Apache Spark.

This book will allow you to alleviate many spark problems, such as serialization, memory utilization, and parallelization of processing.

By Sandy Ryza, Uri Laserson, Sean Owen , Josh Wills

Why should I read it?

1 author picked Advanced Analytics with Spark as one of their favorite books, and they share why you should read it.

What is this book about?

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You'll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques-classification, collaborative filtering, and anomaly detection among others-to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you'll find these patterns useful for…


Book cover of Master Your Data with Power Query in Excel and Power BI: Leveraging Power Query to Get & Transform Your Task Flow

Bill Jelen Author Of Power Excel 2019 with MrExcel: Master Pivot Tables, Subtotals, VLOOKUP, Power Query, Dynamic Arrays & Data Analysis

From my list on to go from Excel to Power Query and Power BI.

Why am I passionate about this?

I’ve been running the MrExcel website since 1998 and have written 66 books about Excel. I am an Excel generalist – I know a fair amount about almost every aspect of Excel. But I respect the specialists who become experts on one part of Excel and offer deep knowledge dives into those portions of Excel. Cleaning data with Power Query, calculating “impossible” calculations with DAX, and then presenting them on interactive dashboards are some of the deep dives that you will learn on this list.

Bill's book list on to go from Excel to Power Query and Power BI

Bill Jelen Why did Bill love this book?

Microsoft quietly slipped the Get & Transform tools onto the Data tab in Excel in 2016. These tools are incredibly powerful – you clean your data once and Excel will remember how to clean your data every month, every week, every day, every hour. Ken Puls and Miguel Escobar will show you all of the best tricks for using these tools.

By Ken Puls, Miguel Escobar,

Why should I read it?

1 author picked Master Your Data with Power Query in Excel and Power BI as one of their favorite books, and they share why you should read it.

What is this book about?

Power Query is the amazing new data cleansing tool in both Excel and Power BI Desktop. Do you find yourself performing the same data cleansing steps day after day? Power Query will make it faster to clean your data the first time. While Power Query is powerful, the interface is subtle—there are tools hiding in plain sight that are easy to miss. Go beyond the obvious and take Power Query to new levels with this book.


Book cover of R for Data Science: Import, Tidy, Transform, Visualize, and Model Data

Tilman M. Davies Author Of The Book of R: A First Course in Programming and Statistics

From my list on intro to programming and data science with R.

Why am I passionate about this?

I’m an applied statistician and academic researcher/lecturer at New Zealand’s oldest university – the University of Otago. R facilitates everything I do – research, academic publication, and teaching. It’s the latter part of my job that motivated my own book on R. From first-year statistics students who have never seen R to my own Ph.D. students using R to implement novel and highly complex statistical methods and models, my experience is that all ultimately love the ease with which the R language permits exploration, visualisation, analysis, and inference of one’s data. The ever-growing need in today’s society for skilled statisticians and data scientists means there's never been a better time to learn this essential language.

Tilman's book list on intro to programming and data science with R

Tilman M. Davies Why did Tilman love this book?

For those intending to use R with an eye on the popular 'Tidyverse' suite of packages – which facilitate the handling, manipulation, and visualisation of data setsit's hard to go past this book. From the founding contributors of the RStudio/Tidyverse worlds, this is a great way to learn about this dialect of R against the overarching backdrop of statistical data analysis and data science.

By Hadley Wickham, Garrett Grolemund,

Why should I read it?

1 author picked R for Data Science as one of their favorite books, and they share why you should read it.

What is this book about?

Learn how to use R to turn raw data into insight, knowledge, and understanding. This book introduces you to R, RStudio, and the tidyverse, a collection of R packages designed to work together to make data science fast, fluent, and fun. Suitable for readers with no previous programming experience, R for Data Science is designed to get you doing data science as quickly as possible. Authors Hadley Wickham and Garrett Grolemund guide you through the steps of importing, wrangling, exploring, and modeling your data and communicating the results. You'll get a complete, big-picture understanding of the data science cycle, along…


Book cover of An Ugly Truth: Inside Facebook's Battle for Domination

Roger Highfield Author Of The Dance of Life: Symmetry, Cells and How We Become Human

From my list on what big data is and how it impacts us.

Why am I passionate about this?

I’m the Science Director of the Science Museum Group, based at the Science Museum in London, and visiting professor at the Dunn School, University of Oxford, and Department of Chemistry, University College London. Every time I write a book I swear that it will be my last and yet I'm now working on my ninth, after earlier forays into the physics of Christmas and the love life of Albert Einstein. Working with Peter Coveney of UCL, we're exploring ideas about computation and complexity we tackled in our two earlier books, along with the revolutionary implications of creating digital twins of people from the colossal amount of patient data now flowing from labs worldwide.

Roger's book list on what big data is and how it impacts us

Roger Highfield Why did Roger love this book?

‘They trust me….dumb f*cks.’ This telling exchange from the Harvard days of Facebook co-founder and CEO, Mark Zuckerberg appears in An Ugly Truth, which shines a harsh light on the tech behemoth that, ultimately, is built on the data of billions of people. As Meta, Zuckerberg’s new business incarnation, wafts into the virtual worlds of the metaverse, the story of Facebook is far from over, which makes this engaging book a tad unsatisfying. Nonetheless, it is a vivid example of how with Big Data comes Big Responsibility.

By Sheera Frenkel, Cecilia Kang,

Why should I read it?

1 author picked An Ugly Truth as one of their favorite books, and they share why you should read it.

What is this book about?

'An explosive new book' Daily Mail

'[A] careful, comprehensive interrogation of every major Facebook scandal. An Ugly Truth provides the kind of satisfaction you might get if you hired a private investigator to track a cheating spouse: it confirms your worst suspicions and then gives you all the dates and details you need to cut through the company's spin' New York Times

__________________________________________

Award-winning New York Times reporters Sheera Frenkel and Cecilia Kang unveil the tech story of our times in this riveting, behind-the-scenes expose that offers the definitive account of Facebook's fall from grace. Once one of Silicon Valley's…


Book cover of Information is Beautiful

Roger Highfield Author Of The Dance of Life: Symmetry, Cells and How We Become Human

From my list on what big data is and how it impacts us.

Why am I passionate about this?

I’m the Science Director of the Science Museum Group, based at the Science Museum in London, and visiting professor at the Dunn School, University of Oxford, and Department of Chemistry, University College London. Every time I write a book I swear that it will be my last and yet I'm now working on my ninth, after earlier forays into the physics of Christmas and the love life of Albert Einstein. Working with Peter Coveney of UCL, we're exploring ideas about computation and complexity we tackled in our two earlier books, along with the revolutionary implications of creating digital twins of people from the colossal amount of patient data now flowing from labs worldwide.

Roger's book list on what big data is and how it impacts us

Roger Highfield Why did Roger love this book?

Big data can be beautiful and visualisations make for a wonderful coffee-table book. In Information is Beautiful, David McCandless turns dry-as-dust data into pop art to show the kind of world we live in, linking politics to life expectancy, women’s education to GDP growth, and more. Through colourful graphics, we get vivid and novel perspectives on current obsessions, from maps of cliches to the most fashionable colours. A testament to how the power of big data comes from being able to distill information to reveal hidden patterns and discern trends. 

By David McCandless,

Why should I read it?

1 author picked Information is Beautiful as one of their favorite books, and they share why you should read it.

What is this book about?

A visual guide to the way the world really works

Every day, every hour, every minute we are bombarded by information - from television, from newspapers, from the internet, we're steeped in it, maybe even lost in it. We need a new way to relate to it, to discover the beauty and the fun of information for information's sake.
No dry facts, theories or statistics. Instead, Information is Beautiful contains visually stunning displays of information that blend the facts with their connections, their context and their relationships - making information meaningful, entertaining and beautiful.
This is information like you have…


Book cover of Forewarned: A Sceptic's Guide to Prediction

David F. Hendry Author Of Forecasting: An Essential Introduction

From my list on getting an insight into forecasting.

Why am I passionate about this?

Accurate and precise forecasting is essential for successful planning and policy from economics to epidemiology. We have been keen to understand why so many forecasts turn out to be highly inaccurate since making dreadful forecasts ourselves, and advising UK government agencies (Treasury, Parliament, Bank of England) during turbulent periods. As simple extrapolation often beats model-based forecasting, we have been developing improved methods that draw on the best aspects of both, and have published more than 60 articles and 6 books attracting more than 6000 citations by other scholars. Our recommended books cover a wide range of forecasting methods—suggesting there is no optimal way to look into the future.

David's book list on getting an insight into forecasting

David F. Hendry Why did David love this book?

When can we trust a forecast? Given how often forecasts end up being very wide of the mark, a degree of scepticism might well be warranted. Paul Goodwin provides an entertaining account of forecasting, arguing that intuition may serve us well in some settings, but that computer-based analysis of big data might be expected to prevail in others.        

By Paul Goodwin,

Why should I read it?

1 author picked Forewarned as one of their favorite books, and they share why you should read it.

What is this book about?

Whether it's an unforeseen financial crash, a shock election result or a washout summer that threatens to ruin a holiday in the sun, forecasts are part and parcel of our everyday lives. We rely wholeheartedly on them, and become outraged when things don't go exactly to plan.

But should we really put so much trust in predictions? Perhaps gut instincts can trump years of methodically compiled expert knowledge? And when exactly is a forecast not a forecast? Forewarned will answer all of these intriguing questions, and many more.

Packed with fun anecdotes and startling facts, Forewarned is a myth-busting guide…


Book cover of The Genome War: How Craig Venter Tried to Capture the Code of Life and Save the World

Roger Highfield Author Of The Dance of Life: Symmetry, Cells and How We Become Human

From my list on what big data is and how it impacts us.

Why am I passionate about this?

I’m the Science Director of the Science Museum Group, based at the Science Museum in London, and visiting professor at the Dunn School, University of Oxford, and Department of Chemistry, University College London. Every time I write a book I swear that it will be my last and yet I'm now working on my ninth, after earlier forays into the physics of Christmas and the love life of Albert Einstein. Working with Peter Coveney of UCL, we're exploring ideas about computation and complexity we tackled in our two earlier books, along with the revolutionary implications of creating digital twins of people from the colossal amount of patient data now flowing from labs worldwide.

Roger's book list on what big data is and how it impacts us

Roger Highfield Why did Roger love this book?

This might not look like a big data book but, for me, the race to read the human genome marks the birth of big data in biology, in the form of a tsunami of DNA sequencing data. I edited Craig Venter’s A Life Decoded, the first genetic autobiography, which explored the implications of becoming the first person to gaze upon all six billion ‘letters’ of their own genetic code. While working on Craig’s extraordinary story I came across The Genome War and thought James Shreeve did a brilliant job in describing the drama, rivalry, and personalities in the race to sequence the very first human genomes between government-backed scientists and Celera, Craig’s company.

By James Shreeve,

Why should I read it?

2 authors picked The Genome War as one of their favorite books, and they share why you should read it.

What is this book about?

The long-awaited story of the science, the business, the politics, the intrigue behind the scenes of the most ferocious competition in the history of modern science—the race to map the human genome.
On May 10, 1998, biologist Craig Venter, director of the Institute for Genomic Research, announced that he was forming a private company that within three years would unravel the complete genetic code of human life—seven years before the projected finish of the U.S. government’s Human Genome Project. Venter hoped that by decoding the genome ahead of schedule, he would speed up the pace of biomedical research and save…


5 book lists we think you will like!

Interested in big data, data mining, and artificial intelligence?

10,000+ authors have recommended their favorite books and what they love about them. Browse their picks for the best books about big data, data mining, and artificial intelligence.

Big Data Explore 29 books about big data
Data Mining Explore 13 books about data mining
Artificial Intelligence Explore 284 books about artificial intelligence