52 books like Designing Data-Intensive Applications

By Martin Kleppmann,

Here are 52 books that Designing Data-Intensive Applications fans have personally recommended if you like Designing Data-Intensive Applications. Shepherd is a community of 12,000+ authors and super readers sharing their favorite books with the world.

When you buy books, we may earn a commission that helps keep our lights on (or join the rebellion as a member).

Book cover of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 3e: Concepts, Tools, and Techniques to Build Intelligent Systems

Tomasz Lelek Author Of Software Mistakes and Tradeoffs: How to make good programming decisions

From my list on big data processing ecosystem.

Why am I passionate about this?

I am motivated by working on products that many people use. I've been a part of companies that deliver products impacting millions of people. To achieve it, I am working in the Big Data ecosystem and striving to simplify it by contributing to Dremio's Data LakeHouse solution. I worked on projects using Spark, HDFS, Cassandra, and Kafka technologies. I have been working in the software engineering industry for ten years now, and I've tried to share my experience and lessons learned in the Software Mistakes and Tradeoffs book, hoping that it will allow current and the next generation of engineers to create better software, leading to more happy users.

Tomasz's book list on big data processing ecosystem

Tomasz Lelek Why did Tomasz love this book?

The Hands-on Machine Learning book presents an end-to-end approach to many problems that can be solved with machine learning.

Every concept and topic is backed up with a running code that you can experiment with and adapt to your real-world problems.

Thanks to this book, you will be able to understand the state of the art of today's machine learning and feel comfortable using the most up-to-date ML methods.

By Géron Aurélien,

Why should I read it?

1 author picked Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 3e as one of their favorite books, and they share why you should read it.

What is this book about?

Through a recent series of breakthroughs, deep learning has boosted the entire field of machine learning. Now, even programmers who know close to nothing about this technology can use simple, efficient tools to implement programs capable of learning from data. This best-selling book uses concrete examples, minimal theory, and production-ready Python frameworks--scikit-learn, Keras, and TensorFlow--to help you gain an intuitive understanding of the concepts and tools for building intelligent systems.

With this updated third edition, author Aurelien Geron explores a range of techniques, starting with simple linear regression and progressing to deep neural networks. Numerous code examples and exercises throughout…


Book cover of Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale

Tomasz Lelek Author Of Software Mistakes and Tradeoffs: How to make good programming decisions

From my list on big data processing ecosystem.

Why am I passionate about this?

I am motivated by working on products that many people use. I've been a part of companies that deliver products impacting millions of people. To achieve it, I am working in the Big Data ecosystem and striving to simplify it by contributing to Dremio's Data LakeHouse solution. I worked on projects using Spark, HDFS, Cassandra, and Kafka technologies. I have been working in the software engineering industry for ten years now, and I've tried to share my experience and lessons learned in the Software Mistakes and Tradeoffs book, hoping that it will allow current and the next generation of engineers to create better software, leading to more happy users.

Tomasz's book list on big data processing ecosystem

Tomasz Lelek Why did Tomasz love this book?

Apache Kafka is the backbone of almost every streaming-based system today.

The solutions created and implemented in Kafka are the key concepts in every streaming system that you will work with.

This book will allow you to fully understand the Kafka architecture, its internals, and APIs and allow you to become an expert in this technology.

By Neha Narkhede, Gwen Shapira, Todd Palino

Why should I read it?

1 author picked Kafka as one of their favorite books, and they share why you should read it.

What is this book about?

Every enterprise application creates data, whether it's log messages, metrics, user activity, outgoing messages, or something else. And how to move all of this data becomes nearly as important as the data itself. If you're an application architect, developer, or production engineer new to Apache Kafka, this practical guide shows you how to use this open source streaming platform to handle real-time data feeds.

Engineers from Confluent and LinkedIn who are responsible for developing Kafka explain how to deploy production Kafka clusters, write reliable event-driven microservices, and build scalable stream-processing applications with this platform. Through detailed examples, you'll learn Kafka's…


Book cover of Advanced Analytics with Spark: Patterns for Learning from Data at Scale

Tomasz Lelek Author Of Software Mistakes and Tradeoffs: How to make good programming decisions

From my list on big data processing ecosystem.

Why am I passionate about this?

I am motivated by working on products that many people use. I've been a part of companies that deliver products impacting millions of people. To achieve it, I am working in the Big Data ecosystem and striving to simplify it by contributing to Dremio's Data LakeHouse solution. I worked on projects using Spark, HDFS, Cassandra, and Kafka technologies. I have been working in the software engineering industry for ten years now, and I've tried to share my experience and lessons learned in the Software Mistakes and Tradeoffs book, hoping that it will allow current and the next generation of engineers to create better software, leading to more happy users.

Tomasz's book list on big data processing ecosystem

Tomasz Lelek Why did Tomasz love this book?

Apache Spark has a very high point of entry for newcomers to the Big Data ecosystem.

However, it is a key tool that almost everyone is using for running distributed processing. I recommend everyone to read this book before delving into production solutions based on Apache Spark.

This book will allow you to alleviate many spark problems, such as serialization, memory utilization, and parallelization of processing.

By Sandy Ryza, Uri Laserson, Sean Owen , Josh Wills

Why should I read it?

1 author picked Advanced Analytics with Spark as one of their favorite books, and they share why you should read it.

What is this book about?

In this practical book, four Cloudera data scientists present a set of self-contained patterns for performing large-scale data analysis with Spark. The authors bring Spark, statistical methods, and real-world data sets together to teach you how to approach analytics problems by example. You'll start with an introduction to Spark and its ecosystem, and then dive into patterns that apply common techniques-classification, collaborative filtering, and anomaly detection among others-to fields such as genomics, security, and finance. If you have an entry-level understanding of machine learning and statistics, and you program in Java, Python, or Scala, you'll find these patterns useful for…


Book cover of Database Internals: A Deep-Dive Into How Distributed Data Systems Work

Tomasz Lelek Author Of Software Mistakes and Tradeoffs: How to make good programming decisions

From my list on big data processing ecosystem.

Why am I passionate about this?

I am motivated by working on products that many people use. I've been a part of companies that deliver products impacting millions of people. To achieve it, I am working in the Big Data ecosystem and striving to simplify it by contributing to Dremio's Data LakeHouse solution. I worked on projects using Spark, HDFS, Cassandra, and Kafka technologies. I have been working in the software engineering industry for ten years now, and I've tried to share my experience and lessons learned in the Software Mistakes and Tradeoffs book, hoping that it will allow current and the next generation of engineers to create better software, leading to more happy users.

Tomasz's book list on big data processing ecosystem

Tomasz Lelek Why did Tomasz love this book?

The Database Internals will allow you to go one step further in your understanding of how distributed databases work.

The author has a lot of experience with one of the most successful distributed databases - Apache Cassandra and shares his knowledge about low-level details and internals of distributed databases.

By Alex Petrov,

Why should I read it?

1 author picked Database Internals as one of their favorite books, and they share why you should read it.

What is this book about?

When it comes to choosing, using, and maintaining a database, understanding its internals is essential. But with so many distributed databases and tools available today, it's often difficult to understand what each one offers and how they differ. With this practical guide, Alex Petrov guides developers through the concepts behind modern database and storage engine internals.

Throughout the book, you'll explore relevant material gleaned from numerous books, papers, blog posts, and the source code of several open source databases. These resources are listed at the end of parts one and two. You'll discover that the most significant distinctions among many…


Book cover of Kubernetes in Action

Magnus Larsson Author Of Microservices with Spring Boot 3 and Spring Cloud: Build resilient and scalable microservices using Spring Cloud, Istio, and Kubernetes

From my list on mastering Java and Spring-based microservices.

Why am I passionate about this?

My passion for developing production-ready, cooperating microservices began in 2008 when I first started assisting customers in creating distributed systems—long before the term “microservices” was coined. During that time, I faced significant challenges, including grappling with the “Eight Fallacies of Distributed Computing”. Since then, I’ve dedicated most of my career to deepening my understanding of these complexities and finding ways to address them through robust architecture, design patterns, and the right tools.

Magnus' book list on mastering Java and Spring-based microservices

Magnus Larsson Why did Magnus love this book?

Kubernetes is the go-to tool for orchestrating a landscape of cooperating microservices, making it a crucial skill to master.

This book guided me through Kubernetes, from the basics, such as pods, services, and deployments, to more advanced topics, like its inner workings and auto-scaling resources. What I particularly appreciate is the balance between theory and practical examples, reinforced by exercises with GitHub-hosted source code, which I also found helpful as a starting point for building real-world applications.

By Marko Luksa,

Why should I read it?

3 authors picked Kubernetes in Action as one of their favorite books, and they share why you should read it.

What is this book about?

Description

With Kubernetes, users don't have to worry about which specific machine in their data center their application is running on. Each layer in their application is decoupled from other layers so they can scale, update, and maintain them independently.

Kubernetes in Action teaches developers how to use Kubernetes to deploy self-healing scalable distributed applications. By the end, readers will be able to build and deploy applications in a proper way to take full advantage of the Kubernetes platform.

Key features

* Easy to follow guide

* Hands-on examples

* Clearly-written

Audience

The book is for both application developers as…


Book cover of Infrastructure as Code: Dynamic Systems for the Cloud Age

Yevgeniy Brikman Author Of Fundamentals of DevOps and Software Delivery: A Hands-On Guide to Deploying and Managing Software in Production

From my list on practical, hands-on books on DevOps and software delivery.

Why am I passionate about this?

I’ve spent more than a decade working on infrastructure, from my early days at LinkedIn, where we had to do a massive DevOps transformation to save the company, to co-founding Gruntwork, where I had the opportunity to work with hundreds of companies on their software delivery practices. From all of this, I can say the following with certainty: the DevOps best practices that a handful of the top tech companies have figured out are not filtering down to the rest of the industry. This is making the entire software industry slower, less effective, and less secure—and I see it as my mission to fix that.

Yevgeniy's book list on practical, hands-on books on DevOps and software delivery

Yevgeniy Brikman Why did Yevgeniy love this book?

This is a book for practitioners, by a practitioner, full of practical learnings that I was able to start using in my work immediately.

I especially appreciated the parts teaching the core principles of infrastructure as code (e.g., systems are disposable, consistent, can easily be reproduced, etc.), core practices of infrastructure as code (e.g., use definition files, self-documented systems and processes, version all the things, etc.), and the idea of antifragile systems (rather than just systems that you prevent from breaking) and autonomic systems (rather than just automated systems).

By Kief Morris,

Why should I read it?

1 author picked Infrastructure as Code as one of their favorite books, and they share why you should read it.

What is this book about?

Six years ago, Infrastructure as Code was a new concept. Today, as even banks and other conservative organizations plan moves to the cloud, development teams for companies worldwide are attempting to build large infrastructure codebases. With this practical book, Kief Morris of ThoughtWorks shows you how to effectively use principles, practices, and patterns pioneered by DevOps teams to manage cloud-age infrastructure.

Ideal for system administrators, infrastructure engineers, software developers, team leads, and architects, this updated edition demonstrates how you can exploit cloud and automation technology to make changes easily, safely, quickly, and responsibly. You'll learn how to define everything as…


Book cover of Continuous Delivery: Reliable Software Releases through Build, Test, and Deployment Automation

Yevgeniy Brikman Author Of Fundamentals of DevOps and Software Delivery: A Hands-On Guide to Deploying and Managing Software in Production

From my list on practical, hands-on books on DevOps and software delivery.

Why am I passionate about this?

I’ve spent more than a decade working on infrastructure, from my early days at LinkedIn, where we had to do a massive DevOps transformation to save the company, to co-founding Gruntwork, where I had the opportunity to work with hundreds of companies on their software delivery practices. From all of this, I can say the following with certainty: the DevOps best practices that a handful of the top tech companies have figured out are not filtering down to the rest of the industry. This is making the entire software industry slower, less effective, and less secure—and I see it as my mission to fix that.

Yevgeniy's book list on practical, hands-on books on DevOps and software delivery

Yevgeniy Brikman Why did Yevgeniy love this book?

This is one of those books that changed how I thought about and approached software development. First, the book addressed the pain points that I had run into so often: the problems with infrequent, manual deployments, the outages caused by changing configuration rather than source code, the nightmare of merge conflicts that you get from long-lived feature branches, and so on.

Then, it showed how to flip the typical software development process on its head through CI / CD, changing the default from “our software is broken, and we need an integration and release process to get it working” to “our software is always working, and we can release it at any time.” Once I read it, I could never go back to the old way.

By Jez Humble, David Farley,

Why should I read it?

1 author picked Continuous Delivery as one of their favorite books, and they share why you should read it.

What is this book about?

Winner of the 2011 Jolt Excellence Award!

Getting software released to users is often a painful, risky, and time-consuming process.This groundbreaking new book sets out the principles and technical practices that enable rapid, incremental delivery of high quality, valuable new functionality to users. Through automation of the build, deployment, and testing process, and improved collaboration between developers, testers, and operations, delivery teams can get changes released in a matter of hours-sometimes even minutes-no matter what the size of a project or the complexity of its code base.

Jez Humble and David Farley begin by presenting the foundations of a rapid,…


Book cover of Practice of Cloud System Administration, The: DevOps and SRE Practices for Web Services, Volume 2

Yevgeniy Brikman Author Of Fundamentals of DevOps and Software Delivery: A Hands-On Guide to Deploying and Managing Software in Production

From my list on practical, hands-on books on DevOps and software delivery.

Why am I passionate about this?

I’ve spent more than a decade working on infrastructure, from my early days at LinkedIn, where we had to do a massive DevOps transformation to save the company, to co-founding Gruntwork, where I had the opportunity to work with hundreds of companies on their software delivery practices. From all of this, I can say the following with certainty: the DevOps best practices that a handful of the top tech companies have figured out are not filtering down to the rest of the industry. This is making the entire software industry slower, less effective, and less secure—and I see it as my mission to fix that.

Yevgeniy's book list on practical, hands-on books on DevOps and software delivery

Yevgeniy Brikman Why did Yevgeniy love this book?

This book felt like a chance to sit with a few experienced Ops people and hear their war stories.

The book is full of concrete, actionable learnings that are essential for running software, including operational requirements (e.g., configuration, draining, hot swaps, feature toggles, graceful degradation, etc.), software architecture (e.g., three-tier web service, four-tier web service, load balancing models etc.), scaling patterns (e.g., horizontal duplication, service splits, caching, etc.), resiliency patterns (software vs hardware resiliency, spare capacity, failure domains, etc.), and much more.

I loved being able to pick up decades of experience and hard-won knowledge by just flipping through a few pages of a book! 

By Thomas Limoncelli, Strata Chalup, Christina Hogan

Why should I read it?

1 author picked Practice of Cloud System Administration, The as one of their favorite books, and they share why you should read it.

What is this book about?

"There's an incredible amount of depth and thinking in the practices described here, and it's impressive to see it all in one place."

-Win Treese, coauthor of Designing Systems for Internet Commerce

The Practice of Cloud System Administration, Volume 2, focuses on "distributed" or "cloud" computing and brings a DevOps/SRE sensibility to the practice of system administration. Unsatisfied with books that cover either design or operations in isolation, the authors created this authoritative reference centered on a comprehensive approach.

Case studies and examples from Google, Etsy, Twitter, Facebook, Netflix, Amazon, and other industry giants are explained in practical ways that…


Book cover of Master Your Data with Power Query in Excel and Power BI: Leveraging Power Query to Get & Transform Your Task Flow

Bill Jelen Author Of Power Excel 2019 with MrExcel: Master Pivot Tables, Subtotals, VLOOKUP, Power Query, Dynamic Arrays & Data Analysis

From my list on to go from Excel to Power Query and Power BI.

Why am I passionate about this?

I’ve been running the MrExcel website since 1998 and have written 66 books about Excel. I am an Excel generalist – I know a fair amount about almost every aspect of Excel. But I respect the specialists who become experts on one part of Excel and offer deep knowledge dives into those portions of Excel. Cleaning data with Power Query, calculating “impossible” calculations with DAX, and then presenting them on interactive dashboards are some of the deep dives that you will learn on this list.

Bill's book list on to go from Excel to Power Query and Power BI

Bill Jelen Why did Bill love this book?

Microsoft quietly slipped the Get & Transform tools onto the Data tab in Excel in 2016. These tools are incredibly powerful – you clean your data once and Excel will remember how to clean your data every month, every week, every day, every hour. Ken Puls and Miguel Escobar will show you all of the best tricks for using these tools.

By Ken Puls, Miguel Escobar,

Why should I read it?

1 author picked Master Your Data with Power Query in Excel and Power BI as one of their favorite books, and they share why you should read it.

What is this book about?

Power Query is the amazing new data cleansing tool in both Excel and Power BI Desktop. Do you find yourself performing the same data cleansing steps day after day? Power Query will make it faster to clean your data the first time. While Power Query is powerful, the interface is subtle—there are tools hiding in plain sight that are easy to miss. Go beyond the obvious and take Power Query to new levels with this book.


Book cover of Kafka in Action

Magnus Larsson Author Of Microservices with Spring Boot 3 and Spring Cloud: Build resilient and scalable microservices using Spring Cloud, Istio, and Kubernetes

From my list on mastering Java and Spring-based microservices.

Why am I passionate about this?

My passion for developing production-ready, cooperating microservices began in 2008 when I first started assisting customers in creating distributed systems—long before the term “microservices” was coined. During that time, I faced significant challenges, including grappling with the “Eight Fallacies of Distributed Computing”. Since then, I’ve dedicated most of my career to deepening my understanding of these complexities and finding ways to address them through robust architecture, design patterns, and the right tools.

Magnus' book list on mastering Java and Spring-based microservices

Magnus Larsson Why did Magnus love this book?

Apache Kafka is the industry standard for real-time event streaming, an essential component for large-scale, high-performance microservice ecosystems.

Despite being new to Kafka when I read this book, it quickly brought me up to speed on key concepts that underpin its scalability and real-time capabilities, such as the commit log, topic partitions, and consumer groups. The book also introduces other critical Kafka features like the schema registry, Kafka Connect, and stream processing with Kafka Streams and ksqlDB. The practical examples provided were straightforward to apply and adapt to my own use cases.

By Dylan Scott, Viktor Gamov, Dave Klein

Why should I read it?

1 author picked Kafka in Action as one of their favorite books, and they share why you should read it.

What is this book about?

Kafka in Action is a practical, hands-on guide to building Kafka-based data pipelines. Filled with real-world use cases and scenarios, this book probes Kafka's most common use cases, ranging from simple logging through managing streaming data systems for message routing, analytics, and more.

In systems that handle big data, streaming data, or fast data, it's important to get your data pipelines right. Apache Kafka is a wicked-fast distributed streaming platform that operates as more than just a persistent log or a flexible message queue.

Key Features

* Understanding Kafka's concepts

* Implementing Kafka as a message queue

* Setting up…


Book cover of Hands-On Machine Learning with Scikit-Learn, Keras, and TensorFlow 3e: Concepts, Tools, and Techniques to Build Intelligent Systems
Book cover of Kafka: The Definitive Guide: Real-Time Data and Stream Processing at Scale
Book cover of Advanced Analytics with Spark: Patterns for Learning from Data at Scale

Share your top 3 reads of 2024!

And get a beautiful page showing off your 3 favorite reads.

1,214

readers submitted
so far, will you?

5 book lists we think you will like!

Interested in big data, python, and data mining?

Big Data 29 books
Python 30 books
Data Mining 13 books