Why am I passionate about this?

I started my career as a research scientist building machine learning algorithms for weather forecasting. Twenty years later, I found myself at a precision agriculture startup creating models that provided guidance to farmers on when to plant, what to plant, etc. So, I am part of the movement from academia to industry. Now, at Google Cloud, my team builds cross-industry solutions and I see firsthand what our customers need in their data science teams. This set of books is what I suggest when a CTO asks how to upskill their workforce, or when a graduate student asks me how to break into the industry.


I wrote

Data Science on the Google Cloud Platform: Implementing End-To-End Real-Time Data Pipelines: From Ingest to Machine Learning

By Valliappa Lakshmanan,

Book cover of Data Science on the Google Cloud Platform: Implementing End-To-End Real-Time Data Pipelines: From Ingest to Machine Learning

What is my book about?

This hands-on guide shows data engineers and data scientists how to implement an end-to-end data pipeline, using statistical and machine…

Shepherd is reader supported. When you buy books, we may earn an affiliate commission.

The books I picked & why

Book cover of Effective Pandas

Valliappa Lakshmanan Why did I love this book?

Even if you are ultimately going to be working with terabytes of data, you’ll start out doing exploratory data analysis. The tool that you’ll use for that is most likely going to be Pandas. One of the best investments that you can make when becoming a data scientist is to become a Pandas expert, and there is no better book than Harrison’s to help you get there. Plus, many of the interview questions you will face during the hiring process will probably involve Pandas. Blow your interviewers out of the water by showing them corners of the Pandas library they didn’t even know!

By Matt Harrison,

Why should I read it?

1 author picked Effective Pandas as one of their favorite books, and they share why you should read it.

What is this book about?

Best practices for manipulating data with Pandas. This book will arm you with years of knowledge and experience that are condensed into an easy to follow format. Rather than taking months reading blogs and websites and searching mailing lists and groups, this book will teach you how to write good Pandas code.

It covers: Series manipulation Creating columns Summary statistics Grouping, pivoting, and cross-tabulation Time series data Visualization Chaining Debugging code and more...


Book cover of Jumpstart Snowflake: A Step-by-Step Guide to Modern Cloud Analytics

Valliappa Lakshmanan Why did I love this book?

In industry, your data is very likely to live within a data warehouse such as BigQuery, Redshift, or Snowflake. Therefore, to be an effective data scientist in the industry, you should learn how to use data warehouses effectively. 

Once you learn data warehousing and SQL with any one of these products, it is quite easy to pick up another. So which one do you start with?

You can use Snowflake on all three of the major public clouds. Because it’s a standalone product, it is the most similar to a “traditional” data warehouse and can be picked up easily even if you are not familiar with cloud computing. That makes it a good data warehouse to start with, and is the reason my second book pick is this book on Snowflake.

BigQuery is also available on all three major public clouds, but it works best (and is used most commonly) on Google Cloud. Because BigQuery is truly serverless (you pay by the query and never deal with clusters or virtual data warehouses), it is quite unlike traditional data warehouses and you will have to learn some public cloud concepts in order to use BigQuery. On the other hand, starting with BigQuery has several advantages — first, it offers 1 TB of querying per month for free; second, it has machine learning built-in — Google Colab even offers a free Jupyter notebook from which to access BigQuery; and third, it’s the best choice for production uses cases as BigQuery is typically more scalable and less expensive than the alternatives. If you are willing to learn public cloud, start with the Definitive Guide to BigQuery.

AWS is the most widely used cloud, and Redshift is the most widely used data warehouse on AWS. Your organization probably already has a Redshift cluster set up and ready to go. The path of least resistance might be to learn data warehousing using the AWS book on Redshift.

By Dmitry Anoshin, Dmitry Shirokov, Donna Strok

Why should I read it?

1 author picked Jumpstart Snowflake as one of their favorite books, and they share why you should read it.

What is this book about?

Explore the modern market of data analytics platforms and the benefits of using Snowflake computing, the data warehouse built for the cloud.

With the rise of cloud technologies, organizations prefer to deploy their analytics using cloud providers such as Amazon Web Services (AWS), Microsoft Azure, or Google Cloud Platform. Cloud vendors are offering modern data platforms for building cloud analytics solutions to collect data and consolidate into single storage solutions that provide insights for business users. The core of any analytics framework is the data warehouse, and previously customers did not have many choices of platform to use.

Snowflake was…


Book cover of Predictive Analytics: The Power to Predict Who Will Click, Buy, Lie, or Die

Valliappa Lakshmanan Why did I love this book?

As a data scientist in the industry, it is very helpful to understand the business context behind the problems that you are solving. In many cases, you are trying to predict behavior—who is likely to buy an item, who is likely to click on a link, who is likely to repay a loan, etc.

This book by Eric Siegel is a great introduction to predictive analytics as used in real-life. It will help you frame data science problems in standard ways. For example, suppose you are asked to score sales leads so that salespeople can prioritize their efforts. How would you do it? The common way to frame this problem is to predict the customer lifetime value (LTV) of every sales lead. Before you can do prediction, you have to be able to do analysis though.

The way you estimate the LTV is to break the problem into three sub-problems: finding the average order value, the average number of transactions per year, and of how long an average customer sticks with your product. Once you know how to estimate the LTV of existing customers, you will be able to create a system to predict LTV by comparing the attributes of the sales lead to your existing customer base. This is by no means obvious, and reading a book like this will help you learn the typical approach.

By Eric Siegel,

Why should I read it?

1 author picked Predictive Analytics as one of their favorite books, and they share why you should read it.

What is this book about?

"Mesmerizing & fascinating..." -The Seattle Post-Intelligencer

"The Freakonomics of big data." -Stein Kretsinger, founding executive of Advertising.com

Award-winning | Used by over 30 universities | Translated into 9 languages

An introduction for everyone. In this rich, fascinating - surprisingly accessible - introduction, leading expert Eric Siegel reveals how predictive analytics (aka machine learning) works, and how it affects everyone every day. Rather than a "how to" for hands-on techies, the book serves lay readers and experts alike by covering new case studies and the latest state-of-the-art techniques.

Prediction is booming. It reinvents industries and runs the world. Companies, governments, law…


Book cover of The Art of Statistics: How to Learn from Data

Valliappa Lakshmanan Why did I love this book?

What if you are faced with a problem for which a standard approach doesn’t yet exist? In such a case, you will need to be able to figure out the approach from the first principles. This book will help you learn how to derive insights starting from raw data.

By David Spiegelhalter,

Why should I read it?

2 authors picked The Art of Statistics as one of their favorite books, and they share why you should read it.

What is this book about?

'A statistical national treasure' Jeremy Vine, BBC Radio 2

'Required reading for all politicians, journalists, medics and anyone who tries to influence people (or is influenced) by statistics. A tour de force' Popular Science

Do busier hospitals have higher survival rates? How many trees are there on the planet? Why do old men have big ears? David Spiegelhalter reveals the answers to these and many other questions - questions that can only be addressed using statistical science.

Statistics has played a leading role in our scientific understanding of the world for centuries, yet we are all familiar with the way…


Book cover of Fundamentals of Data Visualization: A Primer on Making Informative and Compelling Figures

Valliappa Lakshmanan Why did I love this book?

It is not enough for a data scientist to be able to analyze data and build ML models. You have to be able to communicate the insights to decision-makers concisely and accurately. This book shows you bad and good visualizations — you’ll be surprised by how often you would have defaulted to the bad way without the guidance provided by this book!

By Claus O. Wilke,

Why should I read it?

1 author picked Fundamentals of Data Visualization as one of their favorite books, and they share why you should read it.

What is this book about?

Effective visualization is the best way to communicate information from the increasingly large and complex datasets in the natural and social sciences. But with the increasing power of visualization software today, scientists, engineers, and business analysts often have to navigate a bewildering array of visualization choices and options.

This practical book takes you through many commonly encountered visualization problems, and it provides guidelines on how to turn large datasets into clear and compelling figures. What visualization type is best for the story you want to tell? How do you make informative figures that are visually pleasing? Author Claus O. Wilke…


Explore my book 😀

Data Science on the Google Cloud Platform: Implementing End-To-End Real-Time Data Pipelines: From Ingest to Machine Learning

By Valliappa Lakshmanan,

Book cover of Data Science on the Google Cloud Platform: Implementing End-To-End Real-Time Data Pipelines: From Ingest to Machine Learning

What is my book about?

This hands-on guide shows data engineers and data scientists how to implement an end-to-end data pipeline, using statistical and machine learning methods and tools on Google Cloud Platform (GCP).

Through the course of this updated second edition, you'll work through a sample business decision by employing a variety of data science approaches. Follow along by implementing these statistical and machine learning solutions in your own project on GCP, and discover how this platform provides a transformative and more collaborative way of doing data science.

You might also like...

Uniting the States of America: A Self-Care Plan for a Wounded Nation

By Lyle Greenfield,

Book cover of Uniting the States of America: A Self-Care Plan for a Wounded Nation

Lyle Greenfield Author Of Uniting the States of America: A Self-Care Plan for a Wounded Nation

New book alert!

Why am I passionate about this?

I’ve always been fascinated by group dynamics, large and small. Why things functioned well, why they didn’t. It’s possible my ability to empathize and use humor as a consensus-builder is the reason I was elected president of a homeowners association, a music production association, and even an agricultural group. Books were not particularly involved in this fascination! But in recent years, experiencing the breakdown of civility and trust in our political and cultural discourse, I’ve taken a more analytical view of the dynamics. These books, in their very different ways, have taught me lessons about life, understanding those with different beliefs, and finding ways to connect and move forward. 

Lyle's book list on restoring your belief in human possibility

What is my book about?

We’ve all experienced the overwhelming level of political and social divisiveness in our country. This invisible “virus” of negativity is, in part, the result of the name-calling and heated rhetoric that has become commonplace among commentators and elected leaders alike. 

My book provides a clear perspective on the historical and modern-day causes of our nation's divisive state. It then proposes easy-to-understand solutions—an action plan for our elected leaders and citizens as well. Rather than a scholarly treatment of a complex topic, the book challenges us to take the obvious steps required of those living in a free democracy. And it…

Uniting the States of America: A Self-Care Plan for a Wounded Nation

By Lyle Greenfield,

What is this book about?

Lyle Greenfield's "Uniting the States of America―A Self-Care Plan for a Wounded Nation" is a work of nonfiction and opinion. Incorporating the lessons of history and the ideas and wisdom of many, it is intended as both an educational resource and a call-to-action for citizens concerned about the politically and culturally divided state of our Union. A situation that has raised alarm for the very future of our democracy.

First, the book clearly identifies the causes of what has become a national crisis of belief in and love for our country. How the divisiveness and hostility rampant in our political…


5 book lists we think you will like!

Interested in data science, statistics, and data processing?

Data Science 24 books
Statistics 30 books
Data Processing 27 books