Free Ebook Review: Cloud Data Lakes for Dummies, Snowflake Special Edition
Lately I’ve been getting targeted advertisements in Instagram (of all places) that promote free ebooks from various organizations. With all the time on my hands I thought it might be fun to review them. Here’s the first entry on Cloud Data Lakes provided by Snowflake.
The Book:
Title: Cloud Data Lakes for Dummies, Snowflake Special Edition
Author: David Baum
Length: 52 Pages
Chapters:
- Diving into Cloud Data Lakes
- Explaining Why the Modern Data Lake Emerged
- Reducing Risk, Protecting Data
- Strategies for Modernizing a Data Lake
- Assessing the Benefits of a Modern Cloud Data Lake
- Six Steps for Planning Your Cloud Data Lake
Review
This book is a decent high-level overview aimed at a non-technical crowd of what modern Data Lakes are. It doesn’t offer many tangible insights into what an implementation of a modern cloud Data Lake might entail which is unfortunate but understandable. As with many technical implementations, the devil is in the details - in the technologies chosen and the intricacies that each platform has (e.g. AWS S3 vs GCP GCS). I also found it slightly redundant between chapters but there are some good takeaways.
In the author’s words modern data lakes are:
“a place where structured and semi-structured data can be staged in its raw form - either in the data warehouse itself or in an associated object storage service. Modern data lakes provide a harmonious environment that blends these object storage options to easily store, load, integrate, and analyze data in order to derive the deepest insights to inform data-driven decision-making.”
Here’s an example of how vague the book can be in Chapter 2: Explaining Why the Modern Data Lake Emerged when describing the difference between traditional data warehouses, data lakes, and modern data lakes:
“Traditional data lakes […] are capable of storing these mixed data types. That’s just the start, though. In order to analyze that data, you need deeply technical data analytics and data science professionals, who are in short supply. If you can hire these experts, they may end up spending an inordinate amount of time deriving usable insights from the data. If you’re relying on either a traditional data warehouse or a traditional data lake, you’ll rarely gain all insights possible.
With a modern, cloud-built data lake, you get the power of a data warehouse and the flexibility of the data lake, and you leave the limitations of both systems behind. You also get the unlimited resources of the cloud automatically”
One of the best takeaways in this book comes in Chapter 4: Strategies for Modernizing a Data Lake:
Five Characteristics of a Data Lake Built for the Cloud:
A cloud-optimized architecture will simplify your data lake. For maximum flexibility and control, make sure that your cloud data lake service has the following characteristics:
- A multi-cluster, shared-data architecture
- Independent scaling of compute and storage resources
- The ability to add users without affecting performance
- Tools to load and query data simultaneously, without degrading performance
- A robust metadata service that is fundamental to the object storage environment
Lastly, in Chapter 6: Six Steps for Planning Your Cloud Data Lake the author provides the following steps (excerpt below only shows list headers):
- Identify the data mix
- Consider the repository
- Define the data pipeline
- Check pertinent use cases
- Apply governance and security
- Keep it simple
Score: 3/5*
Cloud Data Lakes for Dummies, Snowflake Special Edition is a decent read for a free ebook and I liked that it wasn’t as overt an advertisement for its provider, Snowflake, as other whitepapers I’ve read. I’d score it a solid 3/5* for non-technical audiences looking to learn about Data Lakes in general.
*As the first review in the free ebook series that score may be prone to change.