Modern Data Architectures with Python: A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python - Softcover

Brian Lipp, Brian

 
9781801070492: Modern Data Architectures with Python: A practical guide to building and deploying data pipelines, data warehouses, and data lakes with Python

Synopsis

Learn to build scalable and reliable data ecosystems using Data Mesh, Databricks Spark, and Kafka.

Key Features

  • Develop modern data skills in emerging technologies
  • Learn pragmatic design methodologies like Data Mesh and Lake House
  • Grow a deeper understanding of data governance

Book Description

Data Architecture with Python will teach you how to integrate your machine learning and data science work streams into your data platform. You will also learn how to take your data and build open lakehouses that can combine with any technology. This book will give you deep hands-on experience with tools like Kafka, Apache Spark, MongoDB, Neo4J, Delta Lake MLFlow, and SQL Dashboards.

By the end of this journey, you would have amassed a wealth of hands-on and theoretical knowledge to architect your own data ecosystems.

What you will learn

  • Understand data pattern patterns such as Delta Architecture
  • Learn key details in Spark Internals and how to increase performance
  • Discover how to design critical Data diagrams
  • Explore MLOps with tools like AutoML and MLflow
  • Learn to build data products in a data mesh
  • Discover data governance and how to build confidence in your data
  • Learn how to introduce Data Visualizations and Dashboards into your data practice

Who This Book Is For

This book is great for developers, analytics engineers, and managers looking to further develop a data ecosystem within their organization. Basic Python will be useful but not required, Also, experience with data is useful but not necessary to read and do the labs.

Table of Contents

  1. Modern Data Processing Architectures
  2. Basics of Data Analytics Engineering
  3. Cloud Storage and Processing Concepts
  4. Python Batch and Stream Processing with Spark
  5. Streaming Data with Kafka
  6. Python MLOps
  7. Python and SQL based Visualizations
  8. Integrating CI into your workflow
  9. Data Orchestration
  10. Data Governance
  11. Introduction to Saturn Insurance, Deploying CI and ELT
  12. Data Governance and Dashboards

"synopsis" may belong to another edition of this title.

About the Author

Brian Lipp is a Technology Polyglot, Engineer, and Solution Architect with a wide skillset in many technology domains. His programming background has ranged from R, Python, and Scala, to Go and Rust development. He has worked on Big Data systems, Data Lakes, data warehouses, and backend software engineering. Brian earned a Master of Science, CSIS from Pace University in 2009. He is currently a Sr. Data Engineer working with large Tech firms to build Data Ecosystems.

"About this title" may belong to another edition of this title.