Programming MapReduce with Scalding - Softcover

Antonios Chalkiopoulos

 
9781783287017: Programming MapReduce with Scalding

Synopsis

Scalding is a Scala library that makes it easy to write MapReduce jobs in Hadoop. It's similar to other MapReduce platforms like Pig and Hive, but offers a higher level of abstraction by leveraging the full power of Scala and the JVM. Scalding is built on top of Cascading, which is an application framework for Java developers designed to create data analytics and management systems.

This book is a practical guide to setting up a development environment and implementing simple and complex MapReduce transformations in Scala using a Test-driven Methodology and best practices. The book also presents how to integrate your projects with external data stores and execute machine learning algorithms.

This book will first introduce you to how the Cascading framework allows for higher abstraction reasoning over MapReduce applications and then dive into how Scala, through Scalding, provides a DSL that allows elegant development of testable applications. It will then teach you how to test Scalding jobs and how to define specifications and Behavior-driven Development (BDD) with Scalding. This book will then demonstrate how to monitor and maintain cluster stability and efficiently access MySQL, HBase, and HIVE tables.

This book provides hands-on information starting from proof of concept applications and progressing to production-ready implementations. If you have some development experience and are willing to be introduced to functional programming for the purpose of implementing scalable MapReduce applications, then this book is for you.

"synopsis" may belong to another edition of this title.

About the Author

Antonios Chalkiopoulos is a developer living in London and a professional working with Hadoop and Big Data technologies. He completed a number of complex MapReduce applications in Scalding into 40-plus production nodes HDFS Cluster. He is a contributor to Scalding and other open source projects, and he is interested in cloud technologies, NoSQL databases, distributed real-time computation systems, and machine learning. He was involved in a number of Big Data projects before discovering Scala and Scalding. Most of the content of this book comes from his experience and knowledge accumulated while working with a great team of engineers.

"About this title" may belong to another edition of this title.