Sameer Wadkar

I have been working on extremely high volume Hadoop/HBase Big Data implementations since 2011. Prior to that I implemented distributed systems and high-traffic web sites for a wide variety of clients ranging from federal agencies to investment banking companies

My book, "Pro Apache Hadoop" is one of the first books covering Hadoop 2.0 and YARN. My goal was to write a book which not only explains what Hadoop does but also why it does it. In practice, effective Hadoop development requires a reasonable understanding of how Hadoop works under the hood. Therefore in the book I have deleved into the underlying Hadoop design. Also most discussions on Hadoop describe esoteric use-cases involving Data Science or Machine Learning or Unstructured Data. However most applications of Hadoop are in the ETL domain. Large vendors even have a term for it- ETL Offloading. My book focuses on this use-case. It uses the ubiquitious SQL language and explain Hadoop in terms of the SQL language concepts. I did this on purpose. Firstly, you will learn Hadoop in the context of something you are already very familiar with. Secondly, you will be able to apply the lessons learned to solve commonly encountered real world problems.

I am also passionate about open-source. My GitHub page is https://github.com/sameerwadkar.

My open-source contributions include modifying the implementation of Latent Dirichlett Allocation algorithm in the Mallet API so that it can scale to millions of documents on a single machine.

I am also an avid chess player and play actively on www.chessclub.com.

Sameer Wadkar

Popular items by Sameer Wadkar