Discover modern, next-generation sequencing libraries from the powerful Python ecosystem to perform cutting-edge research and analyze large amounts of biological data
Key Features
- Perform complex bioinformatics analysis using the most essential Python libraries and applications
- Implement next-generation sequencing, metagenomics, automating analysis, population genetics, and much more
- Explore various statistical and machine learning techniques for bioinformatics data analysis
Book Description
Bioinformatics is an active research field that uses a range of simple-to-advanced computations to extract valuable information from biological data, and this book will show you how to manage these tasks using Python.
This updated edition of the Bioinformatics with Python Cookbook begins with a quick overview of the various tools and libraries in the Python ecosystem that will help you convert, analyze, and visualize biological datasets. As you advance through the chapters, you'll cover key techniques for next-generation sequencing, single-cell analysis, genomics, metagenomics, population genetics, phylogenetics, and proteomics with the help of real-world examples. You'll learn how to work with important pipeline systems, such as Galaxy servers and Snakemake, and understand the various modules in Python for functional and asynchronous programming. This book will also help you explore topics such as SNP discovery using statistical approaches under high-performance computing frameworks, including Dask and Spark, and the application of machine learning algorithms to bioinformatics.
By the end of this bioinformatics Python book, you'll be equipped with the knowledge to implement the latest programming techniques and frameworks, empowering you to deal with bioinformatics data on every kind of scale.
What you will learn
- Become well-versed with data processing libraries such as NumPy, pandas, arrow, and zarr in the context of bioinformatic analysis
- Interact with genomic databases
- Solve real-world problems in the fields of population genetics, phylogenetics, and proteomics
- Connect with the RCSB Protein Data Bank using GraphQL
- Build bioinformatics pipelines using a Galaxy server and Snakemake
- Work with functools and itertools for functional and asynchronous programming
- Perform parallel processing with Dask on biological data
- Explore PCA techniques with scikit-learn
Who This Book Is For
This book is for bioinformatics analysts, data scientists, computational biologists, researchers, and Python developers who want to address intermediate-to-advanced biological and bioinformatics problems. Working knowledge of the Python programming language is expected. Basic knowledge of biology would be helpful.
Table of Contents
- Python and the Surrounding Software Ecology
- Using Data Processing Libraries: numpy, pandas, arrow, and zarr
- Next Generation Sequencing
- Advanced NGS Data Processing
- Working with Genomes
- Population Genetics
- Phylogenetics
- Using the Protein Data Bank
- Bioinformatics Pipelines
- Functional and Asynchronous Programming
- Parallel Processing with Dask
- Machine Learning for Bioinformatics
Tiago Antao is a bioinformatician currently working in the field of genomics. A former computer scientist, Tiago moved into computational biology with an MSc in Bioinformatics from the Faculty of Sciences at the University of Porto (Portugal) and a PhD on the spread of drug-resistant malaria from the Liverpool School of Tropical Medicine (UK). Postdoctoral, Tiago has worked with human datasets at the University of Cambridge (UK) and with mosquito whole genome sequencing data at the University of Oxford (UK), before helping to set up the bioinformatics infrastructure at the University of Montana. He currently works as a data engineer in the biotechnology field in Boston, MA. He is one of the co-authors of Biopython, a major bioinformatics package written in Python.