Robert Lalonde
Robert LalondeAug 02, 2021

Personalized Immunotherapy – Machine Learning meets Next Generation Sequencing

Gritstone bio, Inc., a clinical-stage biotechnology company developing next generation cancer and infectious disease immunotherapies, leverages both Nextflow and Nextflow Tower as key components of its genomics and machine learning efforts. On the machine learning front, Nextflow manages the data pipelines and workflows for model training and selection and then produces a benchmark report on the effectiveness of the model. Nextflow is paired with AWS Batch to create a seamless workflow that improves model development speed and virtually eliminates the need to manage the underlying infrastructure.

Nextflow performs the workflow parallelization steps and spreads work across instance types to maximize throughput and leverage the most cost-effective instances at the right stage of the process. Spot instances and related restarts due to spot reclamation can be handled in an automated fashion, and GPU-based instances are leveraged for specific compute stages where necessary, with lower-cost instances being utilized at other stages.

The effort leverages extremely large amounts of training data, sharded across several instances to train the model, ultimately producing an ensemble model greater than the sum of its parts. This new model is then compared to other designs with the results graphed against prior models to gauge performance.

“Previously, our model selection process relied on sequential hypothesis testing with days between idea and outcome. Now, we can test several alternative designs within a single day to select the right model design given the data at hand.”

Joshua Klein, Proteomics Bioinformatics Scientist at Gritstone bio, Inc.

As well as Nextflow, AWS Batch, and S3, the effort makes use of other technologies including Docker, Python, TensorFlow, TensorBoard, and various in-house technologies and pre-processing tools.

“Prior to Nextflow-based pipelines, workflows were assembled using shell scripts and infrastructure was managed manually. Nextflow, AWS Batch and S3 create a vastly more seamless, repeatable, and manageable workflow solution that allows us to optimize the cost of cloud infrastructure utilization.”

Michael Kroell, Director of Cloud Engineering at Gritstone bio, Inc.

About Gritstone

Gritstone bio, Inc. (Nasdaq: GRTS), a clinical-stage biotechnology company, is developing the next generation of immunotherapies against multiple cancer types and infectious diseases. Gritstone develops its products by leveraging two key pillars—first, a proprietary machine learning-based platform, Gritstone EDGETM, which is designed to predict antigens that are presented on the surface of cells, such as tumor or virally-infected cells, that can be seen by the immune system; and, second, the ability to develop and manufacture potent immunotherapies utilizing these antigens to potentially drive the patient’s immune system to specifically attack and destroy disease-causing cells. The company’s lead oncology programs include an individualized neoantigen-based immunotherapy, GRANITE, and an “off-the-shelf” shared neoantigen-based immunotherapy, SLATE, which are being evaluated in clinical studies. Within its infectious disease pipeline, Gritstone is advancing CORAL, a COVID-19 program to develop a second-generation vaccine, with support from departments within the National Institutes of Health (NIH), the Bill & Melinda Gates Foundation, as well as a license agreement with La Jolla Institute for Immunology. Additionally, the company has a global collaboration for the development of a therapeutic HIV vaccine with Gilead Sciences.

About Seqera

Seqera is the leading provider of open source and commercial workflow orchestration software required for data pipeline processing, cloud infrastructure, and secure collaboration. The core open source technology Nextflow transforms the building of massively scalable and distributed computing solutions. The software enables developers and data scientists to create and securely deploy data applications in the cloud and/or on traditional on-premises infrastructure. The company’s products are widely used by leaders in the life sciences segment but is also utilised by enterprises across all computational-intense, data pipeline applications, including for machine learning and AI, manufacturing, and financial services.