Seqera Studios: 4 real-world use cases for interactive analysis
Bioinformatics extends beyond just pipeline runs. Workflows often require additional analysis and human interpretation to deduce meaningful insights for scientific reporting and publication. Interactive environments facilitate this process.
Yet, interactive analysis presents challenges. Analyzing data-in-place is not always possible, resulting in time-consuming data transfers between storage. Bioinformaticians must also navigate multiple programming languages, libraries, and tools while staying within infrastructure limits.
Studios: Closing the loop between pipelines and interactive analysis
Studios is a powerful feature of the Seqera Platform that facilitates a seamless transition between bioinformatics pipelines and interactive analysis. Through the creation of on-demand containers, Studios enables you to use familiar tools such as Jupyter and RStudio notebooks, Visual Studio Code IDEs, and Xpra remote desktops in a dedicated environment customized to your research needs. Each Studio session provides an interactive environment for real-time data analysis and collaboration. In this blog post, we showcase real-world applications using Studios:
- Jupyter: Visualize protein structure prediction data with Python and py3dmol
- RStudio: Explore RNA-Seq and differential expression data in a custom Shiny (R) app
- Xpra: Visualize genetic variants with Integrative Genomics Viewer (IGV)
- VS Code: Create a portable Nextflow development environment
💡Read the full Studios for Interactive Analysis guide now
1. Python-based interactive visualization of protein structures in a Jupyter notebook
After running bioinformatics pipelines, researchers often utilize Jupyter notebooks to perform further downstream analysis. With Seqera Studios, this can be achieved without moving your pipeline data. We’ll demonstrate how to visualize predicted protein structures (generated by Alphafold2 and ESMFold in nf-core/proteinfold) using Biopython and py3dmol in a Jupyter studio.
💡Hint: Studios provides pre-built, version-controlled container template images (for Jupyter, RStudio, Xpra, and VS Code) which are regularly patched with up-to-date packages.
Create a Jupyter studio
Create or reuse an existing compute environment, mount the pipeline results (H1065 sequence public data from the nf-core/proteinfold test profile, or your own pipeline data), and install any packages and scripts (such as Biopython and py3Dmol) needed for interactive protein visualization. See the Studios guide for detailed instructions and the full Python script.
💡Hint: Studios allows you to dynamically build and customize analysis environments with your choice of Conda packages or container templates.
Visualize protein structures
Use the Jupyter studio to interactively visualize Alphafold2 and ESMFold predicted protein structures of the H1065 sequence in a composite 3D image. The provided Python script uses a very simple implementation of the Kabsch algorithm to compare the Alphafold2 and ESMFold structure predictions.
💡 Hint: See Add data using Data Explorer to add and visualize your own proteinfold data.
Video 1: Python-based interactive visualization of protein structures in a Jupyter notebook
2. Downstream analysis of RNA-Seq data and differential expression statistics with RStudio
Combining Studios and RStudio notebooks facilitates seamless interactive analysis using R libraries and tools. We’ll show you how to create a Shiny application in an RStudio notebook to explore data from the nf-core/rnaseq and nf-core/differentialabundance pipelines.
Create an RStudio notebook studio
After creating or selecting a compute environment and mounting the public nf-core data (or your own pipeline data), launch your interactive studio with the pre-built RStudio container image template. See the full guide for detailed instructions.
💡Hint: See Add a cloud bucket to use your own pipeline data for interactive analysis.
Differential expression analysis with RShiny
The RDS data file used in the guide results from using the nf-core/rnaseq pipeline to quantify gene expression in public RNA sequencing data before using nf-core/differentialabundance to derive differential expression statistics. The resulting data is then explored in a custom user interface (ShinyNGS) via PCA plots, volcano plots, heatmaps, and tables.
Video 2: Downstream analysis of RNA-Seq data and differential expression statistics with RStudio
3. Genetic variant exploration using IGV with Xpra
Spinning up an Xpra remote desktop adjacent to your data can facilitate seamless interactive analysis and pipeline troubleshooting. Here, we use public data from the 1000 Genomes project to perform genetic variant exploration with IGV in an interactive Xpra studio.
Create an Xpra studio
After selecting a compute environment and mounting the data (the 1000 Genomes S3 bucket), the standardized Xpra container image template can be customized with IGV and samtools to suit the needs of this analysis. See the full guide for detailed instructions on creating an Xpra remote desktop studio.
View genetic variants in IGV using Xpra
With IGV pre-installed in the Xpra environment, we can launch it and import the 1000 Genomes project data to search for genes of interest, explore exons, and visualize coverage graphs.
💡 Hint: Studios enables multiple users to connect to the same session, enabling the cross-validation of findings with colleagues in real-time.
Video 3: Genetic variant exploration using IGV with Xpra
4. Creating a custom Nextflow development environment with nf-core tools using VS Code
Use Studios with VS Code to build an interactive development environment with all the tools you need to code and run Nextflow. For example, we created an interactive VS Code studio with Conda and nf-core tools, then executed nf-core/fetchngs before creating a new pipeline with the help of nf-core tools and the Nextflow VS Code plugin.
Create a VS Code studio
To create a custom VS Code development environment, use the provided VS Code container image, an existing compute environment, and a YAML field to specify Conda, nf-core tools, and other packages to pre-install in your Studio. See the full guide for detailed instructions.
Writing Nextflow pipelines in VS Code
Once the VS Code studio has been created, the nf-core/fetchngs pipeline can be run with the Conda profile. You can also create a new Nextflow pipeline with nf-core tools and create a VS Code project for your pipeline in the interactive environment.
Video 4: Create a Python Conda environment with nf-core tools to develop Nextflow pipelines
💡Hint: The Nextflow VS Code extension makes use of the Nextflow language server for syntax highlighting, code navigation, code completion, and diagnostics in Nextflow scripts and configuration files.
Why Studios?
Seqera Studios lets you seamlessly transition from bioinformatics workflows to secure, interactive analysis environments within your own infrastructure, consolidating data and analytics in one unified location. With built-in templates, you can quickly create flexible, customizable environments by adding packages, libraries, and scripts tailored to your research needs. Additionally, multiple users can connect to the same Studio session, facilitating real-time collaboration within and across organizations.
💡Interested in finding out more about interactive analysis with Studios? Get started with the full guide now.