Rob NewmanRob Newman
Oct 29, 2024

Introducing Data Studios - Custom environments

Today we announced the release of Data Studios custom environments – allowing users to dynamically build, or bring their own container templates, to create reproducible custom analysis environments.

Tertiary analysis environments can be inflexible and difficult to standardize across an organization because they ship with default packages that cannot be easily customized in a reproducible way. Over time, packages may release new features, become incompatible with other dependencies/technologies, or identify security vulnerabilities necessitating an upgrade – or, in some cases, even a downgrade. This makes cross-team container reproducibility very challenging, leading to inconsistencies in results.

Additionally, organizations are increasingly prioritizing strict governance of package versions for security and regulatory compliance. Package management is commonly centrally managed by compliance and science teams. In terms of reproducibility, a centrally managed environment configuration guarantees that every environment has exactly the same packages and versions, and analysis results will be consistent and repeatable.

Building reproducible analysis environments

Data Studios custom environments solve these problems by providing:

  1. Flexibility – Users can easily install any Conda package into a Data Studios container template by providing a list of packages (and versions).
  2. Reproducibility – Teams can align and standardize their analysis environments by providing a simple configuration file, or using their own container registry templates, as the source for their analysis environments.
  3. Startup velocity – Building custom analysis containers is now a point-and-click action, and templates are cached, allowing for rapid boot times.

There are two methods to build your own custom analysis environment: container augmentation and BYO container registry.

Method 1: Container augmentation

Seqera already provides four pre-built “vanilla” container template images (VSCode, RStudio, Jupyter, and Xpra) that are version-controlled, regularly patched with an up-to-date set of packages, and are publicly available. To create your custom environment you must provide your own organization-maintained YAML-formatted list of versioned (“pinned”) Conda packages. When adding a new session, simply select your Segera-managed template of choice and attach the YAML-formatted package list either by copy/pasting directly into the form, or uploading the list as a file attachment to the form.

On submission, a new custom container is built by augmenting the Seqera-provided base image with the defined Conda packages, using Seqera’s OSS container augmentation service, Wave. This results in a container stored in the public Wave container registry and attached to the session. You then start the session as normal. The session detail page displays a permanent record of the package list, and you can access the Wave build report at any time by clicking on the “Build report summary” link. This report also includes a comprehensive Wave security scan of the container at build-time.

Method 2: BYO container registry

If your organization operates a container registry, you can alternatively provide a URI directly to your centrally managed Docker or Singularity container template and “bring your own” container. Many public container registries are already supported, as are private registries hosted in Amazon ECR. To successfully use your own container template you first need to update it to include the Seqera-managed connect base container with the specific drivers and libraries to successfully run on the Seqera platform. Now, when adding a new session, instead of selecting a Seqera-provided base image, select the new “Prebuilt container image” option in the Template dropdown field and paste the URI to the container template in the Container identifier form field.

Summary

Flexible, reproducible and portable custom analysis environments are a critical part of data governance and GxP standardization, and bring Data Studios one step closer to a general availability release.

Interested in trying it out? Start a free-trial today or contact your Seqera Account Manager now!