In this episode, Phil Ewels speaks with Dr. Olga Botvinnik, founder and CEO of Seanome, who shares her journey from computational biology to marine genomics. Olga discusses her career path, including her PhD research on single-cell alternative splicing and her work with non-model organisms. She explains her motivation to create Seanome, a non-profit research institute focused on annotating proteins of unknown function, with a particular focus on marine life.
The conversation covers the importance of marine genomics, the potential applications of studying ocean biodiversity, and the tools Seanome is developing to aid researchers in this field. Additionally, Olga talks about her return to the Nextflow and nf-core community and the plans for her participation in upcoming nf-core events.
Key links
Here are some of the links that were mentioned during the podcast:
- Seanome
- Seanome blog
- Kmerseek
- Kmerseek Nextflow pipeline
- nf-core/proteinannotator
- Botryllus schlosseri (colonial tunicates)
- Arctic Clam blog post
- Documentary about collaborator curing his father’s neuropathy with arctic clams
- Ayelet Voskoboynik
Podcast overview
In this episode of the Nextflow podcast, host Phil Ewels chats with Dr. Olga Botvinnik, who shares insights into her remarkable career and her current work with Seanome, a nonprofit focused on marine genomics. This episode provides a glimpse into Olga’s efforts to advance genomic research.
Meet Olga Botvinnik
Dr. Olga Botvinnik’s career path has taken her from MIT to UC San Diego and various research roles. She is the founder of Seanome, where her work aims to expand our understanding of marine genomics and how it can help protect our planet.
Olga’s Genomic Journey
Olga discusses her early career, including her time at the Chan Zuckerberg Biohub. Her work has often centered on understanding complex genomic patterns and developing valuable tools for annotating genes and proteins. Olga’s research addresses the challenge of studying understudied organisms using innovative bioinformatics tools.
The Foundation of Seanome
Founded in late 2024, Seanome was born out of Olga’s interest in marine life and its genetic mysteries. Seanome’s primary mission is to develop tools to annotate proteins without known functions, starting with Arctic clam research. This research is crucial as it holds potential implications for treating neurological diseases by investigating the genetic basis of compound production in these clams.
Colonial Tunicates and Genomic Advances
One project Olga highlights is the study of the colonial tunicate, Botryllus schlosseri. This ocean organism’s immune system has gene functions surprisingly similar to human organ transplant genetics. Olga emphasizes the potential for discovery within the ocean’s diverse ecosystems, focusing on how similar functions can evolve independently across species.
Future Goals for Seanome
Olga shares her vision for Seanome, aiming to create the world’s best tool for protein annotation. She aspires to collaborate with initiatives like the Earth BioGenome Project, leveraging global genomic data to enhance our understanding of biodiversity. Additionally, she sees potential in working with Indigenous communities to integrate traditional knowledge, reinforcing science’s role in ecological preservation.
Bioinformatics and Nextflow
Olga’s return to the Nextflow community includes participation in the nf-core hackathon, where she will help develop pipelines for protein annotation. She focuses on creating effective bioinformatics pipelines, particularly for protein annotation, leveraging her experience in the field.
Science Communication: Bioinformatics Beyoncé
Beyond her genomic pursuits, Olga has made a name for herself in science communication. Once adopting the moniker “Bioinformatics Beyoncé,” she has a passion for making science accessible and engaging. Olga shares how she plans to continue live streaming her work, offering an authentic glimpse into the daily life of a researcher, from debugging code to sharing scientific insights.
Conclusion
Dr. Olga Botvinnik’s work exemplifies how cross-disciplinary research can drive advancements in genetics. Her dedication to marine genomics through Seanome underscores the potential of genomic research to explore new areas of science. As Olga continues her work, her efforts highlight the possibilities when passion and science come together.
Full transcript
Podcast Introduction
Phil Ewels:
Hi. Welcome to the Nextflow podcast. You’re listening to Episode 49, going out in March, 2025.
Phil Ewels: My name is Phil Ewels. I’m open source product manager at Seqera, working with Nextflow, nf-core, MultiQC, and Wave.
This is the first episode of 2025. I’ve been a bit slow getting started, but I’m really excited about today’s episode. We’ve got Olga Botvinnik, who’s going to join us today and tell us all about her involvement with the community in the past, and also a little bit about what she’s doing now.
Olga, thanks for joining us today.
Olga Botvinnik: Thank you. It’s a pleasure to be here.
Phil Ewels: So, Olga, you and I, we go back quite a few years now . I think we first met in San Francisco in 2018, I want to say. Probably.
Olga Botvinnik: That sounds familiar. Yeah I was using a different workflow manager
Phil Ewels: We don’t talk about those.
Olga Botvinnik: Until I saw the light of Nextflow.
Phil Ewels: That’s great. Yeah, so back then, I basically invited myself along to the Chan Zuckerberg Biohub where you were working in San Francisco and we got chatting and then that was right at the genesis of nf-core.
But before we get to too carried away with reminiscing, tell us a little bit about who you are , and your career so far.
Olga’s background
Olga Botvinnik: Sure. So. My name is Dr. Olga Botvinnik. I am the founder and CEO of Seanome, which is a non profit research institute for marine genomics.
And like many of you, we write code to find patterns in DNA, and specifically we are looking for patterns in genes or proteins that are previously unknown, and figuring out how we could actually make them known and understood so that we can better serve our planet and serve human health.
And I guess brief overview is I was lucky to go to MIT for undergrad where I studied math and biological engineering. I was one of like two people to double major in that because back then there was no computational biology grouping.
I stayed in Boston for a year, to work at the Broad Institute where I was lucky to work with Jill Mesirov, and then I went and did a master’s degree at UC Santa Cruz and that wasn’t enough school for me so I decided to do a PhD at UC San Diego where I finished in 2017.
And there I studied computational analysis of single cell alternative splicing. So I’ve been thinking about single cell and how to annotate genes for a while, because with alternative splicing, you’re really thinking about what is the difference when you include this exon or not? And that turns to be pretty tricky.
Phil Ewels: Must have been really early days for single cell techniques back then.
Olga Botvinnik: Yeah, it was. The data set I was using was a hundred and seventy cell dataset, all generated with a Fluidigm C1 system.
So back then we didn’t really know how much depth you needed per cell. So we just went overboard, and got a few million reads per cell with the SmartSeq 2 technology.
And unfortunately, with SmartSeq 2 and with I think any 3’ end , or , PolyA enrichment, you’re gonna get mostly the PolyA tails, so we just didn’t get the full transcript coverage that we were hoping. But, yeah, it was early days, like I basically built an internal version of ScanPy for myself, and many other tools.
Yeah, so I came to CZ Biohub to work on more single cell projects, so I worked on the flagship Tabula Muris mouse cell atlas, Tabula Muricenis mouse aging cell atlas, Tabula Sapiens helped out a bit, so that’s the human cell atlas.
I was really excited about the Tabula Microcebus, so that’s on the mouse lemur, which is not a mouse, it is a primate. It’s like the smallest primate and the fastest reproducing primate that we know of.
And, I think working on that organism is really what got me interested in: How do you annotate organisms? How do you think about working with organisms that are so understudied compared to human? And how do you work with them well?
Because coming from the human world, we’re very, very spoiled with nearly every gene annotated. Though still 2, 000 of them don’t have a known function, for those of you out there who think Human, is that everything is solved.
And so I came up with a way to annotate cell types across species without the need for a reference genome. And what that means is that when you align to reference genome, you could be throwing out maybe 50 or more percent of your data because the DNA sequence reads just don’t align.
So if you convert the reads to a protein alphabet and then take another abstraction away of thinking about just the biochemical properties of the protein, then you’re able to annotate across species in a fairly coarse way, not quite as granularly as this is a helper T cell or whatnot, it wasn’t as refined, but still at the course level. I thought it was very promising.
Phil Ewels: Out of curiosity, this is really interesting technique. It reminds me a little bit of bisulfite making a three letter library, simplifying with genomic code. Once you have those orthologs, can you go back to the underlying DNA sequence and see what percentage identity you have?
Olga Botvinnik: Yeah, that’s a great question. So we do keep the underlying protein sequence now. So we’re working with the protein words now, and then we abstract away into a reduced protein alphabet. Right now we’re using a two letter alphabet. When we create that from the protein sequences, we do keep, here was the original protein sequence, and this is the encoded version.
So yeah, in the end we do want to see what was the original versus the end. I love the question of what is the sequence identity because I think it’s going to be quite low.
Phil Ewels: Which is kind of fascinating. in itself.
Olga Botvinnik: So then I went to Arcadia Science for a little bit, working on non-model organisms, and I think that wasn’t quite the right fit for me. So then I decided to learn more about drug discovery by joining BridgeBio, where I was there for about two and a half years.
I had a really wonderful time at BridgeBio, fantastic colleagues, really smart people working on such meaningful work, making therapies for people with no other option, for rare Mendelian diseases.
We had quarterly patient days, and those were all like, tears, and then really productive work, like, I need to get drugs to these patients.
But I really missed basic science, and I missed thinking about how to make science better for everyone and contributing to the more global scientific community, and how could we make tools that enabled the researchers studying these understudied organisms.
I think really fascinating to me is that even though these organisms are understudied, it’s the majority of organisms on the planet, right? By some measures we have around 10 million animal species on the planet, and less than 1 percent of them have a genome in NCBI genomes.
That’s rapidly changing with the Earth biogenome project and other projects like that. But still, what that means is that the database that we have is extremely biased.
When we’re using tools like AlphaFold to predict structure, it’s almost like we trained AlphaFold on like Russian mathematics textbooks, primarily. a.k.a. human species. And then are trying to apply that to everything else.
It’s just not quite the same. And I think there’s similar, but different rules in other organisms and about what’s compatible with life there. But we just don’t really see the full biodiversity of life, and to be able to annotate it.
Colonial tunicates
Olga Botvinnik: So, that’s what got me interested in creating Seanome. And I got really excited about the ocean, partly from my colleague, Ayelet Voskoboynik, at Stanford Hopkins Marine Station, who works on the coolest organism in the world, which is Botryllus schlosseri, and it’s a colonial tunicate that has a immune gene that’s very similar to our HLA genes, to human organ transplant genes.
Phil Ewels: What did you say the creature was? A colonial?
Olga Botvinnik: It’s a colonial tunicate. It’s called Botryllus schlosseri. They get their name because they have a tunic around them, but they are our closest relative that is not a vertebrate, but is still a chordate.
So it has a brain, a heart, a stomach, has a central nervous system, lives in colonies, so it almost looks like a little daisy flower, where each petal is one of the zoids that group together into a colony.
For two of these colonies to fuse, because they are sessile like trees, when they are first born, they’re like a little tadpole.
And then they have to make the biggest decision of their life, which is where do they live. And they find a place to live, they live there for a while. If they expand and get big enough to meet another colony, now they have to decide, ooh, are we gonna be friends, or are we gonna be enemies?
And so it turns out there’s this one gene that decides if they are friends or enemies. And the allele has to be the same, kind of like our HLA genes for organ transplant. There needs to be an HLA match.
This gene it’s called Botryllus Histocompatibility Factor, or BHF, has no homology sequence similarity to any human genes. And yet it has this very very similar function in human genes.
And, I just got so fascinated by that and wanted to make a tool to annotate proteins of unknown function, so that’s been my calling. And that’s really what got me started with annotating proteins and got me excited about the possibilities in the ocean.
Phil Ewels: Fantastic. And tell us a little bit about your new venture and how long have you been working under the new name of Seanome and what is it you’re trying to achieve?
Starting Seanome with Arctic clams
Olga Botvinnik: I’ve been working on Seanome since fall of 2024, so just a few months.
The initial project we’re working on is one, making a tool to annotate proteins of unknown function. The initial application is working on arctic clams.
What is really beautiful and interesting about Arctic clams is that, I met this researcher, Max Glanz, at University of Florida, who has set out to cure his father’s neuropathy.
What he discovered along the way is that the toxic compound that causes his father’s neuropathy is the exact same as something produced by the Arctic clam. And to our knowledge the only organisms on the planet that produce these products are Humans with this disease and Arctic clams.
So presumably there’s something in the Arctic clam genome that can degrade or modify or otherwise be part of the biosynthesis pathway for this compound. It’s a sphingolipid biosynthesis. So that’s our first application.
So far we are six strong with volunteers. I’m very lucky to have, six people join me on this journey so far. And I’ve been coding, fundraising, all of the above. It’s been interesting journey.
I did speak at the Plant and Animal Genomes Conference with some of the preliminary work that I started with the Botryllus data set, and got lots of really good feedback, which was really exciting. So yeah, I had a talk and a poster there, and I’m looking forward to more, more soon.
Vision for Seanome
Phil Ewels: What’s your long term vision for this?
Olga Botvinnik: The long term mission is to build the best protein annotator in the world.
And I think our approach will shine in certain areas, in certain data sets, certain applications, while other tools will do better in others.
For example, what we’ll be doing during the next little hackathon is starting to build this NF protein, annotator tool and our tool kmerseq will be one of many that can be evaluated.
One, we are going to do our best to make the most useful protein annotator, but two, we also want to acknowledge that our tool will have strengths and weaknesses and give people the opportunity to compare between them.
That’s on the technical side. The grand vision is to create a long lasting institute that continues to build tools and focus on the computational development for researchers of non model organisms focusing in on the ocean.
I hope to work with Indigenous communities. This is something I really have a lot to learn about. I think that what we can offer as a servant to the Indigenous peoples and local communities is: We can annotate mystery proteins. Most proteins in the world are a mystery. What is useful to you in terms of sovereignty, ecological importance, from traditional ecological knowledge? How can we use genomics to maybe formalize some of this knowledge? I don’t know. But the point is that it needs to be very co creation driven and driven by the benefit for the communities.
The most important thing for me is to create and share knowledge.
Earth BioGenome Project
Phil Ewels: You mentioned earlier about the Earth BioGenome Project. Is that something that you see affecting your work? There’s a huge volume of more diverse genomic data being pumped into these databases right now. And is that something you intend to work with directly or you think will be beneficial on the side?
Olga Botvinnik: Yeah, I think they are like, my ideal collaborator because they are generating so much new data, and I think are most interested in annotating genomes as quickly and as effectively as possible.
Until we have something that can really get into people’s hands and is working, it’s hard to really establish the collaboration because I don’t want to be just, like, smoke and mirrors. I need to have something that’s really useful to people and we’re still under development. So I think until we have something concrete it’s hard to justify the investment in time on their side. But yeah, I think that working with them would be really, really beautiful. We are focusing on the ocean, but ultimately our tools are applicable to any organism.
Animal genomics not metagenomics
Olga Botvinnik: I should also mention that we’re not doing metagenomics. We’re doing animal genomics. Animals and eukaryotes and multicellular organisms have just different constraints than microbes.
Animals and more complex organisms with complex communication mechanisms have a higher percentage of disordered protein regions. I think this is a place where our approach can really shine because we don’t use the underlying structure, say in the protein data bank. And disordered protein regions are literally dark in the protein databank. They are not present because they’re disordered, they can’t be crystallized. So I think that’s an area where we can shine, that will be a place where we can show our strength.
Phil Ewels: Could your techniques work with prokaryotes and non animal organisms? Or is it that there’s something fundamentally different between how these organisms work?
Olga Botvinnik: Yeah, I haven’t tested that yet. Right now, our focus is going to be optimizing for animals and metazoans, multicellular organisms. I don’t see why they wouldn’t work for prokaryotes and archaea. I think each will require different approaches.
Building nf-core/proteinannotator
Phil Ewels: You mentioned that the nf-core pipeline that you’re starting to work on, tell us a little bit about this pipeline.
Olga Botvinnik: Sure! So, I’ve made the bold choice to not write any code until we have drawn the metromap of that pipeline, which is pretty exciting .
Where kmerseq will fall in, it will be, like, blast, diamondblast, foldseq, eggnog, InParanoid, Orthofinder, to identify somewhat orthologs, except we’re not really interested in orthologs in the genetic sense that these genes have the same ancestor, but I’m most interested in functional homologs, that these are functionally interchangeable.
So the best example I know of is Bcl-2 in Humans and CED-9 in C. elegans, which are both apoptosis regulators. And a paper in 1994, amazingly, like 30 years ago now, showed that these two genes, even though they only have 23 percent sequence similarity, are functionally interchangeable from Human to C. elegans. You can delete the Bcl-2 in humans, insert CED-9 version into humans, get the same function. Similarly in C. elegans, you can remove CED-9, put in Bcl-2, and get the same function.
I’m really interested in finding those kind of comparisons and that is a different question than orthology. But I think the tools can be quite similar and the benchmarks can be similar. Although the philosophy will be different.
Phil Ewels: So you’re going to try and build any benchmarks into the pipeline, or is that something you expect people to do themselves.
Olga Botvinnik: Yeah, I would love for that pipeline to become the tool we use to benchmark against the various data sets.
So, the SCOPe, Structural Classification of Protein Extended dataset from Berkeley, or the Quest for Orthologues data set. We can really design that pipeline to be a comparison benchmark tool, which I think is really exciting.
So I can make the right tool and the right decisions. I want to know, for this gene, what are all the possible annotations from all the tools? And which ones do I trust, which ones do I not trust? And I think that will be the most interesting way to apply and compare across the different tools.
Phil Ewels: What are the outputs of the pipeline you’re building, once you’ve got your proteins with similar functional orthology. Where do you go with those results and what can you do with them?
Olga Botvinnik: Yeah, as of like, today, the output is a region of a protein in UniProt and what protein sequence that is.
What we’re working on is taking that region and going back to what domain is annotated in UniProt for that. So we’re only using SwissProt for that, cause that’s the very well curated data set, and I think has the most trustworthy results.
The goal is to have the domain annotated, so this could be a DNA binding domain, a phosphorylation site, a membrane tether, and so on. And what you do from there is do an experiment.
What I’m making with Seanome is a 0 to 1 tool. I’m really interested in proteins that when you run BLAST, FoldSeq, HMMER, on these tools, you get literally zero results. Or you get results that are, like, mixed and confusing.
I’m really interested in what could this protein do? What is its function? And our goal is to create hypotheses.
Instead of saying, I literally don’t know what this protein does, I can now go do an ELISA to see if this binds this particular protein. I can now go do a DNA binding experiment. I can now do a p32 phosphorylation experiment, see if this is phosphorylated.
Phil Ewels: So of course, we’ll have to work in tandem with experimental labs to make sure our results and predictions are useful and informative.
One thing I was curious about. I remember your first pipeline, I think it was, was called kmermaid.
Phil Ewels: I was wondering if there’s any kind of overlap or kind of history that connect these pipelines. Is there any similarities here?
Olga Botvinnik: Yeah, the underlying tool that we use called SourMash to do the kmer sampling, and in particular, conversion of our protein sequence to a reduced alphabet, so we use a two letter alphabet of hydrophobic and polar, which turns out to partition the amino acid space exactly in two, so 10 and 10 each. We use SourMash to do that because it’s very optimized to use kmers. It has a Rust plugin, so it can be like stupidly fast. What we need to figure out is going back to the original kmers is still not as fast as I would like.
What’s beautiful about SourMash is the new Branchwater plugin which is a very Rust optimized version, can search millions of DNA records in seconds. So that speed can really enable completely new questions that you just weren’t able to do before.
Software engineering in bioinformatics
Olga Botvinnik:
It reminds me of the like Bowtie paper. So Ben Langmead, posted on Twitter a while ago of the rejection letter from Nature Biotech of the Bowtie2 paper. The improvement was speed. And that it was 10x faster, or whatever the value was, faster than before. And the editor said that faster is not an innovation. And yet, with the innovation of Bowtie2 and it being faster, now things like RNA Seq become a lot easier. Those become so much more routine.
Olga Botvinnik: So I find it interesting in the more academic science community, that the innovations in engineering aren’t always recognized as breakthroughs. That new tools enable new science is a big philosophy of mine. I believe that I am making new kinds of microscopes to look at sequence data.
My happy place is looking at A, C, T’s and G’s and thinking about, hmm, how could this make more sense? How could I turn all these letters into a table? Into something that is interpretable and then can be used to come up with new hypotheses.
And I want that to be a kind of guiding principle of the organization that yes, we will publish. Yes, we will make tools. But what really guides us is like, how useful is it to our people using it? What problems are they running into? How can we make it easier to use? What edge cases are missing, things like that.
That engineering mindset for science, I haven’t found in too many places.
Phil Ewels: I remember now why you and I get on so well, we’re kindred spirits in a sense.
Olga Botvinnik: Yeah. Yeah. I mean, yes, of course, Seqera is doing that.
In an academic science, I don’t see that as often. There’s some places that do it very well. They’re swimming upstream, I’d say. It’s hard to get grants to do software maintenance.
Like where you need to maintain tools because so much of bioinformatics just relies on this foundational bedrock of bioinformatics, and yet saying we need the grant to make sure that 80 percent of bioinformatics, 80 percent of biology is like actually possible, somehow doesn’t transfer, doesn’t click.
Phil Ewels: I think it’s probably not specific to bioinformatics in fairness, but usually at this point in the conversation, we reference the XKCD comic with the tower of building sitting on a tiny little,
by someone in Nebraska since 20. I think it’s probably wide widespread thing, but I totally agree. It’s difficult sometimes to convince people that the engineering side of bioinformatics is important. And like you say, can enable new questions to be asked.
nf-core early days
Phil Ewels: Let’s go back in time you’re former core team of nf-core, which maybe many listeners won’t realize, um, tell us a little bit about your memories of the start of nf-core and the Nextflow community back then.
Olga Botvinnik: Yeah, I, basically got tired of seeing the same pipeline for RNA-seq to be written over and over again in so many different ways when if we all combine our efforts and really battle tested in multiple different situations. That could be a really robust pipeline.
I mean, Nextflow has a learning curve. We can certainly admit that. And I just found people to be really helpful. Especially then when, it was like me posting on Google Groups and being like, um, I just did these like wacky Nextflow gymnastics. And what do you think of that? And then Paolo would be like, that’s very clever, but you could do it this way.
And I’m like, okay, I guess I overthought this,
Phil Ewels: I still do that to this day. Just, uh, if it makes you feel any better,
Olga Botvinnik: okay, great. And , I guess like finding the joy in debugging. I really enjoyed that. And yeah, being part of the Slack community was great. I went to the
I think 2019 Barcelona summit, yeah, the 2019 summit was a lot of fun. It was really great to meet people that I’d been, coding with online, in person and getting together to think about, like, how could this pipeline be better?
I think part of why nf-core really took off is the permissive licensing, because I think what academics maybe don’t realize is that one, there’s a lot of companies out there doing bioinformatics and two, they’re running into really wild, crazy situations by doing production level bioinformatics with like thousands and thousands and thousands of samples per day that an academic lab just really won’t run into.
Olga Botvinnik: If the licensing is such that the project can’t benefit from those innovations, that means that the peoplee n academia don’t get the benefits of the thinking by people who are in a for profit company. I think the licensing is very thoughtful to be able to benefit from both industry and academic collaborations.
It also means that when you go to a hackathon, there’s people there who are in industry or academia and working together, like all the same thing. I think that’s just so beautiful. Like it feels like this is what science is really about. Like we’re making new tools, new abilities for everyone on the planet to use.
The reality is like very few people have the bandwidth you need to build a pipeline like nf-core/rnaseq, building that from scratch internally, you could definitely do that, but you will probably run into all these different edge cases that someone else has already thought about and fixed and patched for you. So I think that’s really great.
Phil Ewels: Your career took you away from Nextflow and nf-core for a few years and now you’re starting to come back into the community and I’m curious about what’s changed. You were there in the very early days and when we were probably only, I don’t know, 50 or 100 people back then and now we’re 11, 000 or more. Um, how does it look? Is there anything that’s kind of struck you as being different?
Olga Botvinnik: Yeah, I mean, the Barcelona Summit was like, I don’t know, yeah, maybe 50 people in a teaching classroom kind of size. And I was looking at the Boston Summit coming up and I was like, Whoa, this is a production. There’s a podium with the Seqera logo on it. This is wild. So just like the, I think the growth and the formalization has been very cool and very interesting.
Like I noticed the help hours, and I think that’s really crucial, because getting help with using or setting up or writing your Nextflow pipelines, is super helpful. So, so helpful.
Part of why I wanted to join the community was cause, there was maybe me and a few other people it’s a little lonely. And then you join a community and you’re like, oh my god, you love RNA Seq? I love RNA Seq too! Let’s compare aligners! Yeah, let’s do it! There’s just a lot of fun, excitement around that, and that hasn’t changed. I think that’s still there. I think that’s really nice.
The size and scale has certainly increased. But I think the joy and the excitement, and the community togetherness of realizing, yeah, we’re not alone. We’re all bioinformaticians, like in this together, building together. Um, I think is really great.
Phil Ewels: If I had to pick one thing to keep, and I think that would probably be that friendliness and that community feel. So I’m happy to hear that you still feel that.
That’s and I mean, coming full circle now, I think you’re helping to organize a local site for the upcoming nf-core Hackathon in a few weeks. Is that right?
Olga Botvinnik: Yes, yes. Helping to organise the San Francisco site for nf-core.
Yeah, I’m really excited for it. I’m excited to work on the protein annotator tool. So within Seanome, we’ll be working on that. I think, my plan is to use some online tools so we can edit a metromap together, and then start building. But I think it’ll be very nice to have our North Star, what we’re building, figured out ahead of time.
So, yeah, I’m excited for that. Excited to meet people in person. I haven’t been to a nf-core event in a while. I did still write Nextflow pipelines at BridgeBio, they just didn’t end up outside of their ecosystem. It’ll be really fun to build with other people again. Like, oh, wow, I’m like coding with all my friends. This is great.
Bioinformatics Beyoncé
Phil Ewels: You’ve led me very nicely on to my next topic actually, which was to step a little bit away from Nextflow and nf-core and talk a little bit more about your science communication efforts over the years. In the past you, you’re gonna have to remind me of your username, but it was like Bioinformatics Beyonce or something like that.
Olga Botvinnik: It was, yes. I am a really big Beyoncé fan. I went to my first concert in 2008? At the time her eponymous album, Beyoncé, had just come out. So I created, yeah, a Biopharmatics Beyoncé persona. Using that styling from her album. That was a lot of fun, to code around that.
I think for Seanome, I’m going to be streaming as well. Just,here I am doing science, I am coding today, cause I’m a big fan of peeling back the curtain. And I think the reality of science, if you’re not within the field of science practitioner, you may not know what the day to day could look like.
Olga Botvinnik: The day to day could look like, uh, I’m editing this figure in Canva or I am like debugging this Python thing, or Nextflow thing. I think the reality of day to day science is it can be interesting to people to share. I think I was sharing like oh I was doing like some genome assembly on a super beefy machine and sharing the like htop of oh, yeah, it’s taking up all 128 cores and 2 terabytes of RAM and order all the processes going and I just thought it was a lot of fun and to connect with people who I may not have.
And give them a glimpse of what it’s like to sit behind the desk and, do the science and get questions that help me think about science differently.
That help me think about what am I doing? Why am I doing what I’m doing? How is this useful to the people I’m talking to? How can I communicate more clearly the impact of what I’m doing?
And streaming I find nice because, then I don’t have to do too much editing afterwards. So I think that’s gonna be a fun venue. I look forward to it.
Phil Ewels: Did you have much of a following? Were people watching live, you a commentary or were you just sat there typing
Olga Botvinnik: I was commenting, and I did have five or six regular people who would come up and I was really amazed to learn that Twitch had a, at least at that time, had a pretty strong scientific streaming community of people doing science like geology or astronomy or even people like working on their cars and having a video and mic setup just in the garage they were working on and talking through what they were doing.
I thought that was just pretty cool, that people were sharing in all these different ways. And I was really amazed to see that people were so interested in science. So yeah, a bit of a following, hope to grow more.
We also have a blog, the blog.seanome.org, which is probably the best place to learn more. That’s where I update the most. That’s a lot of fun too, because I try my best to make it as accessible as possible, not super technical, and focus more on a broader audience of why studying ocean animals is important, what are the impacts that we could have, what are the things we’re working on, and, yeah, still posting and working on it there.
Phil Ewels: Brilliant.
Seanome future steps
Phil Ewels: Looking forward, what should we be looking out for, for you and for Seanome? Have you got any big milestones in your sights that you’re looking forward to?
Olga Botvinnik: Yeah, I’m going to be going to some conferences in the future, looking to get our initial kmerseq release out.
I think initially we’ll be focusing on the Nextflow pipeline version, because I have a CLI that wraps. several commands together, but some of them have longer processes than others. So I think putting it into a Nextflow pipeline where some of the processes can just sit and run for a while, but not be blocking others from finishing will be the way to go.
And we’ll be working on a paper for that as well.
Phil Ewels: Hopefully going to Boston for the Summit as well, so you can hunt me down there too. Happy to talk to anyone about career, industry versus academia, I get that question a lot. Yeah. And anyone interested in annotating proteins of unknown function, I would love to talk to you about the problems you’re facing, any specific tools you’re thinking about, data sets you’re thinking about, I’d love to hear especially if you’re having issues with disordered protein regions and low sequence similarity. Those are the two things I would love to hear your struggles with.
Is there any final thoughts you’d like to leave us with?
Final thoughts: Why the ocean
Olga Botvinnik: Mmm. Sure. I’d love to just plug why the ocean a bit more.
So by my calculations, the planet has around 14 quadrillion uh, unique proteins. And in UniProt, our biggest protein database, we have about 250 million that are catalogued. So quadrillion to million is what, nine orders of magnitude difference? And, that’sinsane, if you think about total addressable market, from a startup perspective, like that is so huge.
And why the ocean is we can bite off a tiny piece of that 14 quadrillion, just 43 billion proteins in ocean animals. And I think that is a much more manageable size.
But also the ocean has such a huge diversity of animal life. Almost 80 percent of animal life lives in the ocean. And think of all the organisms that can’t live on land, like jellyfish, and whales, and octopi, that have unique body plans that just can’t live on Earth.
The organisms there also experience a lot more pathogens. So just in a single teaspoon of water, we see a hundred million viruses. So the organisms we see in the ocean are a huge goldmine for novel antivirals, antifungals, antibiotics, and have some of the longest lived organisms. So jellyfish that are immortal, that go back to an embryonic state and live life forever. That could really transform how we think about aging.
Or unfreezable arctic fish that have ice binding antifreeze proteins. And that could really transform how we think about organ preservation. Potentially preserve organs indefinitely.
So I think the potential of applications from the ocean is huge. It’s just absolutely, beautifully huge, and the ocean is so understudied. So I think we can really hit a lovely sweet spot of making tools for organisms that are understudied, but also have such immense potential.
Phil Ewels: I definitely feel inspired after that.
Olga Botvinnik: I’m glad to hear. Glad to hear.
Goodbyes
Phil Ewels: All right, Olga, thanks so much for your time. And I hope to run into you soon at an nf-core Nextflow event.
Olga Botvinnik: Yeah. Great. Yeah. I hope to see you too soon Phil.
Phil Ewels: Cheers.
Olga Botvinnik: Cheers. Bye.