Introducing Harshil Patel, Head of Scientific Development

WOWZAS! It has been a crazy few years… all of a sudden, I find myself as Head of Scientific Development at Seqera! For the benefit of those of you currently backpacking in previously undiscovered areas of the Amazon and hence can't read further, here are the take-home points:

My primary role will be to continue nurturing the symbiotic relationship between the nf-core community, Nextflow, and Seqera. We have thrived due to each other's success and will continue to implement solutions to make it even easier to develop, maintain and monitor your Nextflow pipelines.

→Seqera is officially in the business of helping to solve your pipeline needs! Please reach out to us if you would like assistance with developing both new or existing pipelines.It is a privilege being able to do what I love every day, especially alongside like-minded folks, and I really look forward to crossing paths with you shortly!

For those of you with an annoyingly good Wi-Fi signal, please bear with me while I take this moment to introduce myself and to reminisce a little.

Enter student life, a.k.a. "The time of your life" - for a reason. I attended various universities in the UK and walked away with a few degrees, eventually culminating in a bioinformatics Ph.D. titled "The Structural Analysis of Metabolism on a Genomic Scale". Biology always made sense to me while growing up, and as a consequence, I obtained my first degree in Neuroscience. I didn't fancy the prospect of donning safety specs for the foreseeable future or setting myself on fire, for that matter, so I pivoted to a Masters's conversion in bioinformatics before doing my Ph.D. Coming from a bioLOGICAL background, it was definitely an interesting process becoming more efficient at writing less efficient code - perseverance and repetition are everything when learning to program! I am extremely fortunate to have had great mentors to learn and grow with who have helped shape my career so far. The next step was to get myself into the "real world" and convert my degrees into cash as there was only so long that I could rely on the generous graduate payments from the government (and the Bank of Mum and Dad).

Enter Aengus Stewart a.k.a. "The ponytailed Irishman" - for a reason. In May 2010, through sheer persistence and some good old-fashioned luck, Aengus saw the light (or took a massive punt) and recruited me as a Bioinformatics Officer for the Bioinformatics & Biostatistics Facility (BABS) at Cancer Research UK, London Research Institute (LRI). I didn't realize it at the time but BABS would eventually become an extension of my family – they were ever-present through some seminal moments as I progressed and developed professionally and personally.

Enter Genomics, a.k.a. "Save as" - for a reason. I entered the genomics space knowing nothing... I mean nothing, but at a time where some of the household names such as BWA, SAMTools, Picard were still relatively early days in their development. I found myself looking for mistakes in the yeast genome in SAM format, and after a couple of bouts of conjunctivitis, I decided to write my own home-baked variant caller instead. I am still a firm believer that if you really want to understand genomics, you need to be relatively well versed in THE standard formats; keeping Uncle SAM close is no exception.

As bioinformaticians, we spend most of our time moving data from pillar to post, where Coca-Cola has an indefinite variety of flavors; we have the equivalent in file formats. Finding an existing tool to do the job is a true art form. We are lucky to have a fantastic community at our fingertips – 9 times out of 10, there is already something out there to leverage. However, if you choose to write your own and think it may be helpful to others, please consider writing in a way where it can be shared – I learned this the hard way. You won't be directly contributing to world peace. Still, the benefits and satisfaction of sharing the fruits of your blood, sweat, and stress-induced hair loss will be massively appreciated all around.

Enter The Francis Crick Institute, a.k.a. "The best biomedical research institute in Europe" - for a reason. Cancer Research UK collaborated with 5 other world-leading biomedical research organizations, and in 2015, after years of planning, we infiltrated the shiny new Francis Crick Institute. I wasn't complaining – in my eyes, it was an architectural wonder right in the heart of London! However, it is the people within that make it a true marvel. It is a melting pot of Nobel prize winners and world-renowned scientists of all shapes and sizes working in an open, collaborative, and uncompromising environment. The mission, should we choose to accept, was to answer some of the most difficult questions that science has to offer - no pressure then! Where there is science, there are experiments, and nowadays, where there are experiments, there is a big pile of data that needs to be analyzed. This is where we bioinformaticians come in; "cue Ghostbusters theme song"! Given the breadth of cutting-edge research that was carried out at both the LRI and The Crick and the free rein bestowed upon me by Aengus, I was able to get my hands dirty analyzing all sorts of crazy but well thought out experiments! Working in this environment, I acquired by osmosis key skills that seem to be a cornerstone of any good CV, like meeting deadlines and project and human management. My publication tally was increasing slowly and steadily due to collaboration with some of the world's top scientists. As a side effect, I was accumulating a mass of redundant Python scripts, one per project, and as a result, became a true Jedi master in the art of Ctrl+C, Ctrl+V. There had to be another solution?!

Enter Nextflow, a.k.a. "The Groovy DSL" - for a reason. We performed a minimal benchmark comparing workflow languages within BABS and decided to settle with Nextflow. I went about converting quite a complex but mature pipeline I had written for ATAC-seq analysis to Nextflow. This was primarily because I was fed up with duplicating code all over the place and manually turning the handle on portions of the script that required more complex job dependency management on our Slurm HPC cluster. Little did I know that this was my first subconscious and unintentional contribution to the bioinformatics community…

Enter Phil Ewels a.k.a. "@tallphil" - for a reason. During the early days of nf-core, Phil was invited to visit the sequencing facility at The Crick while he was on a pilgrimage to the UK. I presented the ATAC-seq pipeline I was writing at the time (in person - oh how times have changed…), and he swiftly suggested that I add it to nf-core. I had finally found a way to give back to the community, woooohoooooo, job done! No, no... let's not get ahead of ourselves. Anyone that knows anything about nf-core will also know that literally everything developed within the framework has a guideline, or a standard, or a test for a guideline, or a test for a standard, or a standard guideline test...everything has to be version controlled and documented with backward compatibility….exploding_head! To be fair, I have my own OCD inclinations, so it didn't take much convincing to get me on board. I realized the importance of nurturing a community around portable, reproducible and scalable workflows. I just had to find the time to contribute whilst juggling the bread and butter project work with scientists at The Crick, time with my family, and oh, maintaining a level of my own sanity.

Once again, Aengus "saw the light" and realized the importance of my contributions to nf-core. He had a vested interest in my going rogue and knew that my contributions would ultimately filter back to make things easier for BABS, The Crick, and the community. At the time, a core (pardon the pun) group of us, primarily Phil, Alex Peltzer, Sven Fillinger, really started marketing the buzz around the nf-core framework by giving talks anywhere and to anyone that would listen. We began prioritizing the development of the framework by improving the tooling and website, adding pipelines, and most importantly, the holy grail of nf-core – carving our guidelines into stone. We had a dedicated group of "believers" that were mostly Europe-centric owing partly to the origins of Nextflow. A couple of years later, we now have contributors from virtually every continent, ~2100 members on our Slack channel, 48 best practice pipelines (8 of which I am responsible for), oh, and a Nature Biotech paper. How and when did that happen??? As time went on, I developed an (un)healthy obsession with Nextflow. Mwahahahahahahaha.

Enter Paolo Di Tommaso and Evan Floden, a.k.a. "Yoda and Obi-Wan, respectively" - for a reason. I attended the official Nextflow conference at the CRG, Barcelona in 2018. Evan was wrestling with his PhD but still quite heavily involved in the community engagement and development of Nextflow. Paolo was being hounded for feature requests – a victim of his own success and a true testament to how far they had come! I attended the same conference the following year, each time making more friends, connections and expanding my network, along the way. I was slowly but steadily establishing myself as a major contributor to both the nf-core / Nextflow communities and to the mojito consumption rate at the customary conference events. I roped Paolo and Evan into giving a Nextflow training course at The Crick right before a 3 day nf-core Hackathon. We just about managed to squeeze in this event the week before the UK woke up and smelt the COVID-19 coffee. It was early days for Seqera, and I still vividly remember Evan awaiting calls from investors while we were tucking into a burger and fries. Over the beers that followed, we entered into a gentleman's agreement that I would join them on their crusade one day when the time was right. I knew I had to be a part of this revolution...

Enter pandemic - sigh.

Enter Seqera, a.k.a. "The NASA of Workflows" - for a reason. After 11 invaluable and unforgettable years, my time in BABS came to an end this summer. I have grown, and I am grateful. I have now been a part of the Seqera family for a couple of months, and as expected, it is a genuinely different working environment. A welcome challenge. With a level of dynamism that would be a struggle to replicate in a research setting. Even before I started working here, I was really impressed by the periodic improvements being rolled out on Nextflow and Tower, and I am now able to witness first-hand the sheer amount of work behind the scenes. We are a small, dedicated, and highly experienced team. We are growing rapidly; opinions matter, are respected, and heard right at the top. Everyone is on the same page and is genuine in their belief in the importance of our product. We know that our work is empowering our users and customers to streamline their data analysis needs, now and in the future.

I digress massively, having already pointed out the main reason I was supposed to write this blog. There has always been a fantastic synergy between the nf-core and Nextflow communities, and Seqera has only accelerated that. We now have the resources to make workflow development, deployment and monitoring even better! I would like to personally thank everyone that has been a part of my journey – you know who you are, especially my gorgeous family! I will continue to be heavily involved within the nf-core community and its development. We can now offer workflow development as a service to Seqera customers. The intention is to use a collaborative model so that we can exchange domain expertise with our customers for the construction of workflows that suit their requirements.

Please do get in touch if you think we may be able to assist you in any way!