Logo

dev-resources.site

for different kinds of informations.

Why I'm hyped about Julia for Bioinformatics

Published at
6/1/2022
Categories
bioinformatics
Author
emiller
Categories
1 categories in total
bioinformatics
open
Author
7 person written this
emiller
open
Why I'm hyped about Julia for Bioinformatics

Julia might just change the game for bioinformatics. Bioinformatics is starting to heat up as a field, and I think we have a lot of unique issues starting from the days of perl and the Human Genome Project.

Based off a recent recommendation from my friend Teco

Two Language Problem

Plenty of people have written about this with Julia. Lots of scientific communities may use a high-level "scripting" language to manage their business-logic and then drop into a low-level language such as C/Fortran/Rust to highly optimize the bottleneck.

In bioinformatics, we have a 5 language problem. From the perl script written before the first generation of Next Generation Sequencing, to packages like DESeq2 that aren't going to leave anyone's toolbox anytime soon, but no one is going to rewrite, and instead are going to spend time teaching R to every incoming generation of bioinformaticians just so they can use these influential packages. That's not including python, Rust, and every bioinformatician's favorite, bash. That's not including if you want to contribute to any GATK CLI tools written in Java.

Now, workflow managers such as Snakemake and Nextflow have fixed the majority of those. Just stick those scripts in a container that acts as a time-capsule of versions long forgotten, and you can use them for just what you need and put them back in closet.

But what about new scripts? When you're starting a new bioinformatics project in 2022, do you follow in the footsteps of the pioneers that came before you, and learn how to call C functions in an R package? Or would you rather just see this beauty.

img

But what about my C functions that are highly optimized?

What about our in-house python library that interacts with all of our sample tracking and pulling files temporarily from s3? "I don't have time to figure out how to untangle that mess" you say. The Julia community wants to keep all of those import scientific scripts written. Packages such as RCall.jl, PyCall.jl, and plenty of others housed under Julia Interop allow for legacy code to fit right into new code. As scientists, we are constantly standing on the shoulders of giants, and Julia is enabling that.

BioJulia

Julia has a wonderful open-source structure. Since it was created, only a decade ago there's not a lot of legacy things to maintain and everything can be fresh and inclusive. Take for example the heavy use of organizations to house these code repositories, which prevents packages from going unmaintained and orphaned when the creator moves on.

While it's a relatively small community, they've covered a wide range of the various file types that we deal with on a daily basis. I hope to dog food most of the packages to fill in some missing pieces, and improve documentation and create some content for the community. The part that I found difficult was finding the people based on the website. I joined the Gitter to ask a question, but luckily someone saw the message and told me that most of the real-time chat happens in the #biology channel in the Julia slack.

Designed for Scientific Computing

It seems like bioinformaticians always want to do things in the most efficient path, but solving for a different variable than most. In a recent episode of screaming in the cloud, Lynn Langit mentioned that in finance, they care about getting the results as quickly as possible. The cost of the computing isn't a factor. In bioinformatics, they're trying to solve for both time and cost, trying to find the local minimum between both. There are plenty of time results don't need to be instant, and waiting a few days to stretch a grant out is a necessary evil.

Julia is fast, and that's been said numerous times, so you're probably guessing I'm going to say you can save money by decreasing your compute time. While that's true, the piece that I think people aren't solving for in that equation is developer time. Julia was designed from the ground up as a general programming language for scientists (Why We Created Julia).

We want a language that's open source, with a liberal license. We want the speed
of C with the dynamism of Ruby. We want a language that's homoiconic, with true
macros like Lisp, but with obvious, familiar mathematical notation like Matlab.
We want something as usable for general programming as Python, as easy for
statistics as R, as natural for string processing as Perl, as powerful for
linear algebra as Matlab, as good at gluing programs together as the shell.
Something that is dirt simple to learn, yet keeps the most serious hackers
happy. We want it interactive and we want it compiled.

That sounds like a bioinformatician's dream to me! Think of all the time we can save on developer experience, not trying to hack out some extra speed, or fixing broken dependencies (or a complete lack of specified dependencies!). Allowing legacy code to be treated like the crown jewel in our metaphorical software crown, and wrapping it in some gold Julia code like it deserves.

Call to action

I plan on working to increase the visibility into Julia, specifically for bioinformatics. I always love this Venn diagram to explain to people what bioinformatics is. I think it'll be natural to cover JuliaData, JuliaStats, and BioJulia to cover all three and show people how the three intersect.

img

How to get started with Julia

bioinformatics Article's
30 articles in total
Favicon
Performance trap: general libraries & helper objects
Favicon
Optimizing QuPath intensity measurements: 12.5 hr to 2min
Favicon
Running DeepCell on Google Batch with node pools
Favicon
Exploring the Cutting-Edge of Genome Hacking with Bioinformatics!
Favicon
Download Fasta files in Bash using Nano text editor in 4 simple steps
Favicon
Faster tetranucleotide (k-mer) frequencies!
Favicon
Bioinformatics: PackagesNotFoundError on osx-arm64 Platform
Favicon
Announcing WDL 1.1.1
Favicon
Decoding Life: Navigating the World of Bioinformatics
Favicon
What is Bioconductor in R ?
Favicon
Solution-diffusion model in Rust
Favicon
Meet DNAI: A ML-Based Analysis of DNA
Favicon
BCFtools
Favicon
Website Developer Needed!!!! πŸ§¬πŸ’»
Favicon
Calculating tetranucleotide (k-mer) frequencies
Favicon
Get GC Content
Favicon
Tips for scalable workflows on AWS
Favicon
Why I'm hyped about Julia for Bioinformatics
Favicon
Introducing myself
Favicon
Profiling workflows with the Amazon Genomics CLI
Favicon
How to install Gromacs, PyMOL, AutoDock Vina, VMD, MGLTools, Avogadro2, Open Babel in Ubuntu 20.04
Favicon
AlphaFold e a predição de estruturas
Favicon
Boas Vindas
Favicon
Bash Commands for Bioinformatics Beginners: Part 1
Favicon
[pt-BR] Minha jornada de aprendizagem em Python
Favicon
Slideio - an open-source python library for reading of medical images
Favicon
Python for bioinformatics: Getting started with sequence analysis in Python
Favicon
Multiple Sequence alignment (MSA) [C++]
Favicon
Phred quality score
Favicon
Counting sequences in Fasta/Fastq files

Featured ones: