Logo

dev-resources.site

for different kinds of informations.

Optimizing QuPath intensity measurements: 12.5 hr to 2min

Published at
8/31/2024
Categories
performance
bioinformatics
kotlin
architecture
Author
dchaley
Author
7 person written this
dchaley
open
Optimizing QuPath intensity measurements: 12.5 hr to 2min

Spatial biology analyzes tissue sample images to derive patterns and data. A key first step is identifying cells on the image and gathering quantitative measurements about those cells.

In our ongoing work scaling DeepCell on GCP Batch, we'd previously gotten pretty efficient at the first part: segmenting the image into cells. But we hit a major performance roadblock for the next step: generating quantitative measurements.

The measurements are fairly straightforward:

  • size of each cell (convert pixels in each detected cell to physical dimensions, assuming some number of microns per pixel)
  • pixel intensity of each cell

Of note, for a ~140M pixel image, it took about 12.5 hours (‼️) to measure the detected cells. That's … not great 😩 What the heck?? We're just counting number of pixels, and pixel values. An HD image is ~2 M pixels, and computers (and TVs & phones) render >30 of those per second.

Profiling to the rescue. The great thing about JVM code is that it's extremely easy to profile. Just click "profile" instead of "run".

Screenshot of profiler button

Here's the resulting flamegraph.

IntelliJ flamegraph adding cell measurements

Of note, 99.9% of adding intensity measurements–84% of the total time–is spent simply reading the image.

Screenshot of time spent in readRegion: 84.25% of all, 99.88% of parent, amounting to 79.5 seconds

OK: so we need to not read the image repeatedly. In our case, the entire image can (for now) fit into RAM. If only we could simply prefetch the image, then read regions out of that in-memory image.

Sounds like a great use case for the Proxy pattern. We need an ImageServer that behaves just like the original image server, except, it reads from an in-memory image not from disk (or wherever the wrapped server reads).

The resulting code is quite simple. Here's the pull request. We override the abstract ImageServer, wrapping another ImageServer and forwarding all methods to the original.

UPDATE 2024-09-10: Thanks to Adrián Szegedi (GitHub HawkSK) the code is even simpler (PR#42): no need to explicitly forward methods. Instead we use Kotlin's delegation syntax which implicitly forwards non-overridden methods. This removes 100 lines of boilerplate 💪🏻

The one non-forwarded method is the core operation: reading a region.

That one turns into extracting the region from the entire (prefetched) image:

  private fun readFullImage() {
    if (prefetchedImage != null)
      return

    logger.info("Prefetching full image at path: ${wrappedImageServer.path}")

    val wholeImageRequest = RegionRequest.createInstance(
      wrappedImageServer.path,
      1.0,
      0,
      0,
      wrappedImageServer.width,
      wrappedImageServer.height
    )
    prefetchedImage = wrappedImageServer.readRegion(wholeImageRequest)
  }

  override fun readRegion(request: RegionRequest?): BufferedImage {
    if (request?.z != 0 || request.t != 0)
      throw IllegalArgumentException("PrefetchedImageServer only supports z=0 and t=0")

    readFullImage()
    return prefetchedImage!!.getSubimage(request!!.x, request.y, request.width, request.height)
  }
Enter fullscreen mode Exit fullscreen mode

This way, we only read the image once, and fetch all subregions from the in-memory image.

Here's the speed-up in the real-world (Google Batch)

Google Batch jobs showing new runtime 2min 14s, and old runtime 12hr 25min

Before (min) After (min) Delta
745 2 -743 min (-99.7%)

In the words of the great Tina Turner: Boom, Shaka Laka.

bioinformatics Article's
30 articles in total
Favicon
Performance trap: general libraries & helper objects
Favicon
Optimizing QuPath intensity measurements: 12.5 hr to 2min
Favicon
Running DeepCell on Google Batch with node pools
Favicon
Exploring the Cutting-Edge of Genome Hacking with Bioinformatics!
Favicon
Download Fasta files in Bash using Nano text editor in 4 simple steps
Favicon
Faster tetranucleotide (k-mer) frequencies!
Favicon
Bioinformatics: PackagesNotFoundError on osx-arm64 Platform
Favicon
Announcing WDL 1.1.1
Favicon
Decoding Life: Navigating the World of Bioinformatics
Favicon
What is Bioconductor in R ?
Favicon
Solution-diffusion model in Rust
Favicon
Meet DNAI: A ML-Based Analysis of DNA
Favicon
BCFtools
Favicon
Website Developer Needed!!!! 🧬💻
Favicon
Calculating tetranucleotide (k-mer) frequencies
Favicon
Get GC Content
Favicon
Tips for scalable workflows on AWS
Favicon
Why I'm hyped about Julia for Bioinformatics
Favicon
Introducing myself
Favicon
Profiling workflows with the Amazon Genomics CLI
Favicon
How to install Gromacs, PyMOL, AutoDock Vina, VMD, MGLTools, Avogadro2, Open Babel in Ubuntu 20.04
Favicon
AlphaFold e a predição de estruturas
Favicon
Boas Vindas
Favicon
Bash Commands for Bioinformatics Beginners: Part 1
Favicon
[pt-BR] Minha jornada de aprendizagem em Python
Favicon
Slideio - an open-source python library for reading of medical images
Favicon
Python for bioinformatics: Getting started with sequence analysis in Python
Favicon
Multiple Sequence alignment (MSA) [C++]
Favicon
Phred quality score
Favicon
Counting sequences in Fasta/Fastq files

Featured ones: