dev-resources.site
for different kinds of informations.
Optimizing QuPath intensity measurements: 12.5 hr to 2min
Spatial biology analyzes tissue sample images to derive patterns and data. A key first step is identifying cells on the image and gathering quantitative measurements about those cells.
In our ongoing work scaling DeepCell on GCP Batch, we'd previously gotten pretty efficient at the first part: segmenting the image into cells. But we hit a major performance roadblock for the next step: generating quantitative measurements.
The measurements are fairly straightforward:
- size of each cell (convert pixels in each detected cell to physical dimensions, assuming some number of microns per pixel)
- pixel intensity of each cell
Of note, for a ~140M pixel image, it took about 12.5 hours (‼️) to measure the detected cells. That's … not great 😩 What the heck?? We're just counting number of pixels, and pixel values. An HD image is ~2 M pixels, and computers (and TVs & phones) render >30 of those per second.
Profiling to the rescue. The great thing about JVM code is that it's extremely easy to profile. Just click "profile" instead of "run".
Here's the resulting flamegraph.
Of note, 99.9% of adding intensity measurements–84% of the total time–is spent simply reading the image.
OK: so we need to not read the image repeatedly. In our case, the entire image can (for now) fit into RAM. If only we could simply prefetch the image, then read regions out of that in-memory image.
Sounds like a great use case for the Proxy pattern. We need an ImageServer
that behaves just like the original image server, except, it reads from an in-memory image not from disk (or wherever the wrapped server reads).
The resulting code is quite simple. Here's the pull request. We override the abstract ImageServer
, wrapping another ImageServer
and forwarding all methods to the original.
UPDATE 2024-09-10: Thanks to Adrián Szegedi (GitHub HawkSK) the code is even simpler (PR#42): no need to explicitly forward methods. Instead we use Kotlin's delegation syntax which implicitly forwards non-overridden methods. This removes 100 lines of boilerplate 💪🏻
The one non-forwarded method is the core operation: reading a region.
That one turns into extracting the region from the entire (prefetched) image:
private fun readFullImage() {
if (prefetchedImage != null)
return
logger.info("Prefetching full image at path: ${wrappedImageServer.path}")
val wholeImageRequest = RegionRequest.createInstance(
wrappedImageServer.path,
1.0,
0,
0,
wrappedImageServer.width,
wrappedImageServer.height
)
prefetchedImage = wrappedImageServer.readRegion(wholeImageRequest)
}
override fun readRegion(request: RegionRequest?): BufferedImage {
if (request?.z != 0 || request.t != 0)
throw IllegalArgumentException("PrefetchedImageServer only supports z=0 and t=0")
readFullImage()
return prefetchedImage!!.getSubimage(request!!.x, request.y, request.width, request.height)
}
This way, we only read the image once, and fetch all subregions from the in-memory image.
Here's the speed-up in the real-world (Google Batch)
Before (min) | After (min) | Delta |
---|---|---|
745 | 2 | -743 min (-99.7%) |
In the words of the great Tina Turner: Boom, Shaka Laka.
Featured ones: