dev-resources.site
for different kinds of informations.
Poor man's parallel in Bash
Original post on my blog, happy to include feedback!
Cover: Paris bibliothéques, via Clawmarks
Topics:
- Running scripts in parallel
- Tools to limit concurrent jobs
- Shell process handling cheatsheet
- Limit concurrent jobs with Bash
- Bonus: One-liners
1) Running scripts in parallel
does not take much effort: I've been speeding up builds by running commands simultaneously with an added &
ampersand:
# stuff can happen concurrently
# use `&` to run in a sub-shell
cmd1 &
cmd2 &
cmd3 &
# wait on sub-processes
wait
# these need to happen sequentially
cmd3
cmd4
echo Done!
Job control is a shell feature: commands are put into a background process and run at the same time.
Now assuming you want to loop over more than a few commands, e.g. converting files:
for file in *.jpg; do
# start optimizing every file at once
jpegoptim -m 45 "${file}" &
done
# finish queue
wait
Running a lot of processes this way is still faster than a regular loop. But compared to just a few concurrent jobs there are no speed gains – even possible slowdowns on async disk I/O [Quotation needed].
So you'll want to use
2) Tools to limit concurrent jobs
by either 1) installing custom tools like parallel or xjobs or 2) relying on xargs, which is a feature-rich tool but more complicated.
Transforming wait
to xargs
code is described here: an example for parallel batch jobs. The article notes small differences between POSIX flavours – e.g. different handling of separators on BSD/MacOS.
We'll be choosing option 3) – digging into features of wait
and jobs
to manage processes.
Quoting this great summary, here are some example commands for
3) Shell process handling
# run child process, save process id via `$!`
cmd3 & pid=$!
# get job list
jobs
# get job ids only
# note: not available on zsh
jobs -p
# only wait on job at position `n`
# note: slots may turn up empty while
# newer jobs rest in the queue's tail
wait %n
# wait on last job in list
wait %%
# wait on next finishing process
# note: needs Bash 4.3
wait -n
Taking our example from before, we make sure to
4) Limit concurrent jobs with Bash
each time a process is finished using wait -n
:
for file in *.jpg; do
jpegoptim -m 45 "${file}" &
# still < 3 max job -l ines? continue loop
if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi
# with 3 jobs, wait for -n ext, then loop
wait -n
done
# finish queue
wait
Sadly, this won't work in MacOS, because Bash environments are frozen on old versions. We replace the wait -n
command with wait %%
to loop on the 3rd/last job in queue – an ok compromise on small groups (1/3 chance of fastest/slowest/medium job):
for file in *.jpg; do
jpegoptim -m 45 "${file}" &
# still < 3 max job -l ines? continue loop
if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi
# with 3 jobs, wait for last in line, then loop
wait %%
done
# finish queue
wait
To further develop the code, one could check for Bash version or alternative shells (zsh on MacOS) to switch code depending on context. I keep using these:
5) Bonus: One-liners
# sequential, slow
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" ; done )
# concurrent, messy
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & done; wait )
# concurrent, fast/compatible
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait %%; done; wait )
# concurrent, fastest
time ( for file in *.jpg; do jpegoptim -m 45 "${file}" & if [[ $(jobs|wc -l) -lt 3 ]]; then continue; fi; wait -n; done; wait )
Fun Fact
As the 20th birthday post by parallel
author Ole Tange explains, the original version was leveraging make
because it allows asynchronous processes as well.
Featured ones: