Logo

dev-resources.site

for different kinds of informations.

Let me count the ways (PWC 244 Count Smaller)

Published at
11/21/2023
Categories
perl
perlweeklychallenge
pwc
benchmark
Author
boblied
Author
7 person written this
boblied
open
Let me count the ways (PWC 244 Count Smaller)

TLDR

One pass over sorted data is a better strategy than searching through random data.

The Problem

Perl Weekly Challenge 244, Task 1, Count Smaller asks us to categorize data for its relative values:

You are given an array of integers. Write a 
script to calculate the number of integers 
smaller than the integer at each index.
Enter fullscreen mode Exit fullscreen mode

Example 1

Input: @int = (8, 1, 2, 2, 3)
Output: (4, 0, 1, 1, 3)

For index = 0, count of elements less than 8 is 4.
For index = 1, count of elements less than 1 is 0.
For index = 2, count of elements less than 2 is 1.
For index = 3, count of elements less than 2 is 1.
For index = 4, count of elements less than 3 is 3.

Example 2

Input: @int = (6, 5, 4, 8)
Output: (2, 1, 0, 3)

Example 3

Input: @int = (2, 2, 2)
Output: (0, 0, 0)

Idea 1: Brute force search

The first idea, as usual, is brute force. For every element in the array, scan the array for values that are less than that. An optimization that immediately occurs to me is that the scan for smaller values would be easier if the numbers are sorted; we can quit early and don't have to scan the entire list every time.

sub countSmaller_A($nums)
{
    my @sorted = sort { $a <=> $b } $nums->@*;
    my @smaller = ();
    for my $i ( $nums->@* )
    {
        my $count = 0;
        for my $j ( @sorted )
        {
            last if $j >= $i;
            $count++
        }
        push @smaller, $count;
    }
    return \@smaller;
}
Enter fullscreen mode Exit fullscreen mode

Searching in a sorted list has an optimization. We could find the index of the first element in @sorted that is greater than $i using binary search (or, more realistically, using List::Utils::bsearch_index because my odds of getting a binary search algorithm right are slim). Maybe I'll come back to this, but this is basically an O(n^2) algorithm, so there are probably better ideas to explore.

Idea 2: Frequency counts

The second approach is to count how many times each value occurs, saving a frequency table. Then select from that table all the lesser numbers and add them up. This is a bit of a Perl tour-de-force, using array operations up the wazoo. The code is compact, but the complexity might not be worth it.

sub countSmaller_B($nums)
{
    use List::Util qw/sum0/;
    my %freq;
    $freq{$_}++ for $nums->@*;

    return [ map { my $i = $_; sum0 @freq{ grep { $_ < $i } keys %freq } } $nums->@* ];
}
Enter fullscreen mode Exit fullscreen mode

Building the frequency table is relatively obvious, but what's going on in that last line?

  • return [ ... ] -- the answer is going to be an array reference to the list of numbers
  • return [ map { my $i = $_; ... } $nums->@* ] -- the list of numbers is going to be generated by mapping each of the original numbers to its count of smaller numbers. Each of the original numbers is going to be assigned to $i while we're doing the transformation
  • grep { $_ < $i } keys %freq -- we're going to select all keys from the frequency table %freq where the key is less than the number we're looking at
  • @freq{ grep...} -- Using the list of keys from the grep, take a hash slice from the %freq table. This will give us the counts of each value less than $i.
  • sum0 @freq... -- The sum0 function adds up the values in an array, defaulting to 0 if the array is empty.

Idea 3: Make one pass over sorted data

Let's return for a moment to that sorted array we built back in Idea 1. Once the array is sorted, it's pretty easy to find all the smaller numbers -- it's a monotonically increasing sequence as we scan from left to right.

There are two complications: repeated values, and the fact that the answer has to be in the same order as the original array. Hashes for caches and hash slices are going to come to our rescue.

sub countSmaller_C($nums)
{
    my @sorted = sort { $a <=> $b } $nums->@*;

    my %smaller;
    my $lessCount = 0;
    while ( defined(my $i = shift @sorted) )
    {
        if ( ! exists $smaller{$i} )
        {
            $smaller{$i} = $lessCount;
        }
        $lessCount++
    }

    return [ @smaller{$nums->@*} ];
}
Enter fullscreen mode Exit fullscreen mode

The first thing we do is take the hit of doing the sort, on the premise that we're going to get that performance back with a simpler search loop.

The %smaller variable is a hash that is going to contain our answer, but in an arbitrary order. We will un-arbitrary it at the end with a hash slice that uses the original array for keys -- that will solve our ordering problem.

The $lessCount variable is an accumulator that will count up the number of smaller values as we pass over the sorted array.

The while loop will take elements off the sorted array from left to right. We saw last week that shifting elements off the array can be more efficient than using array indexing. A quick note: the test has to include defined because there could be a zero in the data, and that would test as false.

Within the loop, we use the %smaller array as a cache for values we've already seen. That solves our repeated-values problem.

Time for benchmarking

This is only an exhibition; no wagering, please. As I've been doing for the past week, I brought out the Benchmark module for some simple comparison.

sub runBenchmark($repeat)
{
    use Benchmark qw/cmpthese/;
    my @data = map { int(rand(100)) } 1..100;

    cmpthese($repeat, {
        "simple   " => sub { countSmaller_A(\@data) },
        "frequency" => sub { countSmaller_B(\@data) },
        "one pass " => sub { countSmaller_C(\@data) },
    });
}
Enter fullscreen mode Exit fullscreen mode

The benchmark generates some bigger data than task examples, and then runs the functions thousands of times to get something like a statistically valid result.

And here's the big reveal, sorted from slowest to fastest:

             Rate frequency simple    one pass 
frequency  1339/s        --      -63%      -95%
simple     3647/s      172%        --      -88%
one pass  29268/s     2085%      702%        --
Enter fullscreen mode Exit fullscreen mode

First of all, those compact array operations in Idea 2 look cute, but there's a lot of data access going on underneath. We were better off with using more readable loops from Idea 1.
But, wow! Look at the one pass solution! Literally ten times faster. Better algorithms beat micro-optimizations.

perlweeklychallenge Article's
30 articles in total
Favicon
Maximally Indexed Indices (PWC 298)
Favicon
PWC 296 String Compression
Favicon
PWC 293 Similar Dominos Done Badly
Favicon
Jump, but Don't Break the Game (PWC 295)
Favicon
Ups and Downs, Beginnings and Ends (PWC 297)
Favicon
Consecutive Sequences of Permutations, Anyone? (PWC 294)
Favicon
PWC 293 Similar Dominos Done Badly
Favicon
Domino Frequencies and the Vectorized Boomerang (PWC 293)
Favicon
PWC 287 Strength in Numbers
Favicon
PWC 281 Knight's Move
Favicon
PWC 274 Waiting at the Bus Stop
Favicon
PWC 269 Two of Us Distributing Elements
Favicon
PWC 267 Positively Perl Street
Favicon
PWC 268 Games Numbers Play
Favicon
PWC 263.1 Don't Sort It, Be Happy
Favicon
PWC 262 Grep it once, grep it twice
Favicon
PWC 258 How do I sum thee? Let me count the ones.
Favicon
PWC 259 Something I think's worthwhile if a parser makes you smile
Favicon
PWC 254, Task 2: I, Too, Wish to Make the "Vowel Movement" Joke
Favicon
PWC 254 Task 1, Three Power
Favicon
PWC 252 Special Numbers
Favicon
PWC 246 Random use of algebra
Favicon
Let me count the ways (PWC 244 Count Smaller)
Favicon
PWC 241 Triplets, sorting, and a profiling digression
Favicon
PWC 242 Flip the Script
Favicon
PWC 243 Sweeping the floor
Favicon
PWC 238 Running and Persistence
Favicon
PWC 239 Adagio for frayed knots
Favicon
PWC 235 Steppin' in a Slide Zone
Favicon
PWC 236 Task 1 A Change Would Do You Good

Featured ones: