dev-resources.site
for different kinds of informations.
PWC 243 Sweeping the floor
Perl Weekly Challenge 243 had some more exercises in pair-wise iteration over arrays. While I was still contemplating it, James Curtis-Smith posted one of his characteristically code-dense solutions, and it prompted me to rethink what I was doing. Let's have a look.
Task 1, Reverse Pairs
You are given an array of integers.
Write a script to return the number of
reverse pairs in the given array.
A reverse pair is a pair (i, j) where:
a) 0 <= i < j < nums.length
and b) nums[i] > 2 * nums[j].
Example 1
Input:
@nums = (1, 3, 2, 3, 1)
Output:2
(1, 4) => nums[1] = 3, nums[4] = 1, 3 > 2 * 1
(3, 4) => nums[3] = 3, nums[4] = 1, 3 > 2 * 1
Example 2
Input: @nums = (2, 4, 3, 5, 1)
Output: 3
(1, 4) => nums[1] = 4, nums[4] = 1, 4 > 2 * 1
(2, 4) => nums[2] = 3, nums[4] = 1, 3 > 2 * 1
(3, 4) => nums[3] = 5, nums[4] = 1, 5 > 2 * 1
Thoughts
Well, that's oddly specific, but okay. This is the typical two-index march over an array, finding all the pairs. It's simplified by specifying that i < j
, so it's not really all pairs, but just the half. We've seen this many times in challenges, and the stereotypical solution is a nested loop, where i
goes over all the indices, and j
moves from i+1
to the end.
for ( my $i = 0 ; $i < $#nums ; $i++ )
{
for ( my $j = $i + 1 ; $j <= $#nums ; $j++ )
{
$count++ if $nums[$i] > 2 * $nums[$j];
}
}
This is clean and obvious and has analogous code in most procedural languages with array indexing. But that indexing sure is annoying. It would be nice to deal with the values directly, wouldn't it?
A characteristic of this loop is that as we march along with i
, we use that element once and then we're done with it. The j
index is moving along the elements to the right of i
. For all practical purposes, it's as if the array was getting smaller each time.
So let's make it smaller each time. Instead of indirectly getting the i
-th elements, let's just take it off the array:
while ( my $i = shift @nums ) { ... }
Now we have the next value in $i
, and the remainder of the array in @nums
. To process the rest, we can iterate over the elements directly, without having to access via index. With that, we get simpler code with many fewer brackets.
sub reversePairs(@nums)
{
my $count = 0;
while ( my $i = shift @nums )
{
for my $j ( @nums )
{
$count++ if $i > 2 * $j;
}
}
return $count;
}
It looks better, and Lisp programmers will be feeling some warm fuzzy car
and cdr
vibes, but what about efficiency? Is there a hidden cost in what looks like array modification? As a matter of fact, the shifting solution is much better for performance. Here's a sample Benchmark
comparison:
Rate basic shifting
basic 3125/s -- -44%
shifting 5556/s 78% --
There is one thing to be careful of: this idiom destroys the array while operating on it. That's not much of a problem here, because the array is passed by value. If we were passed an array reference, however, and shifted values off that, the calling function might be surprised to find itself with an empty array.
Task 2, Floor Sum
You are given an array of positive integers (>=1).
Write a script to return the sum of
floor(nums[i] / nums[j])
where 0 <= i,j < nums.length.
The floor() function returns the integer
part of the division.
Example 1
Input:
@nums = (2, 5, 9)
Output:10
floor(2 / 5) = 0 floor(2 / 9) = 0 floor(2 / 2) = 1
floor(5 / 9) = 0. floor(5 / 5) = 1 floor(5 / 2) = 2
floor(9 / 9) = 1 floor(9 / 2) = 4. floor(9 / 5) = 1
Example 2
Input:
@nums = (7, 7, 7, 7, 7, 7, 7)
Output:49
Thoughts
Once again, we have a problem that searches for pairs in an array. This time, however, we have a random array, so we need to consider every possible pair. However, there is a curious kind of anti-symmetry: the floor function will be zero whenever the numerator is less than the denominator.
An aside on the floor function: the task is making it easy on us by specifying the array as containing positive integers. Not only can we skip over division-by-zero worries, that means the floor function is equivalent to integer truncation of the division, int(x/y)
. If we really need to handle the more general case of floating point and negative numbers, there has been a floor function in Perl for a long time in the POSIX
module, and more recently, there is now a version among the experimental builtins:
use builtin qw/floor ceil/; no warnings "experimental::builtin";
Alright, so let's start with obvious solutions. Taking the problem description literally, let's loop over all pairs of i
and j
and find the sum.
sub floorSum(@nums)
{
my $sum = 0;
for ( my $i = 0 ; $i <= $#nums ; $i++ )
{
for ( my $j = 0 ; $j <= $#nums ; $j++ )
{
$sum += int($nums[$i]/$nums[$j]);
}
}
return $sum;
}
That works, but leaves me a little queasy about the O(n^2) performance of that double loop. Could we do better? We could, if the array was sorted in descending order. In that case, the floor function would only give us non-zero numbers when the j
index is less than the i
index -- this will give us a problem similar to Task 1. Well, almost. Think about example 2, where all the values are equal. We can't really move j
only to the right of i
-- we have to consider the possibility of equal values. This leads to a solution like this:
sub fs(@nums)
{
my @N = sort { $b <=> $a } @nums;
my $sum = 0;
my $jstart = 0;
for ( my $i = 0 ; $i <= $#N ; $i++ )
{
$jstart++ while ( $N[$jstart] > $N[$i] );
for ( my $j = $jstart ; $j <= $#N && $N[$i] >= $N[$j] ; $j++ )
{
$sum += int( $N[$i] / $N[$j] );
}
}
return $sum;
}
There's a sort step in here, which could be kind of expensive, but there will be many fewer floor
operations. Does it pay off? Yes, it does. Sorting is probably O(n
logn
), and the reduction of the loop will mean the number of operations is approximately in the neighborhood of sum-from-1-to-n. Instead of n
^2 operations we'll have (n)(n+1)/2, plus the cost of the sort. If n
is 100, n
^2 is 10,000 operations, but the sort is about 200, and the double loop is about 5000. We'll benchmark later, but this seems worth doing.
Now back to that thought from James Curtis-Smith. All those square brackets and indirection are mental clutter. Could we access the array directly as we did in Task 1? James points out how to do it, and gives a concise and minimal solution. Credit for the answer is his, but here's my version of it, made a bit more readable.
sub fs_s(@nums)
{
my $sum = scalar @nums;
while ( my $i = shift @nums )
{
for my $j ( @nums )
{
$sum += int($i/$j) + int($j/$i);
}
}
return $sum;
}
What's going on? First of all, the floor will always be 1 when i
=j
, so we start with the sum being at least the length of the array. Then, as before, we can take each i
value and apply it against the remainder of the array. We want the contributions of each pair i,j and j,i but one of them will be zero when the floor function is applied. Since were graced with having no zeroes or negative values, we can use simple integer division to get our sum.
So, was all that worth it? Here's a Benchmark comparison, showing the four versions from slowest to fastest. basic
is the brute-force double loop; sort
is the first optimization with a sort step; golf
is James' version; and ungolf
is my version, more readable and optimizing the integer division instead of using POSIX::floor
Rate basic sort golf ungolf
basic 885/s -- -18% -49% -62%
sort 1079/s 22% -- -37% -53%
golf 1724/s 95% 60% -- -25%
ungolf 2308/s 161% 114% 34% --
I will note that I did another version that passed an array reference instead of copying the array by value, but it was hard to benchmark because the versions that shift values destroy the array -- as noted above -- and therefore the repeated calls in the benchark loop don't work. I had to recopy the array to make the benchmark work, and that defeats the purpose of using a reference.
Once again, letting Perl do the annoying bookkeeping work of iterating over arrays pays off, not just in simplicity of code, but in performance.
Featured ones: