Logo

dev-resources.site

for different kinds of informations.

How to Perform a Reverse Dictionary Lookup in Python: Generator Expressions and More

Published at
6/23/2020
Categories
performance
testing
python
Author
Jeremy Grifski
Categories
3 categories in total
performance
open
testing
open
python
open
How to Perform a Reverse Dictionary Lookup in Python: Generator Expressions and More

Welcome to yet another Python tutorial. Today, we’re taking a look at dictionaries and how we can perform a reverse dictionary lookup. In words, how do we get a key from a dictionary given a value?

As it turns out, there are three main solutions. First, we could try explicitly looping over the dictionary using something like my_dict.items(). Alternatively, we could create a generator expression: next(key for key, value in my_dict.items() if value == value_to_find). Finally, we could invert the dictionary completely to retrieve the key like normal.

Problem Introduction

Awhile back, I wrote an article about how to invert a dictionary. In other words, how do we swap keys and values in a dictionary? Well, as it turns out, sometimes we don’t need to flip an entire dictionary. All we need is a key given a value.

Normally when we use a dictionary, we pass it a key to retrieve a value. But, what if we want to retrieve a key given a value? In other words, what if we want to perform a reverse dictionary lookup. For example, given the following dictionary, we might want to retrieve the first key that matches the value “red”:

my_dict = { 
  "color": "red", 
  "width": 17, 
  "height": 19
}

In this case, we would expect our solution to return “color”. Of course, there might be multiple keys that match. How do we decide which one to grab?

Luckily, we won’t be digging into the nuance in this article. Instead, we’ll be looking at a handful of solutions that return the first key or every key that matches the value.

Solutions

In this article, we’ll take a look at a few ways to perform a reverse ditionary lookup. As always, we’ll kick things off with a brute force solution. Then, we’ll look at some more sophisticated solutions.

Reverse Dictionary Lookup by Brute Force

Perhaps a straightforward way of solving this problem is to iterate over the dictionary until we find the value we’re looking for:

my_dict = {"color": "red", "width": 17, "height": 19}
value_to_find = "red"
for key, value in my_dict.items(): 
  if value == value_to_find: 
    print(f'{key}: {value}')

In this case, we’re searching the dictionary for the value “red”. During each iteration, we’ll check if the value we’re looking for matches the current value. If it does, we print the results.

If we copy this solution verbatim, it will actually spit out all the matching keys. In this case, we’ll only see “color: red”. That said, a larger dictionary could yield duplicates.

At any rate, there are plenty of more interesting solutions ahead!

Reverse Dictionary Lookup Using a Generator Expression

Instead of looping over our dictionary explicitly, we could leverage a generator expression (PEP 289) which looks a lot like a list comprehension:

my_dict = {"color": "red", "width": 17, "height": 19}
value_to_find = "red"
key = next(key for key, value in my_dict.items() if value == value_to_find)
print(f'{key}: {value_to_find}')

Naturally, the difference between a list comprehension and a generator expression is that there’s no list created. In other words, we save memory and possibly time.

In the example above, instead of generating a list of all of the key-value pairs and iterating over them, we repeatedly generate a new key-value pair until we find one that matches. This clever bit of code is basically a condensed version of our loop from our brute forced solution. Of course, the iteration stops when we find what we need.

Again, be aware that this solution will only return the first key that matches our lookup value. If we wanted more than one key, we’d have to store the generator expression:

exp = (key for key, value in my_dict.items() if value == value_to_find)
next(exp) # First matching key
next(exp) # Second matching key

If we call next more times than there are matches, we get a StopIteration error. As a workaround, we can use a for-each loop directly:

exp = (key for key, value in my_dict.items() if value == value_to_find)
for key in exp: 
  print(key)

Now, isn’t that nice?

Reverse Dictionary Lookup Using an Inverse Dictionary

As I mentioned in the problem description, we can always completely flip the dictionary:

my_dict = {"color": "red", "width": 17, "height": 19}
value_to_find = "red"
my_inverted_dict = {value: key for key, value in my_dict.items()}
key = my_inverted_dict[value_to_find]

If you haven’t had a chance to read the other article, basically this solution takes advantage of a dictionary comprehension. In other words, it constructs a new dictionary from the original dictionary. Naturally, the part that does the magic is value: key which reverses the mapping.

Unfortunately, this solution won’t work for every circumstance because not all values are hashable (e.g. lists), but it gets the job done. Likewise, it only saves the last key for any duplicate values. As a result, other possible keys are lost.

If we want a solution that generates a list of keys, we can do something like the following:

my_dict = {"color": "red", "width": 17, "height": 19}
value_to_find = "red"
my_inverted_dict = dict()
for key, value in my_dict.items(): 
  my_inverted_dict.setdefault(value, list()).append(key)
keys = my_inverted_dict[value_to_find]

In this example, we end up with a list of keys rather than a single key.

Performance

As always, let’s take a look at the performance of each of these solutions. First, we’ll need to set them up in strings:

setup = """
my_dict = {"color": "red", "width": 17, "height": 19}
value_to_find = "red"
"""

brute_force_single = """
for key, value in my_dict.items(): 
  if value == value_to_find: 
    break
"""

brute_force_multi = """
for key, value in my_dict.items(): 
  if value == value_to_find: 
    pass
"""

generator_single = """
next(key for key, value in my_dict.items() if value == value_to_find)
"""

generator_multi = """
exp = (key for key, value in my_dict.items() if value == value_to_find)
for key in exp:
  pass
"""

inverse_single = """
my_inverted_dict = {value: key for key, value in my_dict.items()}
my_inverted_dict[value_to_find]
"""

inverse_multi = """
my_inverted_dict = dict()
for key, value in my_dict.items(): 
  my_inverted_dict.setdefault(value, list()).append(key)
my_inverted_dict[value_to_find]
"""

For the sake of completeness, I adapted each solution to each possible scenario. Either we want a single key, or we want many keys. As a result, each test is labeled single or multi, respectively.

In terms of testing, here are the results:

>>> import timeit
>>> min(timeit.repeat(setup=setup, stmt=brute_force_single))
0.19409550000000309
>>> min(timeit.repeat(setup=setup, stmt=brute_force_multi))
0.3046430999997938
>>> min(timeit.repeat(setup=setup, stmt=generator_single))
0.6223289999998087
>>> min(timeit.repeat(setup=setup, stmt=generator_multi))
0.6531434000003173
>>> min(timeit.repeat(setup=setup, stmt=inverse_single))
0.5350638999998409
>>> min(timeit.repeat(setup=setup, stmt=inverse_multi))
1.2309030999999777

Weirdly enough, the generator expression solution is actually quite slow. Perhaps, there’s a bit of overhead with creating a generator expression. I was interested to see how this solution scales with larger dictionaries, so I updated the setup string and reran my tests:

>>> setup = """
my_dict = {"color": "red", "width": 17, "height": 19, "health": 15, 
"depth": 100, "direction": "north", "material": "metal", "power": 17, 
"strength": 17, "weight": 111, "x": 0, "y": 0, "z": 0, 
"song": "Madeline", "band": "The Wonder Years", 
"friend": "rupert"}
value_to_find = "red"
"""
>>> min(timeit.repeat(setup=setup, stmt=brute_force_single))
0.18737550000059855
>>> min(timeit.repeat(setup=setup, stmt=brute_force_multi))
0.9153716000000713
>>> min(timeit.repeat(setup=setup, stmt=generator_single))
0.5850626999999804
>>> min(timeit.repeat(setup=setup, stmt=generator_multi))
1.2661715000003824
>>> min(timeit.repeat(setup=setup, stmt=inverse_single))
1.4036990000004153
>>> min(timeit.repeat(setup=setup, stmt=inverse_multi))
5.085829500000727

Again, I was a bit bothered by the results, so I tried changing the value we were searching for:

>>> setup = """
my_dict = {"color": "red", "width": 17, "height": 19, 
"health": 15, "depth": 100, "direction": "north", 
"material": "metal", "power": 17, "strength": 17, 
"weight": 111, "x": 0, "y": 0, "z": 0, "song": 
"Madeline", "band": "The Wonder Years", 
"friend": "rupert"}
value_to_find = "The Wonder Years"
"""
>>> min(timeit.repeat(setup=setup, stmt=brute_force_single))
0.8808984999996028
>>> min(timeit.repeat(setup=setup, stmt=brute_force_multi))
0.9333926999997857
>>> min(timeit.repeat(setup=setup, stmt=generator_single))
1.303262800000084
>>> min(timeit.repeat(setup=setup, stmt=generator_multi))
1.295239500000207
>>> min(timeit.repeat(setup=setup, stmt=inverse_single))
1.3928389000002426
>>> min(timeit.repeat(setup=setup, stmt=inverse_multi))
5.030787800000326

Again, brute force has the best performance. When I looked into why, I found that there’s a bit of overhead as I suspected. If I had the time, I’d probably run each of this solutions through cProfiler as outlined in my performance article. That said, I’ll defer to the responses in this Stack Overflow thread.

Overall, it looks like each solution performs in the order they were presented. In other words, brute force is slightly faster than a generator expression. Meanwhile, flipping the dictionary can be extremely costly.

Challenge

With all the fun stuff out of the way, let’s take a look at your challenge. Since covering the reverse dictionary lookup, I thought it would be fun to challenge you with the following:

Look at all three solutions above (or 6 if you include the various requirements). Can you break down exactly why each solution performs the way it does? In other words, can you explain the differences in performance between each solution? Why would looping over a dictionary be faster than using a generator expression? Why wouldn’t flipping the dictionary be fastest?

As I alluded to previously, you can use any tools at your disposal to support your reasoning. For instance, you might try using cProfile to exam the inner workings of each solution. Likewise, you might try running various tests like I did with timeit. Perhaps a plot of each solution under different workloads would help you figure out asymptotic runtimes.

Maybe, you don’t want to run any empirical testing tools at all. Instead, you want to look directly at the source code and trace what work it has to do to accomplish our task. Whatever you choose to do, make sure you share your results on Twitter using the hashtag #RenegadePython!

In case you’re wondering, I kicked things off with a quick execution of cProfile.run() on our brute_force_single solution:

// Detect dark theme var iframe = document.getElementById('tweet-1251974523018260486-234'); if (document.body.className.includes('dark-theme')) { iframe.src = "https://platform.twitter.com/embed/Tweet.html?id=1251974523018260486&theme=dark" }

I wonder what the other solutions look like under the hood!

A Little Recap

And with that, we’re done! Here’s all the solutions from this article in one place:

my_dict = {"color": "red", "width": 17, "height": 19}
value_to_find = "red"

# Brute force solution (fastest) -- single key
for key, value in my_dict.items(): 
  if value == value_to_find: 
    print(f'{key}: {value}') 
    break

# Brute force solution -- multiple keys
for key, value in my_dict.items(): 
  if value == value_to_find: 
    print(f'{key}: {value}')

# Generator expression -- single key
key = next(key for key, value in my_dict.items() if value == value_to_find)
print(f'{key}: {value_to_find}')

# Generator expression -- multiple keys
exp = (key for key, value in my_dict.items() if value == value_to_find)

for key in exp:
  print(f'{key}: {value}')

# Inverse dictionary solution -- single key
my_inverted_dict = {value: key for key, value in my_dict.items()}
print(f'{my_inverted_dict[value_to_find]}: {value_to_find}')

# Inverse dictionary solution (slowest) -- multiple keys
my_inverted_dict = dict()
for key, value in my_dict.items(): 
  my_inverted_dict.setdefault(value, list()).append(key)
print(f'{my_inverted_dict[value_to_find]}: {value_to_find}')

With all that out of the way, it’s time for me to ask you for a little help! Specifically, I’d love it if you hopped on my mailing list or even became a patron. In addition, I’m trying to grow my YouTube channel, so head on over and subscribe.

If you have the time, I’d appreciate it if you stuck around to check out some of these related articles:

Otherwise, thanks for stopping by! I appreciate it.

The post How to Perform a Reverse Dictionary Lookup in Python: Generator Expressions and More appeared first on The Renegade Coder.

Featured ones: