Logo

dev-resources.site

for different kinds of informations.

Hungary for the Power: A Closer Look at Hungarian Notation

Published at
2/13/2022
Categories
elm
programming
functional
Author
John Pavlick
Categories
3 categories in total
elm
open
programming
open
functional
open
Hungary for the Power: A Closer Look at Hungarian Notation

Few concepts within the Global Programming Mindshare are as universally-reviled as Hungarian notation. Everybody Knows that it's completely unnecessary, aesthetically displeasing, and considered harmful1. If you're not familiar, the idea is generally interpreted as follows:

Hungarian notation is a naming convention that adds a prefix to each variable name, so as to encode its type in its name.

And it's generally accepted that examples look like this:

string str_firstName;

int i_ageInYears;

bool b_enabled;

Most people are happy to dismiss this immediately; to consign it to the trashbin of Universally Acknowledged Bad Programming Thoughts, right along with GOTO and Internet Explorer 8. And at first glance, it's not hard to see why; programmers are, by virtue, lazy. There's no compelling reason to annotate a variable's type by encoding it in its name - that's what we have compilers for! If you're dumb enough to type int result = firstName + ageInYears;, you deserve whatever abuse you get from your compiler. It should be obvious that those two things are probably not the Same Kind of Thing. Right?

You'd probably think so, but I'm nothing if not the sworn enemy of "What Everyone Knows"; so come now, let us reason together.

I'm going to start by quoting the whitepaper, because it doesn't seem like anybody else has read it.

...the concept of "type" in this context is determined by the set of operations that can be applied to a quantity. The test for type equivalence is simple: could the same set of operations be meaningfully applied to the quantities in questions? If so, the types are thought to be the same. If there are operations that apply to a quantity in exclusion of others, the type of the quantity is different.

Note that the above definition of type (which, incidentally, is suggested by languages such as SIMULA and Smalltalk) is a superset of the more common definition, which takes only the quantity's representation into account. Naturally, if the representations of x and y are different, there will exist some operations that could be applied to x but not y, or the reverse.2

"The whitepaper", as it happens, was written by one Charles Simonyi. You may not have heard of him, but you're no doubt familiar with his work; he is the mind behind Microsoft Word and Microsoft Excel3. He's probably on a yacht right now4, and he is almost definitely smarter than you are.

Regardless, I'm hard-pressed to think of a more correct definition for a type. Most software developers I've talked to think that a "type" is a "thing" like an int, a string, or a bool, that defines a set of operations over a given category of values; and while they're not wrong5, I can't say that they're right, either - because that's too broad a definition. "What's an ocean?" "Oh, it's a thing you can drown in." Sure, it's not a false statement; but by that reasoning, I have an ocean in my bathroom.

If you don't understand why this is a big deal, let's outline a simple example. There's a difference, you know, between an int value storing someone's age, and an int value storing a for loop's index value. The compiler doesn't know about it, because if you try to multiply someone's age by an iterator's index, as long as they're both ints, the compiler will be happy; but will your program be correct?

Just because an operation is possible, that doesn't mean that it's correct.

So anyway - Simonyi goes on to suggest some naming conventions to give programmers the tools to help themselves avoid this footgun - to make the representations of x and y different, so as to discourage performing valid-but-nonsensical operations to them.

Let's look at another example - let's think about Microsoft Excel for a moment. In a spreadsheet, your two most important primitives are "Row" and "Column". If you needed to implement a spreadsheet in C, you'd probably start by defining a multi-dimensional array - rows of columns - to hold your data, with a cell represented by the the data stored at the coordinate of the indices for the row and the column. Since you can access an item in an array by its index, this means that you can write a function with a signature like void clearCellData(int rowIndex, int columnIndex). And it's great - but what happens when you transpose your inputs for the row and column? What happens when you write a function void clearRow(int rowIndex) but you accidentally pass in a column index? The compiler is as pleased as ever, and you won't even get an error at runtime (unless it's for a null reference) - but you've deleted every cell in the wrong row.6

If you'd been careful to follow Hungarian Notation as outlined in the whitepaper, you'd be much less likely to make that mistake; if every int representing a "Row" was prefixed with row_, and if every int representing a "Column" was prefixed with col_, you'd be much less likely to call clearRow(col_selected). And that, my friends, is the pragmatism behind the reasoning for Hungarian Notation. Nowhere in the whitepaper will you find mention of prefixing strings with str_ and ints with i_, solely on the basis that they're strings and ints.

Well now, that certainly doesn't sound so bad, does it?

The trouble, it seems, is that everybody else got it wrong.

Spend enough time7 researching Hungarian Notation, and you'll come across the name Charles Petzold. This particular Charles was also an Important Programmer at Microsoft, and in his book "Programming Windows", he explains it thusly:

Many Windows programmers use a variable-naming convention known as "Hungarian Notation," in honor of the
legendary Microsoft programmer Charles Simonyi. Very simply, the variable name begins with a lowercase letter or letters that denote the data type of the variable. For example, the sz prefix in szCmdLine stands for "string terminated by zero." The h prefix in hInstance and hPrevInstance stands for "handle;" the i prefix in iCmdShow stands for "integer."8

And that's it! That's his whole statement on the topic. Huh. I guess Other Charles didn't read the whitepaper. Unfortunately, this is the version of the idea that has found widespread adoption. The important difference here is that the idea behind Hungarian Notation concerns itself with semantics, not with the way that the data is handled by the compiler or its output.

If you've been following along, you'll note that earlier, I didn't say, "The naming convention was the reasoning behind Hungarian Notation"; that was merely the pragmatism that drove it. The reasoning behind Hungarian notation should be familiar to every Elm developer that's ever been force-fed Richard Feldman talks in the Elm Slack: making impossible states impossible9.

The goal was never to make sure that you knew that an int was an int; it was to give you better tools to restrict your program in such a way that it only ever performed computations that were semantically valid.

Wait, so you're certainly not suggesting...

... that you go out and prefix all of your program's values with semantic information? Well, it certainly couldn't hurt - but in the forty-plus years that have passed since these ideas were introduced, we've built better tools and thought about these problems a lot more. We're better, and we can do better.

We have Elm, and Elm has custom types, and with custom types we can implement domain-specific types as easily as we can breathe. The Elm Patterns book briefly outlines this in the section about Type Blindness10, and Joel Quenneville has an excellent demonstration of the pattern applied at this footnote's link11, but I'll summarize it here.

In the context of our earlier example - given that we had a two-dimensional array that we needed to access by its indices12, we could wrap the Int values with custom types, like so:

type RowIndex = RowIndex Int

type ColIndex = ColIndex Int

All of a sudden, we have compile-time verification for our program's semantics.

We can keep going! Wrapping primitives means that we have to unwrap them (i.e., some form of (\(RowIndex index) -> index)) when we need to use the primitive value, which is a little extra work - but it's worth it, because now we have to think a little harder about what kinds of operations we're going to need to perform on our values. We have now shifted into the realm of considering our program's semantics, which means that we are now motivated to write functions to use our values. Consider:

nextRowIndex : RowIndex -> RowIndex
nextRowIndex (RowIndex index) =
    RowIndex (index + 1)

prevRowIndex : RowIndex -> RowIndex
prevRowIndex (RowIndex index)
    RowIndex (index - 1)

Now, if we accidentally pass a ColIndex to nextRowIndex - we won't get back a semantically-incorrect value - our code won't compile. You won't get that with a function signature that uses all primitives, such as nextRowIndex : Int -> Int.

This is powerful. By including a value's semantics in the definition of the type used to store and refer to that value, we can use our compiler to verify our program's correctness - and with Elm, we have the tools to do just that. This, truly, is the spirit in which Hungarian Notation was developed and presented; and that spirit lives on, every time an Elm developer writes type Dollar = Dollar Int.

In conclusion, Elm is Hungarian Notation. Thank you for your time. No - stop - where are you going? I thought we were FRIENDS

Oh, and Mr. Simonyi, if you need an Elm developer on that yacht, Microsoft has my number on file from the number of times they've coerced me into giving it to them; call me anytime.

  1. http://catb.org/jargon/html/C/considered-harmful.html ↩

  2. https://docs.microsoft.com/en-us/previous-versions/visualstudio/visual-studio-6.0/aa260976(v=vs.60)#type-calculus ↩

  3. https://en.wikipedia.org/wiki/Charles_Simonyi#Microsoft ↩

  4. https://en.wikipedia.org/wiki/Charles_Simonyi#Personal_life ↩

  5. https://rationalwiki.org/wiki/Not_even_wrong ↩

  6. The Joel talks about this, here, and his prose is much nicer than mine: https://www.joelonsoftware.com/2005/05/11/making-wrong-code-look-wrong/ - but he still misses the point that I intend to make in this essay. I'm getting close to it, I promise. ↩

  7. Five minutes, max ↩

  8. The top search result for "charles petzold programming windows online" was this sketchy public Google Drive Link, which you may download at your own risk and peril: https://docs.google.com/file/d/0B73JwvIHVHaiSFdpekJCOUdoeE0/view?resourcekey=0-JFez95uS9jETJQdFGKY_1w ↩

  9. https://www.youtube.com/watch?v=IcgmSRJHu_8 ↩

  10. https://sporto.github.io/elm-patterns/basic/type-blindness.html ↩

  11. https://thoughtbot.com/upcase/videos/domain-specific-types-in-elm ↩

  12. I know that accessing an array's value by index isn't very Elm-y, but it fits the illustration; assume that it's for interop with some dirty old Javascript, if that makes you feel better ↩

Featured ones: