dev-resources.site
for different kinds of informations.
In Defence of the Long-ish Function
I've recently been getting into Threads and I came across a Blog post from See Ash Code. She says she considers herself a hacker rather than a pro and came across some common advice that almost all beginners will hear at some point in their coding journey from @codingcossack (sorry the link doesn't work, threads is kind of... MVP right now).
The advice? Keep functions short: Each function should do one thing well.
Classic. π€
But it's interesting too because I remember getting this advice and being the unhealthily compulsive perfectionist I can be sometimes, I took it to heart. I would spend hours trying to figure out where I should separate out my code. As Ash noted in her post:
what exactly was short, and what exactly was one thing?
It took me years to figure out something that in retrospect is kind of obvious...
Sorry for the shitty meme, I was weak.
The Problems with Large Functions
So why is this advice so prevalent? Because it's not without merit.
Large Functions Can be Hard to Understand
I once got handed a custom PHP "CMS" where just the startup function was a 2000 line long mess of spaghetti code and was asked to add new features. In the end we had to rewrite it.
I don't think separating out that code into 200 to 400-odd functions would have fixed it but I would have al least appreciated the effort.
Can't be Easily Reused or Tested
If you have a small unit of code, it's more likely that you can use it somewhere else without it doing something else you don't want it to do.
Also the smaller the unit of code you have to test, the less things you have to test about it.
Bad arguments against large functions
- Often contain multiple responsibilities
- Prone to bugs and hard to maintain
I've already addressed these with being hard to understand and can't be reused. Both are the same fundamental issue when you think about why they are bad.
The Problems with Small Functions
This is the dirty secret of small functions is that they have their own costs and they often get overlooked because because small, clean functions look prettier. They are more aesthetically pleasing.
Levels of Indirection
Indirection is when in order to get to something, you have to go through something else.
A long function just containing loops and if statements has one level of indirection. A short function where you hide everything behind a well-named function is easy to understand but when it comes to debugging in a complex application you have to keep that map of the program flow in your head. Plus you can get the benefits of the well-named functions by sprinkling in a couple of comments on the non-trivial code.
Speaking of hiding things...
Hiding Complexity
That code you hid behind that abstraction is still there. You didn't make your app simpler you just hid the code behind an abstraction. Not to say abstractions are bad, making abstractions pretty much our entire job after all but if it was inline and you had a problem (unclosed db handler perhaps) as you were passing through you would be more likely to spot it.
Time Spent on Solutioning
Let's say I have a function that reads something from the file system, does a bit of processing and writes to the database. Three responsibilities so three functions, right?
- What do I name those functions?
- Should they be a class?
- What if I want to extend one of them?
- What are the input and output types?
- Should I pass the DB handle in directly or create an abstraction?
There's a lot more questions you could ask and maybe you have the discipline not to but the temptation is there. After all what if you want to extend it later?.
When to Split
There are two concrete scenarios where you should split code out into it's own module:
- When you can abstract the code to make it easier to deal with another task or you find yourself writing the same code 3-or-so times.
- When you need to test code and want to leverage dependency injection.
There's also a more fuzzy rule that's hard to define. Does my code stink and can I get away with it stinking?
This is the crux of my argument. Long functions are a code smell but they're not inherently bad just because they're long.
Another thing Ash mentioned in her post about excessive indentation, this is another code smell, probably worse than the long function actually and is a fairly strong indicator you should consider a bit of refactoring.
Wrap Up
"It depends" is an inherently unsatisfying answer and it wouldn't fit into the format @codingcossack was making his post for which is why I decided to opine on it.
The advice to keep functions short and focused on a single responsibility is well-intentioned. It encourages modular, reusable code that is easier to understand and maintain. However, taken to an extreme, it can also lead to over-engineering simple solutions and unnecessary abstraction layers.
In summary, keep functions reasonably focused, break out logic when it clearly needs reuse or enables better testing, and refactor smelly code. But don't waste time endlessly splitting simple procedures for the sake of following a "rule." Strive for balance and use common sense based on the situation at hand.
Answering Ash's Question
That said, Iβm not sure if this part is right. It split out the part that reads the loop state file into itβs own function, returning a value for i and j. But my understanding has been that those are local variables, and you canβt pass them from one function to another. Does anyone know if thatβs true? Or is it just kind of an assumption because there are so very few cases where you would need to pass the index of a loop from one function to another?
Bit off-topic but while I'm here you absolutely can pass the index around, it's just a pointer to a value:
def print_number(i):
print(i)
for i in range(5):
print_number(i)
0
1
2
3
4
As for "should you" extract this method?
def read_loop_state(folder_name):
state_file = f"{folder_name}/loop_state.txt"
if os.path.exists(state_file):
with open(state_file, 'r') as f:
state = f.read().strip().split(',')
i = int(state[0])
j = int(state[1])
else:
i = 0
j = 0
return i, j
It's not bad but it's not needed. Focus on other things until you find a compelling reason to do so.
Featured ones: