🇬🇧🇺🇸 Drastically improve your code quality with these 5 tips
As a software engineer which works with code every day, it is crucial for it to be clean and nicely structured.
One thing that most software engineers are not aware of, is that your job is not to write clever code (or code at all), it is to resolve problems. The code just happens to be the tool to achieve that. In medium and large projects, you will always end up working on code somebody else wrote, and your colleagues will need to read the code you write.
It is common for people to write code that just works, without paying attention to its quality. Its quality is an afterthought, more than often reserved as a last step of "cleaning things up if I have time", which is almost never the case.
Bad variable names, twisted and obfuscated logic, improperly separation of concerns and many other issues lead to frustration, not being able to properly estimate the complexity of a task because it's always more complicated than it should be, small changed breaking the whole system for all kinds of obscure reasons.
Code quality matters, and that shouldn't be up for debate.
If you want to improve your code quality (and you should), here are 5 tips you can start applying right away.
1. Communicate intention through names
Everything in our code has a name: classes, methods, functions, variables, modules. The single most impactful thing you can do is to name your things appropriately, so that the reader knows instantly what it is about.
- Long descriptive names are almost always better than short names.
- Boolean returning functions/methods should always have a logical predicate name (eg. is_active, has_children, did_succeed)
- Methods should always have action names (verbs).
- Methods that return something, should indicate so. For example, validate_name is not a proper name for a method that returns something. What does it return? Boolean specifying if the name is valid or not? Or if it is invalid? A list of errors? There's no way to know unless we read the function's code. When you force somebody to read the code of something to understand what it does, it is not named properly. Better names for a function that returns the list of validation errors for a name would be get_name_validation_errors, a better name for a function that just checks if a name is valid or not would be is_name_valid. Choose the names wisely to offer hints about the return value.
- Class names should avoid generic names, and do one thing only. No class name should include generic zero-information words like Utils, Processor, Data. There are always better alternatives.
- Avoid redundant and zero-information words in your names. For example, take this line of code:
instances_list = get_customers(active=True)
The name instances_list doesn't offer us much information. We know it is an iterable because of the function name (getters + plural should always return an iterable of something). instances doesn't give us information either. Instances of what? What do they represent? What can we do with each instance?
A better variable name for that case would have been active_customers . This way, we know exactly what we are dealing with.
And from here, we get another rule we can apply: avoid implementation details in your names , unless, it's a very specific algorithm implementation, or the code is generic enough to not be tied to our business logic.
Our namings should be tied to the business domain of the application. If we are building an e-commerce store, all the code should be full of customers, products, carts, payments, invoices, etc.
Imagine a new person jumping in the project, and they have to work on an area that is cluttered with variable named string , list , dict , instance and they have a question about the business logic of the app. What do they ask? They would not know what lists, dicts, instances are and why they are important to the customers. It would slow them down to a grind, and would affect the performance of the whole team. Only if the code could tell the story of all the business cases it handles (good naming offers that).
2. Many many many small functions
Too many times, a lot of logic is put inside a single function. There are too many almost god-like functions and classes out there, with hundreds of lines of code, code so nested you have to scroll horizontally to see the end of the line. Variables scattered all around. Complex control loops that abuse break s and continue s, which makes following the code flow a nightmare.
My advice is to always break down a bigger feature in smaller methods, with a maximum of three parameters. This way, we obtain a better segmentation of the business logic, we get smaller units that are easier testable, and we "compose" bigger pieces of code out of these smaller units. When a bug occurs, it's easier to spot that a function (with a good and suggestive name) does something wrong, rather than trying to spot down an off-by-one error inside a very nested loop.
So, just to make sure it sticks, I'll repeat: many many many small functions with at most three parameters. It's important to have small functions that can be easily tested and verified that they work properly. When reading the code, it is much easier to spot flaws in the business logic and implementation, and just understand easier what the code does.
3. Unit tests
You need to write unit tests all the time. A lot of developers get discouraged and don't enjoy testing because after they implement a task, they have huge methods with a lot of parameters, and to test that is an awful experience.
Too many cases to consider, things are heavily coupled together and can't be tested in isolation. Just a mess.
But if you split your code in many small functions, you can test all your work easier: for each small function, a few unit tests would be sufficient. With few parameters, you get way fewer cases to tests, with fewer possible combinations. You start testing the more nested and simpler functions (condition checking, simple computations), and the more such small functions you test, you are actually testing the whole feature as well. By the time you work your way upward to the main feature, you'd have actually tested 80% of the feature already.
4. Separation of concerns / things should do one thing only
Things should do one thing only. When we talk about one thing only, we tend to not understand what it means because it's a pretty vague term.
To do a thing only, we need to have many small functions to begin with. When you describe a piece of business logic, it more than often needs to do multiple things, and as you explain, you will use "and" a lot. When you do that, you just segmented the business logic into multiple parts, that can be put in different classes/functions.
Usually, big functions are structured in steps: do this, then do that, then do this other thing. When you have an if statement with an else branch, there are two things being done: one for the true case, one for the else case. When there are nested for s, if s or switch statements, there are multitude of things being done, which can be split away in their own thing, either classes or different functions.
Avoid feature envy, delegate work properly to your classes
Feature envy is when you have a piece of code doing some complex stuff with the information extracted from another class.
For example, take the following code:
class Customer(object):
    ...
    def get_total_interest_amount(self):
        total = {}
        for account in self.get_accounts():
            account = Account.for_account_number(account_num)
            for tx in account.get_transactions():
                if tx.currency not in total:
                    total[tx.currency] = 0
                total[tx.currency] += tx.amount * account.interest_rate
            if account.promotion_type == 'fidelity_plus':
                total['RON'] = total.get('RON', 0) + 10
        return total
Yeah, weird example, but I think it will manage to get the idea across. The biggest issue here, is that the customer class rummages through the account instance to get various information, such as the transactions, the interest rate, and the promotion type. All this information belongs to the account, and the customer has no business doing all these computations itself.
The proper way to do this is to have a method in the Account class that does all these computations, and have the customer class just call it instead. The code would then become something similar to
def dict_add(d1, d2):
    # very important to not alter the original input parameters
    # we want to avoid "surprise" side effects
    res = {}
    for k in set(d1.keys()).union(set(d2.keys())):
      res[k] = d1.get(k, 0) = d2.get(k, 0)
    return res
class Customer(object):
    ...
    def get_total_interest_amount(self):
        total = {}
        for account in self.get_accounts():
            total = dict_add(total, account.get_total_interest())
        return total
This way, the customer doesn't need to have any knowledge about the structure and the information contained in the account. All it needs to know is that it has a method which returns the results in a certain shape.
When the account class will inevitably evolve (maybe it's promotion_type attribute gets removed, or other conditions for computing the interest are added), we don't have to do any change in the Customer class. We call it decoupling, because one component can change independently of another.
6ish: bonus Python tricks
6.1 Avoid returning more than two results at once
When you need to return more values from a function, avoid at all costs returning more than two items. If you return more than two values, you will cause the code that calls it to become a weird mess that will make your colleagues want to quit:
results = get_multiple_results()
# then, the following code will be cluttered with
results[0]
results[1]
results[2]
results[3]
....
Who the hell is results[0] , results[1] , results[2] , etc? No way to tell, unless you read the code of get_multiple_result() . And it's just so easy to introduce bugs. Juggling with indexes this way to extract the values you need, usually leads to using the wrong value around. It's just too easy for somebody unfamiliar with the code to put an 2 instead of 3 and not see it right away. The only way to catch these issues is through extensive and long debug sessions. They can be avoided by wrapping the returned values in a class (or namedtuple ), and allow the calling code to access the values as named attributes. results.success_count and results.failed_count are much better than results[0] and results[1] .
6.2 Avoid meta-programming
A lot of over-zealous programmers tend to over-use meta-programming in Python. From things like patching runtime methods instead of using inheritance to creating classes dynamically at runtime. These things makes the code harder to understand, impossible to track issues just from looking at the code (because everything happens at runtime), and breaks IDE suggestions (which is already pretty clunky anyway).
6.3 Don't use cool hip features nobody knows about.
Avoid using features of the language that are not usually known by the vast majority of Python developers.
Probably the most notorious thing you should avoid is putting else branches on for statements (yeah, it's a thing).
Another example is the walrus operator ( := , doing assignment inline in an if). Although this one is useful in some specific cases, over-using it causes more trouble and makes the code harder to read. Python developers expect assignments as a separate statement, not inside an if condition.
Conclusion
That's all! I hope these tips will help you write better code. If you follow them, I guarantee your code will stand out from the crowd, and your team will recognize and praise it.
