As a software engineer which works with code every day, it is crucial for it to be clean and nicely structured.
One thing that most software engineers are not aware of, is that your job is not to write clever code (or code at all), it is to resolve problems. The code just happens to be the tool to achieve that. In medium and large projects, you will always end up working on code somebody else wrote, and your colleagues will need to read the code you write.
It is common for people to write code that just works, without paying attention to its quality. Its quality is an afterthought, more than often reserved as a last step of "cleaning things up if I have time", which is almost never the case.
Bad variable names, twisted and obfuscated logic, improperly separation of concerns and many other issues lead to frustration, not being able to properly estimate the complexity of a task because it's always more complicated than it should be, small changed breaking the whole system for all kinds of obscure reasons.
Code quality matters, and that shouldn't be up for debate.
If you want to improve your code quality (and you should), here are 5 tips you can start applying right away.
Everything in our code has a name: classes, methods, functions, variables, modules. The single most impactful thing you can do is to name your things appropriately, so that the reader knows instantly what it is about.
instances_list = get_customers(active=True)
The name instances_list
doesn't offer us much information. We know it is an iterable because of the function name
(getters + plural should always return an iterable of something). instances
doesn't give us information either.
Instances of what? What do they represent? What can we do with each instance?
A better variable name for that case would have been active_customers
. This way, we know exactly what we are dealing
with.
And from here, we get another rule we can apply: avoid implementation details in your names , unless, it's a very specific algorithm implementation, or the code is generic enough to not be tied to our business logic.
Our namings should be tied to the business domain of the application. If we are building an e-commerce store, all the code should be full of customers, products, carts, payments, invoices, etc.
Imagine a new person jumping in the project, and they have to work on an area that is cluttered with variable named
string
, list
, dict
, instance
and they have a question about the business logic of the app. What do they ask?
They would not know what lists, dicts, instances are and why they are important to the customers. It would
slow them down to a grind, and would affect the performance of the whole team. Only if the code could tell the
story of all the business cases it handles (good naming offers that).
Too many times, a lot of logic is put inside a single function. There are too many almost god-like functions and
classes out there, with hundreds of lines of code, code so nested you have to scroll horizontally to see the end of the
line. Variables scattered all around. Complex control loops that abuse break
s and continue
s, which makes following
the code flow a nightmare.
My advice is to always break down a bigger feature in smaller methods, with a maximum of three parameters. This way, we obtain a better segmentation of the business logic, we get smaller units that are easier testable, and we "compose" bigger pieces of code out of these smaller units. When a bug occurs, it's easier to spot that a function (with a good and suggestive name) does something wrong, rather than trying to spot down an off-by-one error inside a very nested loop.
So, just to make sure it sticks, I'll repeat: many many many small functions with at most three parameters. It's important to have small functions that can be easily tested and verified that they work properly. When reading the code, it is much easier to spot flaws in the business logic and implementation, and just understand easier what the code does.
You need to write unit tests all the time. A lot of developers get discouraged and don't enjoy testing because after they implement a task, they have huge methods with a lot of parameters, and to test that is an awful experience.
Too many cases to consider, things are heavily coupled together and can't be tested in isolation. Just a mess.
But if you split your code in many small functions, you can test all your work easier: for each small function, a few unit tests would be sufficient. With few parameters, you get way fewer cases to tests, with fewer possible combinations. You start testing the more nested and simpler functions (condition checking, simple computations), and the more such small functions you test, you are actually testing the whole feature as well. By the time you work your way upward to the main feature, you'd have actually tested 80% of the feature already.
Things should do one thing only. When we talk about one thing only, we tend to not understand what it means because it's a pretty vague term.
To do a thing only, we need to have many small functions to begin with. When you describe a piece of business logic, it more than often needs to do multiple things, and as you explain, you will use "and" a lot. When you do that, you just segmented the business logic into multiple parts, that can be put in different classes/functions.
Usually, big functions are structured in steps: do this, then do that, then do this other thing. When you have
an if statement with an else branch, there are two things being done: one for the true case, one for the else case.
When there are nested for
s, if
s or switch
statements, there are multitude of things being done, which can be
split away in their own thing, either classes or different functions.
Feature envy is when you have a piece of code doing some complex stuff with the information extracted from another class.
For example, take the following code:
class Customer(object):
...
def get_total_interest_amount(self):
total = {}
for account in self.get_accounts():
account = Account.for_account_number(account_num)
for tx in account.get_transactions():
if tx.currency not in total:
total[tx.currency] = 0
total[tx.currency] += tx.amount * account.interest_rate
if account.promotion_type == 'fidelity_plus':
total['RON'] = total.get('RON', 0) + 10
return total
Yeah, weird example, but I think it will manage to get the idea across. The biggest issue here, is that the customer class rummages through the account instance to get various information, such as the transactions, the interest rate, and the promotion type. All this information belongs to the account, and the customer has no business doing all these computations itself.
The proper way to do this is to have a method in the Account
class that does all these computations, and have
the customer class just call it instead. The code would then become something similar to
def dict_add(d1, d2):
# very important to not alter the original input parameters
# we want to avoid "surprise" side effects
res = {}
for k in set(d1.keys()).union(set(d2.keys())):
res[k] = d1.get(k, 0) = d2.get(k, 0)
return res
class Customer(object):
...
def get_total_interest_amount(self):
total = {}
for account in self.get_accounts():
total = dict_add(total, account.get_total_interest())
return total
This way, the customer doesn't need to have any knowledge about the structure and the information contained in the account. All it needs to know is that it has a method which returns the results in a certain shape.
When the account class will inevitably evolve (maybe it's promotion_type attribute gets removed, or other conditions
for computing the interest are added), we don't have to do any change in the Customer
class. We call it decoupling,
because one component can change independently of another.
When you need to return more values from a function, avoid at all costs returning more than two items. If you return more than two values, you will cause the code that calls it to become a weird mess that will make your colleagues want to quit:
results = get_multiple_results()
# then, the following code will be cluttered with
results[0]
results[1]
results[2]
results[3]
....
Who the hell is results[0]
, results[1]
, results[2]
, etc? No way to tell, unless you read the code of
get_multiple_result()
. And it's just so easy to introduce bugs. Juggling with indexes this way to
extract the values you need, usually leads to using the wrong value around. It's just too easy for somebody
unfamiliar with the code to put an 2
instead of 3
and not see it right away. The only way to catch these
issues is through extensive and long debug sessions. They can be avoided by wrapping the returned values
in a class (or namedtuple ), and
allow the calling code to access the values as named attributes. results.success_count
and results.failed_count
are much better than results[0]
and results[1]
.
A lot of over-zealous programmers tend to over-use meta-programming in Python. From things like patching runtime methods instead of using inheritance to creating classes dynamically at runtime. These things makes the code harder to understand, impossible to track issues just from looking at the code (because everything happens at runtime), and breaks IDE suggestions (which is already pretty clunky anyway).
Avoid using features of the language that are not usually known by the vast majority of Python developers.
Probably the most notorious thing you should avoid is putting else
branches on for
statements (yeah, it's a thing).
Another example is the walrus operator ( :=
, doing assignment inline in an if). Although this one is useful in some
specific cases, over-using it causes more trouble and makes the code harder to read. Python developers expect
assignments as a separate statement, not inside an if condition.
That's all! I hope these tips will help you write better code. If you follow them, I guarantee your code will stand out from the crowd, and your team will recognize and praise it.