Code Fragrances

We all know what a code smell is and we all know that sinking sensation of diminishing confidence that we feel when we discover one. Sadly, I hardly ever hear about their opposite: code fragrances.

Code fragrances are all the little extra things we can do in our code to be better than good enough. They’re typically not covered by style guides and may not be required practices, and they often go without saying because they are both good to do and not that hard to do. They’re not defects, and without them code can be correct but may not be as helpful as it could be.

Here is a list of the ones I’ve collected.

For Code In All Languages

  • Referenced, contextual symbolism is favored over isolated, context-free literalism. java.lang.Math.PI is used and not a literal numeric value such as 3.14159D. httplib.OK is used and not the literal numeric value 200. The code is full of constants, and wherever possible constants are defined and shared by the code that controls the protocol to which those constants are germaine (so clients should be using constants defined by servers rather than servers using constants defined by clients, or clients and servers should both be using constants defined by some protocol specification independent of both). There are very few literal values of any type to be found in the functions that comprise the behavioral sections of the code.
  • The components, modules, interfaces, and subdivisions within your code are tractable, clear, and sensible. The methods of a class or the elements of a module make sense both individually and considered together as a group. Interfaces have comprehensible semantics that elegantly express the author’s behavior intent in the precise language of code.
  • The documentation is complete and appropriate to its level of abstraction. Public APIs of modules and classes are clear, can be understood without reading a single statement of code, and make no reference to non-public elements of the code. The documentation is properly phrased, capitalized, and punctuated and could be understood if it were read aloud.
    • The documentation is appropriately formatted. Indentation, example code, and links, citations, and cross-references are all where and what they are supposed to be. The documentation renders properly and legibly after extraction with documentation-generating tools.
  • Modules are appropriately coupled. Relationships between modules that can be unidirectional are unidirectional and not bidirectional. There are no functions that interact with the database, the network, and the user interface all at once.
  • Lines of encapsulation are clear and unviolated. No module reaches around or otherwise circumvents the declared API of another module. No module makes self-use of its own client API.
  • Object state and lifecycle are clear. Every object that can be immutable is immutable. For those that aren’t the reasons why are clear and the causes and effects of state changes are justified. Classes have no methods that are legal to call for instances in some states but illegal to call for instances in other states.
  • Single-assignment form is heavily favored or even used exclusively. Fields are never assigned a value and later assigned another value without the first value having been read from the field. Excepting iterator indices, it is rare that fields are ever reassigned. Illustrated, code is of the form const field = condition ? expr1 : expr2; or field = expr1 if condition else expr2; or final Type field; if (condition) field = expr1; else field = expr2; but never field = expr2; if (condition) field = expr1;.
  • Control flow is as explicit as possible and the implicit control flow associated with return statements is made structurally explicit. Illustrated, code is of the form if (cond1) return expr1; elif (cond2) return expr2; else return expr3; instead of if (cond1) return expr1; if (cond2) return expr2; return expr3;.
  • An explanatory comment is included with all lint and compiler warning suppressions. Static analysis tools are not perfect, but they are not so imperfect that it can safely be assumed that most readers will already know why any given warning is being suppressed.
  • Identifiers are defined before they are used. In fiction characters are generally “introduced” by the author and described in isolation before they are referred to in a situational context with other characters or in interaction with other characters. Plays feature a dramatis personæ before the dramatic text. Recipes feature an ingredients list before the instructions for processing those ingredients, and the preparations for intermediate ingredients prepared before the final dish are placed before the preparation of the final dish itself. That compilers and interpreters are able to process programs with identifiers in nearly any order is no reason to write programs with identifiers in an unexamined order. Code written for later human reading should, all else ignored, be written with each logical element being defined before being referenced in the order in which the code is read. Mutually recursive elements can frustrate this and so should be minimized and when required be placed very close to one another in the code body.

For Python Code

  • Everything that is a class is something that should be a class. Classes are used to define types rather than used as namespaces or a way of grouping things together. Classes are instantiated, and when they are instantiated, something of interest is done with the instances of the class.
  • Everything that is in a class must be in that class. Every instance method makes use of its “self” parameter (instance methods required by some implemented type specification are excepted). Most constants are module-scope constants, and what few class-scope constants are present play a role in interface implementation.
  • None is treated like the special value it is rather than assumed to be included in every type. If a function can return None, its doc string describes the circumstances under which it does so. If a function parameter may be None, its description makes clear how the function behaves for such input.
  • Constants are favored over literal values placed in function bodies. The only string literals present in function bodies are messages to other programmers such as the message text written to the log or included in a raised exception. Values of semantic significance that when changed would change the behavior of the code are extracted out of the code’s functions into constants.