Update (Jan 2, 2025): Improved language and image from feedback.
In this post, I want to show you how to access the docstrings defined as string literals1 below class attributes like this:
|
|
By runtime accessible I mean, that given a class and the name of an attribute, it returns the docstring of the attribute. I.e. a function like this:
|
|
This has been a long standing issue for many python user, but is still not integrated in the language or standard library. Various tools, e.g. documentation tools such as Sphinx and Griffe, solved this issue internally in a similar way as I did. Unfortunately, I was not able to find a reusable standalone solution.
An extended version of the code has been released on GitHub and as a package on PyPI.
My personal motivation was to show docstrings of parameters and enum choices in a GUI. Using this project, I was able to display the documentation of the COLOR-attribute as a tooltip:
Extracting docstrings of methods, is easy and builtin to Python. You can access the docstring of methods via the __doc__
-attribute.
But this does not work for class attributes, since it could lead to various problems. Just two examples:
MyInt
would be overwritten by the docstring of FlexibleVector.scalar_type
:
|
|
__doc__
attribute to them.Now that we have established the problem, let’s have a look at what has been tried in the past.
There is a long-standing discussion on what is the best way to document class attributes. Let’s first have a look on what has been proposed in the past to make an informed decision on whether you should use the approach shown in this blog post.
Back in the year 2000, PEP 224 proposed exactly the syntax shown in the introduction, but was rejected.
Many people (including Guido van Rossum himself) argue, that this way is bad, since the docstring is below the attribute and often directly above another attribute, which could lead to confusion.
Side note: Simply moving it above the assignment is not an option, since it would be ambiguous, whether the docstring belongs to the class (or module) or the first attribute.
One year later, PEP 257 states that string literals1 directly after class-attributes should be considered “attribute-docstrings”, although they will not be runtime accessible. In verbatim it says:
String literals occurring elsewhere in Python code may also act as documentation. They are not recognized by the Python bytecode compiler and are not accessible as runtime object attributes (i.e. not assigned to
__doc__
), but two types of extra docstrings may be extracted by software tools:
- String literals occurring immediately after a simple assignment at the top level of a module, class, or
__init__
method are called “attribute docstrings”. […]
Hence, we the python language recognizes them as an official docstring, but won’t help us access it during runtime.
An alternative approach, is described in the recent (2023) PEP 727. It proposes to define the docstring as an annotation:
|
|
While this PEP has a lot of support from popular libraries and is already implemented in typing_extensions
, discussion has brought up various issues,
such that the original sponsor of the PEP suggested it to be withdrawn.
Personally, I also don’t like the syntax, since it interrupts the reader from reading the assignment, i.e. the actual logic of the code.
Summary: Since PEP 257, defining string literals after assignments in a module or class are officially recognized as “attribute docstrings”. But Python does not offer a builtin way to extract them.
In recent years, multiple popular projects support the syntax initially proposed in PEP 224:
There are more discussions2 3 if you are interested in more technical arguments. During my research I found various interesting alternatives, which I present in a separate section. You can then decide for yourself, which solution suits you best.
My verdict: Since, I use both of these projects, don’t mind the syntax and wanted to access the docstrings of class attributes, to show them as tooltips in a GUI, I decided to implement this as a reusable standalone solution.
Next, I want to quickly explain how the solution works. This is a highly optional read, and you can directly skip to Limitations or Alternatives.
You can find an extended implementation usable as a package on GitHub and PyPI.
Instead of implementing the function in the introduction directly, we implement a function which extracts all attribute docstrings as a mapping. A convenience function, just returning the requested attribute’s docstring can be easily implemented.
The implementation can be divided into three steps:
inspect
-moduleast
-module to get an abstract syntax tree (AST)ast.NodeVisitor
As code this will look as follows:
|
|
What remains to implement, is the AttributeDocstringVisitor
.
To understand, what we have to do, let’s have a look at the dumped AST of the class from the introduction:
|
|
We have to go through all statements in the body of a ClassDef
to find Expr
nodes containing Constant
nodes whose values are strings, if they follow AnnAssign
(or Assign
) nodes.
This value is also the docstring, which we have to store.
To build the mapping, we can get the attribute name from the AnnAssign
node’s target-id.
For Assign
nodes we additionally have to handle possible multiple targets.
|
|
The NodeVisitor
implements the visitor design pattern.
When it encounters a node (e.g. ClassDef
) the visitor automatically calls the corresponding method (visit_ClassDef
).
After visiting a class, the docstrings will be in visitor.docs
.
That is all there is.
inspect
.
If this is not possible (e.g. when the class is defined in a REPL), then it will fail and return no docstrings.Some people argue, that the docstring of class attributes is closely tied to the class and should therefore be documented there. The actual syntax depends on your docstring style. Here as an example for Google style docstrings:
|
|
Then you can use a docstring parser (e.g. docstring_parser to extract the information.
|
|
While I understand this argument, I prefer not mangling docstrings too much and prefer a solution which is more integrated in the language.
Since documentation of enums is a common use-case of class attributes docstrings, here are some spezalized options, which only work for enums:
The library enum-tools offers the class decorator @document_enum
to extract the docstrings of the attributes of an enum.
It also traverses the AST as we do, but is more specialized for enums.
|
|
Requires slightly different syntax:
|
|
If you are fine with auto-incrementing values, you can set the docstring as the value.
This approach rewrites the value as the __doc__
attribute
and automatically increments the actual value and assigns it in a custom enum class.
PEP 727 is already implemented in typing_extensions
and is there to stay for backwards-compatibility.
It remains to be seen if this approach still manages to get a foothold, since
some popular libraries prefer it.
The decision here might depend on your ecosystem (i.e. the libraries you are using) and personal preference of the syntax.
Defining the docstring as a special dunder attribute:
|
|
You can then use getattr(cls, f"__doc__{attr_name}__")
to access the docstring.
Works, but that’s it. Be careful when renaming. To my knowledge, this approach is not very wide-spread.
Sphinx has a directive for documenting class attributes.
|
|
This approach is even more difficult to access at runtime, since comments are not stored in the AST at all. The advantage is, that it will be compatible with Sphinx. There is probably a way, since Sphinx can access it, but I did not research how.
Alternatively, you can also use the attribute docstrings with Sphinx.
Extracting the attribute docstrings from the source is not difficult. It is somewhat limited, by requiring source code access at runtime. There are various alternatives, each having there own trade-offs.
I will be using the method presented in this post, since it follows the in PEP 257 officially recognized syntax and is compatible with various tools such as Pydantic and mkdocs.
A string literal is simply a hard coded string. In most languages, you would assign a string to a variable (e.g.: x = "Hello World"
) or use it as part of an expression (e.g. ",".join(["my", "list", "of", "strings"])
). But in python you can also put a string literal in a standalone line. If it appears below a function signature or class declaration, it is interpreted as a docstring. ↩︎ ↩︎
Original mailing list discussion of PEP 224. Interestingly mentions, that this is one of the first publicly discussed PEPs. ↩︎