Runtime Accessible Class Attribute Docstrings in Python

Update (Jan 2, 2025): Improved language and image from feedback.

In this post, I want to show you how to access the docstrings defined as string literals1 below class attributes like this:

1
2
3
class MyClass:
    a: int = 5
    """This is the docstring of a."""

By runtime accessible I mean, that given a class and the name of an attribute, it returns the docstring of the attribute. I.e. a function like this:

1
2
3
4
5
def get_attribute_docstring(cls: type, attribute_name: str) -> str:
    ...

get_attribute_docstring(MyClass, 'a')
# Output: 'This is the docstring of a.'

This has been a long standing issue for many python user, but is still not integrated in the language or standard library. Various tools, e.g. documentation tools such as Sphinx and Griffe, solved this issue internally in a similar way as I did. Unfortunately, I was not able to find a reusable standalone solution.

An extended version of the code has been released on GitHub and as a package on PyPI.

My personal motivation was to show docstrings of parameters and enum choices in a GUI. Using this project, I was able to display the documentation of the COLOR-attribute as a tooltip:

Figure 1: Screenshot from my web-based workflow editor for computer vision. The framework introspects programatically defined Node classes to build a UI for configuring the nodes and show documentations along side them.

Figure 1: Screenshot from my web-based workflow editor for computer vision. The framework introspects programatically defined Node classes to build a UI for configuring the nodes and show documentations along side them.

Why doesn’t this work out of the box?

Extracting docstrings of methods, is easy and builtin to Python. You can access the docstring of methods via the __doc__-attribute.

But this does not work for class attributes, since it could lead to various problems. Just two examples:

  1. It could overwrite existing documentation associated with the value of the attribute. In the following sample, the docstring of MyInt would be overwritten by the docstring of FlexibleVector.scalar_type:
1
2
3
4
5
6
7
class MyInt(Number):
    """Much better implementation of integer."""
    ...

class FlexibleVector:
    scalar_type: Type[Number] = MyInt
    """Type of the scalar values."""
  1. Furthermore, some objects might be immutable, so you cannot simply add a __doc__ attribute to them.

Now that we have established the problem, let’s have a look at what has been tried in the past.

Background and should you do this?

There is a long-standing discussion on what is the best way to document class attributes. Let’s first have a look on what has been proposed in the past to make an informed decision on whether you should use the approach shown in this blog post.

PEP 224 (rejected)

Back in the year 2000, PEP 224 proposed exactly the syntax shown in the introduction, but was rejected.

Many people (including Guido van Rossum himself) argue, that this way is bad, since the docstring is below the attribute and often directly above another attribute, which could lead to confusion.

Side note: Simply moving it above the assignment is not an option, since it would be ambiguous, whether the docstring belongs to the class (or module) or the first attribute.

PEP 257 (accepted)

One year later, PEP 257 states that string literals1 directly after class-attributes should be considered “attribute-docstrings”, although they will not be runtime accessible. In verbatim it says:

String literals occurring elsewhere in Python code may also act as documentation. They are not recognized by the Python bytecode compiler and are not accessible as runtime object attributes (i.e. not assigned to __doc__), but two types of extra docstrings may be extracted by software tools:

  1. String literals occurring immediately after a simple assignment at the top level of a module, class, or __init__ method are called “attribute docstrings”. […]

Hence, we the python language recognizes them as an official docstring, but won’t help us access it during runtime.

PEP 727 (draft)

An alternative approach, is described in the recent (2023) PEP 727. It proposes to define the docstring as an annotation:

1
2
3
4
from typing import Annotated, Doc

class MyClass:
    a: Annotated[int, Doc("This is the docstring of a.")] = 5

While this PEP has a lot of support from popular libraries and is already implemented in typing_extensions, discussion has brought up various issues, such that the original sponsor of the PEP suggested it to be withdrawn.

Personally, I also don’t like the syntax, since it interrupts the reader from reading the assignment, i.e. the actual logic of the code.

The current state of docstring under the assignment

Summary: Since PEP 257, defining string literals after assignments in a module or class are officially recognized as “attribute docstrings”. But Python does not offer a builtin way to extract them.

In recent years, multiple popular projects support the syntax initially proposed in PEP 224:

  1. Griffe, which is the library used by the python handler of mkdocstrings to extract all information from python files to display them in a mkdocs documentation.
  2. Pydantic (since version 2.7), which populates the description of fields with the attribute-docstring.

There are more discussions2 3 if you are interested in more technical arguments. During my research I found various interesting alternatives, which I present in a separate section. You can then decide for yourself, which solution suits you best.

My verdict: Since, I use both of these projects, don’t mind the syntax and wanted to access the docstrings of class attributes, to show them as tooltips in a GUI, I decided to implement this as a reusable standalone solution.

Next, I want to quickly explain how the solution works. This is a highly optional read, and you can directly skip to Limitations or Alternatives.

Implementation

You can find an extended implementation usable as a package on GitHub and PyPI.

Instead of implementing the function in the introduction directly, we implement a function which extracts all attribute docstrings as a mapping. A convenience function, just returning the requested attribute’s docstring can be easily implemented.

The implementation can be divided into three steps:

  1. Extract source code using the inspect-module
  2. Parse it using the ast-module to get an abstract syntax tree (AST)
  3. Extract the attribute docstrings from the AST with an ast.NodeVisitor

As code this will look as follows:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
import inspect
import textwrap
import ast

def get_attribute_docstrings(cls):
    # 1. Extract Source Code
    try:
        source = inspect.getsource(cls)
    except (TypeError, OSError):
        # Return an empty dict, if the source could not be resolved.
        return {}
    # dedent to allow parsing of nested classes
    source = textwrap.dedent(source)
    # 2. Parse code
    tree = ast.parse(source)
    # 3. Visit AST and return collected docs
    visitor = AttributeDocstringVisitor()
    visitor.visit(tree)
    return visitor.docs

What remains to implement, is the AttributeDocstringVisitor. To understand, what we have to do, let’s have a look at the dumped AST of the class from the introduction:

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
Module(
    body=[
        ClassDef(
            name='MyClass',
            bases=[],
            keywords=[],
            body=[
                AnnAssign(
                    target=Name(id='a', ctx=Store()),
                    annotation=Name(id='int', ctx=Load()),
                    value=Constant(value=5),
                    simple=1),
                Expr(
                    value=Constant(value='This is the docstring of a.'))],
            decorator_list=[],
            type_params=[])],
    type_ignores=[])

We have to go through all statements in the body of a ClassDef to find Expr nodes containing Constant nodes whose values are strings, if they follow AnnAssign (or Assign) nodes. This value is also the docstring, which we have to store. To build the mapping, we can get the attribute name from the AnnAssign node’s target-id. For Assign nodes we additionally have to handle possible multiple targets.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
28
29
30
class AttributeDocstringVisitor(ast.NodeVisitor):

    def __init__(self):
        self.docs: Dict[str, str] = {}
        self.last_attr_name: Optional[str] = None

    def visit_Assign(self, node: ast.Assign):
        # ignore multi-assignments `a = b = 5` and assignments such as `a[1] = 5`
        if len(node.targets) == 1 and isinstance(node.targets[0], ast.Name):
            self.last_attr_name = node.targets[0].id
        else:
            self.last_attr_name = None

    def visit_AnnAssign(self, node: ast.AnnAssign):
        # Handle annotated assignments
        if isinstance(node.target, ast.Name):
            self.last_attr_name = node.target.id
        else:
            self.last_attr_name = None

    def visit_Expr(self, node: ast.Expr):
        # Check if the expression is a docstring for the last attribute
        if isinstance(node.value, ast.Constant) and isinstance(node.value.value, str):
            if self.last_attr_name:
                # Removes leading/trailing whitespace 
                # (especially necessary for multi-line docstrings)
                docstring = inspect.cleandoc(node.value.value)
                self.docs[self.last_attr_name] = docstring
        # Reset the last attribute name after processing
        self.last_attr_name = None

The NodeVisitor implements the visitor design pattern. When it encounters a node (e.g. ClassDef) the visitor automatically calls the corresponding method (visit_ClassDef). After visiting a class, the docstrings will be in visitor.docs.

That is all there is.

Limitations

  1. Source Code Access: This implementation requires access to the source code via inspect. If this is not possible (e.g. when the class is defined in a REPL), then it will fail and return no docstrings.
  2. Inheritance (solved in full version): The implementation above does consider inherited attributes, since it only parses the specified class. This can be alleviated by traversing the parent classes in the MRO and parsing them until the attribute and its docstring was found.
  3. Inner Classes (solved in full version): The implementation above, does not ignore inner classes. It will add all attributes of inner classes to the dictionary.

Alternatives

In the docstring of the class

Some people argue, that the docstring of class attributes is closely tied to the class and should therefore be documented there. The actual syntax depends on your docstring style. Here as an example for Google style docstrings:

1
2
3
4
5
6
7
8
9
class MyClass:
    """This is the docstring of MyClass.
    
    Attributes:
        a: This is the docstring of a.
        b: This is the docstring of b.
    """
    a: int = 5
    b: str = "Hello"

Then you can use a docstring parser (e.g. docstring_parser to extract the information.

1
2
3
4
5
from docstring_parser import parse

docstring = parse(MyClass.__doc__)
docstring.params[0].arg_name, docstring.params[0].description
('a', 'This is the docstring of a.')

While I understand this argument, I prefer not mangling docstrings too much and prefer a solution which is more integrated in the language.

Enums

Since documentation of enums is a common use-case of class attributes docstrings, here are some spezalized options, which only work for enums:

enum-tools decorator

The library enum-tools offers the class decorator @document_enum to extract the docstrings of the attributes of an enum. It also traverses the AST as we do, but is more specialized for enums.

 1
 2
 3
 4
 5
 6
 7
 8
 9
10
11
12
13
14
15
16
import enum_tools.documentation
from enum_tools.documentation import document_enum
from enum import Enum

# No idea, why this has to be the set. Maybe it is a runtime optimization,
# such that it is only executed during some workflows (e.g. Sphinx)
# See: [Issue 29](https://github.com/domdfcoding/enum_tools/issues/29)
enum_tools.documentation.INTERACTIVE = True

@document_enum
class MyEnum(Enum):
    a = 5
    """This is the docstring of a."""

MyEnum.a.__doc__
# Output: 'This is the docstring of a.'

aenum

Requires slightly different syntax:

1
2
3
4
5
6
7
from aenum import Enum

class MyEnum(Enum):
    a = 5, """This is the docstring of a."""

MyEnum.a.__doc__
# Output: 'This is the docstring of a.'

Value as docstring

If you are fine with auto-incrementing values, you can set the docstring as the value. This approach rewrites the value as the __doc__ attribute and automatically increments the actual value and assigns it in a custom enum class.

PEP 727

PEP 727 is already implemented in typing_extensions and is there to stay for backwards-compatibility. It remains to be seen if this approach still manages to get a foothold, since some popular libraries prefer it. The decision here might depend on your ecosystem (i.e. the libraries you are using) and personal preference of the syntax.

Guido van Rossum’s suggestion from 2000

Defining the docstring as a special dunder attribute:

1
2
3
class MyClass:
    a: int = 5
    __doc__a__ = """This is the docstring of a."""

You can then use getattr(cls, f"__doc__{attr_name}__") to access the docstring.

Works, but that’s it. Be careful when renaming. To my knowledge, this approach is not very wide-spread.

Non-alternative: Poor Sphinx users

Sphinx has a directive for documenting class attributes.

1
2
3
4
class MyClass:
    #: This is the docstring of a.
    a: int = 5
    b: str = "Hello"  #: This is the docstring of b.

This approach is even more difficult to access at runtime, since comments are not stored in the AST at all. The advantage is, that it will be compatible with Sphinx. There is probably a way, since Sphinx can access it, but I did not research how.

Alternatively, you can also use the attribute docstrings with Sphinx.

Summary

Extracting the attribute docstrings from the source is not difficult. It is somewhat limited, by requiring source code access at runtime. There are various alternatives, each having there own trade-offs.

I will be using the method presented in this post, since it follows the in PEP 257 officially recognized syntax and is compatible with various tools such as Pydantic and mkdocs.


  1. A string literal is simply a hard coded string. In most languages, you would assign a string to a variable (e.g.: x = "Hello World") or use it as part of an expression (e.g. ",".join(["my", "list", "of", "strings"])). But in python you can also put a string literal in a standalone line. If it appears below a function signature or class declaration, it is interpreted as a docstring. ↩︎ ↩︎

  2. Original mailing list discussion of PEP 224. Interestingly mentions, that this is one of the first publicly discussed PEPs. ↩︎

  3. People in 2023 still asking for this feature ↩︎