Some time ago I had a problem where I had to exclude parts of one of my documents based on a build-time condition. I wanted to maintain one document but produce two versions: one that contained internal details of a product and one that didn't.
The class directive
As the title reveals, the solution is the Docutils' class directive. But before we get to the main point let us explore what it can do and how it works first. This will make the next (main) step more clear.
The class directive allows adding a class attribute to document nodes. For example, this can be useful for applying styles when generating HTML output.
The class attribute can be applied in 2 ways:
For example, we might want to add a subsection
class to a subsection of a document, paragraph
and emphasised
classes to a paragraph (we can apply more than one) and a compound-body
to a list.
Title
=====
Subtitle
--------
.. class:: subsection
Section is a structural element
"""""""""""""""""""""""""""""""
.. class:: paragraph emphasised
A paragraph is a simple body element.
.. class:: list
- Bullet list,
- is a compound body element.
The above document will be converted to a document tree that will look something like the below:
<document>
<title>
Title
<subtitle>
Subtitle
<section classes="subsection">
<title>
Section is a structural element
<paragraph classes="paragraph emphasised">
A paragraph is a simple body element.
<bullet_list classes="list">
<list_item>
<paragraph>
Bullet list,
<list_item>
<paragraph>
is a compound body element.
The above output is a so-called pseudo-XML. It’s a format used in Docutils for presenting a document tree in a easy-to-understand way. Note that for simplicity some of the node attributes were removed.
As you can see some elements in the tree were assigned a classes
attribute. Those are the same elements that we preceded with the class directives. The general rule is that the directive is applied to a whole node following the directive, so:
- Class applied to a structural node applies to the whole node, e.g. to the whole section.
- Class applied to a simple body element applies only to that element, e.g. to a paragraph.
- Class applied to a compound body element applies to the whole element but not to its children, e.g. to the top element of a list but not the list elements.
You might wonder what's going to happen when we apply a class to more than one body element? Do we have to prepend every element with a class directive? If we follow the previous example then yes but there's a better way:
Title
=====
Subtitle
--------
Section is a structural element
"""""""""""""""""""""""""""""""
.. class:: a-block
A paragraph is a simple body element.
- Bullet list
- is a compound body element.
The above example will give a document tree with the below structure:
<document>
<title>
Title
<subtitle>
Subtitle
<section>
<title>
Section is a structural element
<paragraph classes="a-block">
A paragraph is a simple body element.
<bullet_list classes="a-block">
<list_item>
<paragraph>
Bullet list
<list_item>
<paragraph>
is a compound body element.
Notice how the indented block was placed directly in the subsection and the class a-block
was applied to each of the indented elements. This works with the body elements only. If we try to do it for any of the structural elements it won't work, i.e. sections can't be in the class' indented block.
Class directives can also be nested:
Title
=====
Subtitle
--------
Section is a structural element
"""""""""""""""""""""""""""""""
.. class:: level-0
Paragraph at level 0.
.. class:: level-1
Paragraph at level 1.
.. class:: level-2
Paragraphs at level 2.
.. class:: level-3
Paragraph at level 3.
Which would results in:
<document>
<title>
Title
<subtitle>
Subtitle
<section>
<title>
Section is a structural element
<paragraph classes="level-0">
Paragraph at level 0.
<paragraph classes="level-1 level-0">
Paragraph at level 1.
<paragraph classes="level-2 level-1 level-0">
Paragraphs at level 2.
<paragraph classes="level-3 level-2 level-1 level-0">
Paragraph at level 3.
That's pretty cool, right? 😁
In all the above examples, we applied the class directive to the whole compound body elements but it's also possible to do it to individual items, e.g. in the case of a list:
.. class:: list
- Bullet list,
.. class:: list-item
- is a compound
- body element
The class in such case is applied to the second item only:
<bullet_list classes="list">
<list_item>
<paragraph>
Bullet list,
<list_item classes="list-item">
<paragraph>
is a compound
<list_item>
<paragraph>
body element
Just remember that the class directive, in this case, must be properly aligned with the preceding list item, otherwise it won't work as expected.
Now that we know everything (or at least enough 😉) about the class directive let's move to the main point.
Excluding nodes of a specific class
The lengthy explanation above led us to this point where we can now comfortably try and tackle the main problem. This part will be much shorter.
In Docutils classes can be used for one more thing besides what classes are usually used for: they allow to exclude nodes from the document. This can be done by passing a class name to the --strip-elements-with-class
option of the rst2xxx.py
family of commands, e.g.:
$ rst2html5.py --strip-elements-with-class=internal doc.rst
The above command will generate an HTML document with all of the nodes with an internal
class removed. That's right, it's that easy 😉.
So if the doc.rst looks like below:
Title
=====
.. class:: internal
Internal section
----------------
Internal content.
.. class:: external
External section
----------------
External content.
Then the HTML output will look something like:
<!DOCTYPE html>
<html xmlns="http://www.w3.org/1999/xhtml" xml:lang="en" lang="en">
<head>
<!-- ... -->
</head>
<body>
<div class="document" id="title">
<h1 class="title">Title</h1>
<div class="external section" id="external-section">
<h1>External section</h1>
<p>External content.</p>
</div>
</div>
</body>
</html>
Further, we'll explore how the class directive is implemented. Feel free to skip that part if you're not interested.
How does it work?
Docutils has two concepts that are involved in processing classes. There is a directive and there is a transform.
Directive
Directives are an extension mechanism for the reStructuredText markup language. Each directive starts with a double full stop and whitespace and ends with a double colon and whitespace with the directive type in between e.g. .. class:: class-name
.
Directives are a very flexible way of extending Docutils so let's take a look at the code of the class directive .
Each directive extends the docutils.parsers.rst.Directive
base class and implements run
method:
from docutils.parsers.rst import Directive
class Class(Directive):
required_arguments = 1
optional_arguments = 0
final_argument_whitespace = True
has_content = True
def run(self):
# ...
We can pass some configuration parameters as class variables like the above required_arguments = 1
which tells the parser to look for at least one parameter for the directive, which in this case are the names of classes or has_content = True
which indicates to the parser that the class directive accepts indented blocks as it's content.
Further, let's look at what is inside the run
method:
def run(self):
class_value = directives.class_option(self.arguments[0])
node_list = []
if self.content:
# ...
else:
# ...
return node_list
The run
method deals with two cases:
- If the
self.content
variable is not empty it means that there was a block (indented) passed to the directive. - If the
self.content
is empty the directive will affect the next node in the content tree (a next sibling).
The first case where an indented block was passed to the directive is handled like below:
container = nodes.Element()
self.state.nested_parse(self.content, self.content_offset, container)
for node in container:
node['classes'].extend(class_value)
node_list.extend(container.children)
First, we create an empty element that will serve as a container for the content included in the nested block: container = nodes.Element()
.
Next, we parse the directive's content with the call to nested_parse()
. This will add children elements created from self.content
and put them into the container
.
We then iterate over all of the created children in the container and add the class name to the node's classes
attribute.
As the last step, we add all of the container
's children to the node_list
which is going to be returned from the run
method.
In the second case:
pending = nodes.pending(
misc.ClassAttribute,
{'class': class_value, 'directive': self.name},
self.block_text)
self.state_machine.document.note_pending(pending)
node_list.append(pending)
Since there is no content directly associated with the class directive we will apply the class to the next element in the element tree (in other words to a sibling). To do that we need to defer this operation until the whole document is parsed (since we don't know our sibling yet).
To postpone applying our class to the sibling we create and insert a pending node. This node will then be processed by a Transform.
To create the pending node we use the nodes.pending
method and pass 3 arguments to it: the misc.ClassAttribute
transform class (this is the operation that will be executed on the pending node later), an options dictionary and a self.block_text
(which is a string containing the whole directive).
We then add this node to the node_list
which will be returned from the run()
method.
That's pretty much everything that the class directive implementation is doing. Next (Transform) we'll see what happens with the pending node that we created.
Transform
Transforms are run after the whole document has been parsed and their purpose is to change the document tree in place. They can perform different operations like resolving references or removing elements based on a certain condition.
We'll take a look at the ClassAttribute
transform that is used by the Class
directive class.
Each transform object derives from the docutils.tranforms.Transform
class and implements the apply
method.
class ClassAttribute(Transform):
default_priority = 210
def apply(self):
# ...
That's pretty straightforward. One thing that is worth mentioning is that the transforms are run in order according to their priorities hence the class attribute default_priority
above.
Next, let's take a look at the apply
method implementation:
def apply(self):
pending = self.startnode
parent = pending.parent
child = pending
while parent:
for index in range(parent.index(child) + 1, len(parent)):
# ...
return
else:
child = parent
parent = parent.parent
error = self.document.reporter.error(...)
pending.replace_self(error)
Let's look at how the for
loop works first and then we'll check what's happening inside of it.
First, we get a reference to the pending node that was added in the Class
class and to its parent. Then we go into a while
loop that will loop as long as the parent
node exists.
In the while
loop there is for ... else
construct which in short loops over indices in a range. If there are no indices to loop over (no elements at all) or no break
is executed inside the loop, it will jump to the else
block.
So in the above case, we generate a range of indices of children that are "behind" the pending node (i.e. siblings of the pending node that are following the node in the tree). If there are siblings like that we execute whatever is inside the for
loop and return (well that depends on what's in the loop of course). If there are no siblings following the pending node (which can happen if we put .. class::
before for example a section) we go level up in the node tree (the else
clause) and repeat the operation until we find a node that we're looking for.
Finally, if we don't find any nodes that we can apply our operations to, we replace the pending node with an error node.
The operations we apply to the nodes once we find them are pretty straightforward:
for index in range(parent.index(child) + 1, len(parent)):
element = parent[index]
if (isinstance(element, nodes.Invisible) or
isinstance(element, nodes.system_message)):
continue
element['classes'] += pending.details['class']
pending.parent.remove(pending)
return
The whole idea here is to get the first eligible sibling that exists after the pending node (or its parent, or its parent's parent, etc.) and add the classes
attribute to it (same as we did in the Class
implementation) and then if that happens, remove the pending node and return.
There are some exceptions like the nodes.Invisible
and nodes.system_message
nodes that are skipped over in the loop but that's not important here. Let's just say that those node types don't qualify as regular nodes so classes can't be applied to them.