pydoctor.epydoc.markup.epytext

module documentation

(source)

Parser for epytext strings. Epytext is a lightweight markup whose primary intended application is Python documentation strings. This parser converts Epytext strings to a simple DOM-like representation (encoded as a tree of Element objects and strings). Epytext strings can contain the following structural blocks:

epytext: The top-level element of the DOM tree.
para: A paragraph of text. Paragraphs contain no newlines, and all spaces are soft.
section: A section or subsection.
field: A tagged field. These fields provide information about specific aspects of a Python object, such as the description of a function's parameter, or the author of a module.
literalblock: A block of literal text. This text should be displayed as it would be displayed in plaintext. The parser removes the appropriate amount of leading whitespace from each line in the literal block.
doctestblock: A block containing sample python code, formatted according to the specifications of the doctest module.
ulist: An unordered list.
olist: An ordered list.
li: A list item. This tag is used both for unordered list items and for ordered list items.

Additionally, the following inline regions may be used within para blocks:

code: Source code and identifiers.
math: Mathematical expressions.
index: A term which should be included in an index, if one is generated.
italic: Italicized text.
bold: Bold-faced text.
uri: A Universal Resource Indicator (URI) or Universal Resource Locator (URL)
link: A Python identifier which should be hyperlinked to the named object's documentation, when possible.

The returned DOM tree will conform to the the following Document Type Description:

   <!ENTITY % colorized '(code | math | index | italic |
                          bold | uri | link | symbol)*'>

   <!ELEMENT epytext ((para | literalblock | doctestblock |
                      section | ulist | olist)*, fieldlist?)>

   <!ELEMENT para (#PCDATA | %colorized;)*>

   <!ELEMENT section (para | listblock | doctestblock |
                      section | ulist | olist)+>

   <!ELEMENT fieldlist (field+)>
   <!ELEMENT field (tag, arg?, (para | listblock | doctestblock)
                                ulist | olist)+)>
   <!ELEMENT tag (#PCDATA)>
   <!ELEMENT arg (#PCDATA)>

   <!ELEMENT literalblock (#PCDATA | %colorized;)*>
   <!ELEMENT doctestblock (#PCDATA)>

   <!ELEMENT ulist (li+)>
   <!ELEMENT olist (li+)>
   <!ELEMENT li (para | literalblock | doctestblock | ulist | olist)+>
   <!ATTLIST li bullet NMTOKEN #IMPLIED>
   <!ATTLIST olist start NMTOKEN #IMPLIED>

   <!ELEMENT uri     (name, target)>
   <!ELEMENT link    (name, target)>
   <!ELEMENT name    (#PCDATA | %colorized;)*>
   <!ELEMENT target  (#PCDATA)>

   <!ELEMENT code    (#PCDATA | %colorized;)*>
   <!ELEMENT math    (#PCDATA | %colorized;)*>
   <!ELEMENT italic  (#PCDATA | %colorized;)*>
   <!ELEMENT bold    (#PCDATA | %colorized;)*>
   <!ELEMENT indexed (#PCDATA | %colorized;)>
   <!ATTLIST code style CDATA #IMPLIED>

   <!ELEMENT symbol (#PCDATA)>
   <!ELEMENT wbr>

Class	`Element`	A very simple DOM-like representation for parsed epytext documents. Each epytext document is encoded as a tree whose nodes are `Element` objects, and whose leaves are `string`s. Each node is marked by a ...
Class	`ParsedEpytextDocstring`	Undocumented
Class	`Token`	`Token`s are an intermediate data structure used while constructing the structuring DOM tree for a formatted docstring. There are five types of `Token`:
Exception	`ColorizingError`	An error generated while colorizing a paragraph.
Exception	`StructuringError`	An error generated while structuring a formatted documentation string.
Exception	`TokenizationError`	An error generated while tokenizing a formatted documentation string.
Function	`get_parser`	Get the `parse_docstring` function.
Function	`gettext`	Return the text inside the epytext element(s).
Function	`parse`	Return a DOM tree encoding the contents of an epytext string. Any errors generated during parsing will be stored in `errors`.
Function	`parse_docstring`	Parse the given docstring, which is formatted using epytext; and return a `ParsedDocstring` representation of its contents.
Function	`slugify`	A generic slugifier utility (currently only for Latin-based scripts). Example:
Constant	`SYMBOLS`	A list of the of escape symbols that are supported by epydoc. Currently the following symbols are supported :
Variable	`__doc__`	Undocumented
Variable	`symblist`	Undocumented
Function	`_add_list`	Add a new list item or field to the DOM tree, with the given bullet or field tag. When necessary, create the associated list.
Function	`_add_para`	Colorize the given paragraph, and add it to the DOM tree.
Function	`_add_section`	Add a new section to the DOM tree, with the given heading.
Function	`_colorize`	Given a string containing the contents of a paragraph, produce a DOM `Element` encoding that paragraph. Colorized regions are represented using DOM `Element`s, and text is represented using DOM `Text`s.
Function	`_colorize_link`	Undocumented
Function	`_pop_completed_blocks`	Pop any completed blocks off the stack. This includes any blocks that we have dedented past, as well as any list item blocks that we've dedented to. The top element on the stack should only be a list if we're about to start a new list item (i...
Function	`_tokenize`	Split a given formatted docstring into an ordered list of `Token`s, according to the epytext markup rules.
Function	`_tokenize_doctest`	Construct a `Token` containing the doctest block starting at `lines[start]`, and append it to `tokens`. `block_indent` should be the indentation of the doctest block. Any errors generated while tokenizing the doctest block will be appended to ...
Function	`_tokenize_listart`	Construct `Token`s for the bullet and the first paragraph of the list item (or field) starting at `lines[start]`, and append them to `tokens`. `bullet_indent` should be the indentation of the list item. Any errors generated while tokenizing will be appended to ...
Function	`_tokenize_literal`	Construct a `Token` containing the literal block starting at `lines[start]`, and append it to `tokens`. `block_indent` should be the indentation of the literal block. Any errors generated while tokenizing the literal block will be appended to ...
Function	`_tokenize_para`	Construct a `Token` containing the paragraph starting at `lines[start]`, and append it to `tokens`. `para_indent` should be the indentation of the paragraph . Any errors generated while tokenizing the paragraph will be appended to ...
Constant	`_BRACE_RE`	Undocumented
Constant	`_BULLET_RE`	Undocumented
Constant	`_COLORIZING_TAGS`	Undocumented
Constant	`_ESCAPES`	Undocumented
Constant	`_FIELD_BULLET`	Undocumented
Constant	`_FIELD_BULLET_RE`	Undocumented
Constant	`_HEADING_CHARS`	Undocumented
Constant	`_LINK_COLORIZING_TAGS`	Undocumented
Constant	`_LIST_BULLET_RE`	Undocumented
Constant	`_OLIST_BULLET`	Undocumented
Constant	`_SYMBOLS`	Undocumented
Constant	`_TARGET_RE`	Undocumented
Constant	`_ULIST_BULLET`	Undocumented

def get_parser(obj: Documentable | None) -> ParserFunction: (source) ¶

Get the parse_docstring function.

def gettext(node: str | Element | list[str | Element]) -> list[str]: (source) ¶

Return the text inside the epytext element(s).

def parse(text: str, errors: list[ParseError]) -> Element | None: (source) ¶

Return a DOM tree encoding the contents of an epytext string. Any errors generated during parsing will be stored in errors.

Parameters
text:`str`	The epytext string to parse.
errors:`list[ParseError]`	A list where any errors generated during parsing will be stored. If no list is specified, then fatal errors will generate exceptions, and non-fatal errors will be ignored.
Returns
`Element \| None`	a DOM tree encoding the contents of an epytext string, or `None` if non-fatal errors were encountered and no `errors` accumulator was provided.
Raises
`ParseError`	If `errors` is `None` and an error is encountered while parsing.

def parse_docstring(docstring: str, errors: list[ParseError]) -> ParsedDocstring: (source) ¶

Parse the given docstring, which is formatted using epytext; and return a ParsedDocstring representation of its contents.

Parameters
docstring:`str`	The docstring to parse
errors:`list[ParseError]`	A list where any errors generated during parsing will be stored.
Returns
`ParsedDocstring`	Undocumented

def slugify(string: str) -> str: (source) ¶

A generic slugifier utility (currently only for Latin-based scripts). Example:

>>> slugify("Héllo Wörld")
"hello-world"

SYMBOLS: list[str] = (source) ¶

A list of the of escape symbols that are supported by epydoc. Currently the following symbols are supported :

    # Arrows
    '<-', '->', '^', 'v',

    # Greek letters
    'alpha', 'beta', 'gamma', 'delta', 'epsilon', 'zeta',
    'eta', 'theta', 'iota', 'kappa', 'lambda', 'mu',
    'nu', 'xi', 'omicron', 'pi', 'rho', 'sigma',
    'tau', 'upsilon', 'phi', 'chi', 'psi', 'omega',
    'Alpha', 'Beta', 'Gamma', 'Delta', 'Epsilon', 'Zeta',
    'Eta', 'Theta', 'Iota', 'Kappa', 'Lambda', 'Mu',
    'Nu', 'Xi', 'Omicron', 'Pi', 'Rho', 'Sigma',
    'Tau', 'Upsilon', 'Phi', 'Chi', 'Psi', 'Omega',

    # HTML character entities
    'larr', 'rarr', 'uarr', 'darr', 'harr', 'crarr',
    'lArr', 'rArr', 'uArr', 'dArr', 'hArr',
    'copy', 'times', 'forall', 'exist', 'part',
    'empty', 'isin', 'notin', 'ni', 'prod', 'sum',
    'prop', 'infin', 'ang', 'and', 'or', 'cap', 'cup',
    'int', 'there4', 'sim', 'cong', 'asymp', 'ne',
    'equiv', 'le', 'ge', 'sub', 'sup', 'nsub',
    'sube', 'supe', 'oplus', 'otimes', 'perp',

    # Alternate (long) names
    'infinity', 'integral', 'product',
    '>=', '<=',

Value

['<-',
 '->',
 '^',
 'v',
 'alpha',
 'beta',
 'gamma',
...

__doc__ = (source) ¶

Undocumented

symblist = (source) ¶

Undocumented

def _add_list(bullet_token: Token, stack: list[Element], indent_stack: list[int | None], errors: list[ParseError]): (source) ¶

Add a new list item or field to the DOM tree, with the given bullet or field tag. When necessary, create the associated list.

def _add_para(para_token: Token, stack: list[Element], indent_stack: list[int | None], errors: list[ParseError]): (source) ¶

Colorize the given paragraph, and add it to the DOM tree.

def _add_section(heading_token: Token, stack: list[Element], indent_stack: list[int | None], errors: list[ParseError]): (source) ¶

Add a new section to the DOM tree, with the given heading.

def _colorize(token: Token, errors: list[ParseError], tagName: str = 'para') -> Element: (source) ¶

Given a string containing the contents of a paragraph, produce a DOM Element encoding that paragraph. Colorized regions are represented using DOM Elements, and text is represented using DOM Texts.

Parameters
token:`Token`	Undocumented
errors:`list` of `string`	A list of errors. Any newly generated errors will be appended to this list.
tagName:`string`	The element tag for the DOM `Element` that should be generated.
Returns
`Element`	a DOM `Element` encoding the given paragraph.

def _colorize_link(link: Element, token: Token, end: int, errors: list[ParseError]): (source) ¶

Undocumented

def _pop_completed_blocks(token: Token, stack: list[Element], indent_stack: list[int | None]): (source) ¶

Pop any completed blocks off the stack. This includes any blocks that we have dedented past, as well as any list item blocks that we've dedented to. The top element on the stack should only be a list if we're about to start a new list item (i.e., if the next token is a bullet).

def _tokenize(text: str, errors: list[ParseError]) -> list[Token]: (source) ¶

Split a given formatted docstring into an ordered list of Tokens, according to the epytext markup rules.

Parameters
text:`str`	The epytext string
errors:`list[ParseError]`	A list where any errors generated during parsing will be stored. If no list is specified, then errors will generate exceptions.
Returns
`list[Token]`	a list of the `Token`s that make up the given string.

def _tokenize_doctest(lines: list[str], start: int, block_indent: int, tokens: list[Token], errors: list[ParseError]) -> int: (source) ¶

Construct a Token containing the doctest block starting at lines[start], and append it to tokens. block_indent should be the indentation of the doctest block. Any errors generated while tokenizing the doctest block will be appended to errors.

Parameters
lines:`list[str]`	The list of lines to be tokenized
start:`int`	The index into `lines` of the first line of the doctest block to be tokenized.
block_indent:`int`	The indentation of `lines[start]`. This is the indentation of the doctest block.
tokens:`list[Token]`	Undocumented
errors:`list[ParseError]`	A list where any errors generated during parsing will be stored. If no list is specified, then errors will generate exceptions.
Returns
`int`	The line number of the first line following the doctest block.

def _tokenize_listart(lines: list[str], start: int, bullet_indent: int, tokens: list[Token], errors: list[ParseError]) -> int: (source) ¶

Construct Tokens for the bullet and the first paragraph of the list item (or field) starting at lines[start], and append them to tokens. bullet_indent should be the indentation of the list item. Any errors generated while tokenizing will be appended to errors.

Parameters
lines:`list[str]`	The list of lines to be tokenized
start:`int`	The index into `lines` of the first line of the list item to be tokenized.
bullet_indent:`int`	The indentation of `lines[start]`. This is the indentation of the list item.
tokens:`list[Token]`	Undocumented
errors:`list[ParseError]`	A list of the errors generated by parsing. Any new errors generated while will tokenizing this paragraph will be appended to this list.
Returns
`int`	The line number of the first line following the list item's first paragraph.

def _tokenize_literal(lines: list[str], start: int, block_indent: int, tokens: list[Token], errors: list[ParseError]) -> int: (source) ¶

Construct a Token containing the literal block starting at lines[start], and append it to tokens. block_indent should be the indentation of the literal block. Any errors generated while tokenizing the literal block will be appended to errors.

Parameters
lines:`list[str]`	The list of lines to be tokenized
start:`int`	The index into `lines` of the first line of the literal block to be tokenized.
block_indent:`int`	The indentation of `lines[start]`. This is the indentation of the literal block.
tokens:`list[Token]`	Undocumented
errors:`list[ParseError]`	A list of the errors generated by parsing. Any new errors generated while will tokenizing this paragraph will be appended to this list.
Returns
`int`	The line number of the first line following the literal block.

def _tokenize_para(lines: list[str], start: int, para_indent: int, tokens: list[Token], errors: list[ParseError]) -> int: (source) ¶

Construct a Token containing the paragraph starting at lines[start], and append it to tokens. para_indent should be the indentation of the paragraph . Any errors generated while tokenizing the paragraph will be appended to errors.

Parameters
lines:`list[str]`	The list of lines to be tokenized
start:`int`	The index into `lines` of the first line of the paragraph to be tokenized.
para_indent:`int`	The indentation of `lines[start]`. This is the indentation of the paragraph.
tokens:`list[Token]`	Undocumented
errors:`list[ParseError]`	A list of the errors generated by parsing. Any new errors generated while will tokenizing this paragraph will be appended to this list.
Returns
`int`	The line number of the first line following the paragraph.

_BRACE_RE = (source) ¶

Undocumented

Value

re.compile(r'[\{\}]')

_BULLET_RE = (source) ¶

Undocumented

Value

re.compile((_ULIST_BULLET + '|' + _OLIST_BULLET + '|' + _FIELD_BULLET))

_COLORIZING_TAGS: dict[str, str] = (source) ¶

Undocumented

Value

{'C': 'code',
 'M': 'math',
 'I': 'italic',
 'B': 'bold',
 'U': 'uri',
 'L': 'link',
 'E': 'escape',
...

_ESCAPES: dict[str, str] = (source) ¶

Undocumented

Value

{'lb': '{', 'rb': '}'}

_FIELD_BULLET: str = (source) ¶

Undocumented

Value

'@\\w+( [^{}:\\n]+)?:'

_FIELD_BULLET_RE = (source) ¶

Undocumented

Value

re.compile(_FIELD_BULLET)

_HEADING_CHARS: str = (source) ¶

Undocumented

Value

'=-~'

_LINK_COLORIZING_TAGS: list[str] = (source) ¶

Undocumented

Value

['link', 'uri']

_LIST_BULLET_RE = (source) ¶

Undocumented

Value

re.compile((_ULIST_BULLET + '|' + _OLIST_BULLET))

_OLIST_BULLET: str = (source) ¶

Undocumented

Value

'(\\d+[.])+( +|$)'

_SYMBOLS = (source) ¶

Undocumented

Value

set(SYMBOLS)

_TARGET_RE = (source) ¶

Undocumented

Value

re.compile(r'^(.*?)\s*<(?:URI:|L:)?([^<>]+)>$')

_ULIST_BULLET: str = (source) ¶

Undocumented

Value

'[-]( +|$)'