Documentation with Verso

2. Verso Markup🔗

Lean's documentation markup language is a close relative of Markdown, but it's not identical to it.

2.1. Design Principles🔗

Syntax errors

Fail fast rather than producing unexpected output or having complicated rules.

Reduce lookahead

Parsing should succeed or fail as locally as possible.

Extensibility

There should be dedicated features for compositionally adding new kinds of content, rather than relying on a collection of ad-hoc textual subformats.

Assume Unicode

Lean users are used to entering Unicode directly and have good tools for it, so there's no need to support alternative textual syntaxes for characters not on keyboards such as em dashes or typographical quotes.

Markdown compatibility

Users benefit from existing muscle memory and familiarity when it doesn't lead to violations of the other principles.

2.2. Syntax🔗

Like Markdown, Lean's markup has three primary syntactic categories:

Document structure

Headers, footnote definitions, and named links give greater structure to a document. They may not be nested inside of blocks.

Block elements

The main organization of written text, including paragraphs, lists, and quotations. Some blocks may be nested: for example, lists may contain other lists.

Inline elements

The ordinary content of written text, such as text itself, bold or emphasized text, and hyperlinks.

2.2.1. Document Structure🔗

Documents are organized into parts. A part is a logical division of the content of the document, such as a chapter, section, subsection, or volume. Parts may have associated metadata, such as authors, publication dates, internal identifiers for cross-referencing, or desired URLs; the specific metadata associated with a part is determined by the document's genre.

A part contains a sequence of blocks followed by a sequence of sub-parts, either of which may be empty.

A part is introduced with a header. Headers consist of one or more hash marks (#) at the beginning of a line followed by a sequence of inlines. The number of hash marks indicates the nesting of a header, and headers with more hash marks indicate parts at a lower level. A lower-level header introduces a sub-part of the preceding header, while a header at a level at least as high as the preceding header terminates that part. In other words, header levels induce a tree structure on the document.

Headers must be well-nested: the first header in a document must have exactly one hash mark, and all subsequent headers may have at most one more hash mark than the preceding header. This is a syntactic requirement: regardless of whether a Verso file contains the text of a whole book, a chapter, or just a single section, its outermost header must be introduced with a single hash mark. Headers indicate the logical nesting structure of the document, rather than the headers to be chosen when rendering the document to output formats such as HTML.

Metadata may be associated with a header by following it with a metadata block. Metadata blocks begin and end with %%%, and they contain any syntax that would be acceptable in a Lean structure initializer.

Header
Verso Markup
# Top-Level Header
Result
<h1>Top-Level Header</h1>
Blah
Verso Markup
a b c
Result
<p> a b c </p>

2.2.2. Block Syntax🔗

Paragraphs are undecorated: any sequence of inlines that is not another block is a paragraph. Paragraphs continue until a blank line (that is, a line containing only whitespace) or another block begins:

Paragraphs
Verso Markup
This is one paragraph.
Even though this sentence is on a
new line, the paragraph continues.

This is a new paragraph.
* This list stopped the paragraph.
* As in Markdown and SGML, lists
  are not part of paragraphs.
Result
<p>
  This is one paragraph. Even
  though this sentence is on a new
  line, the paragraph continues.
</p>

<p> This is a new paragraph. </p>

<ul>
  <li>
    <p>
      This list stopped the
      paragraph.
    </p>
  </li>
  <li>
    <p>
      As in Markdown and SGML,
      lists   are not part of
      paragraphs.
    </p>
  </li>
</ul>

2.2.2.1. Lists🔗

There are three kinds of lists:

Unordered lists

Unordered lists indicate that the order of the items in the list is not of primary importance. They correspond to <ul> in HTML or \begin{itemize} in LaTeX.

Ordered lists

Unordered lists indicate that the order of the items in the list is important. They correspond to <ol> in HTML or \begin{enumerate} in LaTeX.

Description lists

Description lists associate a term with more information. This very list is a description list. They correspond to <dl> in HTML or \begin{description} in LaTeX.

A line that begins with zero or more spaces followed by a *, -, or + starts an item of an unordered list. This character is called the list item indicator. Subsequent items are part of the same list if they use the same indicator character and have the same indentation. Any subsequent blocks whose first character is indented further than the indicator are part of the item itself; items may contain multiple blocks, or even other lists.

Unordered Lists
Verso Markup
* A list with two items
* They both start in column 0
Result
<ul>
  <li>
    <p> A list with two items </p>
  </li>
  <li>
    <p>
      They both start in column 0
    </p>
  </li>
</ul>
Indented Unordered Lists
Verso Markup
 * A list with two items
 * They both start in column 1
Result
<ul>
  <li>
    <p> A list with two items </p>
  </li>
  <li>
    <p>
      They both start in column 1
    </p>
  </li>
</ul>
List Indicators
Verso Markup
 * Two lists
 + They have different indicators.
Result
<ul>
  <li> <p> Two lists </p> </li>
</ul>

<ul>
  <li>
    <p>
      They have different
      indicators.
    </p>
  </li>
</ul>
List Indentation Sensitivity
Verso Markup
 * A list with one element
* Another list, due to different
  indentation
Result
<ul>
  <li>
    <p>
      A list with one element
    </p>
  </li>
</ul>

<ul>
  <li>
    <p>
      Another list, due to
      different   indentation
    </p>
  </li>
</ul>
Sub-Lists
Verso Markup
* A list with one item.
  It contains this paragraph

  * And this other sub-list

  And this paragraph
Result
<ul>
  <li>
    <p>
      A list with one item.   It
      contains this paragraph
    </p><ul>
      <li>
        <p>
          And this other sub-list
        </p>
      </li>
    </ul><p>
      And this paragraph
    </p>
  </li>
</ul>

A line that starts with zero or more spaces, followed by one or more digits and then either a period (e.g. 1.) or a closing parenthesis (1)), begins an item of an ordered list. Like unordered lists, ordered lists are sensitive to both indentation and indicator characters.

Ordered Lists
Verso Markup
1. First, write a number.
2. Then, an indicator (`.` or `)`)
Result
<ol>
  <li>
    <p> First, write a number. </p>
  </li>
  <li>
    <p>
      Then, an indicator
      (<code>"."</code> or
      <code>")"</code>)
    </p>
  </li>
</ol>
Ordered Lists and Indicators
Verso Markup
1. Two lists
2) They have different indicators.
Result
<ol>
  <li> <p> Two lists </p> </li>
</ol>

<ol>
  <li>
    <p>
      They have different
      indicators.
    </p>
  </li>
</ol>

Description lists consist of a sequence of description items that have the same indentation. A description item is a line that starts with zero or more spaces, followed by a colon, followed by a sequence of inlines and a blank line. After the blank line, there must be one or more blocks with indentation greater than the colon.

Description Lists
Verso Markup
: Item 1

  Description of item 1

: Item 2

  Description of item 2
Result
<dl>
    <dt>  Item 1 </dt>
  <dd>
    <p> Description of item 1 </p>
  </dd>
  
    <dt>  Item 2 </dt>
  <dd>
    <p> Description of item 2 </p>
  </dd>
   </dl>

2.2.2.2. Quotes🔗

Quotation blocks show that a sequence of blocks were spoken by someone other than the author. They consist of a line that starts with zero or more spaces, followed by a greater-than sign (>), followed by a sequence of blocks that are indented more than the greater-than sign.

Quotations
Verso Markup
It is said that:
> Quotations are excellent.

  This paragraph is part of the
  quotation.

 So is this one.

But not this one.
Result
<p> It is said that: </p>

<blockquote>
  <p>
    Quotations are excellent.
  </p>
  <p>
    This paragraph is part of the  
    quotation.
  </p>
  <p> So is this one. </p>
</blockquote>

<p> But not this one. </p>

2.2.2.3. Code Blocks🔗

Code blocks begin with three or more back-ticks at the start of a line (they may be preceded by spaces). This is referred to as a fence. They may optionally have a name and a sequence of arguments after the back-ticks. The code block continues until a line that contains only the same number of back-ticks at the same indentation. Every line in the code block must be at least as indented as the fence, and it denotes the string that results from removing the indentation from each line. Code blocks may not contain a sequence of back-ticks that is at least as long as the fence; to add more back-ticks to the code block, the fence must be made larger.

Code Blocks
Verso Markup
```
This is a code block
with two lines
```
Result
<codeblock>
  1|This is a code block⏎
  2|with two lines⏎
</codeblock>
Code Blocks as Extensions
Verso Markup
```lean
-- This code block is named
-- `lean`. It was called with
-- no arguments.
def x := 5
```
Result
<lean >
  1|-- This code block is named⏎
  2|-- `lean`. It was called with⏎
  3|-- no arguments.⏎
  4|def x := 5⏎
</lean>
Indented Code Blocks
Verso Markup
  ```lean
  -- This indented code block
  -- denotes the de-indented
  -- string
  def x := 5
  ```
Result
<lean >
  1|-- This indented code block⏎
  2|-- denotes the de-indented⏎
  3|-- string⏎
  4|def x := 5⏎
</lean>
Code Blocks and Arguments
Verso Markup
```lean (error := true)
-- This code block is named
-- `lean`, called with the
-- named argument `error`
-- set to `true`.
def x : String := 5
```
Result
<lean  error="true">
  1|-- This code block is named⏎
  2|-- `lean`, called with the⏎
  3|-- named argument `error`⏎
  4|-- set to `true`.⏎
  5|def x : String := 5⏎
</lean>

When a code block has a name, then the name is resolved in the current Lean namespace and used to select an implementation. The chapter on Verso markup extensions has more details on this process.

2.2.2.4. Directives🔗

A directive is a kind of block with no meaning other than that assigned by extensions, akin to a custom environment in LaTeX or a <div> tag in HTML. Directives begin with three or more colons and a name with zero or more arguments, and end with the same number of colons at the same indentation on a line by themselves. They may contain any number of blocks, which must be indented at least as much as the colons. Nested directives must begin and end with strictly fewer colons than the surrounding directives.

The chapter on Verso markup extensions describes the processing of directives in more detail.

This is an empty directive:

Directives
Verso Markup
:::nothing
:::
Result
<nothing >  </nothing>

This directive contains a directive and a paragraph:

Nested Directives
Verso Markup
::::outer
:::inner
Hello
:::
This is a paragraph
::::
Result
<outer >
  <inner > <p> Hello </p> </inner>
  <p> This is a paragraph </p>
</outer>

2.2.2.5. Commands🔗

A line that consists of only a set of curly braces that contain a name and zero or more arguments is a command. The name is used to select an implementation for the command, which is then invoked during elaboration. The chapter on Verso markup extensions has more details on this process.

Commands
Verso Markup
{include 0 MyChapter}
Result
<include  0 MyChapter/>

2.2.3. Inline Syntax🔗

Emphasis is written with underscores:

Emphasis
Verso Markup
Here's some _emphasized_ text
Result
<p>
  Here's some
  <emph> emphasized </emph> text
</p>

Emphasis can be nested by using more underscores for the outer emphasis:

Nested Emphasis
Verso Markup
Here's some __emphasized text
with _extra_ emphasis__ inside
Result
<p>
  Here's some
  <emph>
    emphasized text with
    <emph> extra </emph> emphasis
  </emph>
  inside
</p>

Strong emphasis (bold) is written with asterisks:

Strong Emphasis (Bold)
Verso Markup
Here's some *bold* text
Result
<p>
  Here's some <bold>bold</bold>
  text
</p>

Hyperlinks consist of the link text in square brackets followed by the target in parentheses:

Links
Verso Markup
[Lean](https://lean-lang.org)
Result
<p>
  <a href="https://lean-lang.org">
    Lean
  </a>
</p>

Link targets may also be named:

Named Link Targets
Verso Markup
The [Lean website][lean]

[lean]: https://lean-lang.org
Result
<p>
  The
  <a href="(value of «lean»)">Lean
  website</a>
</p>

where
  «lean» := https://lean-lang.org

Literal code elements are written in back-ticks. The same number of back-ticks that opens a code element must be used to close it, and the code element may not contain a sequence of back-ticks that's at least as long as its opener and closer.

Literal Code
Verso Markup
The definition of `main`
Result
<p>
  The definition of
  <code>"main"</code>
</p>

As a special case, code inlines that being and end with a space denote the string that results from removing one leading and trailing space. This makes it possible to represent values that begin or end with back-ticks:

Spaces in Code
Verso Markup
`` `quotedName ``
Result
<p> <code>"`quotedName"</code> </p>

or with spaces:

Multiple Spaces in Code
Verso Markup
``  one space  ``
Result
<p> <code>" one space "</code> </p>

Images require both alternative text and an address for the image:

Images
Verso Markup
![Alt text](image.png)
Result
<p>
  <img
    src="image.png"
    alt="Alt text"/>
</p>
Named Image Addresses
Verso Markup
![Alternative text][image]

[image]: image.png
Result
<p>
  <img
    src="value of «image»"
    alt="Alternative text"/>
</p>

where «image» := image.png

TeX math can be included using a single or double dollar sign followed by code. Two dollar signs results in display-mode math, so $`\sum_{i=0}^{10} i` results in \sum_{i=0}^{10} i while $$`\sum_{i=0}^{10} i` results in: \sum_{i=0}^{10} i

A role indicates that subsequent inlines have a special meaning. Roles can be used to trigger special handling of code (e.g. elaboration), register definitions and uses of technical terms, add marginal notes, and much more. They correspond to custom commands in LaTeX. A role consists of curly braces that contain a name and arguments, all on the same line, followed immediately either by a self-delimiting inline or by square brackets that contain zero or more inlines.

An inline is self-delimiting if it has distinct start and end tokens that make its ending position clear. Emphasis, strong emphasis, code, links, images, math, and roles are self-delimiting.

This role contains multiple inlines using square brackets:

Roles With Explicit Scope
Verso Markup
{ref "chapter-tag"}[the chapter on
$`\frac{1}{2}`-powered syntax]
Result
<p>
  <ref "chapter-tag">
    the chapter on  
    <math contents="\\frac{1}{2}"/>
    -powered syntax
  </ref>
</p>

This one takes a single inline code element without needing square brackets:

Single-Argument Roles
Verso Markup
{lean}`2 + f 4`
Result
<p>
  <lean >
    <code>"2 + f 4"</code>
  </lean>
</p>

2.3. Differences from Markdown🔗

This is a quick "cheat sheet" for those who are used to Markdown, documenting the differences.

2.3.1. Syntax Errors🔗

While Markdown includes a set of precedence rules to govern the meaning of mismatched delimiters (such as in what _is *bold_ or emph*?), these are syntax errors in Lean's markup. Similarly, Markdown specifies that unmatched delimiters (such as * or _) should be included as characters, while Lean's markup requires explicit escaping of delimiters.

This is based on the principle that, for long-form technical writing, it's better to catch typos while writing than while reviewing the text later.

2.3.2. Reduced Lookahead🔗

In Markdown, whether [this][here] is a link depends on whether here is defined as a link reference target somewhere in the document. In Lean's markup, it is always a link, and it is an error if here is not defined as a link target.

2.3.3. Header Nesting🔗

In Lean's markup, every document already has a title, so there's no need to use the highest level header (#) to specify one. Additionally, all documents are required to use # for their top-level header, ## for the next level, and so forth, because a single file may represent a section, a chapter, or even a whole book. Authors should not need to maintain a global mapping from header levels to document structures, so Lean's markup automatically assigns these based on the structure of the document.

2.3.4. Genre-Specific Extensions🔗

Markdown has no standard way for specific tools or styles of writing to express domain- or genre-specific concepts. Lean's markup provides standard syntaxes to use for this purpose, enabling compositional extensions.

2.3.5. Fewer Unused Features🔗

Markdown has a number of features that are rarely used in practice. They have been removed from Verso to reduce surprises while using it and make documents more predictable. This includes:

Other Markdown features don't make sense for non-HTML output, and can be implemented by a genre using code blocks or directives. They have also been removed from Verso. In particular, this includes HTML blocks, raw HTML and thematic breaks.

Finally, some Markdown features are used by a minority of authors, and make sense in all backends, but were not deemed worth the complexity budget. In particular, this includes auto-links.