cl markless

1.0.0

A parser implementation for Markless

About cl-markless

This is an implementation of the Markless standard at version 1.0. It handles the parsing of plaintext from a stream into an abstract syntax tree composed out of strings and component objects. From there the AST can be easily compiled into a target markup language like HTML.

How To

To parse a Markless document, simply call the function parse:

(cl-markless:parse "Hello!" T)

This will return the generated AST. From there on you can manipulate, inspect, or compile it further.

(cl-markless:output * :format 'cl-markless:debug)

One thing in particular to note is that cl-markless requires LF (unix-style) line endings. CR (mac) or CRLF (windows) have to be converted ahead of time, or parse results will not be as expected.

Writing Markless

You can find a lengthy tutorial on the Markless website.

Extending cl-markless

The Markless standard permits extension in a few ways, all of which cl-markless supports. Furthermore, cl-markless allows the seamless addition of output compilers to allow integrating more target languages.

Directives

Adding a new directive will involve some work, as parsing the proper syntax can be complicated. Please read the section for the parser algorithm as well. Generally the parsing behaviour is controlled via 6 central functions.

All of these functions return a cursor — an index into the current line. consume-prefix and consume-end may also return NIL in order to signify a failed match.

Depending on the type of directive and the complexity of its syntax, some or most of these functions require specific methods for your directive. Within those methods, the directive can manipulate the parser state using

Naturally a directive can do whatever it pleases when called, so the above is only an outline of the most useful functions.

Block Directives

Block directives must be a subclass of block-directive and define methods on prefix, begin, consume-prefix, and optionally invoke. By default the invoke on a block directive will call read-block, causing further blocks to be read.

If your directive can only span a single line, you should subclass singular-line-directive instead, for which only methods on prefix, begin, and optionally invoke are necessary. By default read-inline is called for invoke.

Inline Directives

Inline directives must be a subclass of inline-directive and define methods on prefix, begin, consume-end, and optionally invoke. By default the invoke on an inline directive will call read-inline.

If your directive has a constant prefix that is also the same as its ending suffix, you should subclass singular-line-directive instead.

Instructions

Adding a new instruction type requires the following steps:

  1. Add a new component that is a subclass of instruction.

  2. Add a method to parse-instruction that specialises on this new component class, and use it to parse the line into the appropriate instance of your instruction component.

  3. Add a method to evaluate-instruction that specialises on your component class and performs whatever task that your instruction should allow.

Embed Types

A new embed type requires only two steps:

  1. Add a new component that is a subclass of embed.

  2. Add methods to embed-option-allowed-p specialised on your component and each permitted option that simply return T.

Embed Options

Adding new embed options requires a couple more steps:

  1. Add a new component that is a subclass of embed-option.

  2. Add a method to parse-embed-option-type specialised on your new option class, which parses the given option string into a new instance of your class.

  3. Define methods on embed-option-allowed-p that just return T for all embed types that your new option would be appropriate for.

Compound Options

New compound options requires a similar procedure as for embed options.

  1. Add a new component that is a subclass of compound-option.

  2. Add a method to parse-compound-option-type specialised on your new option class, which parses the given option string into a new instance of your class.

Unlike embed options there's no verification step, as any combination of compound options is allowed.

Colour and Size Names

Cl-markless includes a number of colour and size names for the compound option out of the box. If you would like to add or modify those, simply modify the *color-table* and *size-table*.

Output Translators

Defining a new output translator is only a matter of adding a subclass to output-translator and adding the appropriate methods to output-component. To make this a bit shorter for the default case of wanting to output to streams, define-output can be used. Since each format has very different constraints on what it should look like, nothing beyond these two functions is really offered.

Parser Algorithm

The parser operates via a set of directives and a stack. Each entry in the stack holds a component and a directive. At the beginning of a parse, the stack is emptied and a first stack item is filled composed out of fresh root-directive and root-component instances. The parser then operates as follows:

  1. If the stream has things to read, it reads a new line via read-full-line.

    1. If it does not have anything new to read, the parse is completed:

    2. The stack is unwound to 0, causing all directives still active to be ended.

    3. The root-component is returned.

  2. process-stack is called on the parser, its stack, and the new line.

    1. The stack is traversed upwards, calling consume-prefix on each directive in turn and updating the cursor.

      1. If consume-prefix returns NIL for a directive:

      2. The stack is unwound to and including the current point, calling end on each directive that is popped off the stack.

    2. invoke is called on the directive at the top of the stack and the cursor is updated.

    3. If the cursor is not yet at the end of the line, go back to 2.2.

  3. Go back to 1.

You may note that in this algorithm it is not typical for the cursor to move backwards, and straight out impossible to go back a line. This is despite the fact that Markless may seem to force a lot of backtracking on invalidly matched inline directives. The crux here is that when an inline directive is aborted via end, it can invoke change-class on the component it inserted to transform it into an invisible parent-component and then push its consumed prefix to the front of the child array.

A similar strategy can be employed for block directives that need to match more than the standard two prefix chars: on an invalid match they can pretend to be a paragraph and insert the paragraph directive and component into the stack instead of their own. Since Markless has a guarantee that each directive must match a unique prefix, this strategy is possible without excessive backtracking and reparsing.

Tests

This implementation includes a test suite that should cover most of the aspects of the Markless standard. The tests are intentionally formatted in a simple way that should allow re-using them to verify other implementations for correctness. See the tests directory for more information.

To run the test suite on this implementation:

(asdf:test-system :cl-markless)

Output Formats

Additional output formats are provided by external systems.

Standalone Executable

You can create a standalone executable of cl-markless with a command line interface. See cl-markless-standalone.

System Information

1.0.0
Nicolas Hafner
Artistic

Definition Index