CoDL is a format for representing tree-structured data, designed for documents that may
be edited by both humans and computers. CoDL keeps markup to a minimum: spaces and
newlines define structure, while #
is the only other meaningful character, for starting
a comment.
- Models tree-structured data
- Symbolic markup is minimal, making it more enjoyable to write
- Automatic modifications don't reformat a document
- Documents may be untyped or typed
- Allows embedded textual content (such as XML or JSON) without escaping
- Lightweight data schemas with simple syntax
- User-extensible data verification
- Fast and lightweight binary format (BiCoDL)
- Safe schema evolution with compatibility checking
- Allows comments, which may be attached to data, or not
- Both data and schemas are composable
Here is an example of CoDL being used to describe a project and modules:
import parent
project main
module alpha
name Alpha
description This is a description
# Todo: tidy up this section
# Previously called "beta"
module gamma
name Gamma
description
This is a longer description which flows onto
more than one line.
In this example, the keywords import
, project
or name
are not part of CoDL
, but may be part of a
schema that defines the data's structure, such as:
import ref!
project id!
module id!
name value
description? value&
links? link+
Each line defines a keyword, and gives names (e.g. ref
or value
) to its parameters, indicating where it
may appear in the document with indentation. The symbols !
, ?
and &
define the multiplicity of a keyword;
details like whether it is required, unique or may be repeated.
Writing CoDL is easy. Each line contains some words, or data, which represent a node in the tree. Each line is indented by a number of spaces. If the indentation is the same as the previous line, then the node is a sibling or peer—it shares the same parent node. If it is indented two spaces more than the previous line, then it is a child.
If the line has less indentation than the previous line, then it is the child of an earlier node: the last one with two fewer spaces of indentation—exactly as the visual appearance implies.
Any other indentation, including having an odd number of spaces, is considered an error. There is one exception to this for supporting multiline strings, which is explained below.
Each data line contains one or more words (or character sequences), separated by spaces. Any number of spaces may appear between words without any semantic significance,
Blank lines may appear anywhere. This format has no significant punctuation other than whitespace, and it should seem very natural to a human reader.
A CoDL document may contain human-readable comments. They contain no data, and their contents is not interpreted but they are part of the CoDL metamodel, and can only appear in certain places in a document.
Comments always begin with a #
character. The #
must either start a line or be preceded by at least one
space, and must always be followed by one or more spaces.
For example, the line,
email [email protected] # The user's email address
would contain the data, email [email protected]
, and the comment, The user's email address
, but the line,
url https://example.com/page#ref
and,
reference #foo
would not contain any comments.
A comment may also appear alone on a line, but the whitespace around it is significant: it must exist at a valid indentation level, that is, preceded by an even number of spaces, and up to one level higher than the previous line.
For example, like this,
usr
local
bin
# This is a valid comment
or this,
usr
local
bin
# This is a valid comment
but not this,
usr
local
bin
# This is a valid comment
or this:
usr
local
bin
# This is a valid comment
Comments are attached to data nodes, and their attachment is determined by the whitespace around them. Comments will attach to a data node if they appear on the line immediately preceding the data, at the same level of indentation. If that node is deleted by a computer editor, the comment will be deleted too.
Standalone comments may also be followed by blank line, in which case their are attached to the parent node. Such comments will be retained even when data nodes around them are modified, but will be removed if the parent node is deleted.
An uninterrupted sequence of comment lines at the same indentation level is treated as a single comment.
There are two special rules relating to comments on the first line of a CoDL document: if the first line is
a comment (one or more lines long), then it must be followed by a blank line; and the requirement that the
#
be followed by a space is relaxed only for a comment on the first line of the document.
These two exceptions facilitate the inclusion of a shebang line at the start of a document, such as,
#!/usr/bin/env processor
model
data
Sometimes it is necessary to write a value containing more than one line of text, or which contains spaces,
or the character sequence #
, without being considered a comment. This is possible using a double indent:
instead of writing a key and its value on the same line, such as,
dog
name Fido
description furry
we can write:
dog
name Fido
description
Furry, brown and cuddly.
A double-indented value continues so long as its indentation level is maintained. Thus, in,
dog
name Fido
description
Furry, brown
and cuddly.
the value of description
would be, Furry, brown\nand cuddly
: the newline character (\n
) is part of the
value, but the six spaces of indentation are not. Nevertheless, additional spaces may be included:
description
Furry, brown
and cuddly
would be interpreted as a Furry, brown\n and cuddly
, but any subsequent line with less indentation would
terminate the multiline value, and be interpreted as new data.
This is particularly useful for embedding other languages in CoDL. For example,
data
representations
json
{ "name": "Fido", "description": "furry" }
xml
<dog>
<name>Fido</name>
<description>furry</description>
</dog>
markdown
# Dog
*Fido* is a furry dog.
Note, in particular, that since the markdown value, markdown
, is indented as a multiline value,
# Dog
is not interpreted as a comment.
When embedding CoDL data in a host language, we often want to add additional indentation so that the embedded CoDL aligns with the surrounding code. When a CoDL document is parsed, the indentation of the first line containing data is noted, and subtracted from all subsequent lines.
For example, in Scala we might write,
object Data:
val animal: String = """
Animal dog
name Fido
legs 4
tail yes
"""
It is therefore also possible to interpret any contiguous fragment of a CoDL document provided no line contains less indentation than the first line of data.
From the example at the top of the page, the fragment,
module alpha
name Alpha
description This is a description
is itself a valid CoDL document.
CoDL can provide a convenient way of storing or transmitting tree-structured data. But for fast serialization and deserialization, a binary form exists, as a direct translation of the same data model, which can be written and read faster, and which uses less memory. This is called BiCoDL.
Although CoDL is a binary format, in the sense that it is not primarily for human consumption, it contains only valid printable UTF-8 characters, making it seamless to copy/paste or to embed within other textual data formats, such as XML or JSON.
BCoDL should use the custom Media Type application/x-bcodl
. BiCoDL data always begins with the
byte sequence, b1
c0
d1
, which looks like ±ÀÑ
when interpreted as UTF-8
.
A CoDL schema is an untyped CoDL document which may be used to verify other CoDL documents. Its structure should mirror that of the documents it verifies: each line begins with, and defines, a keyword which may appear at that position in the tree. The words that follow are the names of the parameters to that keyword.
Additionally, each keyword or paramater may be suffixed (without space) by a qualifying symbol.
?
: parameter or keyword is optional; may appear zero or once+
: parameter or keyword is required and may appear once or many times*
: parameter or keyword may appear zero, once or many times!
: parameter is required and must be unique&
: parameter is the concatenation of remaining words, including spaces and terminated by a newline~
: keyword may also appear zero, once or many times in this section or any descendant
For example,
child? arg
specifies the keyword child
which may optionally appear once, with a required argument called arg
. This schema
would validate the document,
child alpha
Or, for example,
employee+ id! name&
specifies the employee
keyword which must appear at least once, but may be repeated, with an id
parameter that
must be unique, and a name
parameter, which is the concatenation of the rest of the line. Such a schema would
verify, for example,
employee sgs Simon G. Smith
employee rp Richard Price
Todo