Skip to content

Commit

Permalink
documented explicit form #85
Browse files Browse the repository at this point in the history
  • Loading branch information
proycon committed Aug 15, 2020
1 parent a1c8e5a commit 3985178
Show file tree
Hide file tree
Showing 4 changed files with 36 additions and 2 deletions.
30 changes: 30 additions & 0 deletions docs/source/form.rst
Original file line number Diff line number Diff line change
@@ -0,0 +1,30 @@
.. _form:

Form
======================

In addition to the normal form of FoLiA XML, there is an additional *explicit* form. This form of XML serialisation is
functionally equivalent to the normal form, but any defaults that are implicit in the normal form are expressed
explicitly instead. Documents in either form can always be converted to eachother without any gain or loss of
information, it is just the accessibility of the certain information that is facilitated in explicit mode, at the cost
of redundancy, bigger filesize and higher memory footprint.

The reason for the existance of this explicit form is to help parsers, especially those not implementing the full FoLiA
logic. Parsers that can not deal with a document in normal form should themselves invoke ``foliavalidator --explicit`` to do
the conversion to explicit form prior to parsing it themselves.

The explicit form is declared by the attribute ``form="explicit"`` on the FoLiA root tag. When this form attribute is not set to explicit (or absent) altogether, behaviour is unchanged and normal form is used.

In explicit form, all defaults are made explicit:

- All annotations that carry a set have a set attribute, sets never refer to aliases.
- All annotations associated with a processor have an explicit processor attribute.
- Layers themsleves carry a set attribute if the span elements within carry a set.
- All text-content elements explicitly declare their class (so ``<t>`` will become ``<t class="current">``)
- Predefined features/subsets are serialised explicitly using ``<feat>`` elements rather than as XML attributes.

Certain FoLiA internals are made explicit:

- All annotation elements get a ``typegroup`` attribute that makes explicit what kind of annotation element we are dealing with. Values are: *structure, inline, span, higherorder, textmarkup, content, layer*. So ``<w>`` becomes ``<w typegroup="structure">``, ``<pos>`` becomes ``<pos typegroup="inline">``. This allows for example xpath expressions like: give me the deepest structural ancestor.


1 change: 1 addition & 0 deletions docs/source/guidelines.rst
Original file line number Diff line number Diff line change
Expand Up @@ -37,6 +37,7 @@ For developers
very good reason to do so, do NOT assume your documents are neatly subdivided into e.g. only paragraphs and
sentences. There may be lists, figures, divisions. Generally spoken, you'll often want to descend into the deepest
structural nodes that have text. The FoLiA libraries provide a high-level API for you to do this.
6. If you don't use a FoLiA library, you may want to consider accepting only FoLiA documents in so-called *explicit form* (see :`form`). Explicit form makes does not use any implicit defaults but makes everything explicit in the XML. This means the logic in your parser can be kept less complicated. You can turn any explicit form document into a normal form one and vice versa (without loss). If you get a normal form document (which is the norm), run an external tool like ``foliavalidator --explicit`` to turn it into explicit form before parsing it. It's strongly recommended not to shift this burden to the user as he/she may be confused by it.

Conventions
-----------------------
Expand Down
1 change: 1 addition & 0 deletions docs/source/index.rst
Original file line number Diff line number Diff line change
Expand Up @@ -25,6 +25,7 @@ version: 2.2.1
annotation_types
foreign_annotation
querying
form
implementations
guidelines

Expand Down
6 changes: 4 additions & 2 deletions docs/source/libraries.csv
Original file line number Diff line number Diff line change
Expand Up @@ -38,8 +38,8 @@ Validation,... with RDF+turtle sets,yes,no,no,yes
Validation,... with legacy XML sets,yes,no,no,yes
Validation,RelaxNG schema generation,yes,no,no,yes

Serialisation,XML (canonical),yes,yes,yes,yes
Serialisation,JSON,yes,no,no,yes
Serialisation,XML (normal form),yes,yes,yes,yes
Serialisation,JSON (not standarised),yes,no,no,yes
Serialisation,RDF,no,no,no,no

Querying,select() mechanism,yes,yes,yes,yes
Expand Down Expand Up @@ -67,3 +67,5 @@ Serialisation Details,Default set,yes,yes,yes,yes
Serialisation Details,Set aliases,yes,yes,?,yes
Serialisation Details,Default processor,yes,yes,yes,no FoLiA v2
Serialisation Details,Default annotator (old-style),yes,yes,no,yes
Serialisation Details,Read explicit form,yes,yes,yes,no
Serialisation Details,Write explicit form,yes,no,no,no

0 comments on commit 3985178

Please sign in to comment.