Implementations of an Annotation Graph API for linguistic annotations.
Annotation Graphs are a data structure conceived by Steven Bird and Mark Liberman: (http://lanl.arxiv.org/abs/cs/9907003) (http://xxx.lanl.gov/PS_cache/cs/pdf/9903/9903003v1.pdf)
The structure is designed to be a tool-independent way of representing annotated linguistic data, and essentially defines an Annotation Graph as a directed acyclic graph where:
- nodes are 'anchors' that represent a point in time (in seconds) or a point in a text (in characters) (although the time/character offset label is optional), and
- edges are 'annotations' which have a 'label' (the content of the annotation) and a 'type' (the kind of annotation, analogous with an 'tier' or 'layer')
This particular implementation, which is used for LaBB-CAT, developed by the NZILBB, includes extra features that allow tier hierarchies and parent/child constraints to be defined. More details on extra features are available in http://dx.doi.org/10.1016/j.csl.2017.01.004
API documentation is available at https://nzilbb.github.io/ag/
Apart from use within LaBB-CAT, the object model can be used for other purposes like format conversion, e.g.
- vtt-to-textgrid - a utility for converting subtitles downloaded from YouTube to Praat TextGrids
- trs-to-eaf - a utility for converting Transcriber files to ELAN files
These use the serializers/deserializers in the formatter directory of this repository to read a file in one format, convert it to an annotation graph, and then write that graph out as a file in another format. As pointed out by Cochran et al. (2007 - Report from TILR Working Group 1 : Tools interoperability and input/output formats) this saves having order n2 explicit conversion algorithms between formats; only order n format conversions are required.
This exemplifies an approach to linguistic data interoperability called the interlingua philosophy on interoperability by Witt et al. (2009) and uses annotation graphs as an 'interlingua' similar to work by Schmidt et al. (2008), except that rather using a third file format as a persistent intermediary, the annotation graph models of the linguistic data are ephemeral, existing in memory only for the duration of the conversion.
More format conversions are available here
- The JDK for at least Java 8
sudo apt install default-jdk
- Maven
sudo apt install maven
mvn package
mvn package -pl :nzilbb.ag
mvn package -pl :nzilbb.transcriber
mvn package -pl :nzilbb.transcriber.deepspeech
etc...
mvn test
cd ag
mvn site
OSSRH is the central Maven repository where nzilbb.ag modules are deployed (published).
There are two type of deployment:
- snapshot: a transient deployment that can be updated during development/testing
- release: an official published version that cannot be changed once it's deployed
A snapshot deployment is done when the module version (version
tag in pom.xml) ends with
-SNAPSHOT
. Otherwise, any deployment is a release.
To perform a snapshot deployment:
- Ensure the
version
in pom.xml is suffixed with-SNAPSHOT
- Execute the command:
mvn clean deploy -pl :nzilbb.ag
To perform a release deployment:
- Ensure the
version
in pom.xml isn't suffixed with-SNAPSHOT
e.g. use something like the following command from within the ag directory:mvn versions:set -DnewVersion=1.1.0 -pl :nzilbb.ag
- Execute the command:
mvn clean deploy -P release -pl :nzilbb.ag
- Happy with everything? Complete the release with:
Otherwise:
mvn nexus-staging:release -P release -pl :nzilbb.ag
...and start again.mvn nexus-staging:drop -P release -pl :nzilbb.ag
- Regenerate the citation file:
mvn cff:create -pl :nzilbb.ag
- Commit/push all changes and create a release in GitHub
To release another module (e.g. formatters, annotators, etc.)
- Ensure the
version
in pom.xml isn't suffixed with-SNAPSHOT
NB Don't usemvn versions:set
for this if the module is a nzilbb.formatter, because it will fix versions in nzilbb.converter projects, which are manually set in their pom.xml - Execute the command:
mvn clean deploy -P release -pl :nzilbb.formatter.praat
- Happy with everything? Complete the release with:
Otherwise:
mvn nexus-staging:release -P release -pl :nzilbb.formatter.praat
...and start again.mvn nexus-staging:drop -P release -pl :nzilbb.formatter.praat
- Start a new .SNAPSHOT version.
NB Don't usemvn versions:set
for this if the module is a nzilbb.formatter, because it will fix versions in nzilbb.converter projects, which are manually set in their pom.xml