ICLT
The Icelandic Centre for
Language Technology


SOFTWARE AND RESOURCES

Software

ICLT and collaborators have developed the following free and open source systems:

  • Apertium-IceNLP. A shallow-transfer machine translation system between Icelandic and English, based on the Apertium platform and IceNLP (see below). Apertium-relevant code and data is available at http://sourceforge.net/projects/apertium/. A prototype of the Apertium-IceNLP system is available.
  • IceNLP - a natural language toolkit for processing and analyzing Icelandic. The main components of IceNLP are: a tokeniser, a morphological analyser (IceMorphy), a linguistic rule-based part-of-speech tagger (IceTagger), a trigram tagger (TriTagger), a shallow (finite-state) parser (IceParser), and a lemmatizer (Lemmald). IceNLP is available from http://icenlp.sourceforge.net/. A web interface for IceNLP is available.
  • CombiTagger - a language and tagset independent system for developing and evaluating combined taggers. CombiTagger is available from http://combitagger.sourceforge.net/

Language resources

  • Icelandic Parsed Historical Corpus. The corpus is syntactically parsed, annotated for full phrase structure using an adaptation of the annotation scheme used by the Penn parsed corpora of historical English. The corpus is distributed as raw UTF-8 data in labeled bracketing format and it is therefore compatible with various existing programs, including CorpusSearch. Further information on the annotation guidelines and project organization can be found on the project wiki.
  • The Icelandic Frequency Dictionary (IFD) corpus. A PoS-tagged corpus containing about 590k tokens. The underlying tagset contains about 700 possible tags. The corpus is available for research and development purposes - please contact Sigrún Helgadóttir at the Árni Magnússon Institute for Icelandic Studies.
  • BIN - the morphological database of Icelandic. This database is available for research and development purposes conditioned by a certain license (only in Icelandic). For further information, please contact Kristín Bjarnadóttir at the Árni Magnússon Institute for Icelandic Studies.
  • A phonetically transcribed Icelandic word list. This is a Microsoft Office Excel file containing almost 56,000 common Icelandic word forms, transcribed in both IPA and SAMPA.