NLP resources for a rare language morphological analyzer: danish case

dc.citation.conferenceComputational linguistics andintelligent systems (COLINS 2017)
dc.contributor.affiliationV.N. Karazin Kharkiv National University, Kharkiv, Ukraineuk_UA
dc.contributor.authorKotov, Mykhailo
dc.coverage.countryUAuk_UA
dc.coverage.placenameKharkivuk_UA
dc.date.accessioned2018-02-22T11:35:04Z
dc.date.available2018-02-22T11:35:04Z
dc.date.issued2017
dc.description.abstractThe paper discusses the characteristics and practical aspects of application of the natural language processing resources available for developing a rare language morphological analysis solution. The case under consideration reveals the pipeline design needed to prepare the grammatical resources for Danish. Being rare not only in terms of distribution, but also in the amount of natural language resources available, the Danish language represents a significant problem in terms of application of third-party tools to help solve various NLP-related issues. The paper focuses on part-of-speech tagging and lemmatization, typical but indispensable tasks at the pre-processing stage within the framework of developing a morphological analyzer as a custom NLP solution.uk_UA
dc.format.pages31-36
dc.identifier.citationKotov M. NLP resources for a rare language morphological analyzer: danish case / Mykhailo Kotov // Computational linguistics andintelligent systems (COLINS 2017) : proceedings of the 1st International conference, Kharkiv, Ukraine, 21 April 2017 / National Technical University «KhPI», Lviv Polytechnic National University. – Kharkiv, 2017. – P. 31–36. – Bibliography: 12 titles.uk_UA
dc.identifier.urihttps://ena.lpnu.ua/handle/ntb/39456
dc.language.isoenuk_UA
dc.publisherNational Technical University «KhPI»uk_UA
dc.relation.referencesen1. Andor, D., Alberti, C., Weiss, D., Severyn, A., Presta, A., Ganchev, K., Petrov, S., and Collins, M. (2016). Globally normalized transition-based neural networks. In: Proceedings of the 54th Annual Meeting of the Association for Computational Linguistics (p. 2442– 2452). 2. Apache OpenNLP Developer Documentation. (2017). Retrieved from https://opennlp. apache.org/documentation/1.5.2-incubating/manual/opennlp.html. 3. Asmussen, J. (2015). Survey of POS taggers. Approaches to making words tell who they are (Technical Report DK-CLARIN WP 2.1). Retrieved from http://korpus.dsl.dk/clarin/ corpus -doc/ pos-survey.pdf. 4. Brill, E. (1995). Transformation-based error-driven learning and natural language processing: a case study in part-of-speech tagging. Comput. Linguist.,21, 543–565. 5. Hansen, D.H. (2000). Træningogbrugaf Brill-taggerenpådansketekster (Ontoquery Technical report). Retrieved from https://cst.dk/online/pos_tagger/Brill_tagger.pdf 6. Johannsen, A. (2014). A trainable Part-of-Speech Tagger and Dependency Parser for Danish. Available at: https://github.com/andersjo/danish_dependency_parser/blob/master/ README.md 7. Jongejan, B., and Dalianis, H. (2009). Automatic training of lemmatization rules that handle morphological changes in pre-, in- and suffixes alike. In: Proceedings of the Joint Conference of the 47th Annual Meeting of the ACL and the 4th International Joint Conference on Natural Language Processing of the AFNLP. (pp. 145–153). 8. Lewis, M.P., Simons, G.F., and Fennig, C.D. (eds.). (2013). Ethnologue: Languages of the World. Dallas, Texas: SIL International. 9. Ling, W., Dyer, C., Black, A.W., Trancoso, I., Fermandez, R., Amir, S., Marujo, L, and Luis, T. (2015). Finding function in form: Compositional character models for open vocabulary word representation. In: Proceedings of the 2015 Conference on Empirical Methods in Natural Language Processing (pp. 1520– 1530). 10. Manning, C.D., Raghavan, P., Schütze, H. (2008). Introduction to Information Retrieval. New York: Cambridge University Press 11. Marquez i Villodre, L. (1999). Part-of-speech Tagging: A Machine Learning Approach based on Decision Trees (Doctoral dissertation). Retrieved from https://upcommons.upc. edu/ bitstream/handle/2117/93974/TLMV1de2.pdf 12. taggerXML [Computer software and its adaptations]. Retrieved from http://cst.dk/download/uk/index.html#taggeruk_UA
dc.subjectmorphological analyzeruk_UA
dc.subjectlemmatizationuk_UA
dc.subjectpart-of-speech tagginguk_UA
dc.subjectHunspelluk_UA
dc.subjectOpenNLPuk_UA
dc.subjectSnowball stemmeruk_UA
dc.subjectSyntaxNetuk_UA
dc.subjectword-listuk_UA
dc.titleNLP resources for a rare language morphological analyzer: danish caseuk_UA
dc.typeConference Abstractuk_UA

Files

Original bundle

Now showing 1 - 1 of 1
Thumbnail Image
Name:
006-031-036.pdf
Size:
374.55 KB
Format:
Adobe Portable Document Format
Description:

License bundle

Now showing 1 - 1 of 1
No Thumbnail Available
Name:
license.txt
Size:
2.99 KB
Format:
Item-specific license agreed upon to submission
Description: