The Grammarly team in Kyiv has initiated the creation of the first annotated GEC dataset for the Ukrainian language and is inviting those interested to join the project!
- What is a GEC dataset?
It’s a collection of everyday texts: essays, blog and social media posts, reviews, emails, and other common pieces of writing. Such texts contain grammar, spelling, and mechanical mistakes — because who doesn’t make them? Addressing this class of writing issues is the main concern of what is called grammatical error correction (GEC). Professional linguists will then check (annotate) texts to classify errors. The Grammarly team is excited about creating such a GEC dataset and making it available for everyone to use!
- How will it help the Ukrainian language?
It will accelerate the development of new online systems for grammar checking and writing assistance.
- How will it help the scientific community?
It will support the creation of more open instruments for researching the Ukrainian language in the area of natural language processing (NLP).
- How can someone join the project?
It’s very easy — please go to the project webpage and share your text in Ukrainian (e.g., an essay, a translation, or a social media post). Or you can create a new text by following the simple instructions on the website. There is no limit to the number of texts you may add.
Text collection will run until September 13. Follow project updates on the Grammarly Kyiv Facebook page and be among the first to know about dataset publication!
Join this exciting project and share the information with your colleagues and friends! Let’s work together to create the first annotated GEC dataset for the Ukrainian language!