Wolaita Language PosTagger
Wolaytta is a language of the Ometu group, which belongs to the Omotic branch of the Afro-asiatic family (or phylum). It is spoken in the Wolaytta zone, the administrative unit northwest of Lake Abaya in southwestern Ethiopia, about 400 km from the capital city of Addis Ababa. It seems that almost all native Wolaytta people of the Wolaytta zone speak the Wolaytta language [5]. The Wolaytta language is widely spoken in the Wolaytta zone. It uses Latin based script with 26 basic characters. The language is serving as an official language of Wolayita nation of Ethiopia and also primary schools is now being conducted in Wolaytta as a means of instruction and also being offered as a subject in junior and secondary schools. Wolaytta language, literature and folklore are delivered as a field of study in Wolayita Sodo University. This language plays a crucial role for the people of the zone in social, political, religious and economic activities [13].
Thus, the general objective of the project is to adapt of existing tevhnology for development of part-of-speech tagger for Wolaytta language.
Technical objective:
- Regarding the lexical categories, the study focuses on broadly categorized part-of-speech of the language.
- Using Hidden Markov Model (Viterbi algorithm)
- Hybrid: HMM(Viterbi) with Rule Based approach.
- LSTM, CNN, GRU + CNN for wolaita postagger