nguni

Looking into the digitizing of Nguni languages and increasing their digital footprint.

Goals

The goal of this project is to come up with a
- Language model - probability distribution over sequences of words
- Large/comprehensive dataset
- dictionary like document/resource that will spell out the use/meaning of each word/phrase in different contexts - detailing how it is used/mis-used, proper spelling and misspelling, pronunciation and alternative pronunciations how it has evolved overtime, its origin, etc
- text-to-speech (TTS) model (low priority)
- All documented in Nguni languages (low priority)
- Text Analysis (low priority)

Scope

To focus only on South African Nguni languages (hence the name of the project), excluding Mozambican and Zimbabwean languages
isiZulu and isiXhosa as a starting point

Vision

Live it a world where I can

voice type in isiXhosa/isiZulu
get keyboard autocomplete in isiXhosa/isiZulu
and finally get over computers squiggly my name

The bigger picture is to bring nguni culture and heritage to the modern world, and open doors to wide range of possibilities, such as

Closing the illiteracy/computer illiteracy gap by allowing everyone and anyone to access modern tools using their native languages
Making it possible to learn and teach in isiXhosa/isiZulu
Using isiXhosa/isiZulu to communicate at any level
Preserve and protect culture and heritage

What it take

Collect Language Data: isiXhosa/Zulu language data to train your language model. This can include Xhosa/Zulu books, articles, news, and other text sources. You can also use publicly available datasets such as the South African National Corpus.
Preprocess and Clean the Data: This involves removing any unwanted characters, punctuation, and other non-text elements.
Train a Language Model: using PyTorch, TensorFlow, and Keras.
Fine-tune the Model: To improve the accuracy of your Xhosa/Zulu language model, you may need to fine-tune it. This involves training the model on a smaller, more specific dataset to improve its performance on a particular task.
Test and Evaluate the Model: Finally, test and evaluate the model to ensure accurate results. Watch for perplexity, accuracy, and F1 score metrics.

Name		Name	Last commit message	Last commit date
Latest commit History 18 Commits
data		data
research		research
.gitignore		.gitignore
LICENSE		LICENSE
README.md		README.md
roadmap.JPG		roadmap.JPG

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Repository files navigation

nguni

Goals

Scope

Vision

What it take

Contributions

About

Releases

Packages

Languages

License

makhosi6/nguni

Folders and files

Latest commit

History

Repository files navigation

nguni

Goals

Scope

Vision

What it take

Contributions

About

Resources

License

Stars

Watchers

Forks

Releases

Packages 0

Languages

Packages