A Transformer model to auto insert Vietnamese accent marks

Finetuning XLM-Roberta to auto insert Vietnamese accent marks (diacritics)
vietnamese accent marks
finetuned xlm-roberta
Author

Peter Hoang

Published

July 3, 2024

This project was completed quite some time ago but the model wasn’t published yet. And now I’m glad this model is now available on HuggingFace hub here.

This model was finetuned based on XLM-Roberta (multilingual Roberta), a Transformer encoder, for the task of inserting Vietnamese accent marks.

This accent marks insertion was modelled as a token classification where the assigned label corresponds to the necessary transformation to insert accents. For a more detailed description of the experiment, please refer to this blog post.

The HF model page linked to above also contains detailed instructions on how to use the model from input to output.