Create a SentenceTransformer in Dhivehi using ELECTRA

Description
Dhivehi is a low resource language. Since the available data is less, ELECTRA seems to be a good option as it requires less computing power and training data as compared to others.

Model
electra-small pretrained in dhivehi available here

Discord channel

To chat and organise with other people interested in this project, head over to our Discord and:

  • Follow the instructions on the #join-course channel

  • Join the #sentence-transformers-dhivehi channel

Just make sure you comment here to indicate that you’ll be contributing to this project :slight_smile:

He @ashraq thanks for proposing this interesting project! One question: what do you mean by create a “sentence transformer”? Are you talking about adding a pooling layer to the electra-small model and then training that on a Dhivevi corpus?

Do you also happen to have access / know of a Dhivevi corpus to train on?

Yes. Adding a pooling layer and fine tuning sentence transformer for dhivehi.

Yes there is a corpus available. In fact i have been preparing data for this task since last week before I came to know about the event. So i think this will be a wonderful opportunity.

This sounds like a great project indeed! I’ve created a Discord channel (see topic description) in case you and others want to use it

btw are there any limitations on the instance we can choose on aws sage maker during this event?

As far as I know you can choose a p3 instance if it’s available or a T4 if not :slight_smile: