Now a unique machine translation tool from Hindi to Gondi
This app, developed by Microsoft Research Lab, CGNet Swara and IIIT Naya Raipur, during the lockdown, hopes to motivate youth from the Gond Adivasi community to learn the Gondi language
These days, Arka Manikrao from Telangana, Rainuram Markam from Chhattisgarh and Rabindranath from Odisha have been united by a common project—an Interactive Neural Machine Translation (INMT) tool, which translates sentences from Hindi to Gondi and vice-versa. This initiative is being led by Microsoft Research Lab and CGNet Swara, an Indian voice-based online portal that gives people in the forests of central tribal India a platform for expression by reporting local news and stories through a phone call. Anurag Shukla, a young student from International Institute of Information Technology, Naya Raipur, has been lending his technical support to the project as well.
Interestingly, most of the project has been executed during the covid-19 induced lockdown. “We did one workshop at the Microsoft Research Lab office in Bengaluru in 2019. But the app was developed during the lockdown and is likely to be released later this month," Shubhranshu Choudhary, a former journalist who co-founded CGNet Swara. So, for the past four months, nearly 150-plus Gondi speakers, spread across the six states of Maharashtra, Chhattisgarh, Odisha, Andhra Pradesh, Telangana and Madhya Pradesh, from all walks of life have been sitting at home, translating sentences from Hindi to Gondi and back. Some examples include: “Aag tho hamaare liye bhagwaan jaisa hai," which when translated into Gondi, reads as: “Kis tho maawa lane bhagwaan leka aayun".
Even though the language is spoken by nearly 12 million Gond Adivasis, it is not standardised, with different versions spoken in the six states. There is also no written literature in the language, with the dialects having been passed down orally over the centuries. And hence there are no local teachers, and the ones coming from outside only spoke Hindi. “As a result, the new generation speaks mostly Hindi and the language is disappearing. This tool is an effort to save it," says Dinesh Watti, a resident of Madhya Pradesh, who hails from the community and is leading the project on behalf of CGNet Swara. “Hopefully, when the youth have this app, easily accessible on the mobile, they will be motivated to learn the language."
This project is an extension of the Microsoft Research Lab’s work on natural language processing, as part of which it focuses on low-resource languages such as Gondi, where so little data is available. The team took on this project after getting to know of CGNet Swara’s various other language initiatives. “Gondi is a very good language to use as a case study as it has a substantial speaker base across six states. It is not endangered and yet zero resources are available for the same. Through CGNet Swara, we became aware of the various issues that the Gond Adivasis face, and how access to the language could help the cultural identity of the community," says Kalika Bali, principal researcher, Microsoft Research Lab. By bringing together technology and language and providing easy access to people, the team hopes to inspire others, who want to do similar work with other communities.
Instead of a top-down approach, in which the decisions of the technologists would be imposed on the community, the project focused on ideas and desires of the community members. The workshop in Bengaluru, attended by academic partners, the CGNet Swara team and community members, turned out to be an enriching experience for all present. “Almost none of the Gond Adivasi representatives understood English. And not everyone at the lab speaks Hindi. But it was nice to see young people at the lab make an effort to communicate," says Bali. One teammate —a Hindi speaker — volunteered to translate on the spot. Another intern got so inspired that he put together a hackathon project, a mobile app that would allow Gondi speakers to access content in the language on the mobile. And from there, the idea of the app came into being. “However, an Interactive Neural Machine Translation tool requires a substantial amount of data for model to work. And there was very little data available in Gondi at that time," adds Bali.
So, 2019 onwards, after the Bengaluru meeting, the CGNet team embarked on a “yatra". “Representatives from the community convened at Pharasgaon in Kondagaon district of Chhattisgarh and discussed the ways of collecting data. IIIT Naya Raipur and teachers from Bastar also started helping at that point," says Watti.
As data started coming in, the project managed to overcome its first hurdle last month by crossing a bank of 20,000 sentences. Today, the base has increased to 35,000 sentences and the aim is to have translations of at least 1 lakh sentences. “With the app, a window will open up for the community. Anything written in any major language in the world can then be translated and communicated to them. By making it a mobile app and not a browser-based one, the idea is to make it accessible in places where not many devices available and with inconsistent internet services. It can even work offline," says Bali.
Those from the community working on the app may or may not be tech-savvy. But the CGNet team is handholding them through the process by sending videos of the format and having a technical team call them constantly. “The various members see the sentences in Hindi and then submit audio of the translations. We are trying to do this in standardised Gondi," says Watti.
This is one of the many language projects that CGNet Swara has been working on, one of the foremost being the standardised Gondi dictionary. Already 3,000-plus words have been added to it. In a 2018-interview with Lounge, Choudhary had talked about the significance of such initiatives. “If everyone has a standardised dictionary, then journalists, administrators or teachers can emerge from within the community. They don’t need to dropout of schools and take up the guns. They could work with All India Radio to start a news service in Gondi," he had said. Recently the team has worked on another project, translating 400 children’s books by Pratham Books in Gondi. “The Chhattisgarh government recently announced that education in the state will start in tribal languages. The New Education Policy also states that education will be imparted in the mother tongue. But there are no books in Gondi. So, we have worked with Pratham Books and translated 400 of books. This has not been published yet but is ready material, if the government wants. This will be the first written material done in standard Gondi," he says.
FIRST PUBLISHED14.08.2020 | 12:30 PM IST