“Building a Multilingual Wikipedia”
Communications of the ACM, January 2019, Vol. 62 No. 1, Pages 106-114
By David Adrian, Karthikeyan Bhargavan, et al.
“Wikifunctions and Abstract Wikipedia are expected to drive a number of research directions in knowledge representation, natural language generation, collaborative systems, and computer-aided software engineering.”
Wikipedia has more than 50 million articles in approximately 300 languages. The content in these languages is independently created and maintained. The knowledge in Wikipedia is very unevenly distributed over the languages: some languages have more than a million articles, but more than 50 languages have only a few hundred articles or less. More importantly, also the number of contributors is very unevenly distributed: English Wikipedia has more than 418,000 contributors, the second-most active one, Spanish, drops down to 90,000. More than half of language editions have fewer than 10 contributors doing more than four edits per month. To assume that fewer than 10 active contributors can write and maintain a comprehensive encyclopedia in their spare time is optimistic at best.
In order to close these knowledge gaps we are building a multilingual Wikipedia where content is created only once but made available in all languages. The multilingual Wikipedia has two main components: Abstract Wikipedia where the content is created and maintained in a language-independent notation, and Wikifunctions, a project to create, catalog, and maintain functions. For the multilingual Wikipedia, the most important function is one that takes content from Abstract Wikipedia and renders it in natural language, which in turn gets integrated into Wikipedia proper.
This will considerably reduce the effort required to create a comprehensive and maintain a current encyclopedia in many languages. It will allow more people to share more knowledge in more languages than ever before. It will be particularly useful for under-served languages, providing an important way to help improve education and ready access to knowledge in many countries.
About the Author:
Denny Vrandečić is Head of Special Projects at the Wikimedia Foundation in San Francisco, CA, USA.
- Architecture for a multilingual Wikipedia, Cornell University on arXiv.
- “Knowledge beyond the Graph: Toward a Multilingual Wikipedia” video. Part of the NSF Convergence Accelerator series.