The project of OBDILCI about creating indicators of the presence of languages in the Internet started in 2017, have reached version 3.2 in April 2023, and version 4 in May 2023.
The project has been conducted by OBDILCI with the collaboration, starting from version 2, in April 2021, of the UNESCO Chair on Language Policies for Multilingualism. The project has been funded by Organisation de la Francophonie, for previous studies, versions 1 and 3 and by the Brazilian Ministry of Foreign Affairs via Instituto da Lingua Portuguesa for version 2. The realization of the database access and the publication of its methodology in Frontiers Research Metrics and Analytics, has been funded by Délégation générale à la langue française et aux langues de France, from France Ministry of Culture, together with the Permanent Delegation of Brazil to UNESCO.
The indicators produced by OBDILCI are accessible under CC-BY-SA 4.0 license, in Excel files from https://obdilci.org/Results or in the form of a database query in https://obdilci.org/Base. The results from Version 3.0 are fully described in the peer-reviewed, open data article Resource: Indicators on the Presence of Languages in Internet. The indicator's figures are displayed with 2, 3 or 4 digits after the coma, depending on the context; however, it must be understood that the contents indicator computed by the model is obtained within a rather large confidence interval (±20%) and therefore differences lower than 20% are not significant for comparison between language's contents.
The English names of languages corresponding to ISO-639 are those used by Ethnologue. For translation of the names to French and Portuguese the process starts with https://en.wikipedia.org/wiki/Wikipedia:WikiProject_Languages/List_of_ISO_639-3_language_codes_(2019), using the English Wikipedia article corresponding to each associated language and the translation of those articles using Wikipedia's other languages link to the French article and from there to the Portuguese article.
In case no French article is indicated, the same process is used from the Portuguese article. In case no corresponding translation is found further research is realized. In case no translation is secured the English name is kept for translation.
For the localized name the first source is the autonyms field of https://www.ethnologue.com/language/isocode and the second source is the Wikipedia article of the corresponding language.
For macro languages, except a few exceptions of macro-languages grouping a very large number of languages, all forms of the corresponding languages are searched for and listed separated by a "+".
Very few localized names are not found and are left empty (this may be the consequence of some Unicode conflict or lack of standard).
If somebody wants to help fill this gap, please use the contact form and explain your source.