The Tower of Babel Meets Web 2.0: User-Generated Content and its Applications in a Multilingual Context


Abstract in English

This study explores languages fragmenting effect on user-generated content by examining the diversity of knowledge representations across 25 different Wikipedia language editions. This diversity is measured at two levels: the concepts that are included in each edition and the ways in which these concepts are described. We demonstrate that the diversity present is greater than has been presumed in the literature and has a significant influence on applications that use Wikipedia as a source of world knowledge. We close by explicating how knowledge diversity can be beneficially leveraged to create culturally-aware applications and hyperlingual applications.

Download