Topical Phrase Extraction from Clinical Reports by Incorporating both Local and Global Context


Abstract in English

Making sense of words often requires to simultaneously examine the surrounding context of a term as well as the global themes characterizing the overall corpus. Several topic models have already exploited word embeddings to recognize local context, however, it has been weakly combined with the global context during the topic inference. This paper proposes to extract topical phrases corroborating the word embedding information with the global context detected by Latent Semantic Analysis, and then combine them by means of the P{o}lya urn model. To highlight the effectiveness of this combined approach the model was assessed analyzing clinical reports, a challenging scenario characterized by technical jargon and a limited word statistics available. Results show it outperforms the state-of-the-art approaches in terms of both topic coherence and computational cost.

Download