Review and Implementation of Topic Modeling in Hindi

Santosh Kumar Ray, Amir Ahmad, Ch Aswani Kumar

Research output: Contribution to journalArticlepeer-review

35 Citations (Scopus)

Abstract

Due to the widespread usage of electronic devices and the growing popularity of social media, a lot of text data is being generated at the rate never seen before. It is not possible for humans to read all data generated and find what is being discussed in his field of interest. Topic modeling is a technique to identify the topics present in a large set of text documents. In this paper, we have discussed the widely used techniques and tools for topic modeling. There has been a lot of research on topic modeling in English, but there is not much progress in the resource-scarce languages like Hindi despite Hindi being spoken by millions of people across the world. In this paper, we have discussed the challenges faced in developing topic models for Hindi. We have applied Latent Semantic Indexing (LSI), Non-negative Matrix Factorization (NMF), and Latent Dirichlet Allocation (LDA) algorithms for topic modeling in Hindi. The outcomes of the topic model algorithms are usually difficult to interpret for the common user. We have used various visualization techniques to represent the outcomes of topic modeling in a meaningful way. Then we have used the metrics like perplexity and coherence to evaluate the topic models. The results of Topic modeling in Hindi seem to be promising and comparable to some results reported in the literature on English datasets.

Original languageEnglish
Pages (from-to)979-1007
Number of pages29
JournalApplied Artificial Intelligence
Volume33
Issue number11
DOIs
Publication statusPublished - Sept 19 2019

ASJC Scopus subject areas

  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Review and Implementation of Topic Modeling in Hindi'. Together they form a unique fingerprint.

Cite this