TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

BL-LDA: Bringing bigram to supervised topic model

Tutkimustuotosvertaisarvioitu

Yksityiskohdat

AlkuperäiskieliEnglanti
OtsikkoProceedings - 2015 International Conference on Computational Science and Computational Intelligence, CSCI 2015
KustantajaInstitute of Electrical and Electronics Engineers Inc.
Sivut83-88
Sivumäärä6
ISBN (elektroninen)9781467397957
DOI - pysyväislinkit
TilaJulkaistu - 2 maaliskuuta 2016
OKM-julkaisutyyppiA4 Artikkeli konferenssijulkaisussa
TapahtumaInternational Conference on Computational Science and Computational Intelligence, CSCI 2015 - Las Vegas, Yhdysvallat
Kesto: 7 joulukuuta 20159 joulukuuta 2015

Conference

ConferenceInternational Conference on Computational Science and Computational Intelligence, CSCI 2015
MaaYhdysvallat
KaupunkiLas Vegas
Ajanjakso7/12/159/12/15

Tiivistelmä

With the increasing amount of data being published on the Web, it is difficult to analyze their content within a short time. Topic modeling techniques can summarize textual data that contains several topics. Both the label (such as category or tag) and word co-occurrence play a significant role in understanding textual data. However, many conventional topic modeling techniques are limited to the bag-of-words assumption. In this paper, we develop a probabilistic model called Bigram Labeled Latent Dirichlet Allocation (BL-LDA), to address the limitation of the bag-of-words assumption. The proposed BL-LDA incorporates the bigram into the Labeled LDA (L-LDA) technique. Extensive experiments on Yelp data show that the proposed scheme is better than the L-LDA in terms of accuracy.