文本分类 - 综述

任务

  • 短文本分类
  • 长文本分类
  • 超短文本(一个word)分类

特定领域的文本分类

  • aspect-level classification
  • ss

methods:

  • word-level
    • tfidf + svm/lr
    • fastText facebook (只是作为baseline而已)
    • lstm bilstm
    • lstm + attention
    • cnn code1 code2
    • gated cnn
    • rcnn
  • char-level
    • char的作用? 见NLP.md
    • char cnn (Zhang and LeCun, 2015)
    • char rnn
    • char-CRNN (Xiao and Cho, 2016)
    • char-rnn + word rnn (Finding Function in Form: Compositional Character Models for Open Vocabulary Word Representation)
    • char-cnn + word rnn
  • Hierarchical:

datasets &

paper & implementation

http://www.jianshu.com/p/4fbc4939509f [RNN+Attention-code]: https://richliao.github.io/supervised/classification/2016/12/26/textclassifier-RNN/

* [1] P. Bojanowski*, E. Grave*, A. Joulin, T. Mikolov, Enriching Word Vectors with Subword Information
* [2] A. Joulin, E. Grave, P. Bojanowski, T. Mikolov, Bag of Tricks for Efficient Text Classification
* [3] A. Joulin, E. Grave, P. Bojanowski, M. Douze, H. Jégou, T. Mikolov, FastText.zip: Compressing text classification models
  • https://github.com/scharmchi/char-level-cnn-tf
  • !!!!!! char-level deep learning https://offbit.github.io/how-to-read/ https://github.com/offbit/char-models

tutorial & survey & blog

http://www.jeyzhang.com/cnn-apply-on-modelling-sentence.html https://zhuanlan.zhihu.com/p/25928551

web service

1. watson NLC: https://www.ibm.com/watson/developercloud/natural-language-classifier/api/v1
2. songfang NLC

code

  • 模型汇总 https://github.com/brightmart/text_classification