Unsupervised Segmentation
Boundary prediction
Branching Entropy(BE)를 이용한 방법
library
WordRank
KR-WordRank, 토크나이저를 이용하지 않는 한국어 키워드 추출기
Articles
- A Simple and Effective Unsupervised Word Segmentation Approach
- KR-WordRank : WordRank를 개선한 비지도학습 기반 한국어 단어 추출 방법
Library
- wordseg_wordrank in python
기타
- Mutual Information (MI) / Sun, Shen, and Tsou (1998)
- WordEnds / Fleck (2008)
- Minimum Description Length (MDL) criterion and local statistics BE / Zhikov, Takamura, and Okumura (2010)
Word recognition
Accessor Variety
library
Cohesion Probability를 이용한 방법.
library
기타
- Description Length Gain (DLG) / Kit and Wilks (1999)
- Dirichlet Process (DP) and Hierarchical Dirichlet Process (HDP) / Goldwater, Griffiths, and Johnson (2006)
Word Embeadding
Deeplearning을 이용한 방법
library