NLTK词频统计

nltk 使用FreqDist进行词频统计:

import nltk
from nltk.corpus import brown

nltk.data.path = ["C:\\nltk_data\\nltk_data-gh-pages\packages"]
tagged_words = brown.tagged_words(categories='news')
# print(tagged_words[:3])
# [('The', 'AT'), ('Fulton', 'NP-TL'), ('County', 'NN-TL')]
tagged_sents = brown.tagged_sents(categories='news')
# print(tagged_sents[:3])
# [[('The', 'AT'), ('Fulton', 'NP-TL')...
tags = [tag for (word, tag) in tagged_words]
# print(nltk.FreqDist(tags).max())
# NN
# print(nltk.FreqDist(tags).get('NN'))
# 13162


标签: nltk、tagged、freqdist、brown、words、面试
  • 回复
隐藏