Python下安装LDA模块,学习使用

      初次接触LDA心中还是充满好奇的,今天因为项目的需求,需要研究一些NLP相关的东西,这里先想到了主题模型中比较经典的LDA,虽说很多模块里比如:sklearn、gensim都已经内置了LDA模型,但是我还是比较喜欢独立使用的模块,所以专门安装了一下LDA模块,安装很简单,执行下面命令即可:

pip install lda

      相关的文档和API在这里

      因为是新手,所以没有办法系统地去实践一下,看了一下API写得也是比较简单的,所以这次是拿文档中的实际例子来运行学习的,目的是了解一下lda的基本原理,下面是具体的代码,文档和注释都很清晰我就不多解释了。

#!usr/bin/env python
#encoding:utf-8


'''
__Author__:沂水寒城
功能:安装 LDA 模块,学习使用
'''


import numpy as np
import lda
import lda.datasets
import matplotlib.pyplot as plt



def load_data():
    '''
    加载lda中的数据
    '''
    X=lda.datasets.load_reuters()
    vocab=lda.datasets.load_reuters_vocab()
    titles=lda.datasets.load_reuters_titles()
    print X.shape
    print X.sum()
    return X,vocab,titles


def Model(X,vocab,titles):
    '''
    构建模型
    '''
    model=lda.LDA(n_topics=20, n_iter=1500, random_state=1)
    model.fit(X)  # model.fit_transform(X) is also available
    topic_word=model.topic_word_  # model.components_ also works
    n_top_words = 8
    for i, topic_dist in enumerate(topic_word):
        topic_words = np.array(vocab)[np.argsort(topic_dist)][:-(n_top_words+1):-1]
        print('Topic {}: {}'.format(i, ' '.join(topic_words)))
    print '-|'*50
    doc_topic = model.doc_topic_
    for i in range(10):
        print("{} (top topic: {})".format(titles[i], doc_topic[i].argmax()))
    print '-|'*50
    plt.plot(model.loglikelihoods_[5:])
    plt.savefig('lda_test.png')


if __name__=='__main__':
    X,vocab,titles=load_data()
    Model(X,vocab,titles)

        结果如下:

(395L, 4258L)
8INFO:lda:n_documents: 395
4010
INFO:lda:vocab_size: 4258
INFO:lda:n_words: 84010
INFO:lda:n_topics: 20
INFO:lda:n_iter: 1500
INFO:lda:<0> log likelihood: -1051748
INFO:lda:<10> log likelihood: -719800
INFO:lda:<20> log likelihood: -699115
INFO:lda:<30> log likelihood: -689370
INFO:lda:<40> log likelihood: -684918
INFO:lda:<50> log likelihood: -681322
INFO:lda:<60> log likelihood: -678979
INFO:lda:<70> log likelihood: -676598
INFO:lda:<80> log likelihood: -675383
INFO:lda:<90> log likelihood: -673316
INFO:lda:<100> log likelihood: -672761
INFO:lda:<110> log likelihood: -671320
INFO:lda:<120> log likelihood: -669744
INFO:lda:<130> log likelihood: -669292
INFO:lda:<140> log likelihood: -667940
INFO:lda:<150> log likelihood: -668038
INFO:lda:<160> log likelihood: -667429
INFO:lda:<170> log likelihood: -666475
INFO:lda:<180> log likelihood: -665562
INFO:lda:<190> log likelihood: -664920
INFO:lda:<200> log likelihood: -664979
INFO:lda:<210> log likelihood: -664722
INFO:lda:<220> log likelihood: -664459
INFO:lda:<230> log likelihood: -664360
INFO:lda:<240> log likelihood: -663600
INFO:lda:<250> log likelihood: -664164
INFO:lda:<260> log likelihood: -663826
INFO:lda:<270> log likelihood: -663458
INFO:lda:<280> log likelihood: -663393
INFO:lda:<290> log likelihood: -662904
INFO:lda:<300> log likelihood: -662294
INFO:lda:<310> log likelihood: -662031
INFO:lda:<320> log likelihood: -662430
INFO:lda:<330> log likelihood: -661601
INFO:lda:<340> log likelihood: -662108
INFO:lda:<350> log likelihood: -662152
INFO:lda:<360> log likelihood: -661899
INFO:lda:<370> log likelihood: -661012
INFO:lda:<380> log likelihood: -661278
INFO:lda:<390> log likelihood: -661085
INFO:lda:<400> log likelihood: -660418
INFO:lda:<410> log likelihood: -660510
INFO:lda:<420> log likelihood: -660343
INFO:lda:<430> log likelihood: -659789
INFO:lda:<440> log likelihood: -659336
INFO:lda:<450> log likelihood: -659039
INFO:lda:<460> log likelihood: -659329
INFO:lda:<470> log likelihood: -658707
INFO:lda:<480> log likelihood: -658879
INFO:lda:<490> log likelihood: -658819
INFO:lda:<500> log likelihood: -658407
INFO:lda:<510> log likelihood: -658651
INFO:lda:<520> log likelihood: -658111
INFO:lda:<530> log likelihood: -658018
INFO:lda:<540> log likelihood: -658111
INFO:lda:<550> log likelihood: -657925
INFO:lda:<560> log likelihood: -657860
INFO:lda:<570> log likelihood: -657494
INFO:lda:<580> log likelihood: -657723
INFO:lda:<590> log likelihood: -657591
INFO:lda:<600> log likelihood: -657557
INFO:lda:<610> log likelihood: -657505
INFO:lda:<620> log likelihood: -657730
INFO:lda:<630> log likelihood: -657304
INFO:lda:<640> log likelihood: -657208
INFO:lda:<650> log likelihood: -657518
INFO:lda:<660> log likelihood: -657541
INFO:lda:<670> log likelihood: -657381
INFO:lda:<680> log likelihood: -657575
INFO:lda:<690> log likelihood: -656985
INFO:lda:<700> log likelihood: -656815
INFO:lda:<710> log likelihood: -656930
INFO:lda:<720> log likelihood: -656538
INFO:lda:<730> log likelihood: -656291
INFO:lda:<740> log likelihood: -656417
INFO:lda:<750> log likelihood: -656747
INFO:lda:<760> log likelihood: -656600
INFO:lda:<770> log likelihood: -656269
INFO:lda:<780> log likelihood: -656311
INFO:lda:<790> log likelihood: -656069
INFO:lda:<800> log likelihood: -656228
INFO:lda:<810> log likelihood: -656178
INFO:lda:<820> log likelihood: -655694
INFO:lda:<830> log likelihood: -655997
INFO:lda:<840> log likelihood: -656224
INFO:lda:<850> log likelihood: -656197
INFO:lda:<860> log likelihood: -655889
INFO:lda:<870> log likelihood: -656180
INFO:lda:<880> log likelihood: -656997
INFO:lda:<890> log likelihood: -655989
INFO:lda:<900> log likelihood: -655615
INFO:lda:<910> log likelihood: -655584
INFO:lda:<920> log likelihood: -656602
INFO:lda:<930> log likelihood: -656083
INFO:lda:<940> log likelihood: -656294
INFO:lda:<950> log likelihood: -656257
INFO:lda:<960> log likelihood: -656243
INFO:lda:<970> log likelihood: -656028
INFO:lda:<980> log likelihood: -655603
INFO:lda:<990> log likelihood: -656012
INFO:lda:<1000> log likelihood: -655849
INFO:lda:<1010> log likelihood: -655376
INFO:lda:<1020> log likelihood: -655417
INFO:lda:<1030> log likelihood: -655856
INFO:lda:<1040> log likelihood: -655197
INFO:lda:<1050> log likelihood: -655938
INFO:lda:<1060> log likelihood: -655529
INFO:lda:<1070> log likelihood: -655092
INFO:lda:<1080> log likelihood: -655119
INFO:lda:<1090> log likelihood: -656215
INFO:lda:<1100> log likelihood: -655602
INFO:lda:<1110> log likelihood: -655296
INFO:lda:<1120> log likelihood: -655547
INFO:lda:<1130> log likelihood: -655580
INFO:lda:<1140> log likelihood: -655604
INFO:lda:<1150> log likelihood: -655168
INFO:lda:<1160> log likelihood: -655281
INFO:lda:<1170> log likelihood: -655409
INFO:lda:<1180> log likelihood: -655517
INFO:lda:<1190> log likelihood: -654922
INFO:lda:<1200> log likelihood: -655304
INFO:lda:<1210> log likelihood: -655852
INFO:lda:<1220> log likelihood: -655184
INFO:lda:<1230> log likelihood: -655650
INFO:lda:<1240> log likelihood: -655606
INFO:lda:<1250> log likelihood: -656086
INFO:lda:<1260> log likelihood: -655698
INFO:lda:<1270> log likelihood: -655351
INFO:lda:<1280> log likelihood: -655686
INFO:lda:<1290> log likelihood: -654801
INFO:lda:<1300> log likelihood: -654973
INFO:lda:<1310> log likelihood: -655186
INFO:lda:<1320> log likelihood: -655128
INFO:lda:<1330> log likelihood: -655365
INFO:lda:<1340> log likelihood: -655338
INFO:lda:<1350> log likelihood: -655219
INFO:lda:<1360> log likelihood: -655115
INFO:lda:<1370> log likelihood: -654930
INFO:lda:<1380> log likelihood: -655209
INFO:lda:<1390> log likelihood: -654940
INFO:lda:<1400> log likelihood: -655055
INFO:lda:<1410> log likelihood: -655286
INFO:lda:<1420> log likelihood: -655316
INFO:lda:<1430> log likelihood: -655257
INFO:lda:<1440> log likelihood: -654964
INFO:lda:<1450> log likelihood: -654884
INFO:lda:<1460> log likelihood: -655493
INFO:lda:<1470> log likelihood: -655415
INFO:lda:<1480> log likelihood: -655192
INFO:lda:<1490> log likelihood: -655728
INFO:lda:<1499> log likelihood: -655858
Topic 0: british churchill sale million major letters west britain
Topic 1: church government political country state people party against
Topic 2: elvis king fans presley life concert young death
Topic 3: yeltsin russian russia president kremlin moscow michael operation
Topic 4: pope vatican paul john surgery hospital pontiff rome
Topic 5: family funeral police miami versace cunanan city service
Topic 6: simpson former years court president wife south church
Topic 7: order mother successor election nuns church nirmala head
Topic 8: charles prince diana royal king queen parker bowles
Topic 9: film french france against bardot paris poster animal
Topic 10: germany german war nazi letter christian book jews
Topic 11: east peace prize award timor quebec belo leader
Topic 12: n't life show told very love television father
Topic 13: years year time last church world people say
Topic 14: mother teresa heart calcutta charity nun hospital missionaries
Topic 15: city salonika capital buddhist cultural vietnam byzantine show
Topic 16: music tour opera singer israel people film israeli
Topic 17: church catholic bernardin cardinal bishop wright death cancer
Topic 18: harriman clinton u.s ambassador paris president churchill france
Topic 19: city museum art exhibition century million churches set
-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|
0 UK: Prince Charles spearheads British royal revolution. LONDON 1996-08-20 (top topic: 8)
1 GERMANY: Historic Dresden church rising from WW2 ashes. DRESDEN, Germany 1996-08-21 (top topic: 13)
2 INDIA: Mother Teresa's condition said still unstable. CALCUTTA 1996-08-23 (top topic: 14)
3 UK: Palace warns British weekly over Charles pictures. LONDON 1996-08-25 (top topic: 8)
4 INDIA: Mother Teresa, slightly stronger, blesses nuns. CALCUTTA 1996-08-25 (top topic: 14)
5 INDIA: Mother Teresa's condition unchanged, thousands pray. CALCUTTA 1996-08-25 (top topic: 14)
6 INDIA: Mother Teresa shows signs of strength, blesses nuns. CALCUTTA 1996-08-26 (top topic: 14)
7 INDIA: Mother Teresa's condition improves, many pray. CALCUTTA, India 1996-08-25 (top topic: 14)
8 INDIA: Mother Teresa improves, nuns pray for "miracle". CALCUTTA 1996-08-26 (top topic: 14)
9 UK: Charles under fire over prospect of Queen Camilla. LONDON 1996-08-26 (top topic: 8)
-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|-|
[Finished in 21.6s]

       图片如下:

    

      从上面的结果中可以看到:随着横坐标迭代次数的增加,模型逐渐趋于稳定,所以在实际测试的时候可以适当 增加迭代的次数来取得更好的结果。 

已标记关键词 清除标记
©️2020 CSDN 皮肤主题: 护眼 设计师:闪电赇 返回首页
实付 19.90元
使用余额支付
点击重新获取
扫码支付
钱包余额 0

抵扣说明:

1.余额是钱包充值的虚拟货币,按照1:1的比例进行支付金额的抵扣。
2.余额无法直接购买下载,可以购买VIP、C币套餐、付费专栏及课程。

余额充值