当前位置:网站首页>Day 8.Developing Simplified Chinese Psychological Linguistic Analysis Dictionary for Microblog
Day 8.Developing Simplified Chinese Psychological Linguistic Analysis Dictionary for Microblog
2022-07-27 05:12:00 【无知的研究生】
Title:
Developing Simplified Chinese Psychological Linguistic Analysis Dictionary for Microblog
为微博开发简体中文心理语言分析词典
Keywords:
LIWC,
Traditional Chinese, 繁体中文
Simplified Chinese, 简体中文
microblog, 微博
text analysis. 文本分析
Abstract:
The words that people use could reveal their emotional states, intentions, thinking styles, individual differences, etc. LIWC (Linguistic Inquiry and Word Count) has been widely used for psychological text analysis, and its dictionary is the core. The Traditional Chinese version of LIWC dictionary has been released, which is a translation of LIWC English dictionary. However, Simplified Chinese which is the world’s most widely used language has subtle differences with Traditional Chinese. Furthermore, both English LIWC dictionary and Traditional Chinese version dictionary were both developed for relatively formal text. Microblog has become more and more popular in China nowadays. Original LIWC dictionaries take less consideration on microblog popular words, which makes it less applicable for text analysis on microblog. In this study, a Simplified Chinese LIWC dictionary is established according to LIWC categories. After translating Traditional Chinese dictionary into Simplified Chinese, five thousand words most frequently used in microblog are added into the dictionary. Four graduate students of psychology rated whether each word belonged in a category. The reliability and validity of Simplified Chinese
LIWC dictionary were tested by these four judges. This new dictionary could contribute to all the text analysis on microblog in future.
人们使用的词语可以揭示他们的情绪状态、意图、思维方式、个体差异等。 语言查询和词数统计(LIWC)被广泛应用于心理语篇分析,词典是其核心。《LIWC词典》的繁体中文版已经发行,它是LIWC英语词典的翻译。然而,作为世界上使用最广泛的语言,简体中文与繁体中文有着微妙的区别。此外,英语LIWC词典和繁体中文词典都是为相对正式的文本而开发的。如今微博在中国越来越流行。原有的LIWC词典对微博流行词的考虑较少,不适合微博文本分析。本研究根据LIWC的分类,建立了一个简体中文LIWC词典。在将繁体中文词典翻译成简体中文后,微博上最常用的五千个单词被加入词典。四名心理学研究生对每个词是否属于一个范畴进行了评分。通过这四位评委对《简化汉语LIWC词典》的信度和效度进行了检验。这部新词典将有助于今后微博上所有的文本分析。
Conclusion:
Percentage of words captured by the SCLIWC dictionary indicates that words usage in internet environment like Sina microblog are much more diverse compared to formal text materials[9, 14]. Percentage of words captured by the SCMBWC dictionary improves above 10 percent, especially captured more words in category of psychological processes and its sub categories, such as social processes, affective
processes, cognitive processes and etc. Internal Reliability and External Validity of those two dictionaries are well guaranteed by four groups of judges. SCLIWC bridges the gap between LIWC software and Simplified Chinese. What is more, SCMBWC suggests a promising approach for further text analysis of Chinese Simplified in various internet environments.
SCLIWC词典所捕获单词的百分比表明,新浪微博等网络环境下的词汇使用比正式文本材料[9, 14]更加多样化。SCMBWC词典收录词的百分比提高了10%以上,尤其是在心理过程类及其子类中,如社会过程、情感过程等,捕捉到了更多的词汇,这两部词典的内部信度和外部效度都得到了四组评委的充分保证。SCLIWC弥补了LIWC软件与简体中文之间的差距。此外,SCMBWC为进一步分析各种网络环境下的简体中文文本提供了一种很有前景的方法。
边栏推荐
- 刷脸支付更符合支付宝一直做生态的理念
- GBASE 8C——SQL参考 5 全文检索
- golang封装mysql涉及到的包以及sqlx和gorm的区别
- Amazon evaluation autotrophic number, how to carry out systematic learning?
- Day14. 用可解释机器学习方法鉴别肠结核和克罗恩病
- 「中高级试题」:MVCC实现原理是什么?
- go通过channel获取goroutine的处理结果
- NFT new paradigm, okaleido innovation NFT aggregation trading ecosystem
- 舆情&传染病时空分析文献阅读笔记
- Mysql5.7版本如何实现主从同步
猜你喜欢

机器人编程与交叉学科的融合延伸

Read and understand the advantages of the LAAS scheme of elephant swap

Seektiger's okaleido has a big move. Will the STI of ecological pass break out?

inno setup 打包 jar + h5 + mysql + redis 成 exe

The LAF protocol elephant of defi 2.0 may be one of the few profit-making means in your bear market

1024 | 正式称为码农的第四年,初心犹在,继续前进

Count the quantity in parallel after MySQL grouping

minio8.x版本设置policy桶策略

Sequel Pro下载及使用方法

如果在线上遇到了OOM,该如何解决?
随机推荐
Build a complete system in the maker education movement
Seven enabling schemes of m-dao help Dao ecology move towards mode and standardization
MySQL二级索引中的主键——MySQL存在大量相同数据分页查询优化
MySQL索引优化相关原理
Specific matters of opening accounts of futures companies
inno setup 打包 jar + h5 + mysql + redis 成 exe
ES对比两个索引的数据差
MySQL如何执行查询语句
选择国企背景的期货公司开户
Minio fragment upload lifting fragment size limit - chunk size must be greater than 5242880
go通过channel获取goroutine的处理结果
如果面试官问你 JVM,额外回答“逃逸分析”技术会让你加分
去哪家期货公司如何开户?
Handler操作记录 Only one Looper may be created per thread
How to choose a good futures company for futures account opening?
Integration and extension of robot programming and interdisciplinary
The LAF protocol elephant of defi 2.0 may be one of the few profit-making means in your bear market
GBASE 8C——SQL参考6 sql语法(5)
怎样才能拿到期货开户最低的手续费?
未来刷脸支付是能够占据市场很多的份额