当前位置:网站首页>Chinese name extraction (toy code - accurate head is too small, right to play)
Chinese name extraction (toy code - accurate head is too small, right to play)
2022-07-02 13:16:00 【Fantasy elves_ cq】
Python Official website :https://www.python.org/
Free: Big coffee free “ Bible ” course 《 python Complete self study course 》, It's not just the basics ……
- My CSDN Home page 、My HOT Bo 、My Python Study personal memos
- A good writer recommends 、 Laoqi classroom
Self study is not a mysterious thing , A person's self-study time is always longer than that in school , There are always more times when there are no teachers than when there are teachers .
—— Hua Luogeng

- 1、 Origin of notes
- 2、 Directory structure
- 3、 Code running effect
- 4、 This exercise complete source code
Based on this comment , I “ Sacrifice one's life ” Trial .
I happen to have hundreds of family names , Take hundreds of family names and “ Chinese name commonly used words ” To make a toy ——“ Chinese name extraction ”.
“ toy ” Directory structure
Code trial ( With “ The romance of The Three Kingdoms .txt”、“ Dafeng is a watchman _19txt” Two texts “ Make fun of ”)
This exercise complete source code
#!/sur/bin/nve python
# coding: utf-8
from re import findall # from re Module loading findall Method .
''' filename = 're_Chinese_name.py' author = ' The dream spirit _cq' time = '2022-06-29' '''
from os import system
class re_Chinese_name:
''' Extract Chinese names from text '''
def __init__(self):
l = system('clear')
with open('data/firstnames_one_100.txt') as f:
self.firstnames = f.read().strip().split(',')
with open('data/firstnames_two_85.txt') as f:
self.firstnames_two = f.read().strip().split(',')
self.firstnames.extend(self.firstnames_two)
with open('data/boy_names.txt') as f:
self.names_chr = f.read()
with open('data/girl_names.txt') as f:
self.names_chr += f.read()
self.names = "".join(self.names_chr.strip().split(','))
#input(f"\n\n surname :{self.firstnames}\n The name is written :{self.names_chr}")
def get_names(self, text):
''' Extract names ,text Is the text from which the name is to be extracted .'''
names = []
for firstname in self.firstnames:
if firstname in text:
re_s = f"{
firstname}"r'\w{3}'
#print(re_s) # Debug wins statement .
names.extend(findall(re_s, text))
print(' Sorting the extracted names …… '.center(39, '~'))
names = self.isname(list(set(names)))
return set(names)
def isname(self, names_list):
''' Chinese name determination '''
names = []
n = self.names_chr
for name in names_list:
if name[:2] in self.firstnames_two:
if name[2] in n and name[3] in n:
names.append(name)
elif name[2] in n and name[3] not in n:
names.append(name[:-1])
else:
if name[1] in n and name[2] in n and name[3] in n:
names.append(name)
elif name[1] in n and name[2] in n:
names.append(name[:3])
elif name[1] in n:
names.append(name[:2])
return names
if __name__ == '__main__':
rn = re_Chinese_name()
names = rn.get_names(open('data/ The romance of The Three Kingdoms .txt').read())
names2 = rn.get_names(open('data/ Dafeng is a watchman _19.txt').read())
print(f"\n\n{
u' re Extract Chinese names '.center(44, '~')}\n\n《 The romance of The Three Kingdoms 》:\n{
','.join(names)}\n\n《 Dafeng is a watchman 》 Chapter nineteen :\n{
','.join(names2)}\n\n")
__ Last one :__ CSV File format —— It is convenient to use the smallest data transmission method
__ Next :__
my HOT Bo :
- practice : Calculation of bank compound interest ( use for Solve a junior high school problem in a circular way )(1052 read )
- pandas Data type DataFrame(1321 read )
- Is it difficult for someone in the class to have the same birthday as me ?( probability probability、 Monte Carlo stochastic simulation method )(2080 read )
- Python The string is centered (1469 read )
- practice : Even sum 、 Threshold segmentation and subtraction ( list Two basic questions of the object )(1638 read )
- use pandas Solve a small problem (1964 read )
- Iteratable object and four functions (1065 read )
- “ Happy number ” Judge (1226 read )
- Roman digital converter ( Construct element module )(1933 read )
- Hot: Rome digital ( converter | Luo )(3571 read )
- Hot: Give Way QQ Group nickname color change code (26511 read )
- Hot: Fibonacci sequence ( recursive | for )(4038 read )
- The largest rectangle in the histogram (1646 read )
- Repeat start and end of sorting array elements (1236 read )
- Telephone dialing keyboard letter combination (1343 read )
- Password strength detector (1791 read )
- Find the balance point of the list (1812 read )
- Hot: String statistics (4281 read )
- Hot: Nim game ( Smart version starts )(3415 read ) Nim game ( Optimized version )(979 read )
Recommended conditions Click to read a thousand

Excellent articles :
- A good writer recommends :《python Complete self study course 》 Qi Wei manuscript free Serial
- OPP The three major characteristics : In the package property
- Understand through built-in objects python'
- Regular expressions
- python in “*” The role of
- Python A complete self-study manual
- Walrus operators
- Python Medium `!=` And `is not` Different
- The right way to learn programming
source : Laoqi classroom
Python Getting started 【Python 3.6.3】
A good writer recommends :
High quality creators in the whole stack field —— Cold guy ( Or a domestic college student ) Good writing :《 Non technical paper — About English and how to ask questions correctly 》,“ English ” and “ I will ask questions ” Are two sharp tools for learning .
CSDN Practical skills blog :
- 8 A good one Python Practical skills
- python Ignore the warning
- Python Code specification
- Python Of docstring standard ( Describe the standard writing of the document )
边栏推荐
- Oracle from entry to mastery (4th Edition)
- Unity skframework framework (XIX), POI points of interest / information points
- 互联网常见34个术语解释
- numpy数组计算
- Ltc3307ahv meets EMI standard, step-down converter qca7005-al33 phy
- [OpenGL] notes 29. Advanced lighting (specular highlights)
- Jerry's weather code table [chapter]
- Japan bet on national luck: Web3.0, anyway, is not the first time to fail!
- Unity SKFramework框架(十九)、POI 兴趣点/信息点
- PXE installation UOS prompt NFS over TCP not available from 10 x.x.x
猜你喜欢
[opencv learning] [moving object detection]
完全自主可控三维云CAD:CrownCAD便捷的命令搜索,快速定位所需命令具体位置。
Js4day (DOM start: get DOM element content, modify element style, modify form element attributes, setinterval timer, carousel Map Case)
诚邀青年创作者,一起在元宇宙里与投资人、创业者交流人生如何做选择……...
Mobile layout (flow layout)
[200 opencv routines] 100 Adaptive local noise reduction filter
Jerry's watch gets the default ringtone selection list [article]
Analog to digital converter (ADC) ade7913ariz is specially designed for three-phase energy metering applications
研究表明“气味相投”更易成为朋友
2022零代码/低代码开发白皮书【伙伴云出品】附下载
随机推荐
Unity skframework framework (XVIII), roamcameracontroller roaming perspective camera control script
Js5day (event monitoring, function assignment to variables, callback function, environment object this, select all, invert selection cases, tab column cases)
(6) Web security | penetration test | network security encryption and decryption ciphertext related features, with super encryption and decryption software
2、 Frame mode MPLS operation
de4000h存储安装配置
The UVM Primer——Chapter2: A Conventional Testbench for the TinyALU
Day4 operator, self increasing, self decreasing, logical operator, bit operation, binary conversion decimal, ternary operator, package mechanism, document comment
How can attribute mapping of entity classes be without it?
Ali on three sides, it's really difficult to successfully get the offer rated P7
2022零代码/低代码开发白皮书【伙伴云出品】附下载
Fully autonomous and controllable 3D cloud CAD: crowncad's convenient command search can quickly locate the specific location of the required command.
Structured data, semi-structured data and unstructured data
Redis database persistence
Unity SKFramework框架(十九)、POI 兴趣点/信息点
JS generates 4-digit verification code
EasyDSS点播服务分享时间出错如何修改?
Unity skframework framework (XVII), freecameracontroller God view / free view camera control script
Ltc3307ahv meets EMI standard, step-down converter qca7005-al33 phy
Jerry's watch delete alarm clock [chapter]
Embedded software development