当前位置:网站首页>中文姓名提取(玩具代码——准头太小,权当玩闹)
中文姓名提取(玩具代码——准头太小,权当玩闹)
2022-07-02 09:46:00 【梦幻精灵_cq】
Python 官网:https://www.python.org/
Free:大咖免费“圣经”教程《 python 完全自学教程》,不仅仅是基础那么简单……
自学并不是什么神秘的东西,一个人一辈子自学的时间总是比在学校学习的时间长,没有老师的时候总是比有老师的时候多。
—— 华罗庚

基于这条评论,我“舍身”试炼了。
我正好有收录百家姓,就以百家姓和“中文取名常用字”来做了一个玩具——“中文姓名提取”。
“玩具”目录结构
代码试炼(以“三国演义.txt”、“大奉打更人_19txt”两个文本“开涮”)
本练习完整源码
#!/sur/bin/nve python
# coding: utf-8
from re import findall # 从re模块加载findall方法。
''' filename = 're_Chinese_name.py' author = '梦幻精灵_cq' time = '2022-06-29' '''
from os import system
class re_Chinese_name:
''' 从文本中提取中文姓名 '''
def __init__(self):
l = system('clear')
with open('data/firstnames_one_100.txt') as f:
self.firstnames = f.read().strip().split(',')
with open('data/firstnames_two_85.txt') as f:
self.firstnames_two = f.read().strip().split(',')
self.firstnames.extend(self.firstnames_two)
with open('data/boy_names.txt') as f:
self.names_chr = f.read()
with open('data/girl_names.txt') as f:
self.names_chr += f.read()
self.names = "".join(self.names_chr.strip().split(','))
#input(f"\n\n姓:{self.firstnames}\n名用字:{self.names_chr}")
def get_names(self, text):
''' 提取姓名,text为待从中提取姓名的文本。'''
names = []
for firstname in self.firstnames:
if firstname in text:
re_s = f"{
firstname}"r'\w{3}'
#print(re_s) # 调试胜语句。
names.extend(findall(re_s, text))
print(' 正在整理提取的姓名…… '.center(39, '~'))
names = self.isname(list(set(names)))
return set(names)
def isname(self, names_list):
''' 中文姓名判定 '''
names = []
n = self.names_chr
for name in names_list:
if name[:2] in self.firstnames_two:
if name[2] in n and name[3] in n:
names.append(name)
elif name[2] in n and name[3] not in n:
names.append(name[:-1])
else:
if name[1] in n and name[2] in n and name[3] in n:
names.append(name)
elif name[1] in n and name[2] in n:
names.append(name[:3])
elif name[1] in n:
names.append(name[:2])
return names
if __name__ == '__main__':
rn = re_Chinese_name()
names = rn.get_names(open('data/三国演义.txt').read())
names2 = rn.get_names(open('data/大奉打更人_19.txt').read())
print(f"\n\n{
u' re提取中文姓名 '.center(44, '~')}\n\n《三国演义》:\n{
','.join(names)}\n\n《大奉打更人》第一十九章:\n{
','.join(names2)}\n\n")
__上一篇:__ CSV文件格式——方便好用个头最小的数据传递方式
__下一篇:__
我的HOT博:
- 练习:银行复利计算(用 for 循环解一道初中小题)(1052阅读)
- pandas 数据类型之 DataFrame(1321阅读)
- 班里有人和我同生日难吗?(概率probability、蒙特卡洛随机模拟法)(2080阅读)
- Python字符串居中显示(1469阅读)
- 练习:求偶数和、阈值分割和求差( list 对象的两个基础小题)(1638阅读)
- 用 pandas 解一道小题(1964阅读)
- 可迭代对象和四个函数(1065阅读)
- “快乐数”判断(1226阅读)
- 罗马数字转换器(构造元素取模)(1933阅读)
- Hot:罗马数字(转换器|罗生成器)(3571阅读)
- Hot:让QQ群昵称色变的代码(26511阅读)
- Hot:斐波那契数列(递归| for )(4038阅读)
- 柱状图中最大矩形(1646阅读)
- 排序数组元素的重复起止(1236阅读)
- 电话拨号键盘字母组合(1343阅读)
- 密码强度检测器(1791阅读)
- 求列表平衡点(1812阅读)
- Hot: 字符串统计(4281阅读)
- Hot:尼姆游戏(聪明版首发)(3415阅读)尼姆游戏(优化版)(979阅读)
推荐条件点阅破千

精品文章:
- 好文力荐:《python 完全自学教程》齐伟书稿免费连载
- OPP三大特性:封装中的property
- 通过内置对象理解python'
- 正则表达式
- python中“*”的作用
- Python 完全自学手册
- 海象运算符
- Python中的 `!=`与`is not`不同
- 学习编程的正确方法
来源:老齐教室
Python 入门指南【Python 3.6.3】
好文力荐:
全栈领域优质创作者——寒佬(还是国内某高校学生)好文:《非技术文—关于英语和如何正确的提问》,“英语”和“会提问”是学习的两大利器。
CSDN实用技巧博文:
边栏推荐
- Mysql常用命令详细大全
- net share
- (7) Web security | penetration testing | how does network security determine whether CND exists, and how to bypass CND to find the real IP
- Interview questions for software testing - a collection of interview questions for large factories in 2022
- Oracle from entry to mastery (4th Edition)
- Should I have a separate interface assembly- Should I have a separate assembly for interfaces?
- moon
- [opencv learning] [common image convolution kernel]
- spfa AcWing 851. SPFA finding the shortest path
- Counting class DP acwing 900 Integer partition
猜你喜欢
Execute any method of any class through reflection
Linear DP acwing 902 Shortest editing distance
Unity skframework framework (XVIII), roamcameracontroller roaming perspective camera control script
LTC3307AHV 符合EMI标准,降压转换器 QCA7005-AL33 PHY
Get started REPORT | today, talk about the microservice architecture currently used by Tencent
面渣逆袭:MySQL六十六问,两万字+五十图详解!有点六
Sensor adxl335bcpz-rl7 3-axis accelerometer complies with rohs/weee
Mobile layout (flow layout)
West digital decided to raise the price of flash memory products immediately after the factory was polluted by materials
Ali was killed by two programming problems at the beginning, pushed inward again, and finally landed (he has taken an electronic offer)
随机推荐
国内首款、完全自主、基于云架构的三维CAD平台——CrownCAD(皇冠CAD)
Js1day (syntaxe d'entrée / sortie, type de données, conversion de type de données, Var et let différenciés)
Uniapp develops wechat applet Tencent map function and generates sig signature of location cloud
NTMFS4C05NT1G N-CH 30V 11.9A MOS管,PDF
Linear DP acwing 897 Longest common subsequence
Fundamentals of face recognition (facenet)
Hundreds of web page special effects can be used. Don't you come and have a look?
Oracle from entry to mastery (4th Edition)
moon
Ntmfs4c05nt1g N-ch 30V 11.9a MOS tube, pdf
架构师必须了解的 5 种最佳软件架构模式
Js6day (search, add and delete DOM nodes. Instantiation time, timestamp, timestamp cases, redrawing and reflow)
West digital decided to raise the price of flash memory products immediately after the factory was polluted by materials
Interesting interview questions
[opencv learning] [image filtering]
Record idea shortcut keys
Js4day (DOM start: get DOM element content, modify element style, modify form element attributes, setinterval timer, carousel Map Case)
Ali was killed by two programming problems at the beginning, pushed inward again, and finally landed (he has taken an electronic offer)
Explain in detail the process of realizing Chinese text classification by CNN
Js3day (array operation, JS bubble sort, function, debug window, scope and scope chain, anonymous function, object, Math object)