当前位置:网站首页>中文姓名提取(玩具代码——准头太小,权当玩闹)
中文姓名提取(玩具代码——准头太小,权当玩闹)
2022-07-02 09:46:00 【梦幻精灵_cq】
Python 官网:https://www.python.org/
Free:大咖免费“圣经”教程《 python 完全自学教程》,不仅仅是基础那么简单……
自学并不是什么神秘的东西,一个人一辈子自学的时间总是比在学校学习的时间长,没有老师的时候总是比有老师的时候多。
—— 华罗庚


基于这条评论,我“舍身”试炼了。
我正好有收录百家姓,就以百家姓和“中文取名常用字”来做了一个玩具——“中文姓名提取”。
“玩具”目录结构

代码试炼(以“三国演义.txt”、“大奉打更人_19txt”两个文本“开涮”)


本练习完整源码
#!/sur/bin/nve python
# coding: utf-8
from re import findall # 从re模块加载findall方法。
''' filename = 're_Chinese_name.py' author = '梦幻精灵_cq' time = '2022-06-29' '''
from os import system
class re_Chinese_name:
''' 从文本中提取中文姓名 '''
def __init__(self):
l = system('clear')
with open('data/firstnames_one_100.txt') as f:
self.firstnames = f.read().strip().split(',')
with open('data/firstnames_two_85.txt') as f:
self.firstnames_two = f.read().strip().split(',')
self.firstnames.extend(self.firstnames_two)
with open('data/boy_names.txt') as f:
self.names_chr = f.read()
with open('data/girl_names.txt') as f:
self.names_chr += f.read()
self.names = "".join(self.names_chr.strip().split(','))
#input(f"\n\n姓:{self.firstnames}\n名用字:{self.names_chr}")
def get_names(self, text):
''' 提取姓名,text为待从中提取姓名的文本。'''
names = []
for firstname in self.firstnames:
if firstname in text:
re_s = f"{
firstname}"r'\w{3}'
#print(re_s) # 调试胜语句。
names.extend(findall(re_s, text))
print(' 正在整理提取的姓名…… '.center(39, '~'))
names = self.isname(list(set(names)))
return set(names)
def isname(self, names_list):
''' 中文姓名判定 '''
names = []
n = self.names_chr
for name in names_list:
if name[:2] in self.firstnames_two:
if name[2] in n and name[3] in n:
names.append(name)
elif name[2] in n and name[3] not in n:
names.append(name[:-1])
else:
if name[1] in n and name[2] in n and name[3] in n:
names.append(name)
elif name[1] in n and name[2] in n:
names.append(name[:3])
elif name[1] in n:
names.append(name[:2])
return names
if __name__ == '__main__':
rn = re_Chinese_name()
names = rn.get_names(open('data/三国演义.txt').read())
names2 = rn.get_names(open('data/大奉打更人_19.txt').read())
print(f"\n\n{
u' re提取中文姓名 '.center(44, '~')}\n\n《三国演义》:\n{
','.join(names)}\n\n《大奉打更人》第一十九章:\n{
','.join(names2)}\n\n")

__上一篇:__ CSV文件格式——方便好用个头最小的数据传递方式
__下一篇:__
我的HOT博:
- 练习:银行复利计算(用 for 循环解一道初中小题)(1052阅读)
- pandas 数据类型之 DataFrame(1321阅读)
- 班里有人和我同生日难吗?(概率probability、蒙特卡洛随机模拟法)(2080阅读)
- Python字符串居中显示(1469阅读)
- 练习:求偶数和、阈值分割和求差( list 对象的两个基础小题)(1638阅读)
- 用 pandas 解一道小题(1964阅读)
- 可迭代对象和四个函数(1065阅读)
- “快乐数”判断(1226阅读)
- 罗马数字转换器(构造元素取模)(1933阅读)
- Hot:罗马数字(转换器|罗生成器)(3571阅读)
- Hot:让QQ群昵称色变的代码(26511阅读)
- Hot:斐波那契数列(递归| for )(4038阅读)
- 柱状图中最大矩形(1646阅读)
- 排序数组元素的重复起止(1236阅读)
- 电话拨号键盘字母组合(1343阅读)
- 密码强度检测器(1791阅读)
- 求列表平衡点(1812阅读)
- Hot: 字符串统计(4281阅读)
- Hot:尼姆游戏(聪明版首发)(3415阅读)尼姆游戏(优化版)(979阅读)
推荐条件点阅破千

精品文章:
- 好文力荐:《python 完全自学教程》齐伟书稿免费连载
- OPP三大特性:封装中的property
- 通过内置对象理解python'
- 正则表达式
- python中“*”的作用
- Python 完全自学手册
- 海象运算符
- Python中的 `!=`与`is not`不同
- 学习编程的正确方法
来源:老齐教室
Python 入门指南【Python 3.6.3】
好文力荐:
全栈领域优质创作者——寒佬(还是国内某高校学生)好文:《非技术文—关于英语和如何正确的提问》,“英语”和“会提问”是学习的两大利器。
CSDN实用技巧博文:
边栏推荐
- 【云原生数据库】遇到慢SQL该怎么办(上)?
- 上手报告|今天聊聊腾讯目前在用的微服务架构
- Heap acwing 839 Simulated reactor
- Modular commonjs es module
- Js4day (DOM start: get DOM element content, modify element style, modify form element attributes, setinterval timer, carousel Map Case)
- [opencv learning] [image histogram and equalization]
- Js6day (search, add and delete DOM nodes. Instantiation time, timestamp, timestamp cases, redrawing and reflow)
- Jerry's watch gets the default ringtone selection list [article]
- Direct control PTZ PTZ PTZ PTZ camera debugging (c)
- Uniapp develops wechat applet Tencent map function and generates sig signature of location cloud
猜你喜欢

Word efficiency guide - word's own template

Five best software architecture patterns that architects must understand

Direct control PTZ PTZ PTZ PTZ camera debugging (c)

面渣逆袭:MySQL六十六问,两万字+五十图详解!有点六
![[opencv learning] [common image convolution kernel]](/img/15/d1e8b8aa3c613755e64edb8c9a0f54.jpg)
[opencv learning] [common image convolution kernel]

js4day(DOM开始:获取DOM元素内容,修改元素样式,修改表单元素属性,setInterval定时器,轮播图案例)

应用LNK306GN-TL 转换器、非隔离电源

Heap acwing 838 Heap sort

Analog to digital converter (ADC) ade7913ariz is specially designed for three-phase energy metering applications

Execute any method of any class through reflection
随机推荐
[error record] cannot open "XXX" because Apple cannot check whether it contains malware
net share
Browser storage scheme
LTC3307AHV 符合EMI标准,降压转换器 QCA7005-AL33 PHY
VIM super practical guide collection of this one is enough
架构师必须了解的 5 种最佳软件架构模式
js1day(輸入輸出語法,數據類型,數據類型轉換,var和let區別)
Unity skframework framework (XIII), question module
Js10day (API phased completion, regular expression introduction, custom attributes, filtering sensitive word cases, registration module verification cases)
West digital decided to raise the price of flash memory products immediately after the factory was polluted by materials
js1day(输入输出语法,数据类型,数据类型转换,var和let区别)
Linear DP acwing 895 Longest ascending subsequence
8A 同步降压稳压器 TPS568230RJER_规格信息
ADB basic commands
Unity skframework framework (XX), VFX lab special effects library
How can attribute mapping of entity classes be without it?
Fully autonomous and controllable 3D cloud CAD: crowncad's convenient command search can quickly locate the specific location of the required command.
To bypass obregistercallbacks, you need to drive the signature method
Unity SKFramework框架(二十一)、Texture Filter 贴图资源筛选工具
Uniapp develops wechat applet Tencent map function and generates sig signature of location cloud