当前位置:网站首页>“Ran out of input” while use WikiExtractor
“Ran out of input” while use WikiExtractor
2022-06-09 05:18:00 【kaims】
当使用Wikipedia Extractor(GitHub - attardi/wikiextractor: A tool for extracting plain text from Wikipedia dumps)工具处理所下载的wiki dump文件(https://dumps.wikimedia.org/zhwiki/latest/zhwiki-latest-pages-articles.xml.bz2)时,当我执行python命令:
python Wikiextractor.py -b 10M -o zh_extracted zhwiki-latest-pages-articles.xml.bz2时,出现了
EOFError: Ran out of input的错误。
经过百度和google,在wikidata - "EOFError: Ran out of input" while use Wikipedia Extractor as a parser for Wikipedia Data Dump File - Stack Overflow中找到了解决方法:可能是因为windows系统的stringIO问题导致,如果换用linux系统运行的话就不会有问题。
边栏推荐
- AQS 之 CyclicBarrier 源码分析
- Requests segmented downloading of files and multi-threaded downloading
- Article title
- Rotate array leetcode
- Windows uses php to start ThinkPHP project and deploy configuration
- Quickly detect high-risk vulnerabilities of common middleware and components in penetration testing
- P1743 Audiophobia
- Why do I need a thread pool? What is pooling technology?
- Built in objects for typescript
- Typescript learning [5] type
猜你喜欢

Windows uses php to start ThinkPHP project and deploy configuration

【IT】福昕pdf保持工具选择

Interview process and thread

R language multivariable generalized orthogonal GARCH (go-garch) model for fitting and forecasting high-dimensional volatility time series of stock market

Lighting - brightness attenuation of light

How WPS ppt pictures come out one by one
![[it] Fuxin PDF Keeping Tool Selection](/img/1e/87dbd435e830c139bc3d5cf86d6d57.png)
[it] Fuxin PDF Keeping Tool Selection

How to change the color of WPS ppt background picture

SQL summary statistics: use cube and rollup in SQL to realize multidimensional data summary

内网渗透 - 哈希传递攻击
随机推荐
Missing digit JS in sword finger 0~n-1
R language multivariable generalized orthogonal GARCH (go-garch) model for fitting and forecasting high-dimensional volatility time series of stock market
Data summit 2022 conference information sharing (23 in total)
1- enter the database
Apache Devlake 代码库导览
(Application of reflection + introspection mechanism) processing the result set of JDBC
由id获取name调用示例(腾讯IM)
^25进程与线程
Windows uses php to start ThinkPHP project and deploy configuration
软键盘出现搜索
Apache devlake code base guide
P1779 小胡同学的跳板
Pattern recognition big job PCA & Fisher & KNN & kmeans
Lighting - 光的亮度衰减
Heap and priority queues
AQS 之 ReentrantReadWriteLock 源码分析
The 27th issue of product weekly report | members' new interests of black users; CSDN app v5.1.0 release
Penetration test path dictionary, blasting dictionary
wps ppt图片如何一张一张出来
myql报错 Expression #1 of SELECT list is not in GROUP BY clause and contains nonaggregated column