Automatically download and crop key information from the arxiv daily paper.

Last update: Jul 30, 2022

Related tags

Overview

Arxiv daily 速览

功能：按关键词筛选arxiv每日最新paper，自动获取摘要，自动截取文中表格和图片。

1 测试环境

Ubuntu 16+
Python3.7
torch 1.9
Colab GPU

2 使用演示

首先下载权重baiduyun 提取码:il87，放置于code/ParseServer/models/PubLayNet/faster_rcnn_R_50_FPN_3x/model_final.pth

2.1 环境安装

可选择在本地使用或Colab使用，以本地使用为例。

1.提前安装Pytorch GPU版本
2.在本项目根目录启动jupyter notebook，运行Overview_RUNME_Local.ipynb
3.首次运行，先安装环境

4.运行文档版面分析服务，确认正常启动后再运行下一步

5.按照需要填写关键词进行筛选，如果需要PDF文件needPDF=True，需要将结果打包needZip=True

6.启动后，将同时进行下载和文档版面分析，截取需要的内容。下载的文件将保存在./arxiv 目录下，如果needZip=True，会产生 ./arxiv.zip 文件。

2.2 Colab

将code目录压缩上传 google drive根目录
使用Colab运行Overview_RUNME_Colab.ipynb，后续步骤同2.1

3 效果展示

本地解压后，使用Typora markdown阅览工具可进行查看。

每个文件夹中的abs.md文件保留的是当前pdf的介绍。

ps:排版不规范会导致截图混乱，这也侧面说明了文章质量。

其他

ps:本着能用就行"堆屎山"代码，有bug描述清楚提issue，定期维护。

Automatically download and crop key information from the arxiv daily paper.

Related tags

Overview

Arxiv daily 速览

1 测试环境

2 使用演示

2.1 环境安装

2.2 Colab

3 效果展示

其他

Owner

HeoLis

Async Python 3.6+ web scraping micro-framework based on asyncio

Instagram_scrapper - This project allow you to scrape the list of followers, following or both from a public Instagram account, and create a csv or excel file easily.

Extract gene TSS site form gencode/ensembl/gencode database GTF file and export bed format file.

API to parse tibia.com content into python objects.

Python based Web Scraper which can discover javascript files and parse them for juicy information (API keys, IP's, Hidden Paths etc)

A simple django-rest-framework api using web scraping

淘宝茅台抢购最新优化版本，淘宝茅台秒杀，优化了茅台抢购线程队列

Divar.ir Ads scrapper

This is a python api to scrape search results from a url.

A Python Covid-19 cases tracker that scrapes data off the web and presents the number of Cases, Recovered Cases, and Deaths that occurred because of the pandemic.

Haphazard scripts for scraping bitcoin/bitcoin data from GitHub

Github scraper app is used to scrape data for a specific user profile created using streamlit and BeautifulSoup python packages

A Telegram crawler to search groups and channels automatically and collect any type of data from them.

An Automated udemy coupons scraper which scrapes coupons and autopost the result in blogspot post

基于Github Action的定时HITsz疫情上报脚本，开箱即用

A web crawler for recording posts in "sina weibo"

Newsscraper - A simple Python 3 module to get crypto or news articles and their content from various RSS feeds.

Linkedin webscraping - Linkedin web scraping with python

Web scrapping tool written in python3, using regex, to get CVEs, Source and URLs.

This is my CS 20 final assesment.