当前位置:网站首页>Remember to use pdfbox once to parse PDF and obtain the key data of PDF
Remember to use pdfbox once to parse PDF and obtain the key data of PDF
2022-07-28 13:43:00 【Hua Weiyun】
Because of a need , Need to get pdf Data in , For me who has never had similar experience , It still seems unknown , All unknowns are often difficult to judge the workload , So after saying the demand , I will understand this immediately pdf Parsing tool , With understanding , I know some parsing tools , The final parsing code is just a few lines , It's simple . Compared with the unknown pressure before , It's much easier . Then there will be similar pdf Handle , Have similar experience .
pdfbox
Of course, there are other parsing tools , Such as iTika,iText,pdfparser etc. , I will only introduce this time I use , Others should be similar , Please study by yourself .
Actual code example :
import java.awt.image.BufferedImage;import java.io.File;import java.io.FileOutputStream;import java.io.IOException;import java.util.Iterator;import java.util.UUID;import javax.imageio.ImageIO;import org.apache.pdfbox.cos.COSName;import org.apache.pdfbox.pdmodel.PDDocument;import org.apache.pdfbox.pdmodel.PDPage;import org.apache.pdfbox.pdmodel.PDResources;import org.apache.pdfbox.pdmodel.encryption.InvalidPasswordException;import org.apache.pdfbox.pdmodel.graphics.image.PDImageXObject;import org.apache.pdfbox.text.PDFTextStripper;public class PdfTest { public static void main(String[] args) { String path = "D:\\temp\\temp\\test.pdf"; File file = new File(path); // Use as much as possible try-with-resource Instead of try-catch-finally try (PDDocument document = PDDocument.load(file)) { int pageSize = document.getNumberOfPages(); // Read page by page for (int i = 0; i < pageSize; i++) { // Text content PDFTextStripper stripper = new PDFTextStripper(); // Set sequential output stripper.setSortByPosition(true); stripper.setStartPage(i + 1); stripper.setEndPage(i + 1); String text = stripper.getText(document); System.out.println(text.trim()); System.out.println("-=-=-=-=-=-=-=-=-=-=-=-=-"); // Picture content PDPage page = document.getPage(i); PDResources resources = page.getResources(); Iterable<COSName> cosNames = resources.getXObjectNames(); if (cosNames != null) { Iterator<COSName> cosNamesIter = cosNames.iterator(); while (cosNamesIter.hasNext()) { COSName cosName = cosNamesIter.next(); if (resources.isImageXObject(cosName)) { PDImageXObject Ipdmage = (PDImageXObject) resources.getXObject(cosName); BufferedImage image = Ipdmage.getImage(); try (FileOutputStream out = new FileOutputStream("D:\\temp\\temp\\" + UUID.randomUUID() + ".png")) { ImageIO.write(image, "png", out); } catch (IOException e) { } } } } } } catch (InvalidPasswordException e) { } catch (IOException e) { } }}Close test available , Basically, there is no need to change the code , Just put the target file in the target location , You can test .
PDDocument.load(file) You can also pass in the input stream , It is more convenient to handle in this way .
边栏推荐
- FFT海浪模拟
- 接口调不通,如何去排查?没想到10年测试老鸟栽在这道面试题上
- Map tiles: detailed explanation of vector tiles and grid tiles
- 基于神经网络的帧内预测和变换核选择
- C language: quick sorting of sequential storage structure
- 持续(集成--&gt;交付--&gt;部署)
- Force buckle 2354. Number of high-quality pairs
- Debezium系列之:2.0.0.Beta1的重大变化和新特性
- JS encapsulation at a glance
- Leetcode notes 118. Yang Hui triangle
猜你喜欢
随机推荐
Realize the mutual value transfer between main window and sub window in WPF
Can second uncle cure young people's spiritual internal friction?
夜神模拟器抓包微信小程序
使用 Fail2ban 保护 Web 服务器免受 DDoS 攻击
[报错]使用ssh登陆到另一台机器后,发现主机名还是自己|无法访问yarn8088
FFT海浪模拟
Force buckle 2354. Number of high-quality pairs
面经整理,助力秋招,祝你称为offer收割机
Parent and child of treeselect
持续(集成--&gt;交付--&gt;部署)
Org.apache.ibatis.exceptions.toomanyresultsexception
Is azvudine, a domestic oral new coronal drug, safe? Expert authority interpretation
How to check if the interface cannot be adjusted? I didn't expect that the old bird of the 10-year test was planted on this interview question
Go language - Application of stack - expression evaluation
FFT wave simulation
二舅能治好年轻人的精神内耗吗?
vim常用命令详解(vim使用教程)
Better and more modern terminal tools than xshell!
JWT 登录认证 + Token 自动续期方案,写得太好了!
C language: quick sorting of sequential storage structure









