当前位置:网站首页>C#/VB.NET 从PDF中提取表格
C#/VB.NET 从PDF中提取表格
2022-08-03 10:56:00 【InfoQ】
程序环境:
从PDF中提取表格具体步骤:
- 实例化PdfDocument类的对象并调用PdfDocument.LoadFromFile()方法加载文档。
- 通过 PdfTableExtractor.ExtractTable(intpageIndex) 方法提取指定页面中的表格。
- 通过 PdfTable.GetText(int rowIndex, intcolumnIndex) 方法将获取具体行和列中的单元格文本内容。
- 将获取的表格内容保存为TXT文件。
完整代码:
using Spire.Pdf;
using Spire.Pdf.Utilities;
using System.IO;
using System.Text;
namespace ExtractTable
{
class Program
{
static void Main(string[] args)
{
//实例化PdfDocument类的对象
PdfDocument pdf = new PdfDocument();
//加载PDF文档
pdf.LoadFromFile("编程语言1.pdf");
//创建StringBuilder类的对象
StringBuilder builder = new StringBuilder();
//实例化PdfTableExtractor类的对象
PdfTableExtractor extractor = new PdfTableExtractor(pdf);
//声明PdfTable类的表格数组
PdfTable[] tableLists;
//遍历PDF页面
for (int pageIndex = 0; pageIndex < pdf.Pages.Count; pageIndex++)
{
//从页面提取表格
tableLists = extractor.ExtractTable(pageIndex);
//判断表格列表是否为空
if (tableLists != null && tableLists.Length > 0)
{
//遍历表格
foreach (PdfTable table in tableLists)
{
//获取表格中的行和列数
int row = table.GetRowCount();
int column = table.GetColumnCount();
//遍历表格行和列
for (int i = 0; i < row; i++)
{
for (int j = 0; j < column; j++)
{
//获取行和列中的文本
string text = table.GetText(i, j);
//写入文本到StringBuilder容器
builder.Append(text + " ");
}
builder.Append("\r\n");
}
}
}
}
//保存提取的表格内容为txt文档
File.WriteAllText("提取表格.txt", builder.ToString());
}
}
}Imports Spire.Pdf
Imports Spire.Pdf.Utilities
Imports System.IO
Imports System.Text
Namespace ExtractTable
Class Program
Private Shared Sub Main(args As String())
'实例化PdfDocument类的对象
Dim pdf As New PdfDocument()
'加载PDF文档
pdf.LoadFromFile("编程语言1.pdf")
'创建StringBuilder类的对象
Dim builder As New StringBuilder()
'实例化PdfTableExtractor类的对象
Dim extractor As New PdfTableExtractor(pdf)
'声明PdfTable类的表格数组
Dim tableLists As PdfTable()
'遍历PDF页面
For pageIndex As Integer = 0 To pdf.Pages.Count - 1
'从页面提取表格
tableLists = extractor.ExtractTable(pageIndex)
'判断表格列表是否为空
If tableLists IsNot Nothing AndAlso tableLists.Length > 0 Then
'遍历表格
For Each table As PdfTable In tableLists
'获取表格中的行和列数
Dim row As Integer = table.GetRowCount()
Dim column As Integer = table.GetColumnCount()
'遍历表格行和列
For i As Integer = 0 To row - 1
For j As Integer = 0 To column - 1
'获取行和列中的文本
Dim text As String = table.GetText(i, j)
'写入文本到StringBuilder容器
builder.Append(text & Convert.ToString(" "))
Next
builder.Append(vbCr & vbLf)
Next
Next
End If
Next
'保存提取的表格内容为txt文档
File.WriteAllText("提取表格.txt", builder.ToString())
End Sub
End Class
End Namespace
效果图

边栏推荐
- 深入解析分布式文件系统的一致性的实现
- CADEditorX ActiveX 14.1.X
- 程序员架构修炼之道:如何设计出可持续演进的系统架构?
- The way of programmer architecture practice: how to design a sustainable evolution system architecture?
- RecyclerView的item高度自适应
- 从餐桌到太空,孙宇晨的“星辰大海”
- 「全球数字经济大会」登陆 N 世界,融云提供通信云服务支持
- Why is the new earth blurred, in-depth analysis of white balls, viewing pictures, and downloading problems
- 混动产品谁更吃香,看技术还是看市场?
- 面试官:工作两年了,这么简单的算法题你都不会?
猜你喜欢

synchronized

type="module" you know, but type="importmap" you know

谷歌实用插件分享

4 g acquisition ModbusTCP turn JSON MQTT cloud platform

Depth study of 100 cases - convolution neural network (CNN) to realize the clothing image classification

3分钟实现内网穿透(基于ngrok实现)

QT with OpenGL(Shadow Mapping)(面光源篇)

Spinner文字显示不全解决办法
![[Detailed explanation of binary search plus recursive writing method] with all the code](/img/51/c4960575a59f8ca7f161b310e47b27.png)
[Detailed explanation of binary search plus recursive writing method] with all the code

STM32+OLED显示屏制作指针式电子钟
随机推荐
[Detailed explanation of binary search plus recursive writing method] with all the code
Advanced use of MySQL database
Question G: Word Analysis ← Questions for the second provincial competition of the 11th Blue Bridge Cup Competition
redis基础知识总结——数据类型(字符串,列表,集合,哈希,集合)
Mysql OCP 74 questions
机器学习(公式推导与代码实现)--sklearn机器学习库
error C2872: “flann”: 不明确的符号 解决方法
数字藏品和ICP
深度学习100例——卷积神经网络(CNN)实现服装图像分类
gbase在轨道交通一般都采用哪种高可用架构?
Binary search tree (search binary tree) simulation implementation (there is a recursive version)
RecyclerView的item高度自适应
[Explanation of JDBC and inner classes]
Regulation action for one hundred days during the summer, more than 700 traffic safety hidden dangers were thrown out
LeetCode第三题(Longest Substring Without Repeating Characters)三部曲之二
三大产品力赋能欧萌达OMODA5
3分钟实现内网穿透(基于ngrok实现)
孙宇晨式“溢价逻辑”:不局限眼前,为全人类的“星辰大海”大胆下注
Apache Doris系列之:数据模型
Pixel mobile phone system