当前位置:网站首页>C # reading table data in word
C # reading table data in word
2022-06-12 22:13:00 【ViperL1】
A few days ago, there was a project that needed to start from word Take table data from the file and process it , Most of the online solutions are based on office Of com Component implementation , But there's a drawback , If the computer is not installed office Will not be available , Because of previous operations excel It's all used NPOI, So of course I want to use NPOI Solve this problem .
So I found the following code
private List<string> GetDoc(string Path)
{
if (Path == "")
return null; // The file path is empty
List<string> Result = new List<string>(); // Result container
FileStream stream = new FileStream(Path, FileMode.Open); // Turn on the flow
XWPFDocument docx = new XWPFDocument(stream);
var list = new List<XWPFTableCell>();
// Loop through table contents
foreach (var row in docx.Tables[0].Rows)
{
foreach (var cell in row.GetTableCells())
{
if (!list.Contains(cell))
{
list.Add(cell);
Result.Add(cell.GetText());
}
}
}
stream.Close();
return Result; // Close file stream ( Is the key , Otherwise, the next file cannot be opened )
}
But there is another drawback to this ,NPOI Support only .docx File format , If reading .doc Will report an error directly !
Then we found another open source component freeSpire. There are the following codes
private List<string> GetDocX(string Path)
{
if (Path == "")
return null; // The file path is empty
List<string> Result = new List<string>();
Spire.Doc.Document doc = new Spire.Doc.Document();
doc.LoadFromFile(Path);
TextBox textbox = doc.TextBoxes[0];
Spire.Doc.Table table = textbox.Body.Tables[0] as Spire.Doc.Table;
foreach (TableRow row in table.Rows)
{
foreach (TableCell cell in row.Cells)
{
foreach (Paragraph paragraph in cell.Paragraphs)
{
Result.Add(paragraph.Text);
}
}
}
return Result;
}But I don't know why , It's not grabbing .doc The form in the file .

And then tried its getText() Function to determine whether text content can be directly grabbed , The preliminary judgment may be the format problem .

I have considered writing matching functions to analyze the text content , But because the format is too complex , Many commonality problems cannot be solved before giving up . If the format is not complicated , It is also a solution .
The final method is First use of Spire Components will .doc Convert to .docx And then use it NPOI Content processing , Effect grouping !!!
private string ChangeToDocx(string Path)
{
if (Path == "")
return ""; // The file path is empty
List<string> Result = new List<string>();
Spire.Doc.Document doc = new Spire.Doc.Document();
doc.LoadFromFile(Path); // Open file
Path.Replace(".doc", "docx"); // Replace suffix
doc.SaveToFile(Path, FileFormat.Docx); // Save as .doc
return Path;
}The main function is called as follows :( If it is not .doc Then there is no need to convert to save overhead )
if (Path.Contains(".doc"))
{
string newPath = ChangeToDocx(Path);
result = GetDoc(newPath);
}
result = GetDoc(Path);边栏推荐
- be careful! Your Navicat may have been poisoned
- 动态规划之如何将问题抽象转化为0-1背包问题(详解利用动态规划求方案数)
- Yyds dry goods inventory solution Huawei machine test: weighing weight
- 数据库每日一题---第10天:组合两个表
- "Oracle database parallel execution" technical white paper reading notes
- Role of volatile keyword
- Mr. Sun's version of JDBC (21:34:25, June 12, 2022)
- 【Proteus仿真】简易数码管定时器时钟
- How to write a vscode plug-in by yourself to realize plug-in freedom!
- Is it safe to open an account with new bonds? How should novices operate?
猜你喜欢

MySQL介绍和安装(一)

"Oracle database parallel execution" technical white paper reading notes

最近公共祖先问题你真的学会了吗?

Dolphin-2.0.3 cluster deployment document

RAID disk array

建立高可用的数据库

NoSQL - redis configuration and optimization (II) high availability, persistence and performance management

SQL tuning guide notes 15:controlling the use of optimizer statistics

Audio and video technology development weekly 𞓜 234
![[C language] data type occupation](/img/12/e0f9679076d89fb5bd993ee3c345bf.jpg)
[C language] data type occupation
随机推荐
OceanBase 社区版 OCP 功能解读
2022-02-28 incluxdb high availability planning
Permission to query execution plan in Oracle Database
PE安装win10系统
SQL tuning guide notes 15:controlling the use of optimizer statistics
动态规划之如何将问题抽象转化为0-1背包问题(详解利用动态规划求方案数)
[Jianzhi offer simple] Jianzhi offer 06 Print linked list from end to end
Oracle livelabs experiment: introduction to Oracle Spatial
最近公共祖先问题你真的学会了吗?
Ansible foundation and common modules (I)
The programmer dedicated to promoting VIM has left. Father of vim: I will dedicate version 9.0 to him
Role of volatile keyword
[image denoising] image denoising based on trilateral filter with matlab code
[medium] 78 Subset (backtracking shall be supplemented later)
[proteus simulation] simple digital tube timer clock
【QNX Hypervisor 2.2 用戶手册】4.2 支持的構建環境
疼痛分级为什么很重要?
Ansible playbook and variable (II)
2021 rust survey results released: 9354 questionnaires collected
Leetcode: the maximum number of building change requests that can be reached (if you see the amount of data, you should be mindless)