当前位置:网站首页>爬虫知识点总结
爬虫知识点总结
2022-07-01 08:21:00 【hellolianhua】
一,从网页上获取网页的html文件
使用webClient获取网页信息,这个方法传入url就可以获得html的字符串
public static string HttpPost(string url, string paraJsonStr)
{
WebClient webClient = new WebClient();
webClient.Headers.Add("Content-Type", "application/x-www-form-urlencoded");
byte[] postData = System.Text.Encoding.UTF8.GetBytes(paraJsonStr);
byte[] responseData = webClient.UploadData(url, "POST", postData);
string returnStr = System.Text.Encoding.UTF8.GetString(responseData);
return returnStr;
}二,正则匹配
MatchCollection matches = Regex.Matches(str, @">(.*)</h1>");
foreach (Match match in matches)
{
GroupCollection groups = match.Groups;
//group[1]是内容
Console.WriteLine(groups[1].Value);
}字典形式
Dictionary<string, string> resultDict = new Dictionary<string, string>();
foreach (Match matchedGroup in Regex.Matches(str, @"<p>(?<key>.*)</p>\n<ul>\n(?<value>[\w\W\n]*?)</ul>"))
{
Console.WriteLine(matchedGroup.Groups["key"].Value);
if (matchedGroup.Groups["key"].Value.Contains("hahhaha"))
{
resultDict.Add(matchedGroup.Groups["key"].Value, matchedGroup.Groups["value"].Value);
}
}三,写入文件
写入txt文件
using (StreamWriter sw = new StreamWriter($"C:\\hhaha.txt"))
{
resultDict.ToList().ForEach(x => { sw.WriteLine($"{x.Key}"); sw.WriteLine($"{x.Value}"); });
}
写入csv文件,其中new StreamWriter($"路径",true)这个true表示已经存在文件就不要新建,没有存在此文件,需要新建这个文件
using (StreamWriter sw = new StreamWriter($"C:\\Users\\haha1.CSV", true))
{
if (type == "people")
{
sw.WriteLine("name," + name);
//aValue是一个list,其中有逗号会跳到多个列
aValue.ToList().ForEach(x => { sw.WriteLine("hobby," + (x)); });
}else if(type== "animals")
{
sValue.ToList().ForEach(x => { sw.WriteLine("长相," + (x)); });
sw.WriteLine("");
}
else
{
}
}这样在文件上添加新的行
//File.AppendAllLines($"C:\\Users\\hobby.CSV", Value.ToArray());
边栏推荐
猜你喜欢

What is the material of 15CrMoR, mechanical properties and chemical analysis of 15CrMoR
![[untitled]](/img/be/3523d0c14d555b293673af2b6fbcff.jpg)
[untitled]

vscode自定义各个区域的颜色

Agrometeorological environment monitoring system

Li Kou 1358 -- number of substrings containing all three characters (double pointer)

Airsim radar camera fusion to generate color point cloud

一套十万级TPS的IM综合消息系统的架构实践与思考

Huawei machine test questions column subscription Guide

seaborn clustermap矩阵添加颜色块

谈谈数字化转型的几个关键问题
随机推荐
Internet of things technology is widely used to promote intelligent water automation management
[深度剖析C语言] —— 数据在内存中的存储
初学者如何正确理解google官方建议架构原则(疑问?)
【无标题】
Yolov5进阶之六目标追踪环境搭建
golang中的正则表达式使用注意事项与技巧
Connect timed out of database connection
[question brushing] character statistics [0]
Anddroid text to speech TTS implementation
华为机试真题专栏订阅指引
Tita OKR: a dashboard to master the big picture
[untitled]
vscode自定义各个区域的颜色
On June 30, 2022, the record of provincial competition + national competition of Bluebridge
01 numpy introduction
Koltin35, headline Android interview algorithm
Airsim radar camera fusion to generate color point cloud
Book of quantitative trading - reading notes of the man who conquers the market
[Yu Yue education] Shandong Vocational College talking about railway reference materials
Precautions and skills in using regular expressions in golang