当前位置:网站首页>爬虫知识点总结
爬虫知识点总结
2022-07-01 08:21:00 【hellolianhua】
一,从网页上获取网页的html文件
使用webClient获取网页信息,这个方法传入url就可以获得html的字符串
public static string HttpPost(string url, string paraJsonStr)
{
WebClient webClient = new WebClient();
webClient.Headers.Add("Content-Type", "application/x-www-form-urlencoded");
byte[] postData = System.Text.Encoding.UTF8.GetBytes(paraJsonStr);
byte[] responseData = webClient.UploadData(url, "POST", postData);
string returnStr = System.Text.Encoding.UTF8.GetString(responseData);
return returnStr;
}二,正则匹配
MatchCollection matches = Regex.Matches(str, @">(.*)</h1>");
foreach (Match match in matches)
{
GroupCollection groups = match.Groups;
//group[1]是内容
Console.WriteLine(groups[1].Value);
}字典形式
Dictionary<string, string> resultDict = new Dictionary<string, string>();
foreach (Match matchedGroup in Regex.Matches(str, @"<p>(?<key>.*)</p>\n<ul>\n(?<value>[\w\W\n]*?)</ul>"))
{
Console.WriteLine(matchedGroup.Groups["key"].Value);
if (matchedGroup.Groups["key"].Value.Contains("hahhaha"))
{
resultDict.Add(matchedGroup.Groups["key"].Value, matchedGroup.Groups["value"].Value);
}
}三,写入文件
写入txt文件
using (StreamWriter sw = new StreamWriter($"C:\\hhaha.txt"))
{
resultDict.ToList().ForEach(x => { sw.WriteLine($"{x.Key}"); sw.WriteLine($"{x.Value}"); });
}
写入csv文件,其中new StreamWriter($"路径",true)这个true表示已经存在文件就不要新建,没有存在此文件,需要新建这个文件
using (StreamWriter sw = new StreamWriter($"C:\\Users\\haha1.CSV", true))
{
if (type == "people")
{
sw.WriteLine("name," + name);
//aValue是一个list,其中有逗号会跳到多个列
aValue.ToList().ForEach(x => { sw.WriteLine("hobby," + (x)); });
}else if(type== "animals")
{
sValue.ToList().ForEach(x => { sw.WriteLine("长相," + (x)); });
sw.WriteLine("");
}
else
{
}
}这样在文件上添加新的行
//File.AppendAllLines($"C:\\Users\\hobby.CSV", Value.ToArray());
边栏推荐
- 《MATLAB 神经网络43个案例分析》:第30章 基于随机森林思想的组合分类器设计——乳腺癌诊断
- Data analysis notes 11
- Intelligent water conservancy solution
- Airsim雷达相机融合生成彩色点云
- 機動目標跟踪——當前統計模型(CS模型)擴展卡爾曼濾波/無迹卡爾曼濾波 matlab實現
- Use threejs simple Web3D effect
- [深度剖析C语言] —— 数据在内存中的存储
- XX攻击——反射型 XSS 攻击劫持用户浏览器
- Using settoolkit to forge sites to steal user information
- What is the material of 16MnDR, the minimum service temperature of 16MnDR, and the chemical composition of 16MnDR
猜你喜欢

SPL-介绍(一)

Suivi des cibles de manoeuvre - - mise en oeuvre du modèle statistique actuel (modèle CS) filtre Kalman étendu / filtre Kalman sans trace par MATLAB

The data analyst will be ruined without project experience. These 8 project resources will not be taken away

factory type_id::create过程解析

Redis publish subscription

seaborn clustermap矩阵添加颜色块
![[untitled]](/img/b9/6922875009c2d29224a26ed2a22b01.jpg)
[untitled]

Field agricultural irrigation system

网关gateway-88

15Mo3 German standard steel plate 15Mo3 chemical composition 15Mo3 mechanical property analysis of Wuyang Steel Works
随机推荐
AES简单介绍
[dynamic planning] p1020 missile interception (variant of the longest increasing subsequence)
How can beginners correctly understand Google's official suggested architectural principles (questions?)
SPL-介绍(一)
leetcode T31:下一排列
What is the material of 16MnDR, the minimum service temperature of 16MnDR, and the chemical composition of 16MnDR
Practice and Thinking on the architecture of a set of 100000 TPS im integrated message system
Using settoolkit to forge sites to steal user information
SPL Introduction (I)
Rumtime 1200 upgrade: London upgrade support, pledge function update and more
P4 installation bmv2 detailed tutorial
【华为机试真题详解】判断字符串子序列【2022 Q1 Q2 | 200分】
Yolov5进阶之七目标追踪最新环境搭建
factory type_id::create过程解析
On several key issues of digital transformation
[untitled]
DID的使用指南,原理
OJ input and output exercise
The era of low threshold programmers is gone forever behind the sharp increase in the number of school recruitment for Internet companies
Li Kou 1358 -- number of substrings containing all three characters (double pointer)