当前位置:网站首页>爬虫知识点总结
爬虫知识点总结
2022-07-01 08:21:00 【hellolianhua】
一,从网页上获取网页的html文件
使用webClient获取网页信息,这个方法传入url就可以获得html的字符串
public static string HttpPost(string url, string paraJsonStr)
{
WebClient webClient = new WebClient();
webClient.Headers.Add("Content-Type", "application/x-www-form-urlencoded");
byte[] postData = System.Text.Encoding.UTF8.GetBytes(paraJsonStr);
byte[] responseData = webClient.UploadData(url, "POST", postData);
string returnStr = System.Text.Encoding.UTF8.GetString(responseData);
return returnStr;
}二,正则匹配
MatchCollection matches = Regex.Matches(str, @">(.*)</h1>");
foreach (Match match in matches)
{
GroupCollection groups = match.Groups;
//group[1]是内容
Console.WriteLine(groups[1].Value);
}字典形式
Dictionary<string, string> resultDict = new Dictionary<string, string>();
foreach (Match matchedGroup in Regex.Matches(str, @"<p>(?<key>.*)</p>\n<ul>\n(?<value>[\w\W\n]*?)</ul>"))
{
Console.WriteLine(matchedGroup.Groups["key"].Value);
if (matchedGroup.Groups["key"].Value.Contains("hahhaha"))
{
resultDict.Add(matchedGroup.Groups["key"].Value, matchedGroup.Groups["value"].Value);
}
}三,写入文件
写入txt文件
using (StreamWriter sw = new StreamWriter($"C:\\hhaha.txt"))
{
resultDict.ToList().ForEach(x => { sw.WriteLine($"{x.Key}"); sw.WriteLine($"{x.Value}"); });
}
写入csv文件,其中new StreamWriter($"路径",true)这个true表示已经存在文件就不要新建,没有存在此文件,需要新建这个文件
using (StreamWriter sw = new StreamWriter($"C:\\Users\\haha1.CSV", true))
{
if (type == "people")
{
sw.WriteLine("name," + name);
//aValue是一个list,其中有逗号会跳到多个列
aValue.ToList().ForEach(x => { sw.WriteLine("hobby," + (x)); });
}else if(type== "animals")
{
sValue.ToList().ForEach(x => { sw.WriteLine("长相," + (x)); });
sw.WriteLine("");
}
else
{
}
}这样在文件上添加新的行
//File.AppendAllLines($"C:\\Users\\hobby.CSV", Value.ToArray());
边栏推荐
- leetcode T31:下一排列
- When using charts to display data, the time field in the database is repeated. How to display the value at this time?
- 量化交易之读书篇 - 《征服市场的人》读书笔记
- SPL Introduction (I)
- 《微机原理》-绪论
- Provincial selection + noi Part II string
- 【无标题】
- Programmer's regimen
- Li Kou 1358 -- number of substrings containing all three characters (double pointer)
- [JS reverse] MD5 encryption parameter cracking
猜你喜欢

shardingSphere

01 NumPy介绍
![[JS reverse] MD5 encryption parameter cracking](/img/06/0610045d287f646479d6eb5021a067.png)
[JS reverse] MD5 encryption parameter cracking

Huawei machine test questions column subscription Guide

Conception et mise en service du processeur - chapitre 4 tâches pratiques

Suivi des cibles de manoeuvre - - mise en oeuvre du modèle statistique actuel (modèle CS) filtre Kalman étendu / filtre Kalman sans trace par MATLAB

Intelligent constant pressure irrigation system

DID的使用指南,原理

Agrometeorological environment monitoring system

OJ输入输出练习
随机推荐
【js逆向】md5加密参数破解
Yolov5进阶之六目标追踪环境搭建
Airsim radar camera fusion to generate color point cloud
使用beef劫持用户浏览器
网关gateway-88
事务方法调用@Transactional
XX攻击——反射型 XSS 攻击劫持用户浏览器
Huawei machine test questions column subscription Guide
SPL-介绍(一)
Hijacking a user's browser with beef
Mavros sends a custom topic message to Px4
Intelligent constant pressure irrigation system
Utiliser Beef pour détourner le navigateur utilisateur
Intelligent water and fertilizer integrated control system
使用beef劫持用戶瀏覽器
[deep analysis of C language] - data storage in memory
Serial port oscilloscope software ns-scope
串口转WIFI模块通信
防“活化”照片蒙混过关,数据宝“活体检测+人脸识别”让刷脸更安全
How to recruit Taobao anchor suitable for your own store