当前位置:网站首页>A scheme for crawlers to collect public opinion data
A scheme for crawlers to collect public opinion data
2022-06-24 12:33:00 【User 6172015】
A web crawler simply refers to a web site that is accessed through a crawler program API Connect to get data information . The crawler program can crawl the required data information from the web page , Then save it in the new document . The web crawler supports the collection of various data , file , picture . Video and so on can be collected , But you can't collect illegal business . In the era of Internet big data , Web crawler is mainly for search engines to provide the most comprehensive and up-to-date data , A web crawler is also a crawler program that collects data from the Internet .
We can also collect public opinion data through web crawlers , You can collect news , social contact , Forum , Blog and other information data . This is also one of the common public opinion data acquisition schemes . Generally, the crawler agent is used through the crawler program IP Data collection for some meaningful websites . Public opinion data can also be purchased through the data trading market , Or find those professional public opinion analysis teams to obtain , But generally speaking , Professional public opinion analysis team , They also use agents through crawlers IP To collect relevant data , So as to analyze public opinion data .
Due to the popularity of short videos , Tiktok , Kwai these two mainstream short videos APP, We can also use the crawler program to collect Tiktok , Kwai conducts public opinion data analysis . Generate the statistical data into tables , It is provided to you as a data report , You can also refer to the following acquisition scheme codes :
// Target page to visit
string targetUrl = "http://httpbin.org/ip";
// proxy server ( The product's official website www.16yun.cn)
string proxyHost = "http://t.16yun.cn";
string proxyPort = "31111";
// Proxy authentication information
string proxyUser = "username";
string proxyPass = "password";
// Setting up a proxy server
WebProxy proxy = new WebProxy(string.Format("{0}:{1}", proxyHost, proxyPort), true);
ServicePointManager.Expect100Continue = false;
var request = WebRequest.Create(targetUrl) as HttpWebRequest;
request.AllowAutoRedirect = true;
request.KeepAlive = true;
request.Method = "GET";
request.Proxy = proxy;
//request.Proxy.Credentials = CredentialCache.DefaultCredentials;
request.Proxy.Credentials = new System.Net.NetworkCredential(proxyUser, proxyPass);
// Set up Proxy Tunnel
// Random ran=new Random();
// int tunnel =ran.Next(1,10000);
// request.Headers.Add("Proxy-Tunnel", String.valueOf(tunnel));
//request.Timeout = 20000;
//request.ServicePoint.ConnectionLimit = 512;
//request.UserAgent = "Mozilla/5.0 (Windows NT 6.3; Win64; x64) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/48.0.2564.82 Safari/537.36";
//request.Headers.Add("Cache-Control", "max-age=0");
//request.Headers.Add("DNT", "1");
//String encoded = System.Convert.ToBase64String(System.Text.Encoding.GetEncoding("ISO-8859-1").GetBytes(proxyUser + ":" + proxyPass));
//request.Headers.Add("Proxy-Authorization", "Basic " + encoded);
using (var response = request.GetResponse() as HttpWebResponse)
using (var sr = new StreamReader(response.GetResponseStream(), Encoding.UTF8))
{
string htmlStr = sr.ReadToEnd();
}边栏推荐
- The operation and maintenance boss laughed at me. Don't you know that?
- A good habit that makes your programming ability soar
- 文本转语音功能上线,可以体验专业播音员的服务,诚邀试用
- Opencv learning notes -- Separation of color channels and multi-channel mixing
- As one of the bat, what open source projects does Tencent have?
- How can I open an account with new bonds? Is it safe
- VaR in PHP_ export、print_ r、var_ Differences in dump debugging
- [redisson] analysis of semaphore lock source code
- Continuous testing | making testing more free: practicing automated execution of use cases in coding
- Deep learning ~11+ a new perspective on disease-related miRNA research
猜你喜欢

Opencv learning notes -- Separation of color channels and multi-channel mixing

电商红包雨是如何实现的?拿去面试用(典型高并发)

GTEST from getting started to getting started

Opencv learning notes - Discrete Fourier transform
Database migration tool flyway vs liquibase (II)

ArrayList # sublist these four holes, you get caught accidentally

Opencv learning notes - regions of interest (ROI) and image blending

Group planning - General Review
[Old Wei makes machines] issue 090: keyboard? host? Full function keyboard host!

我真傻,招了一堆只会“谷歌”的程序员!
随机推荐
What are the software prototyping tools?
Coinbase will launch the first encrypted derivative product for retail traders
Install Kali on the U disk and persist it
嵌入式必学!硬件资源接口详解——基于ARM AM335X开发板 (下)
Deep learning ~11+ a new perspective on disease-related miRNA research
mRNA疫苗的研制怎么做?27+ 胰腺癌抗原和免疫亚型的解析来告诉你答案!
Remote terminal RTU slope monitoring and early warning
Group planning - General Review
Install MySQL in docker and modify my CNF profile
How does easygbs, a national standard platform, solve the problem that information cannot be carried across domains?
RTMP streaming platform easydss video on demand interface search bar development label fuzzy query process introduction
怎么可以打新债 开户是安全的吗
Conceptual analysis of DDD Domain Driven Design
11+的基于甲基化组和转录组综合分析识别葡萄膜黑色素瘤中新的预后 DNA 甲基化特征~
GTEST from getting started to getting started
I'm in Shenzhen. Where can I open an account? Is it safe to open an account online now?
Use the object selection tool to quickly create a selection in Adobe Photoshop
Kubernetes log viewer - kubetail
Installation and operation of libuv
How can a shell script (.Sh file) not automatically close or flash back after execution?