当前位置:网站首页>33 - nodejs simple proxy pool (it's estimated that it's over) the painful experience of using proxy by SuperAgent and the need to be careful

33 - nodejs simple proxy pool (it's estimated that it's over) the painful experience of using proxy by SuperAgent and the need to be careful

2022-06-09 10:36:00 BerryBC

zero . despair

I don't understand how many years ago Young What do I think :
Forty four -NodeJS Simple proxy pool ( rise )
Did I ever test at that time ?
There was a fatal mistake .

Cause this :
thirty-two - NodeJS Simple proxy pool ( Over and over ?) And SuperAgent Using a proxy is not Timeout Of Timeout

In fact, there is no idea to look directly at the bottom .
That's the conclusion .

one . Continue to troubleshoot TCP Too many connections

 Why on earth !?
I am very confused , So check netstat, Here is the process :

[[email protected] ~]$ sudo netstat -tnlpoa

tcp        0      0 172.17.0.7:50136        182.54.207.74:80        ESTABLISHED 10418/node /home/Be  off (0.00/0/0)
tcp        0      1 172.17.0.7:51250        49.76.80.223:80         SYN_SENT    10418/node /home/Be  on (12.42/6/0)
tcp        0      0 172.17.0.7:53422        73.239.197.175:80       ESTABLISHED 10418/node /home/Be  off (0.00/0/0)
tcp        0      0 172.17.0.7:50170        122.192.175.180:9999    ESTABLISHED 10418/node /home/Be  off (0.00/0/0)
tcp        0      1 172.17.0.7:57588        185.100.15.48:80        SYN_SENT    10418/node /home/Be  on (11.39/5/0)
tcp        0      1 172.17.0.7:36296        223.242.224.59:80       SYN_SENT    10418/node /home/Be  on (61.76/6/0)
tcp        1      0 127.0.0.1:39800         127.0.0.1:52107         CLOSE_WAIT  1490/python3         off (0.00/0/0)
tcp        0      0 172.17.0.7:43576        85.196.183.162:80       ESTABLISHED 10418/node /home/Be  off (0.00/0/0)
tcp        0      0 172.17.0.7:47874        180.254.227.43:80       ESTABLISHED 10418/node /home/Be  off (0.00/0/0)



[[email protected] ~]$ sudo netstat -tnlpoa|grep 80|wc -l
311

# ----- mongo:

> db.tbProxy.find({
    p:null})
{
     "_id" : ObjectId("5e46b85a7e384044fb93c7a2"), "u" : "183.166.103.182", "p" : null, "ft" : ISODate("2020-02-14T15:10:18.526Z"), "fail" : 0 }
{
     "_id" : ObjectId("5e46b85a7e384044fb93c7a3"), "u" : "82.137.244.59", "p" : null, "ft" : ISODate("2020-02-14T15:10:18.526Z"), "fail" : 0 }
{
     "_id" : ObjectId("5e46b85a7e384044fb93c7a4"), "u" : "110.78.154.71", "p" : null, "ft" : ISODate("2020-02-14T15:10:18.527Z"), "fail" : 0 }

> db.tbProxy.find({
    p:null}).count()
3426

I have seen that the port is 27017 (MongoDB) as well as 9999 ( Common ports of a high-level proxy ) .
But found , It turns out that the main cause of the problem is 80 port ( It's more than half ).
What!?

Ii. . Search the code

I keep agent The following are mainly used when Code

            request.get(strWebURL).set(that.objHeader).timeout({
     response: that.intTimeout, deadline: that.intTimeout * 2 }).use(superagentCheerio).then((res) => {
    
				....
                let arrProxyList = strTableContent.match(/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}[\n\s]*\d{1,4}/g);
                for (const strOneProxy of arrProxyList) {
    
					....
                }
                // console.log(' complete  ' + strWebURL + '  To capture , Catch sth  ' + arrProxyList.length)
				....
            }).catch((err) => {
    
                // If you can't, you can't , It's wrong not to report .
				....
            });
        };

The problem is coming. , I see a lot Proxy records There is no port at all (MongoDB in p by null term ).
Then I went through the websites one by one , Find some websites Free agent That's true :
 There are some
 terrible

The code :

t.match(/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}[\n\s]*\d{1,4}/g)

There is no match for the second , To match the second, you need to add... In brackets :, as follows :

t.match(/\d{1,3}\.\d{1,3}\.\d{1,3}\.\d{1,3}[\n\s:]*\d{1,4}/g)

That's why I didn't work carefully , alas .

3 .SuperAgent Use Proxy

At that time, I was out of my mind , Write the following code :

const superagentCheerio = require('superagent-cheerio');
const request = require('superagent');

...
    verifyTheProxy(arrProxy, ctlIO, funCB) {
    
        let that = this;
        let funCheck = function(item, funCB) {
    
            let strProxy = 'http://' + item;
            request.get('https://www.baidu.com').timeout({
     response: that.intTimeout, deadline: that.intTimeout * 3 }).use(superagentCheerio).proxy(strProxy).set(that.objHeader).then((res) => {
    
				....
				//  Update agent , It is confirmed that 
            }).catch((err) => {
    
				....
            });
        };
        async.eachLimit(arrProxy, 6, funCheck, (err) => {
    
            ....
        });
    };

Is there a problem ?
I didn't think there was any problem , Until search SuperAgent Its official website .
Not at all proxy This !
Then search for SuperAgent and Proxy, I found this :
superagent-proxy


Do I have any questions ?
I'll use it directly first NodeJS Have a try

G:\Programme\Working\NodeJS\Working\Proxy_Pool>node
> const superagentCheerio = require('superagent-cheerio');
undefined

> (node:872) ExperimentalWarning: The http2 module is an experimental API.
const request = require('superagent');
undefined

> let intTimeout=10000;
undefined

> let strProxy='http://122.xx.xx.xx:xxxx';
undefined

> request.get('https://www.baidu.com').timeout({
     response: intTimeout, deadline: intTimeout * 3 }).use(superagentCheerio).proxy(strProxy).then((res) => {
    console.log(res)})
TypeError: request.get(...).timeout(...).use(...).proxy is not a function

Why? ?
Why? not a function !?
That's the fact that I was NodeJS Whether the authentication agent on can connect is always wrong !?
That is, no matter how you connect, you actually go directly to catch Inside !?
use VSCode Give it a try .
That's true …


I … that …
I immediately added this :
require('superagent-proxy')(request);

const superagentCheerio = require('superagent-cheerio');
const request = require('superagent');
//  With this , That's all !!!!!!
require('superagent-proxy')(request);
...
    verifyTheProxy(arrProxy, ctlIO, funCB) {
    
        let that = this;
        let funCheck = function(item, funCB) {
    
            let strProxy = 'http://' + item;
            request.get('https://www.baidu.com').timeout({
     response: that.intTimeout, deadline: that.intTimeout * 3 }).use(superagentCheerio).proxy(strProxy).set(that.objHeader).then((res) => {
    
				....
				//  Update agent , It is confirmed that 
            }).catch((err) => {
    
				....
            });
        };
        async.eachLimit(arrProxy, 6, funCheck, (err) => {
    
            ....
        });
    };

boss . Life needs effort

But you will find that no matter how hard you try , Lack of talent is not enough , Let's forget it .

原网站

版权声明
本文为[BerryBC]所创,转载请带上原文链接,感谢
https://yzsam.com/2022/160/202206090959013240.html