当前位置:网站首页>How to identify fake reptiles?
How to identify fake reptiles?
2022-07-31 21:11:00 【oHuangBing】
When we examine website logs, we often encounter various crawlers.Some are normal crawlers, for example: search engine crawlers (Baidu search engine crawler, Google Search Engine Crawler, Bing Search Engine Crawler, YandexBot, etc.), and some crawlers with various functions, which can be viewed here: list crawlers.
However, not all crawlers on the Internet are beneficial, and some crawlers try to hide themselves, so they will learn some characteristics of real crawlers.There are also fake crawlers, that is, crawlers that fake those search engines, and will crawl the data of your website. Although the User-agent looks the same as the search engine, the IP does not belong to the search engine. At this timeWe need to accurately identify the IP addresses of these fake crawlers.
Through Crawler IP Query Tool, we can easily identify fake crawlers, for example:
34.68.229.128 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
This is my simplified log record. The front is the IP address, and the back is the User-agent that accesses the crawler. Through the User-agent, we can see that he is a spider of the Google search engine.
By querying, we can see that this is a fake Google spider, the screenshot is as follows:

We only need to enter the IP address of the fake crawler, and we can see some information about the Crawler.In this way, whether it is true or false Li Kui (true and false reptiles) can not escape our eyes.
And if we want to see more fake bots, we can go here: listcrawlers fake bot, sort out the common fake bots on the Internet.
Summary
By introducing what is fake crawler, and how to query this tool by crawler IP, to accurately identify fake reptiles.
边栏推荐
猜你喜欢

Short-circuit characteristics and protection of SiC MOSFETs

请问我的这段sql中sql语法哪里出了错

嵌入式开发没有激情了,正常吗?

Chapter VII

Three.js入门

Apache EventMesh distributed event-driven multi-runtime

Bika LIMS open source LIMS set - use of SENAITE (detection process)

架构实战营模块八作业

Basics of ResNet: Principles of Residual Blocks

顺序表的实现
随机推荐
sqlite3 simple operation
Bika LIMS open source LIMS set - use of SENAITE (detection process)
Linux环境redis集群搭建「建议收藏」
renderjs usage in uni-app
NVIDIA has begun testing graphics products with AD106 and AD107 GPU cores
Short-circuit characteristics and protection of SiC MOSFETs
Pytorch lstm time series prediction problem stepping on the pit "recommended collection"
leetcode 665. Non-decreasing Array 非递减数列(中等)
leetcode:6135. 图中的最长环【内向基环树 + 最长环板子 + 时间戳】
Transfer Learning - Domain Adaptation
嵌入式开发没有激情了,正常吗?
1161. 最大层内元素和 : 层序遍历运用题
Go1.18 upgrade function - Fuzz test from scratch in Go language
Chapter VII
One thing to say, is outsourcing company worth it?
Given an ip address, how does the subnet mask calculate the network number (how to get the ip address and subnet mask)
MySQL---Create and manage databases and data tables
全网一触即发,自媒体人的内容分发全能助手——融媒宝
multithreaded lock
Efficient Concurrency: A Detailed Explanation of Synchornized's Lock Optimization