当前位置:网站首页>How to identify fake reptiles?
How to identify fake reptiles?
2022-07-31 21:11:00 【oHuangBing】
When we examine website logs, we often encounter various crawlers.Some are normal crawlers, for example: search engine crawlers (Baidu search engine crawler, Google Search Engine Crawler, Bing Search Engine Crawler, YandexBot, etc.), and some crawlers with various functions, which can be viewed here: list crawlers.
However, not all crawlers on the Internet are beneficial, and some crawlers try to hide themselves, so they will learn some characteristics of real crawlers.There are also fake crawlers, that is, crawlers that fake those search engines, and will crawl the data of your website. Although the User-agent looks the same as the search engine, the IP does not belong to the search engine. At this timeWe need to accurately identify the IP addresses of these fake crawlers.
Through Crawler IP Query Tool, we can easily identify fake crawlers, for example:
34.68.229.128 Mozilla/5.0 (Linux; Android 6.0.1; Nexus 5X Build/MMB29P) AppleWebKit/537.36 (KHTML, like Gecko) Chrome/85.0.4183.121 Mobile Safari/537.36 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)
This is my simplified log record. The front is the IP address, and the back is the User-agent that accesses the crawler. Through the User-agent, we can see that he is a spider of the Google search engine.
By querying, we can see that this is a fake Google spider, the screenshot is as follows:

We only need to enter the IP address of the fake crawler, and we can see some information about the Crawler.In this way, whether it is true or false Li Kui (true and false reptiles) can not escape our eyes.
And if we want to see more fake bots, we can go here: listcrawlers fake bot, sort out the common fake bots on the Internet.
Summary
By introducing what is fake crawler, and how to query this tool by crawler IP, to accurately identify fake reptiles.
边栏推荐
- Douyin fetches video list based on keywords API
- leetcode 665. Non-decreasing Array 非递减数列(中等)
- 利用反射实现一个管理对象信息的简单框架
- Short-circuit characteristics and protection of SiC MOSFETs
- OSPFv3的基本配置
- rj45 to the connector Gigabit (Fast Ethernet interface definition)
- Qualcomm cDSP simple programming example (to query Qualcomm cDSP usage, signature), RK3588 npu usage query
- What is Thymeleaf?How to use.
- One thing to say, is outsourcing company worth it?
- sqlite3简单操作
猜你喜欢

Apache EventMesh 分布式事件驱动多运行时

关注!海泰方圆加入《个人信息保护自律公约》

Made with Flutter and Firebase!counter application

ResNet的基础:残差块的原理

Socket回顾与I/0模型

idea中搜索具体的字符内容的快捷方式

The old music player WinAmp released version 5.9 RC1: migrated to VS 2019, completely rebuilt, compatible with Win11

高效并发:Synchornized的锁优化详解

ReentrantLock原理(未完待续)

Chapter VII
随机推荐
ReentrantLock原理(未完待续)
请问我的这段sql中sql语法哪里出了错
【AcWing】The 62nd Weekly Match 【2022.07.30】
【公开课预告】:超分辨率技术在视频画质增强领域的研究与应用
[Open class preview]: Research and application of super-resolution technology in the field of video image quality enhancement
【AcWing】The 62nd Weekly Match 【2022.07.30】
STM32 full series development firmware installation guide under Arduino framework
【PIMF】OpenHarmony 啃论文俱乐部—盘点开源鸿蒙三方库【3】
Implementation of a sequence table
第六章
Flink_CDC construction and simple use
1161. 最大层内元素和 : 层序遍历运用题
Given an ip address, how does the subnet mask calculate the network number (how to get the ip address and subnet mask)
第七章
性能优化:记一次树的搜索接口优化思路
【AcWing】第 62 场周赛 【2022.07.30】
BM5 合并k个已排序的链表
ECCV 2022 Huake & ETH propose OSFormer, the first one-stage Transformer framework for camouflaging instance segmentation!The code is open source!...
Apache EventMesh distributed event-driven multi-runtime
Memblaze发布首款基于长存颗粒的企业级SSD,背后有何新价值?