当前位置:网站首页>PHP实现敏感词过滤系统「建议收藏」
PHP实现敏感词过滤系统「建议收藏」
2022-07-01 16:48:00 【全栈程序员站长】
大家好,又见面了,我是你们的朋友全栈君。
码说明
1、敏感词库维护更新脚本:
reload_dict.php,提供自动更新字典库到trie-tree文件的过程
PHP
<?php
// 设置内存
ini_set('memory_limit', '128M');
// 读取敏感词字典库
$handle = fopen('dict.txt', 'r');
// 生成空的trie-tree-filter
$resTrie = trie_filter_new();
while(! feof($handle)) {
$item = trim(fgets($handle));
if (empty($item)) {
continue;
}
// 把敏感词逐个加入trie-tree
trie_filter_store($resTrie, $item);
}
// 生成trie-tree文件
$blackword_tree = 'blackword.tree';
trie_filter_save($resTrie, $blackword_tree);2、trie树对象获取工具类
FilterHelper.php,提供获取trie-tree对象,避免重复生成trie-tree对象和保证tree文件与敏感词库的同步更新
PHP
<?php
/**
* 过滤器助手
*
* getResTrie 提供trie-tree对象;
* getFilterWords 提取过滤出的字符串
*
* @author W.Y.P ([email protected])
*/
class FilterHelper
{
// trie-tree对象
private static $_resTrie = null;
// 字典树的更新时间
private static $_mtime = null;
/**
* 防止初始化
*/
private function __construct() {}
/**
* 防止克隆对象
*/
private function __clone() {}
/**
* 提供trie-tree对象
*
* @param $tree_file 字典树文件路径
* @param $new_mtime 当前调用时字典树的更新时间
* @return null
*/
static public function getResTrie($tree_file, $new_mtime) {
if (is_null(self::$_mtime)) {
self::$_mtime = $new_mtime;
}
if (($new_mtime != self::$_mtime) || is_null(self::$_resTrie)) {
self::$_resTrie = trie_filter_load($tree_file);
self::$_mtime = $new_mtime;
// 输出字典文件重载时间
echo date('Y-m-d H:i:s') . "\tdictionary reload success!\n";
}
return self::$_resTrie;
}
/**
* 从原字符串中提取过滤出的敏感词
*
* @param $str 原字符串
* @param $res 1-3 表示 从位置1开始,3个字符长度
* @return array
*/
static public function getFilterWords($str, $res)
{
$result = array();
foreach ($res as $k => $v) {
$word = substr($str, $v[0], $v[1]);
if (!in_array($word, $result)) {
$result[] = $word;
}
}
return $result;
}
}3、对外提供过滤HTTP访问接口
filter.php,使用swool,对外提交过滤接口访问
PHP
<?php
// 设置脚本最大运行内存,根据字典大小调整
ini_set('memory_limit', '512M');
// 设置时区
date_default_timezone_set('Asia/Shanghai');
// 加载助手文件
require_once('FilterHelper.php');
// http服务绑定的ip及端口
$serv = new swoole_http_server("182.92.177.16", 9502);
/**
* 处理请求
*/
$serv->on('Request', function($request, $response) {
// 接收get请求参数
$content = isset($request->get['content']) ? $request->get['content']: '';
$result = '';
if (!empty($content)) {
// 字典树文件路径,默认当时目录下
$tree_file = 'blackword.tree';
// 清除文件状态缓存
clearstatcache();
// 获取请求时,字典树文件的修改时间
$new_mtime = filemtime($tree_file);
// 获取最新trie-tree对象
$resTrie = FilterHelper::getResTrie($tree_file, $new_mtime);
// 执行过滤
$arrRet = trie_filter_search_all($resTrie, $content);
// 提取过滤出的敏感词
$a_data = FilterHelper::getFilterWords($content, $arrRet);
$result = json_encode($a_data);
}
// 定义http服务信息及响应处理结果
$response->cookie("User", "W.Y.P");
$response->header("X-Server", "W.Y.P WebServer(Unix) (Red-Hat/Linux)");
$response->header('Content-Type', 'Content-Type: text/html; charset=utf-8');
$response->end($result);
});
$serv->start();发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/130918.html原文链接:https://javaforall.cn
边栏推荐
- LeetCode中等题之TinyURL 的加密与解密
- 中国锦纶长丝缝纫线发展预测与投资方向研究报告(2022版)
- SQL question brushing 584 Looking for user references
- 软件工程导论——第六章——详细设计
- 《中国智慧环保产业发展监测与投资前景研究报告(2022版)》
- 中国冰淇淋市场深度评估及发展趋势预测报告(2022版)
- 多线程并发之CountDownLatch阻塞等待
- How to solve the keyboard key failure of notebook computer
- China benzene hydrogenation Market Research and investment forecast report (2022 Edition)
- Gold, silver and four want to change jobs, so we should seize the time to make up
猜你喜欢

SQL question brushing 627 Change gender

Free lottery | explore the future series of blind box digital copyright works of "abadou" will be launched on the whole network!

【C补充】【字符串】按日期排序显示一个月的日程

字节跳动数据平台技术揭秘:基于 ClickHouse 的复杂查询实现与优化

Mysql database - Advanced SQL statement (2)

Girls who want to do software testing look here

Redis Distributed Lock

Soft test network engineer full truth simulation question (including answer and analysis)

How wild are hackers' ways of making money? CTF reverse entry Guide

Internet News: "20220222" get together to get licenses; Many products of Jimi have been affirmed by consumers; Starbucks was fined for using expired ingredients in two stores
随机推荐
Machine learning 11 clustering, outlier discrimination
SystemVerilog structure (II)
Redis distributed lock
PETRv2:一个多摄像头图像3D感知的统一框架
How to use etcd to realize distributed /etc directory
Yyds dry inventory MySQL RC transaction isolation level implementation
Pytest learning notes (13) -allure of allure Description () and @allure title()
英特尔开源深度学习工具库 OpenVINO,将加大与本土软硬件方合作,持续开放
String类
China nylon 11 industry research and future forecast report (2022 Edition)
China sorbitol Market Forecast and investment strategy report (2022 Edition)
Judge whether a binary tree is a balanced binary tree
Leetcode records - sort -215, 347, 451, 75
剑指 Offer 20. 表示数值的字符串
Concatenate strings to get the result with the smallest dictionary order
Please, stop painting star! This has nothing to do with patriotism!
[wrung Ba wrung Ba is 20] [essay] why should I learn this in college?
Transition technology from IPv4 to IPv6
想做软件测试的女孩子看这里
[Supplément linguistique c] déterminer quel jour est demain (date de demain)