当前位置:网站首页>PHP实现敏感词过滤系统「建议收藏」
PHP实现敏感词过滤系统「建议收藏」
2022-07-01 16:48:00 【全栈程序员站长】
大家好,又见面了,我是你们的朋友全栈君。
码说明
1、敏感词库维护更新脚本:
reload_dict.php,提供自动更新字典库到trie-tree文件的过程
PHP
<?php
// 设置内存
ini_set('memory_limit', '128M');
// 读取敏感词字典库
$handle = fopen('dict.txt', 'r');
// 生成空的trie-tree-filter
$resTrie = trie_filter_new();
while(! feof($handle)) {
$item = trim(fgets($handle));
if (empty($item)) {
continue;
}
// 把敏感词逐个加入trie-tree
trie_filter_store($resTrie, $item);
}
// 生成trie-tree文件
$blackword_tree = 'blackword.tree';
trie_filter_save($resTrie, $blackword_tree);
2、trie树对象获取工具类
FilterHelper.php,提供获取trie-tree对象,避免重复生成trie-tree对象和保证tree文件与敏感词库的同步更新
PHP
<?php
/**
* 过滤器助手
*
* getResTrie 提供trie-tree对象;
* getFilterWords 提取过滤出的字符串
*
* @author W.Y.P ([email protected])
*/
class FilterHelper
{
// trie-tree对象
private static $_resTrie = null;
// 字典树的更新时间
private static $_mtime = null;
/**
* 防止初始化
*/
private function __construct() {}
/**
* 防止克隆对象
*/
private function __clone() {}
/**
* 提供trie-tree对象
*
* @param $tree_file 字典树文件路径
* @param $new_mtime 当前调用时字典树的更新时间
* @return null
*/
static public function getResTrie($tree_file, $new_mtime) {
if (is_null(self::$_mtime)) {
self::$_mtime = $new_mtime;
}
if (($new_mtime != self::$_mtime) || is_null(self::$_resTrie)) {
self::$_resTrie = trie_filter_load($tree_file);
self::$_mtime = $new_mtime;
// 输出字典文件重载时间
echo date('Y-m-d H:i:s') . "\tdictionary reload success!\n";
}
return self::$_resTrie;
}
/**
* 从原字符串中提取过滤出的敏感词
*
* @param $str 原字符串
* @param $res 1-3 表示 从位置1开始,3个字符长度
* @return array
*/
static public function getFilterWords($str, $res)
{
$result = array();
foreach ($res as $k => $v) {
$word = substr($str, $v[0], $v[1]);
if (!in_array($word, $result)) {
$result[] = $word;
}
}
return $result;
}
}
3、对外提供过滤HTTP访问接口
filter.php,使用swool,对外提交过滤接口访问
PHP
<?php
// 设置脚本最大运行内存,根据字典大小调整
ini_set('memory_limit', '512M');
// 设置时区
date_default_timezone_set('Asia/Shanghai');
// 加载助手文件
require_once('FilterHelper.php');
// http服务绑定的ip及端口
$serv = new swoole_http_server("182.92.177.16", 9502);
/**
* 处理请求
*/
$serv->on('Request', function($request, $response) {
// 接收get请求参数
$content = isset($request->get['content']) ? $request->get['content']: '';
$result = '';
if (!empty($content)) {
// 字典树文件路径,默认当时目录下
$tree_file = 'blackword.tree';
// 清除文件状态缓存
clearstatcache();
// 获取请求时,字典树文件的修改时间
$new_mtime = filemtime($tree_file);
// 获取最新trie-tree对象
$resTrie = FilterHelper::getResTrie($tree_file, $new_mtime);
// 执行过滤
$arrRet = trie_filter_search_all($resTrie, $content);
// 提取过滤出的敏感词
$a_data = FilterHelper::getFilterWords($content, $arrRet);
$result = json_encode($a_data);
}
// 定义http服务信息及响应处理结果
$response->cookie("User", "W.Y.P");
$response->header("X-Server", "W.Y.P WebServer(Unix) (Red-Hat/Linux)");
$response->header('Content-Type', 'Content-Type: text/html; charset=utf-8');
$response->end($result);
});
$serv->start();
发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/130918.html原文链接:https://javaforall.cn
边栏推荐
- 整形数组合并【JS】
- How to solve the keyboard key failure of notebook computer
- SystemVerilog structure (II)
- SQL question brushing 584 Looking for user references
- How to cancel automatic search and install device drivers for laptops
- SQL question brushing 586 Customers with the most orders
- There is a new breakthrough in quantum field: the duration of quantum state can exceed 5 seconds
- 中国冰淇淋市场深度评估及发展趋势预测报告(2022版)
- Detailed explanation of activity life cycle and startup mode
- How to restore the system of Sony laptop
猜你喜欢
Are you still using charged document management tools? I have a better choice! Completely free
Activity的生命周期和启动模式详解
Cookies and session keeping technology
SQL question brushing 584 Looking for user references
SQL注入漏洞(Mysql与MSSQL特性)
如何使用 etcd 实现分布式 /etc 目录
[live broadcast appointment] database obcp certification comprehensive upgrade open class
机器学习11-聚类,孤立点判别
PR basic clip operation / video export operation
Iommu/smmuv3 code analysis (10) page table operation
随机推荐
Judge whether the binary tree is a binary search tree
Redis Distributed Lock
China carbon disulfide industry research and investment strategy report (2022 Edition)
可迭代对象与迭代器、生成器的区别与联系
中国乙腈市场预测与战略咨询研究报告(2022版)
China nylon 11 industry research and future forecast report (2022 Edition)
Is the securities account given by the head teacher of goucai school safe? Can I open an account?
How to use F1 to F12 correctly on laptop keyboard
软件工程导论——第六章——详细设计
SystemVerilog structure (II)
拼接字符串,得到字典序最小的结果
【PyG】文档总结以及项目经验(持续更新
中国冰淇淋市场深度评估及发展趋势预测报告(2022版)
【C語言補充】判斷明天是哪一天(明天的日期)
[Supplément linguistique c] déterminer quel jour est demain (date de demain)
多线程并发之CountDownLatch阻塞等待
[live broadcast appointment] database obcp certification comprehensive upgrade open class
[C language foundation] 12 strings
Judge whether a binary tree is a balanced binary tree
Leetcode 77 combination -- backtracking method