当前位置:网站首页>PHP实现敏感词过滤系统「建议收藏」
PHP实现敏感词过滤系统「建议收藏」
2022-07-01 16:48:00 【全栈程序员站长】
大家好,又见面了,我是你们的朋友全栈君。
码说明
1、敏感词库维护更新脚本:
reload_dict.php,提供自动更新字典库到trie-tree文件的过程
PHP
<?php
// 设置内存
ini_set('memory_limit', '128M');
// 读取敏感词字典库
$handle = fopen('dict.txt', 'r');
// 生成空的trie-tree-filter
$resTrie = trie_filter_new();
while(! feof($handle)) {
$item = trim(fgets($handle));
if (empty($item)) {
continue;
}
// 把敏感词逐个加入trie-tree
trie_filter_store($resTrie, $item);
}
// 生成trie-tree文件
$blackword_tree = 'blackword.tree';
trie_filter_save($resTrie, $blackword_tree);2、trie树对象获取工具类
FilterHelper.php,提供获取trie-tree对象,避免重复生成trie-tree对象和保证tree文件与敏感词库的同步更新
PHP
<?php
/**
* 过滤器助手
*
* getResTrie 提供trie-tree对象;
* getFilterWords 提取过滤出的字符串
*
* @author W.Y.P ([email protected])
*/
class FilterHelper
{
// trie-tree对象
private static $_resTrie = null;
// 字典树的更新时间
private static $_mtime = null;
/**
* 防止初始化
*/
private function __construct() {}
/**
* 防止克隆对象
*/
private function __clone() {}
/**
* 提供trie-tree对象
*
* @param $tree_file 字典树文件路径
* @param $new_mtime 当前调用时字典树的更新时间
* @return null
*/
static public function getResTrie($tree_file, $new_mtime) {
if (is_null(self::$_mtime)) {
self::$_mtime = $new_mtime;
}
if (($new_mtime != self::$_mtime) || is_null(self::$_resTrie)) {
self::$_resTrie = trie_filter_load($tree_file);
self::$_mtime = $new_mtime;
// 输出字典文件重载时间
echo date('Y-m-d H:i:s') . "\tdictionary reload success!\n";
}
return self::$_resTrie;
}
/**
* 从原字符串中提取过滤出的敏感词
*
* @param $str 原字符串
* @param $res 1-3 表示 从位置1开始,3个字符长度
* @return array
*/
static public function getFilterWords($str, $res)
{
$result = array();
foreach ($res as $k => $v) {
$word = substr($str, $v[0], $v[1]);
if (!in_array($word, $result)) {
$result[] = $word;
}
}
return $result;
}
}3、对外提供过滤HTTP访问接口
filter.php,使用swool,对外提交过滤接口访问
PHP
<?php
// 设置脚本最大运行内存,根据字典大小调整
ini_set('memory_limit', '512M');
// 设置时区
date_default_timezone_set('Asia/Shanghai');
// 加载助手文件
require_once('FilterHelper.php');
// http服务绑定的ip及端口
$serv = new swoole_http_server("182.92.177.16", 9502);
/**
* 处理请求
*/
$serv->on('Request', function($request, $response) {
// 接收get请求参数
$content = isset($request->get['content']) ? $request->get['content']: '';
$result = '';
if (!empty($content)) {
// 字典树文件路径,默认当时目录下
$tree_file = 'blackword.tree';
// 清除文件状态缓存
clearstatcache();
// 获取请求时,字典树文件的修改时间
$new_mtime = filemtime($tree_file);
// 获取最新trie-tree对象
$resTrie = FilterHelper::getResTrie($tree_file, $new_mtime);
// 执行过滤
$arrRet = trie_filter_search_all($resTrie, $content);
// 提取过滤出的敏感词
$a_data = FilterHelper::getFilterWords($content, $arrRet);
$result = json_encode($a_data);
}
// 定义http服务信息及响应处理结果
$response->cookie("User", "W.Y.P");
$response->header("X-Server", "W.Y.P WebServer(Unix) (Red-Hat/Linux)");
$response->header('Content-Type', 'Content-Type: text/html; charset=utf-8');
$response->end($result);
});
$serv->start();发布者:全栈程序员栈长,转载请注明出处:https://javaforall.cn/130918.html原文链接:https://javaforall.cn
边栏推荐
- GameFramework食用指南
- [flask introduction series] cookies and session
- 字节跳动数据平台技术揭秘:基于 ClickHouse 的复杂查询实现与优化
- 【C语言补充】判断明天是哪一天(明天的日期)
- Chinese diosgenin market forecast and investment strategy report (2022 Edition)
- How to solve the problem that the battery icon of notebook computer does not display
- GaussDB(for MySQL) :Partial Result Cache,通过缓存中间结果对算子进行加速
- ACL 2022 | 分解的元学习小样本命名实体识别
- 存在安全隐患 起亚召回部分K3新能源
- 判断一棵二叉树是否为平衡二叉树
猜你喜欢

【Try to Hack】vulnhub DC4

【flask入门系列】Cookie与Session

Pytest learning notes (13) -allure of allure Description () and @allure title()

Leetcode records - sort -215, 347, 451, 75

Alibaba cloud, Zhuoyi technology beach grabbing dialogue AI

sql刷题627. 变更性别

多线程使用不当导致的 OOM

Iommu/smmuv3 code analysis (10) page table operation

ACL 2022 | 分解的元学习小样本命名实体识别

What is the effect of choosing game shield safely in the game industry?
随机推荐
拼接字符串,得到字典序最小的结果
Object. fromEntries()
英特尔开源深度学习工具库 OpenVINO,将加大与本土软硬件方合作,持续开放
走进微信小程序
How to maintain the laptop battery
The difference between the lazy mode of singleton mode and the evil mode
[live broadcast appointment] database obcp certification comprehensive upgrade open class
Detailed explanation of activity life cycle and startup mode
How to repair the laptop that cannot connect to the wireless network
[kotlin] Introduction to higher-order functions
Judge whether a binary tree is a balanced binary tree
vulnhub靶场-Hacker_Kid-v1.0.1
Transition technology from IPv4 to IPv6
机器学习11-聚类,孤立点判别
Hidden Markov model (HMM): model parameter estimation
Is the securities account given by the head teacher of goucai school safe? Can I open an account?
Template Engine Velocity Foundation
What is the effect of choosing game shield safely in the game industry?
Are you still using charged document management tools? I have a better choice! Completely free
sql刷题1050. 合作过至少三次的演员和导演