Converts XML to Python objects

Overview

untangle

Build Status PyPi version Code style: black

Documentation

  • Converts XML to a Python object.
  • Siblings with similar names are grouped into a list.
  • Children can be accessed with parent.child, attributes with element['attribute'].
  • You can call the parse() method with a filename, an URL or an XML string.
  • Substitutes -, . and : with _ <foobar><foo-bar/></foobar> can be accessed with foobar.foo_bar, <foo.bar.baz/> can be accessed with foo_bar_baz and <foo:bar><foo:baz/></foo:bar> can be accessed with foo_bar.foo_baz
  • Works with Python 2.7 and 3.4, 3.5, 3.6, 3.7, 3.8 and pypy

Installation

With pip:

pip install untangle

With conda:

conda install -c conda-forge untangle

Conda feedstock maintained by @htenkanen. Issues and questions about conda-forge packaging / installation can be done here.

Usage

(See and run examples.py or this blog post: Read XML painlessly for more info)

import untangle
obj = untangle.parse(resource)

resource can be:

  • a URL
  • a filename
  • an XML string

Running the above code and passing this XML:

<?xml version="1.0"?>
<root>
	<child name="child1"/>
</root>

allows it to be navigated from the untangled object like this:

obj.root.child['name'] # u'child1'

Changelog

see CHANGELOG.md

Comments
  • Add parse_raw() to handle large XML strings

    Add parse_raw() to handle large XML strings

    In Windows environments the path length limitation causes a path too long exception to be raised from parse(). To combat this, I've written a parse_raw() function that accepts an XML string and will not attempt to estimate what type of data it was given. This seemed like a better choice over adding a type parameter to parse() because of the several options it could be given: filepath, url, xml string, stream, or some value to indicate that it needs to be figured out (like None).

    This has been confirmed to work on Windows 7 with Python 3.4.3 using a document that exceeds 300+ characters.

    opened by turt2live 14
  • accessing a tag's name

    accessing a tag's name

    I wanted to get the name of a tag and I could do it by writing tag._name. The underscrose suggests that it's a private variable but getting a tag's name is a common task, thus there should be a public way to do it. tag.name didn't work while it seemed to be the intuitive solution.

    opened by jabbalaci 7
  • can't dir() the parsed object

    can't dir() the parsed object

    I'm not sure if this is a SAX problem. If it is, just close it. But here's the problem:

    >>> filename = 'posts-july-2012.xml'
    >>> from untangle import parse
    >>> o = parse(filename)
    >>> dir(o)
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "/usr/local/lib/python2.7/dist-packages/untangle.py", line 66, in __getattr__
        raise IndexError('Unknown key <%s>' % key)
    IndexError: Unknown key <__dir__>
    
    opened by iffy 7
  • Make untangle installable from conda-forge?

    Make untangle installable from conda-forge?

    Hi @stchris!

    Thanks for creating this very useful package! I use it myself in here: https://github.com/HTenkanen/transx2gtfs

    I was just creating a conda-forge recipe for my library, and realized that untangle is not currently available from conda.

    Would you be okay with the idea of adding untangle to conda-forge? The thing with conda is, that they are very restrictive that all dependencies of libraries need to come from conda-forge (in this way the reliability of the whole system is ensured). Hence, any libraries that depends on untangle cannot be published in conda-forge easily (such as mine).

    If you are open to this idea, I am happy to help with writing a conda-forge recipe for untangle. The process does not require any modifications to the current repo, but the things are done by forking the "conda-forge/staged-recipes" repo.

    More information about the whole conda process can be found from here: https://conda-forge.org/

    opened by HTenkanen 5
  • Better error message when file is not found

    Better error message when file is not found

    If parsed file does not exist, the untangle message is rather cryptic:

    Python 2.7.12 (v2.7.12:d33e0cf91556, Jun 27 2016, 15:19:22) [MSC v.1500 32 bit (Intel)] on win32
    Type "help", "copyright", "credits" or "license" for more information.
    >>> import untangle
    >>> untangle.parse('sfsdf')
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
      File "C:\Python27\lib\site-packages\untangle.py", line 143, in parse
        parser.parse(StringIO(filename))
      File "C:\Python27\lib\xml\sax\expatreader.py", line 110, in parse
        xmlreader.IncrementalParser.parse(self, source)
      File "C:\Python27\lib\xml\sax\xmlreader.py", line 125, in parse
        self.close()
      File "C:\Python27\lib\xml\sax\expatreader.py", line 225, in close
        self.feed("", isFinal = 1)
      File "C:\Python27\lib\xml\sax\expatreader.py", line 217, in feed
        self._err_handler.fatalError(exc)
      File "C:\Python27\lib\xml\sax\handler.py", line 38, in fatalError
        raise exception
    xml.sax._exceptions.SAXParseException: <unknown>:1:0: syntax error
    
    bug 
    opened by techtonik 5
  • Optional xml.sax features

    Optional xml.sax features

    I'm using untangle to parse XML files and it's great. However, I'm operating offline, and by default xml.sax tries to load external entities such as DTDs. Loading external entities is a controllable parser "feature."

    This PR adds the ability to pass xml.sax parser features as extra arguments to parse(), so for example

    untangle.parse(my_xml, feature_external_ges=False)
    

    becomes

    parser.setFeature(xml.sax.handler.feature_external_ges, False)
    

    parse() raises AttributeError if a nonexistent feature is requested.

    opened by ransford 5
  • How do I access deeply nested children?

    How do I access deeply nested children?

    (This issue was initially in a comment for another issue, and I thought it might be best to start a new issue instead of hijacking an old one. Apologies for the confusion.)

    I have an XML file with deeply nested elements, and they are all under higher level elements with the same name. An example:

    <MeasurementRecords attrib="something">
        <HistoryRecords>
            <ValueItemId>100_0000100004_3788_Resource-0.customId_WSx Data Precip Type</ValueItemId>
            <List>
                <HistoryRecord>
                    <Value>60</Value>
                    <State>Valid</State>
                    <TimeStamp>2016-04-20T12:40:00Z</TimeStamp>
                </HistoryRecord>
            </List>
        </HistoryRecords>
        <HistoryRecords>
            <ValueItemId>100_0000100004_3788_Resource-0.customId_Specific Enthalpy (INS)</ValueItemId>
            <List>
                <HistoryRecord>
                    <Value>33</Value>
                    <State>Valid</State>
                    <TimeStamp>2016-04-20T12:40:00Z</TimeStamp>
                </HistoryRecord>
            </List>
        </HistoryRecords>
    

    How do I access the <value> of the Specific Enthalpy element? From other examples I assume I should loop though all the HistoryRecords elements. But when I do that, it appears the children are NOT in the object. My attempt so far:

    for HistoryRecord in RSPobj.MeasurementRecords.HistoryRecords:
        if HistoryRecord.ValueItemId.cdata == "100_0000100004_3788_Resource-0.customId_Specific Enthalpy (INS)":
            pprint(HistoryRecord.ValueItemId)
    

    Gives me:

    $ python parseRSPXMLfiles.py
    Element(name = ValueItemId, attributes = {}, cdata = 100_0000100004_3788_Resource-0.customId_Specific Enthalpy (INS))
    

    Where are all the children?

    I was expecting to be able to do something like this:

    pprint(HistoryRecord.ValueItemId.List.HistoryRecord.Value)
    

    But that gives me this error:

    Traceback (most recent call last):
      File "parseRSPXMLfiles.py", line 17, in <module>
        pprint(HistoryRecord.ValueItemId.List.HistoryRecord.Value)
      File "/usr/lib/python2.7/site-packages/untangle.py", line 66, in __getattr__
        raise IndexError('Unknown key <%s>' % key)
    IndexError: Unknown key <List>
    

    FYI, this:

            pprint(dir(HistoryRec.ValueItemId))
    

    Results in [] being printed.

    opened by jakehawkes 5
  • Using in class

    Using in class

    I may be missing something but following your example of using untangle does not work inside of a class for example:

    import untangle

    Class foo:

    def load(self):
         xmldoc = untangle.parse('test.xml')
    

    I have tried assigning it to a class variable but was unable to. Any help would be appreciated.

    opened by excellentingenuity 5
  • UnicodeEncodeError

    UnicodeEncodeError

    Hi! Cool project. I was looking for something like this. I came across a bug.

    I have some xml with unicode chars:

    <?xml version="1.0" encoding="UTF-8"?>
    <page>
        <menu>
        <name>Привет мир</name>
        <items>
            <item>
                <name>Пункт 1</name>
                <url>http://example1.com</url>
            </item>
            <item>
                <name>Пункт 2</name>
                <url>http://example2.com</url>
            </item>
        </items>
        </menu>
    </page>
    
    
    >>> obj = untangle.parse("1.xml")
    >>> obj.page.menu.name
    Traceback (most recent call last):
      File "<stdin>", line 1, in <module>
    UnicodeEncodeError: 'ascii' codec can't encode characters in position 46-51: ordinal not in range(128)
    
    opened by un1t 5
  • support 'in' and 'not in' operator

    support 'in' and 'not in' operator

    Adding support for the in and not in operator by implementing __contains__() method as the following example:

    >>> 'child' in untangle.parse('<root/>').root
    ... False
    >>> 'child' in untangle.parse('<root><child>value</child></root>').root
    ... True
    

    This could be useful when dealing with dynamic xml API response.

    opened by reverbc 4
  • XML is okay but I get a strange error

    XML is okay but I get a strange error

    Element <None> with attributes None, children [Element(name = products, attributes = {}, cdata = 
    
    )] and cdata 
    None
    

    I have no idea why. But the XML is okay.

    unclear 
    opened by tvdsluijs 3
  • Bump black from 22.6.0 to 22.12.0

    Bump black from 22.6.0 to 22.12.0

    Bumps black from 22.6.0 to 22.12.0.

    Release notes

    Sourced from black's releases.

    22.12.0

    Preview style

    • Enforce empty lines before classes and functions with sticky leading comments (#3302)
    • Reformat empty and whitespace-only files as either an empty file (if no newline is present) or as a single newline character (if a newline is present) (#3348)
    • Implicitly concatenated strings used as function args are now wrapped inside parentheses (#3307)
    • Correctly handle trailing commas that are inside a line's leading non-nested parens (#3370)

    Configuration

    • Fix incorrectly applied .gitignore rules by considering the .gitignore location and the relative path to the target file (#3338)
    • Fix incorrectly ignoring .gitignore presence when more than one source directory is specified (#3336)

    Parser

    • Parsing support has been added for walruses inside generator expression that are passed as function args (for example, any(match := my_re.match(text) for text in texts)) (#3327).

    Integrations

    • Vim plugin: Optionally allow using the system installation of Black via let g:black_use_virtualenv = 0(#3309)

    22.10.0

    Highlights

    • Runtime support for Python 3.6 has been removed. Formatting 3.6 code will still be supported until further notice.

    Stable style

    • Fix a crash when # fmt: on is used on a different block level than # fmt: off (#3281)

    Preview style

    ... (truncated)

    Changelog

    Sourced from black's changelog.

    22.12.0

    Preview style

    • Enforce empty lines before classes and functions with sticky leading comments (#3302)
    • Reformat empty and whitespace-only files as either an empty file (if no newline is present) or as a single newline character (if a newline is present) (#3348)
    • Implicitly concatenated strings used as function args are now wrapped inside parentheses (#3307)
    • Correctly handle trailing commas that are inside a line's leading non-nested parens (#3370)

    Configuration

    • Fix incorrectly applied .gitignore rules by considering the .gitignore location and the relative path to the target file (#3338)
    • Fix incorrectly ignoring .gitignore presence when more than one source directory is specified (#3336)

    Parser

    • Parsing support has been added for walruses inside generator expression that are passed as function args (for example, any(match := my_re.match(text) for text in texts)) (#3327).

    Integrations

    • Vim plugin: Optionally allow using the system installation of Black via let g:black_use_virtualenv = 0(#3309)

    22.10.0

    Highlights

    • Runtime support for Python 3.6 has been removed. Formatting 3.6 code will still be supported until further notice.

    Stable style

    • Fix a crash when # fmt: on is used on a different block level than # fmt: off (#3281)

    ... (truncated)

    Commits
    • 2ddea29 Prepare release 22.12.0 (#3413)
    • 5b1443a release: skip bad macos wheels for now (#3411)
    • 9ace064 Bump peter-evans/find-comment from 2.0.1 to 2.1.0 (#3404)
    • 19c5fe4 Fix CI with latest flake8-bugbear (#3412)
    • d4a8564 Bump sphinx-copybutton from 0.5.0 to 0.5.1 in /docs (#3390)
    • 2793249 Wordsmith current_style.md (#3383)
    • d97b789 Remove whitespaces of whitespace-only files (#3348)
    • c23a5c1 Clarify that Black runs with --safe by default (#3378)
    • 8091b25 Correctly handle trailing commas that are inside a line's leading non-nested ...
    • ffaaf48 Compare each .gitignore found with an appropiate relative path (#3338)
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Update setuptools requirement from ^62.6.0 to ^65.6.3

    Update setuptools requirement from ^62.6.0 to ^65.6.3

    Updates the requirements on setuptools to permit the latest version.

    Changelog

    Sourced from setuptools's changelog.

    v65.6.3

    Misc ^^^^

    • #3709: Fix condition to patch distutils.dist.log to only apply when using distutils from the stdlib.

    v65.6.2

    No significant changes.

    v65.6.1

    Documentation changes ^^^^^^^^^^^^^^^^^^^^^

    • #3689: Documented that distutils.cfg might be ignored unless SETUPTOOLS_USE_DISTUTILS=stdlib.

    Misc ^^^^

    • #3678: Improve clib builds reproducibility by sorting sources -- by :user:danigm
    • #3684: Improved exception/traceback when invalid entry-points are specified.
    • #3690: Fixed logging errors: 'underlying buffer has been detached' (issue #1631).
    • #3693: Merge pypa/[email protected] with compatibility fix for distutils.log.Log.
    • #3695, #3697, #3698, #3699: Changed minor text details (spelling, spaces ...)
    • #3696: Removed unnecessary coding: utf-8 annotations
    • #3704: Fixed temporary build directories interference with auto-discovery.

    v65.6.0

    Changes ^^^^^^^

    v65.5.1

    Misc

    ... (truncated)

    Commits
    • 6f7dd7c Bump version: 65.6.2 → 65.6.3
    • 0f513c1 Merge pull request #3709 from abravalheri/issue-3707
    • a4db65f Remove wrong comment
    • 5801753 Add news fragment
    • 4c267c7 Replace condition to patch distutils.dist.log
    • 7049c73 Add simple regression test for logging patches
    • e515641 Bump version: 65.6.1 → 65.6.2
    • bd60014 Minor adjustments in changelog
    • 00f59ef Bump version: 65.6.0 → 65.6.1
    • b0f42b9 Adequate news fragment file names
    • Additional commits viewable in compare view

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump build from 0.8.0 to 0.9.0

    Bump build from 0.8.0 to 0.9.0

    Bumps build from 0.8.0 to 0.9.0.

    Changelog

    Sourced from build's changelog.

    0.9.0 (2022-10-27)

    • Hide a Python 3.11.0 unavoidable warning with venv (PR [#527](https://github.com/pypa/build/issues/527)_)
    • Fix infinite recursion error in check_dependency with circular dependencies (PR [#512](https://github.com/pypa/build/issues/512), Fixes [#511](https://github.com/pypa/build/issues/511))
    • Only import colorama on Windows (PR [#494](https://github.com/pypa/build/issues/494), Fixes [#493](https://github.com/pypa/build/issues/493))
    • Flush output more often to reduce interleaved output (PR [#494](https://github.com/pypa/build/issues/494)_)
    • Small API cleanup, like better __all__ and srcdir being read only. (PR [#477](https://github.com/pypa/build/issues/477)_)
    • Only use importlib_metadata when needed (PR [#401](https://github.com/pypa/build/issues/401)_)
    • Clarify in printout when build dependencies are being installed (PR [#514](https://github.com/pypa/build/issues/514)_)

    .. _PR #401: pypa/build#401 .. _PR #477: pypa/build#477 .. _PR #494: pypa/build#494 .. _PR #512: pypa/build#512 .. _PR #514: pypa/build#514 .. _PR #527: pypa/build#527 .. _#493: pypa/build#493 .. _#511: pypa/build#511

    Commits
    • 7b002bb release 0.9.0
    • 9c60690 docs: update changelog
    • a3700d3 env: avoid warning on Windows 3.11.0
    • 3b36b6e tests: skip toml vs. tomli test on 3.11+
    • dd5ec7e tests: ignore warning from pytest-dist + pytest-cov
    • 4e7e64c ci: move to final release of 3.11
    • b1acadc main: disable colorama on Linux and flush output (#494)
    • a1de450 pre-commit: bump repositories (#524)
    • aaaf4f8 tests: better isolate test_venv_fail
    • 03f93d5 pre-commit: bump repositories
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Bump pytest from 7.1.2 to 7.2.0

    Bump pytest from 7.1.2 to 7.2.0

    Bumps pytest from 7.1.2 to 7.2.0.

    Release notes

    Sourced from pytest's releases.

    7.2.0

    pytest 7.2.0 (2022-10-23)

    Deprecations

    • #10012: Update pytest.PytestUnhandledCoroutineWarning{.interpreted-text role="class"} to a deprecation; it will raise an error in pytest 8.

    • #10396: pytest no longer depends on the py library. pytest provides a vendored copy of py.error and py.path modules but will use the py library if it is installed. If you need other py.* modules, continue to install the deprecated py library separately, otherwise it can usually be removed as a dependency.

    • #4562: Deprecate configuring hook specs/impls using attributes/marks.

      Instead use :pypytest.hookimpl{.interpreted-text role="func"} and :pypytest.hookspec{.interpreted-text role="func"}. For more details, see the docs <legacy-path-hooks-deprecated>{.interpreted-text role="ref"}.

    • #9886: The functionality for running tests written for nose has been officially deprecated.

      This includes:

      • Plain setup and teardown functions and methods: this might catch users by surprise, as setup() and teardown() are not pytest idioms, but part of the nose support.
      • Setup/teardown using the @​with_setup decorator.

      For more details, consult the deprecation docs <nose-deprecation>{.interpreted-text role="ref"}.

    Features

    • #9897: Added shell-style wildcard support to testpaths.

    Improvements

    • #10218: @pytest.mark.parametrize() (and similar functions) now accepts any Sequence[str] for the argument names, instead of just list[str] and tuple[str, ...].

      (Note that str, which is itself a Sequence[str], is still treated as a comma-delimited name list, as before).

    • #10381: The --no-showlocals flag has been added. This can be passed directly to tests to override --showlocals declared through addopts.

    • #3426: Assertion failures with strings in NFC and NFD forms that normalize to the same string now have a dedicated error message detailing the issue, and their utf-8 representation is expresed instead.

    • #7337: A warning is now emitted if a test function returns something other than [None]{.title-ref}. This prevents a common mistake among beginners that expect that returning a [bool]{.title-ref} (for example [return foo(a, b) == result]{.title-ref}) would cause a test to pass or fail, instead of using [assert]{.title-ref}.

    • #8508: Introduce multiline display for warning matching via :pypytest.warns{.interpreted-text role="func"} and enhance match comparison for :py_pytest._code.ExceptionInfo.match{.interpreted-text role="func"} as returned by :pypytest.raises{.interpreted-text role="func"}.

    • #8646: Improve :pypytest.raises{.interpreted-text role="func"}. Previously passing an empty tuple would give a confusing error. We now raise immediately with a more helpful message.

    • #9741: On Python 3.11, use the standard library's tomllib{.interpreted-text role="mod"} to parse TOML.

      tomli{.interpreted-text role="mod"}` is no longer a dependency on Python 3.11.

    • #9742: Display assertion message without escaped newline characters with -vv.

    • #9823: Improved error message that is shown when no collector is found for a given file.

    ... (truncated)

    Commits
    • 3af3f56 Prepare release version 7.2.0
    • bc2c3b6 Merge pull request #10408 from NateMeyvis/patch-2
    • d84ed48 Merge pull request #10409 from pytest-dev/asottile-patch-1
    • ffe49ac Merge pull request #10396 from pytest-dev/pylib-hax
    • d352098 allow jobs to pass if codecov.io fails
    • c5c562b Fix typos in CONTRIBUTING.rst
    • d543a45 add deprecation changelog for py library vendoring
    • f341a5c Merge pull request #10407 from NateMeyvis/patch-1
    • 1027dc8 [pre-commit.ci] auto fixes from pre-commit.com hooks
    • 6b905ee Add note on tags to CONTRIBUTING.rst
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
  • Accessing a child element named 'children'

    Accessing a child element named 'children'

    Currently an XML element with a child element named 'children' will be parsed into a Python object with children being the list of child elements, not the 'children' element itself. Is there any way to access this element?

    opened by willburden 0
  • Bump flake8 from 4.0.1 to 5.0.4

    Bump flake8 from 4.0.1 to 5.0.4

    Bumps flake8 from 4.0.1 to 5.0.4.

    Commits
    • 6027577 Release 5.0.4
    • 213e006 Merge pull request #1653 from asottile/lower-bound-importlib-metadata
    • e94ee2b require sufficiently new importlib-metadata
    • 318a86a Merge pull request #1646 from televi/main
    • 7b8b374 Clarify entry point naming
    • 7160561 Merge pull request #1649 from PyCQA/pre-commit-ci-update-config
    • 84d56a8 [pre-commit.ci] pre-commit autoupdate
    • ff6569b Release 5.0.3
    • e76b59a Merge pull request #1648 from PyCQA/invalid-syntax-partial-parse
    • 25e8ff1 ignore config files that partially parse as flake8 configs
    • Additional commits viewable in compare view

    Dependabot compatibility score

    Dependabot will resolve any conflicts with this PR as long as you don't alter it yourself. You can also trigger a rebase manually by commenting @dependabot rebase.


    Dependabot commands and options

    You can trigger Dependabot actions by commenting on this PR:

    • @dependabot rebase will rebase this PR
    • @dependabot recreate will recreate this PR, overwriting any edits that have been made to it
    • @dependabot merge will merge this PR after your CI passes on it
    • @dependabot squash and merge will squash and merge this PR after your CI passes on it
    • @dependabot cancel merge will cancel a previously requested merge and block automerging
    • @dependabot reopen will reopen this PR if it is closed
    • @dependabot close will close this PR and stop Dependabot recreating it. You can achieve the same result by closing it manually
    • @dependabot ignore this major version will close this PR and stop Dependabot creating any more for this major version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this minor version will close this PR and stop Dependabot creating any more for this minor version (unless you reopen the PR or upgrade to it yourself)
    • @dependabot ignore this dependency will close this PR and stop Dependabot creating any more for this dependency (unless you reopen the PR or upgrade to it yourself)
    dependencies 
    opened by dependabot[bot] 0
Releases(1.2.1)
  • 1.2.1(Jul 2, 2022)

    Bugfixes

    • (SECURITY) Use defusedxml to prevent external entity exploits by @stchris in https://github.com/stchris/untangle/pull/94

    Full Changelog: https://github.com/stchris/untangle/compare/1.2.0...1.2.1

    Source code(tar.gz)
    Source code(zip)
  • 1.2.0(Jul 1, 2022)

    Major changes

    • (SECURITY) Prevent XML SAX vulnerability: External Entities injection (#60)
    • support for python keywords as element names (#43)
    • support Element truthiness on Python 3 (#68)
    • dropped support for Python 3.4-3.6 and pypy, untangle currently support Python 3.7-3.10
    • fixed setup.py warning (#77)

    Development related changes

    • dropped support for Python 2.6, 3.3
    • formatted code with black
    • flake8 linter enforced in CI
    • main is now the default branch
    • switch to Github Actions
    • switch to poetry and pytest

    Changes by PRs

    • adding trailing underscore for python keywords by @reverbc in https://github.com/stchris/untangle/pull/43
    • support 'in' and 'not in' operator by @reverbc in https://github.com/stchris/untangle/pull/40
    • Make hasattribute work with 3.6 by @CarlosSolrac in https://github.com/stchris/untangle/pull/57
    • Added equality tests for #63 by @stchris in https://github.com/stchris/untangle/pull/67
    • Make untangle available from conda-forge by @HTenkanen in https://github.com/stchris/untangle/pull/75
    • Make main the default branch by @stchris in https://github.com/stchris/untangle/pull/81
    • Github Actions for CI by @stchris in https://github.com/stchris/untangle/pull/84
    • Release to testpypi by @stchris in https://github.com/stchris/untangle/pull/85
    • Remove Python versions mentioned in docs by @stchris in https://github.com/stchris/untangle/pull/86
    • Fix simple typo: addded -> added by @timgates42 in https://github.com/stchris/untangle/pull/71
    • Remove duplicated tests module by @stchris in https://github.com/stchris/untangle/pull/88
    • Add a test with an unicode element name by @stchris in https://github.com/stchris/untangle/pull/89
    • Disable feature_external_ges by @stchris in https://github.com/stchris/untangle/pull/90
    • Updating pip & Suggestion for better doc in parse method by @domi877 in https://github.com/stchris/untangle/pull/76
    • Make classifiers a list not a tuple by @juliangilbey in https://github.com/stchris/untangle/pull/77
    • Support Element truthiness on Python 3 by @davidjb in https://github.com/stchris/untangle/pull/68
    • Prepare 1.2.0 release by @stchris in https://github.com/stchris/untangle/pull/91

    New Contributors

    • @reverbc made their first contribution in https://github.com/stchris/untangle/pull/43
    • @CarlosSolrac made their first contribution in https://github.com/stchris/untangle/pull/57
    • @HTenkanen made their first contribution in https://github.com/stchris/untangle/pull/75
    • @timgates42 made their first contribution in https://github.com/stchris/untangle/pull/71
    • @domi877 made their first contribution in https://github.com/stchris/untangle/pull/76
    • @juliangilbey made their first contribution in https://github.com/stchris/untangle/pull/77
    • @davidjb made their first contribution in https://github.com/stchris/untangle/pull/68

    Full Changelog: https://github.com/stchris/untangle/compare/1.1.1...1.2.0

    Source code(tar.gz)
    Source code(zip)
  • 1.1.1(Jun 30, 2022)

    What's Changed

    • Better tests by @stchris in https://github.com/stchris/untangle/pull/23
    • Moved CHANGELOG to a new file by @stchris in https://github.com/stchris/untangle/pull/24
    • Update trove classifiers to match travis.yml by @cfournie in https://github.com/stchris/untangle/pull/25
    • Added python3.6 to travis builds by @stchris in https://github.com/stchris/untangle/pull/31
    • Optional xml.sax features by @ransford in https://github.com/stchris/untangle/pull/26
    • Updated Python 3.6 related docs and metadata by @stchris in https://github.com/stchris/untangle/pull/34
    • Added SAX feature toggle docs. #32 #26 by @stchris in https://github.com/stchris/untangle/pull/35
    • Make sure that unicode strings are parsed properly. #17 by @stchris in https://github.com/stchris/untangle/pull/36
    • Release preparation for 1.1.1 by @stchris in https://github.com/stchris/untangle/pull/38
    • Feature/flake8 by @stchris in https://github.com/stchris/untangle/pull/39

    New Contributors

    • @cfournie made their first contribution in https://github.com/stchris/untangle/pull/25
    • @ransford made their first contribution in https://github.com/stchris/untangle/pull/26

    Full Changelog: https://github.com/stchris/untangle/compare/1.1.0...1.1.1

    Source code(tar.gz)
    Source code(zip)
Owner
Christian Stefanescu
Christian Stefanescu
That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.

That project takes as input special TXT File, divides its content into lsit of HTML objects and then creates HTML file from them.

1 Jan 10, 2022
Python binding to Modest engine (fast HTML5 parser with CSS selectors).

A fast HTML5 parser with CSS selectors using Modest engine. Installation From PyPI using pip: pip install selectolax Development version from github:

Artem Golubin 710 Jan 04, 2023
Pythonic HTML Parsing for Humans™

Requests-HTML: HTML Parsing for Humans™ This library intends to make parsing HTML (e.g. scraping the web) as simple and intuitive as possible. When us

Python Software Foundation 12.9k Jan 01, 2023
Converts XML to Python objects

untangle Documentation Converts XML to a Python object. Siblings with similar names are grouped into a list. Children can be accessed with parent.chil

Christian Stefanescu 567 Nov 30, 2022
A python HTML builder library.

PyML A python HTML builder library. Goals Fully functional html builder similar to the javascript node manipulation. Implement an html parser that ret

Arjix 8 Jul 04, 2022
Modded MD conversion to HTML

MDPortal A module to convert a md-eqsue lang to html Basically I ruined md in an attempt to convert it to html Overview Here is a demo file from parse

Zeb 1 Nov 27, 2021
A jquery-like library for python

pyquery: a jquery-like library for python pyquery allows you to make jquery queries on xml documents. The API is as much as possible the similar to jq

Gael Pasgrimaud 2.2k Dec 29, 2022
Generate HTML using python 3 with an API that follows the DOM standard specfication.

Generate HTML using python 3 with an API that follows the DOM standard specfication. A JavaScript API and tons of cool features. Can be used as a fast prototyping tool.

byteface 114 Dec 14, 2022
inscriptis -- HTML to text conversion library, command line client and Web service

inscriptis -- HTML to text conversion library, command line client and Web service A python based HTML to text conversion library, command line client

webLyzard technology 122 Jan 07, 2023
The lxml XML toolkit for Python

What is lxml? lxml is the most feature-rich and easy-to-use library for processing XML and HTML in the Python language. It's also very fast and memory

2.3k Jan 02, 2023
Standards-compliant library for parsing and serializing HTML documents and fragments in Python

html5lib html5lib is a pure-python library for parsing HTML. It is designed to conform to the WHATWG HTML specification, as is implemented by all majo

1k Dec 27, 2022
The awesome document factory

The Awesome Document Factory WeasyPrint is a smart solution helping web developers to create PDF documents. It turns simple HTML pages into gorgeous s

Kozea 5.4k Jan 07, 2023
Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes

Bleach Bleach is an allowed-list-based HTML sanitizing library that escapes or strips markup and attributes. Bleach can also linkify text safely, appl

Mozilla 2.5k Dec 29, 2022
Lektor-html-pretify - Lektor plugin to pretify the HTML DOM using Beautiful Soup

html-pretify Lektor plugin to pretify the HTML DOM using Beautiful Soup. How doe

Chaos Bodensee 2 Nov 08, 2022
A library for converting HTML into PDFs using ReportLab

XHTML2PDF The current release of xhtml2pdf is xhtml2pdf 0.2.5. Release Notes can be found here: Release Notes As with all open-source software, its us

2k Dec 27, 2022
A HTML-code compiler-thing that lets you reuse HTML code.

RHTML RHTML stands for Reusable-Hyper-Text-Markup-Language, and is pronounced "Rech-tee-em-el" despite how its abbreviation is. As the name stands, RH

Duckie 4 Nov 15, 2021
Python module that makes working with XML feel like you are working with JSON

xmltodict xmltodict is a Python module that makes working with XML feel like you are working with JSON, as in this "spec": print(json.dumps(xmltod

Martín Blech 5k Jan 04, 2023
Safely add untrusted strings to HTML/XML markup.

MarkupSafe MarkupSafe implements a text object that escapes characters so it is safe to use in HTML and XML. Characters that have special meanings are

The Pallets Projects 514 Dec 31, 2022
Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API

Dominate Dominate is a Python library for creating and manipulating HTML documents using an elegant DOM API. It allows you to write HTML pages in pure

Tom Flanagan 1.5k Jan 09, 2023