Utilities¤
Contains a function to get HTML/XML tags assuming well-formed XML using ETree with Beautiful Soup as a back-up.
get_tags(text: str) -> dict
¤
Get information about the tags in a text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
text
|
str
|
The text to be analyzed. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
dict |
dict
|
A dict with the keys "tags" and "attributes". "Tags is a list of unique tag names |
dict
|
in the data and "attributes" is a list of dicts containing the attributes and values |
|
dict
|
for those tags that have attributes. |
Note
The procedure tries to parse the markup as well-formed XML using ETree; otherwise, it falls back to BeautifulSoup's parser.