Utils¤
The utils
component of Scrubber
contains helper functions shared by the other components.
lexos.scrubber.utils.get_tags(text)
¤
Get information about the tags in a text.
Parameters:
Name | Type | Description | Default |
---|---|---|---|
text |
str
|
The text to be analyzed. |
required |
Returns:
Name | Type | Description |
---|---|---|
dict |
dict
|
A dict with the keys "tags" and "attributes". "Tags is a list of unique tag names |
dict
|
in the data and "attributes" is a list of dicts containing the attributes and values |
|
dict
|
for those tags that have attributes. |
Note
The procedure tries to parse the markup as well-formed XML using ETree; otherwise, it falls back to BeautifulSoup's parser.
Source code in lexos\scrubber\utils.py
4 5 6 7 8 9 10 11 12 13 14 15 16 17 18 19 20 21 22 23 24 25 26 27 28 29 30 31 32 33 34 35 36 37 38 39 40 41 42 43 44 45 46 47 48 |
|