Resources¤
Mappings for removing or transforming character patterns.
HTMLTextExtractor
¤
Bases: HTMLParser
Simple subclass of :class:html.parser.HTMLParser.
Collects data elements (non-tag, -comment, -pi, etc. elements)
fed to the parser, then make them available as stripped, concatenated
text via HTMLTextExtractor.get_text().
Note
Users probably shouldn't deal with this class directly;
instead, use :func:remove.remove_html_tags()`.
Methods:
| Name | Description |
|---|---|
__init__ |
Initialize the parser. |
get_text |
Return the collected text. |
handle_data |
Handle data elements. |
Source code in lexos/scrubber/resources.py
__init__()
¤
get_text(sep: Optional[str] = '') -> str
¤
Return the collected text.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
sep
|
Optional[str]
|
The separator to join the collected text with. |
''
|
Returns:
| Name | Type | Description |
|---|---|---|
str |
str
|
The collected text. |
Source code in lexos/scrubber/resources.py
handle_data(data: Any) -> None
¤
Handle data elements.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Any
|
The data element(s) to handle. |
required |