Util¤
Utility functions for the milestones module.
API Documentation: util.py¤
LexosBaseModel
pydantic-model
¤
Bases: BaseModel
Base model inherits from Pydantic base model but validates spaCy objects.
Config:
arbitrary_types_allowed:Truejson_schema_extra:DocJSONSchema.schema()
Source code in lexos/milestones/util.py
chars_to_tokens(doc: Doc) -> dict[int, int]
¤
Generate a characters to tokens mapping for _match_regex().
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
A spaCy doc. |
required |
Returns:
| Type | Description |
|---|---|
dict[int, int]
|
A dict mapping character indexes to token indexes. |
Source code in lexos/milestones/util.py
ensure_list(item: Any) -> list[Any]
¤
Ensure that the input is a list.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
item
|
Any
|
The item to ensure is a list. |
required |
Returns:
| Type | Description |
|---|---|
list[Any]
|
list[Any]: The item as a list. |
Source code in lexos/milestones/util.py
filter_doc(doc: Doc, spans: list[Span]) -> Doc
¤
Filter a doc to remove tokens by index, retaining custom extensions.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
A spaCy doc. |
required |
spans
|
list[Span]
|
The span(s) to remove from the doc. |
required |
Returns:
| Type | Description |
|---|---|
Doc
|
A new doc with the spans removed. |
Source code in lexos/milestones/util.py
lowercase_spacy_rules(patterns: list[list[dict[str, Any]]], old_key: list[str] | str = ['TEXT', 'ORTH'], new_key: str = 'LOWER') -> list
¤
Convert spaCy Rule Matcher patterns to lowercase.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
patterns
|
list[list[dict[str, Any]]]
|
A list of spacy Rule Matcher patterns. |
required |
old_key
|
list[str] | str
|
A dictionary key or list of keys to rename. |
['TEXT', 'ORTH']
|
new_key
|
str
|
The new key name. |
'LOWER'
|
Returns:
| Type | Description |
|---|---|
list
|
A list of spaCy Rule Matcher patterns. |
Source code in lexos/milestones/util.py
move_milestone(doc: Doc, spans: list[Span], start: str) -> list[Span]
¤
Move the milestone start to a new token index.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
doc
|
Doc
|
A spaCy doc. |
required |
spans
|
list[Span]
|
The span(s) to use for identifying token attributes. |
required |
start
|
str
|
Set milestone start to the token before or after the milestone span. May be "before" or "after". |
required |
Returns:
| Type | Description |
|---|---|
list[Span]
|
A list of new milestone spans. |