Processors¤
process_data(data: Any, docs: Optional[int | str | list[int] | list[str]] = None, limit: Optional[int] = Field(None, gt=0, description='Limit on number of terms to return')) -> dict[str, int]
¤
Process any supported data type into a consistent format of term counts.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Any
|
The input data to process |
required |
docs
|
Optional[int | str | list[int] | list[str]]
|
Optional document selection for multi-document data |
None
|
limit
|
Optional[int]
|
Optional limit on number of terms to return |
Field(None, gt=0, description='Limit on number of terms to return')
|
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
dict[str, int]: Dictionary with terms as keys and counts as values |
Raises:
| Type | Description |
|---|---|
LexosException
|
If data type is unsupported |
Source code in lexos/visualization/processors.py
filter_docs(df: pd.DataFrame, docs: Optional[list[int] | list[str]] = None) -> pd.DataFrame
¤
Filter the documents in a DTM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A Document Term Matrix. |
required |
docs
|
Optional[list[int] | list[str]]
|
A list of document indices or labels to filter the DTM. |
None
|
Returns:
| Type | Description |
|---|---|
DataFrame
|
A filtered DTM. |
Source code in lexos/visualization/processors.py
process_dataframe(df: pd.DataFrame, docs: Optional[int | str | list[int] | list[str]] = None) -> Counter
¤
Generate a term frequency dictionary from a DTM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
df
|
DataFrame
|
A Document Term Matrix object. |
required |
docs
|
Optional[int | str | list[int] | list[str]]
|
A list of document indices or labels to filter the DTM. |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Counter |
Counter
|
A Counter object with the terms as keys and the counts as values. |
Source code in lexos/visualization/processors.py
process_dtm(dtm: DTM, docs: Optional[int | str | list[int] | list[str]] = None) -> dict[str, int]
¤
Generate a term frequency dictionary from a DTM.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
dtm
|
DTM
|
A Document Term Matrix object. |
required |
docs
|
Optional[int | str | list[int] | list[str]]
|
A list of document indices or labels to filter the DTM. |
None
|
Returns:
| Type | Description |
|---|---|
dict[str, int]
|
dict[str, int]: A dictionary with the terms as keys and the counts as values. |
Source code in lexos/visualization/processors.py
process_list(data: list[list[Doc | Span] | list[str] | list[Token]], docs: Optional[int | list[int]]) -> Counter
¤
Process a list of docs, spans, strings, or tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
list[list[Doc | Span] | list[str] | list[Token]]
|
The data. |
required |
docs
|
Optional[int | list[int]]
|
A list of document indices to be selected from the DTM. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Counter |
Counter
|
A Counter object with the terms as keys and the counts as values. |
Source code in lexos/visualization/processors.py
process_docs(data: list[Doc] | list[Span], docs: Optional[int | list[int]]) -> Counter
¤
Process multiple docs or spans.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
list[Doc] | list[Span]
|
The data. |
required |
docs
|
Optional[int | list[int]]
|
A list of document indices to be selected from the DTM. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Counter |
Counter
|
A Counter object with the terms as keys and the counts as values. |
Source code in lexos/visualization/processors.py
process_item(data: Doc | Span | list[str] | list[Token]) -> Counter
¤
Process single docs, spans, and strings, or flat lists of strings or tokens.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
Doc | Span | list[str] | list[Token]
|
The data. |
required |
Returns:
| Type | Description |
|---|---|
Counter
|
dict[str, int]: A dictionary with the terms as keys and the counts as values. |
Source code in lexos/visualization/processors.py
multicloud_processor(data: DTM | pd.DataFrame | list[Doc] | list[Span] | list[list[str]] | list[list[Token]] | list[dict[str, int]], docs: Optional[int | str | list[int] | list[str]] = None) -> list[dict[str, int]]
¤
Process data into list of term-count dicts for multicloud visualization.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
DTM | pd.DataFrame | list[Doc] | list[Span] | list[list[str]] | list[list[Token]] | list[dict[str, int]]]
|
The data. |
required |
docs
|
Optional[int | str | list[int] | list[str]]
|
A list of document indices or labels to be selected from the DTM. |
None
|
Returns:
| Type | Description |
|---|---|
list[dict[str, int]]
|
list[dict[str, int]]: A list of dictionaries with the terms as keys and the counts as values. |
Source code in lexos/visualization/processors.py
get_rows(lst, n) -> Iterator[int]
¤
Yield successive n-sized rows from a list of documents.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
lst
|
list
|
A list of documents. |
required |
n
|
int
|
The number of columns in the row. |
required |
Yields:
| Type | Description |
|---|---|
int
|
A generator with the documents separated into rows. |
Source code in lexos/visualization/processors.py
_process_list_data(data: list, docs: Optional[int | str | list[int] | list[str]] = None) -> Counter
¤
Process list-type data inputs.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
data
|
list
|
List data to process |
required |
docs
|
Optional[int | str | list[int] | list[str]]
|
Optional document selection |
None
|
Returns:
| Name | Type | Description |
|---|---|---|
Counter |
Counter
|
Counter object with term counts |
Source code in lexos/visualization/processors.py
rendering:
show_root_heading: true
heading_level: 3