Pipeline¤
Allows the user to customize a "pipeline," an order in which to perform different scrubbing operations. For example: removing all digits before replacing all phone numbers would have a very different effect than replacing all phone numbers before removing all digits.
pipe(func: Callable, *args, **kwargs) -> Callable
¤
Apply functool.partial and add __name__ to the partial function.
This allows the function to be passed to the pipeline along with keyword arguments.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
func
|
Callable
|
A callable. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
Callable |
Callable
|
A partial function with |
Source code in lexos/scrubber/pipeline.py
make_pipeline(*funcs: Callable[[str], str]) -> Callable[[str], str]
¤
Make a callable pipeline.
Make a callable pipeline that passes a text through a series of functions in sequential order, then outputs a (scrubbed) text string.
This function is intended as a lightweight convenience for users, allowing them to flexibly specify scrubbing options and their order,which (and in which order) preprocessing treating the whole thing as a single callable.
python -m pip install cytoolz is required for this function to work.
Use pipe (an alias for functools.partial) to pass arguments to preprocessors.
from lexos import scrubber
scrubber = Scrubber.pipeline.make_pipeline(
scrubber.replace.hashtags,
scrubber.replace.emojis,
pipe(scrubber.remove.punctuation, only=[".", "?", "!"])
)
scrubber("@spacy_io is OSS for industrial-strength NLP in Python developed by @explosion_ai 💥")
'_USER_ is OSS for industrial-strength NLP in Python developed by _USER_ _EMOJI_'
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
*funcs
|
Callable[[str], str
|
A series of functions to be applied to the text. |
()
|
Returns:
| Type | Description |
|---|---|
Callable[[str], str]
|
Callable[[str], str]: Pipeline composed of |
Source code in lexos/scrubber/pipeline.py
make_pipeline_from_tuple(funcs: tuple) -> tuple
¤
Return a pipeline from a tuple.
Parameters:
| Name | Type | Description | Default |
|---|---|---|---|
funcs
|
tuple
|
A tuple containing callables or string names of functions. |
required |
Returns:
| Name | Type | Description |
|---|---|---|
tuple |
tuple
|
A pipeline composed of the functions in |