Skip to content

Pipeline¤

The pipeline component of Scrubber is used to manage an ordered application of Scrubber component functions to text.

lexos.scrubber.pipeline.make_pipeline(*funcs) ¤

Make a callable pipeline.

Make a callable pipeline that passes a text through a series of functions in sequential order, then outputs a (scrubbed) text string.

This function is intended as a lightweight convenience for users, allowing them to flexibly specify scrubbing options and their order,which (and in which order) preprocessing treating the whole thing as a single callable.

python -m pip install cytoolz is required for this function to work.

Use pipe (an alias for functools.partial) to pass arguments to preprocessors.

from lexos import scrubber
scrubber = Scrubber.pipeline.make_pipeline(
    scrubber.replace.hashtags,
    scrubber.replace.emojis,
    pipe(scrubber.remove.punctuation, only=[".", "?", "!"])
)
scrubber("@spacy_io is OSS for industrial-strength NLP in Python developed by @explosion_ai 💥")
'_USER_ is OSS for industrial-strength NLP in Python developed by _USER_ _EMOJI_'

Parameters:

Name Type Description Default
*funcs dict

A series of functions to be applied to the text.

()

Returns:

Type Description
Callable[[str], str]

Pipeline composed of *funcs that applies each in sequential order.

Source code in lexos\scrubber\pipeline.py
29
30
31
32
33
34
35
36
37
38
39
40
41
42
43
44
45
46
47
48
49
50
51
52
53
54
55
56
57
58
59
60
def make_pipeline(*funcs: Callable[[str], str]) -> Callable[[str], str]:
    """Make a callable pipeline.

    Make a callable pipeline that passes a text through a series of
    functions in sequential order, then outputs a (scrubbed) text string.

    This function is intended as a lightweight convenience for users,
    allowing them to flexibly specify scrubbing options and their order,which (and in which order) preprocessing
    treating the whole thing as a single callable.

    `python -m pip install cytoolz` is required for this function to work.

    Use `pipe` (an alias for `functools.partial`) to pass arguments to preprocessors.

    ```python
    from lexos import scrubber
    scrubber = Scrubber.pipeline.make_pipeline(
        scrubber.replace.hashtags,
        scrubber.replace.emojis,
        pipe(scrubber.remove.punctuation, only=[".", "?", "!"])
    )
    scrubber("@spacy_io is OSS for industrial-strength NLP in Python developed by @explosion_ai 💥")
    '_USER_ is OSS for industrial-strength NLP in Python developed by _USER_ _EMOJI_'
    ```

    Args:
        *funcs (dict): A series of functions to be applied to the text.

    Returns:
        Pipeline composed of ``*funcs`` that applies each in sequential order.
    """
    return functoolz.compose_left(*funcs)

lexos.scrubber.pipeline.make_pipeline_from_tuple(funcs) ¤

Return a pipeline from a tuple.

Parameters:

Name Type Description Default
funcs tuple

A tuple containing callables or string names of functions.

required

Returns a tuple of functions.

Source code in lexos\scrubber\pipeline.py
62
63
64
65
66
67
68
69
70
def make_pipeline_from_tuple(funcs: tuple) -> tuple:
    """Return a pipeline from a tuple.

    Args:
        funcs (tuple): A tuple containing callables or string names of functions.

    Returns a tuple of functions.
    """
    return make_pipeline(*[eval(x) if isinstance(x, str) else x for x in funcs])

Note

lexos.scrubber.pipeline.make_pipeline_from_tuple is deprecated. It should not be necessary if you are using lexos.scrubber.registry.

lexos.scrubber.pipeline.pipe(func, *args, **kwargs) ¤

Apply functool.partial and add __name__ to the partial function.

This allows the function to be passed to the pipeline along with keyword arguments.

Parameters:

Name Type Description Default
func Callable

A callable.

required

Returns:

Type Description
Callable

A partial function with __name__ set to the name of the function.

Source code in lexos\scrubber\pipeline.py
11
12
13
14
15
16
17
18
19
20
21
22
23
24
25
26
27
def pipe(func: Callable, *args, **kwargs) -> Callable:
    """Apply functool.partial and add `__name__` to the partial function.

    This allows the function to be passed to the pipeline along with
    keyword arguments.

    Args:
        func (Callable): A callable.
    Returns:
        A partial function with `__name__` set to the name of the function.
    """
    if not args and not kwargs:
        return func
    else:
        partial_func = partial(func, *args, **kwargs)
        update_wrapper(partial_func, func)
        return partial_func