Skip to content

Scrubberยค

Scrubber is a destructive preprocessing module that contains a set of functions for manipulating text. It leans heavily on the code base for Textacy but tweaks some of that library's functions in order to modify or extend the functionality.

Scrubber is divided into eight submodules:

normalize A set of functions for massaging text into standardized forms.
pipeline A set of functions for feeding multiple components into a scrubbing function.
registry A registry of scrubbing functions that can be accessed to reference functions by name.
remove A set of functions for removing strings and patterns from text.
replace A set of functions for replacing strings and patterns from text.
resources A set of constants, classes, and functions used by the other components of the Scrubber module.
scrubber Contains the lexos.scrubber.scrubber.Scrub class for managing scrubbing pipelines.
utils A set of utility functions shared by the other components of the Scrubber module.