Getting Started

Overview¤

Lexos is a library for constructing text analysis workflows. This normally means a step-by-step pipeline of collecting, processing, analyzing, and visualizating data. (The distinction between the analysis and visualization, however, is often blurred because most visualizations require some form of analysis.) Lexos offers different modules for performing these steps. The Loader and Corpus modules collect and create containers for storing and accessing data. The Scrubber module enables you to perform preprocessing steps on the texts in your data, such as normalizing whitespace or removing certain character patterns. The Tokenizer module uses Natural Language Processing (NLP) tools to extract features from your data — most importantly, countable tokens. This can be transformed into a document-term matrix with the DTM module. A typical workflow is shown below (dotted lines indicate optional steps).

flowchart LR
   id1{Data} --> id2(((Loader))) & id3[(Corpus)]-. Preprocessing .-> id4{Scrubber}-. Feature Recognition .-> id5{Tokenizer} --> id6{DTM}

The DTM module allows you to extract basic statistics which you can use to interpret your data.

Lexos modules do not always have to be used in a strict sequential order. For instance, you can feed scrubbed or tokenized texts back into a corpus. You can also split your data at any time in the workflow with the Cutter module.

The workflow above might be supplemented by another leading to analysis and visualization.

flowchart LR
   id1{DTM}-.->id2(Analysis) & id3([Visualization]) & id4{Export}

Before You Get Started¤

Before you get started, make sure that you have installed Lexos.

Basic Usage¤

Lexos workflows can be run conveniently in Jupyter notebooks simply by importing the relevant module (or the required functions and classes from the module). For instance, you can import the Loader with

# Import the Lexos smart loader
from lexos.io.smart import Loader

# Instantiate the Loader object
loader = Loader()

# Load a text file
loader.load("myfile.txt")

This will work in a standalone script as well. Any errors will be printed to your notebook or console.

If you are designing an app that uses Lexos "under the hood", it is good practice to import the LexosException class and re-write the last line above in a try...except clause:

from lexos.exceptions import LexosException

try:
    loader.load("myfile.txt")
except LexosException as e:
    print(e)

This will enable your application to handle errors without stopping the program.

To learn about each of the individual modules in the Lexos API, browse through the pages in this tutorial, which take you through the modules and their applications one by one.