Corpus¤
The corpus module provides functionality for document management and statistical analysis in the Lexos ecosystem. It provides centralized storage, metadata management, and inter-module communication capabilities that enable seamless integration with analysis modules. By default, it is entirely file-based; however, there is an option to manage a corpus database with SQLite.
Core Classes¤
Corpus (corpus.py)¤
The corpus module main container for managing collections of documents. Provides document storage, metadata management, and inter-module communication capabilities.
Record (record.py)¤
The record module implements an individual document container with robust metadata and serialization capabilities.
CorpusStats (corpus_stats.py)¤
The CorpusStats module provides methods for generating statistics about a corpus.
LexosModelCache and RecordsDict (utils.py)¤
The utils module provides utility classes for efficient model management and type-safe record storage.
SQLite Database¤
Database management is implemented in two modules:
SQLiteBackend (database.py)¤
The database module provides the main database functionality.
SQLiteCorpus (integration.py)¤
The integration module the handler for integration with the main corpus API.
Corpus Analysis Report¤
The corpus_analysis_report module provides a helper function for generating a comprehensive analysis of the contents of a corpus instance.