DFR Browser 2¤
Overview¤
DFR Browser 2 is web-based topic modelling browser that provides interactive visualizations and analysis tools for exploring topic models generated by MALLET. DFR Browser 2 is based on Andrew Goldstone's original dfr-browser. It reproduces all the major functionality of the original, but with an entirely new architecture and additional features. For full documentation, see the DFR Browser 2 repository.
The dfr_browser2 module provides a small helper class Browser that automates the steps required to prepare and open a a DFR Browser 2 distribution. This helper is designed to be used programmatically from Python and can be used to produce a small static browser bundle. It performs the following functions:
- Validates that all required MALLET output files exist
- Auto-generates
topic_coords.csvfromtopic-state.gzif the coordinates file is missing - Copies a template DFR Browser 2 folder into a working browser folder
- Copies all MALLET output and metadata files into the browser's
data/, along with optional files (diagnostics.xml) if present - Copies an optional file containing the documents used to generate the topic model
- Manages configuration settings for the browser
- Checks port availability before starting the server
- Starts a simple HTTP server to serve the browser and opens it in a web browser
If you have not yet generated a MALLET topic model, it is recommended that you start with the MALLET tutorial.
Browser Class¤
To create a simple instance of a browser, you need to supply a path to the directory where you mallet files are located and a path to a directory where you want to save the browser:
from lexos.topic_modeling.dfr_browser2 import Browser
b = Browser(
mallet_files_path="/path/to/mallet_files",
browser_path="/tmp/dfr_browser_output", # optional (temporary folder created if omitted)
)
b.serve()
Calling the serve() method will start a localhost server and open the browser in your system's default web browser. The server runs in a subprocess, so you can continue working in your Python session while the browser is running. You can also start the server from the commmand line by running the server.py script in your DFR Browser 2 folder.

Note
DFR Browser 2 must be served from a server; otherwise it will not have full functionality. The serve() method checks if the specified port is available before starting the server. If the port is already in use, you'll receive a helpful error message with instructions on how to find and terminate the conflicting process, or you can specify a different port using the port parameter. The default port is 8000, which may conflict with Jupyter notebooks or other local servers.
Tip
If you're running this in a Jupyter notebook, the browser will open in a new tab. You can stop the server by calling b.stop_server() or by restarting the kernel.
The example below demonstrates a fuller set of options:
b = Browser(
mallet_files_path="/path/to/mallet_files",
browser_path="/tmp/dfr_browser_output", # optional (temporary folder created if omitted)
data_path="/path/to/docs.txt", # optional
template_path="/path/to/dfr_browser2/template",
filename_map={"doc-topics.txt": "doc-topic.txt"},
config={"application": {"name": "My Browser"}},
port=5000
)
The data_path is the path to your original training data file. It should be a tab-separated file with 2 columns per line (ID, content) or 3 columns per line (ID, label, content) — lines are validated during initialization and copied to data/docs.txt.
Because DFR Browser 2 is a separate package, Lexos may not have the latest distribution. If this is the case, you can download the latest version from the DFR Browser 2 repository and set the template_path to this version.
The Browser class assumes canonical names like doc-topic.txt for your input files. You can rename your files to the canonical names or map the current names of your files to the canonical ones with filename_map. See below for further information on using this parameter.
Every instance of DFR Browser 2 has a config.json file containing the browser configuration. You can pass configuration values as a dictionary to this file using the Browser class with the config parameter. For discussion of the configuration options, see the DFR Browser 2 repository.
As mentioned earlier, you can also set the port on which the browser is served with the port parameter.
Typical Usage Flow¤
- Construct the
Browserobject — initialization checks required files, auto-generates missing files (liketopic_coords.csv), copies the template, copies files intodata/, validates TSVs, and writesconfig.json. - Start serving using
b.serve()— this checks port availability, runs a subprocess HTTP server, and opens the browser in your default web browser (or prints instructions if a web browser cannot be opened). - Continue working or stop the server using
b.stop_server()when done.
Required MALLET Files and Alternate Filenames¤
The Browser checks a small set of required files in mallet_files_path and supports some alternate names. Canonical names are:
metadata.csv(required)topic-keys.txt(required)doc-topic.txt(ordoc-topics.txt) (required) — synonyms for the documents-to-topic mappingtopic-state.gz(orstate.gz) (required) — supportsstate.gzas an alternate nametopic_coords.csv(ortopic-coords.csv) (optional) — coordinates for topic display; auto-generated from topic-state.gz if missingdiagnostics.xml(optional) — MALLET diagnostics file, copied if present
Behavior notes:
- Auto-generation: If
topic_coords.csvis missing, the Browser will automatically generate it fromtopic-state.gzusing Jensen-Shannon divergence and multidimensional scaling (MDS). This ensures the browser has topic coordinates for visualization even if they weren't pre-computed. - If you supply a
filename_map, keys are considered the original filenames found inmallet_files_pathand values are destination names to use insidedata/. filename_mapis flexible — if you reverse the mapping (i.e., use the destination as key and the source as value), Browser attempts to detect and correct the mapping.- For canonicalization, where both
doc-topic.txtanddoc-topics.txtare present, Browser will deduplicate and select the canonical destinationdoc-topic.txt. - Optional files like
diagnostics.xmlare automatically copied if they exist in the MALLET output directory.
Example: filename_map¤
- Standard mapping when you want
doc-topics.txtto be calleddoc-topic.txtindata/:
- You can also specify a partial map to only rename some files:
- If your mapping was reversed (the key was the destination),
Browserwill attempt to handle that by checking if the key or value exists in themallet_files_pathand swapping behavior as necessary.
data_path TSV Validation¤
If data_path is provided, Browser will ensure it is a non-directory file and validate each row. Each non-empty row must contain exactly 2 or 3 tab-separated columns. If any row fails this validation a ValueError is raised.
Browser will copy the data file to data/docs.txt and update the config accordingly.
Merging config.json¤
Browser reads the template config.json (if it exists) and then merges the config provided by the caller. The merge rules are:
- User-supplied
configkeys are preserved and take precedence — Browser will not overwrite keys inself.config - Browser sets
*_filevalues (e.g.doc_topic_file,topic_keys_file,topic_state_file, etc.) to the files that were copied intodata/— but only when the user did not specify those keys inself.config - Template values are used only where not overridden by the user or the copying process
The merged config.json is written to browser_path/config.json and the merged config is saved back to b.config so it is available in-memory. Assigning b.config later (either by b.config = {...} or using config_browser()) also writes the new config to disk.
To explicitly update the config programmatically after the Browser instance has been initialized, you can call:
This sets b.config, which triggers a write to the json.config file.
Version Behavior¤
Calling Browser.version will return the DFR Browser 2 version number. The Browser has a class-level BROWSER_VERSION used as a default. If the template or the user-specified config has an application.version, that value is returned by the property Browser.version. Otherwise, BROWSER_VERSION is returned.
The property is intentionally defensive — if the config.json file is malformed, it will simply fall back to the default BROWSER_VERSION.
Troubleshooting & Tips¤
- If you get
FileNotFoundError: Missing required mallet files, checkmallet_files_pathfor expected files and confirm names or provide afilename_mapto rename or canonicalize source files. - If you get
RuntimeError: Server failed to startwith a port conflict message, either use a different port by passingport=XXXXtoserve(), or follow the instructions in the error message to terminate the conflicting process. - If
topic_coords.csvis missing, it will be automatically generated fromtopic-state.gz. This may take a few seconds for large topic models. - If a
config.jsonkey is missing or incorrect, verify whether you set the config key inconfig(user-specified configs override template values) or whether Browser wrote a copied/data path into the mergedconfig.json. - Use
Browser.BROWSER_VERSIONorb.versionto inspect or assert the configured version. - To stop a running server, call
b.stop_server()or interrupt the Python process.