Component of Code

DoMD-ChemFAST is a modular platform for constructing molecular dynamics simulations. This manual details its code architecture, core module functionalities, and data management protocols.

Directory Structure

The directory structure of DoMD follows the principle of “Logic-Data-Interface” separation. Below is the core file tree of the project with functional descriptions:

DoMD/
├── domd_cgbuilder/
│   ├── HSP_predictor/
│   │   ├── models/
│   │   │   ├── __init__.py
│   │   │   ├── dDPredictor.pt
│   │   │   ├── dHPredictor.pt
│   │   │   └── dPPredictor.pt
│   │   ├── __init__.py
│   │   └── hsp_models.py
│   ├── __init__.py
│   ├── _conf_gen.py
│   ├── cg_ff.py
│   ├── cg_ff_old.py
│   ├── cg_mol.py
├── domd_database/
│   ├── forcefield/
│   │   ├── gaff/
│   │   │   ├── data/
│   │   │   │   ├── amber_10.pkl
│   │   │   │   └── gaff_10.db
│   │   │   └── add_data_to_db.py
│   │   └── oplsaa/
│   │       ├── data/
│   │       │   ├── boss_bonded.sb
│   │       │   ├── ffbonded.itp
│   │       │   ├── ffnonbonded.itp
│   │       │   └── STaGE_opls_tomoltemplate_opls.txt
│   │       ├── add_data_to_db.py
│   │       └── make_db.py
│   └── __init__.py
├── domd_forcefield/
│   ├── gaff/
│   │   ├── resources/
│   │   │   └── gaff_10.db
│   │   ├── __init__.py
│   │   ├── database.py
│   │   ├── gaff.py
│   │   ├── gaff_db.py
│   │   ├── gaff_types.py
│   │   └── ml.py
│   ├── oplsaa/
│   │   ├── ml_functions/
│   │   │   ├── resources/
│   │   │   │   ├── angle_idx.pkl
│   │   │   │   ├── bond_idx.pkl
│   │   │   │   ├── di_idx.pkl
│   │   │   │   ├── idx_angle.pkl
│   │   │   │   ├── idx_bond.pkl
│   │   │   │   ├── idx_di.pkl
│   │   │   │   ├── idx_imps.pkl
│   │   │   │   ├── idx_nonbond.pkl
│   │   │   │   ├── imps_idx.pkl
│   │   │   │   ├── minAngle.pt
│   │   │   │   ├── minBond.pt
│   │   │   │   ├── minCharge.pt
│   │   │   │   ├── minDi.pt
│   │   │   │   ├── minDi_add.pt
│   │   │   │   ├── minImp.pt
│   │   │   │   ├── minNonbond.pt
│   │   │   │   └── nbtype_an_hash.pkl
│   │   │   ├── __init__.py
│   │   │   └── models.py
│   │   ├── resources/
│   │   │   ├── opls.db # This file is not included in the repository due to its size. Please refer to the Data Management section for setup instructions.
│   │   │   └── readme.md
│   │   ├── __init__.py
│   │   ├── database.py
│   │   ├── ml.py
│   │   ├── opls.py
│   │   ├── opls_db.py
│   │   └── opls_types.py
│   ├── __init__.py
│   ├── charge_model.py
│   ├── forcefield.py
│   └── functions.py
├── domd_tools/
│   ├── __init__.py
│   ├── aa_builder.py
│   ├── coarse_grain.py
│   ├── force_field.py
│   ├── gmx_output.py
│   └── manage_db.py
├── domd_topology/
│   ├── __init__.py
│   ├── _mapping.py
│   ├── functions.py
│   ├── reactor.py
│   └── reactor_old.py
├── domd_xyz/
│   ├── embed/
│   │   ├── __init__.py
│   │   ├── embed_with_cg_xyz.py
│   │   └── optimize_orientation.py
│   ├── __init__.py
│   └── embed_molecule.py
├── misc/
│   ├── io/
│   │   ├── __init__.py
│   │   ├── assemble.py
│   │   ├── gmx_reader.py
│   │   ├── gmx_writer.py
│   │   ├── xml_reader.py
│   │   └── xml_writer.py
│   ├── __init__.py
│   ├── aa_molecule.py
│   ├── cg_system.py
│   ├── draw.py
│   └── logger.py
├── polyimides_dataset/
│   ├── dbapp.zip
│   └── readme.md
├── .gitignore
├── .readthedocs.yaml
├── ChemFAST-logo.png
├── DoMD-logo.png
├── DoMDlogo-square.png
├── environment.yml
├── environment_gpu.yml
├── LICENSE
├── MAINFEST.in
├── README.md
└── setup.py

Core Modules Explained

DoMD consists of several core sub-packages that work synergistically to transform input data from SMILES strings into GROMACS input files.

1. domd_topology (Topology Engine)

This is the logical core of DoMD, responsible for handling chemical connectivity. * Function: Parses SMILES/SMARTS strings and executes the S-CGFG (Stochastic Coarse-Grained Fine-Graining) algorithm. * Key Class: Reactor. It reads reaction templates to connect disconnected monomers or coarse-grained beads into a complete All-Atom (AA) topology graph (NetworkX Graph).

2. domd_forcefield (Parameterization)

This is the physical core of DoMD, responsible for assigning physical parameters to the topology graph. * Function: Assigns atomic charges, Lennard-Jones parameters, and bonded parameters (Bond/Angle/Dihedral). * Strategy: Uses a Hybrid Strategy. It prioritizes querying verified experimental parameters (BOSS/LigParGen) from domd_database. For unknown fragments, it utilizes built-in GAT (Graph Attention Networks) models for high-precision prediction.

3. domd_xyz (Geometry Embedding)

Responsible for generating 3D coordinates from the topology graph. * Algorithm: Uses Fragment Embedding technology. It generates local coordinates for rigid fragments, maps them back to the positions of Coarse-Grained (CG) beads, and performs rotational optimization to eliminate steric clashes. * Feature: Supports the large=N parameter, which uses spatial grid decomposition to accelerate the construction of macromolecules.

4. domd_cgbuilder (Coarse-Graining)

Responsible for bottom-up coarse-grained modeling. * Function: Predicts Hansen Solubility Parameters (HSP) based on chemical structure to derive interaction potentials (\(\epsilon\)) between CG beads. It also handles the definition of Rigid Bodies.

5. domd_tools (User Interface)

This layer is for direct user interaction. It is recommended to import functions directly from this module when writing scripts, rather than calling low-level modules. * create_cg_system: One-click generation of coarse-grained simulation files. * build_aa_topology: Executes Backmapping. * assign_ff_parameters: Automated force field assignment. * get_gmx: Exports final simulation files.

Data Management

DoMD relies on extensive force field database files, which are typically not included in the Git repository due to their size.

OPLS Database Setup: Users must ensure that the opls.db file is placed in the correct path; otherwise, the force field engine will not function:

# Correct Path
domd_database/forcefield/oplsaa/resources/opls.db

This database contains pre-calculated parameters for millions of molecules and serves as the foundation of the Hybrid strategy.

Workflow Integration

The modular design of DoMD allows for flexible pipeline integration. A typical data flow is as follows:

  1. Input: SMILES + Reaction Template

  2. CGBuilder: domd_cgbuilder \(\rightarrow\) .xml (for HOOMD/GALAMOST)

  3. Simulation: (External MD Engine) \(\rightarrow\) Relaxed Configuration

  4. Backmapping: domd_topology + domd_xyz \(\rightarrow\) AA Graph + Coords

  5. Typing: domd_forcefield \(\rightarrow\) Parameterized System

  6. Output: misc.io \(\rightarrow\) .gro / .top