Wrapped functions

The domd_tools module serves as the high-level “Chemical Compiler” interface for ChemFAST. It encapsulates the complex S-CGFG algorithms, Force Field assignment engines, and I/O exporters into four primary stages: Coarse-Graining, Backmapping, Parameterization, and Exporting.

The main APIs for each stage are:

  • Coarse-Grained System Generation: create_cg_system

  • CG Parameter Prediction: get_cg_ff

  • All-Atom Backmapping: build_aa_topology

  • Force Field Assignment: assign_ff_parameters

  • System Export: get_gmx

This document first explains the theoretical logic and API usage for each stage, followed by a complete end-to-end example.

Part 1: The Compilation Pipeline & wrapped functions

1. Chemical Definition Logic

Before utilizing the API, the system’s chemical composition and reaction logic must be defined using standard human-readable descriptors:

  • SMILES: Defines the atomic connectivity and structure of individual monomers.

  • SMARTS: Defines the reaction rules for polymerization or crosslinking. Correct atom map indices within SMARTS (e.g., [#7H2:1]) are crucial for defining regio-specificity (e.g., ensuring Head-to-Tail connectivity).

2. Coarse-Grained System Generation

To generate a relaxed, pre-equilibrated topology, ChemFAST maps the SMILES/SMARTS into a Coarse-Grained (CG) representation. The force field parameters for these CG beads are derived automatically using the Hansen Solubility Parameter (HSP) predictor.

Wrapped function: ``create_cg_system``

create_cg_system(molecules, reaction_template, cg_config, rc=1.0, rho=0.85, boxl=None, program='hoomd', filename='chemfast_cg')
  • molecules (dict): Dictionary defining monomers (SMILES).

  • reaction_template (dict): SMARTS rules defining how monomers connect.

  • cg_config (list[dict]): The structural blueprint (e.g., number of chains, topology types, rigidity).

  • rc (float, default: 1.0): Cutoff radius for initial geometric embedding.

  • rho (float, default: 0.85): Target number density. ChemFAST calculates the box size automatically based on this density and the total number of generated beads.

  • boxl (float, default: None): Explicit box length. If provided, overrides rho.

  • program (str, default: 'hoomd'): Target engine for the CG output XML.

  • filename (str, default: 'chemfast_cg'): Generates output files like out_chemfast_cg.xml.

Note

Standalone Parameter Prediction: If you only want to predict the CG force field parameters without building a full simulation box, you can bypass create_cg_system and use the HSP predictor directly. This is extremely useful for rapidly deriving parameters for novel monomers.

Example: Call get_cg_ff(molecules, reaction_template, iterations=3) from domd_cgbuilder.cg_ff to extract bonded and non-bonded parameters.

import json
from domd_cgbuilder.cg_ff import get_cg_ff

def predict_cg_params(molecules, reaction_template, iterations=3):
    graphs, prod_mols, stats, all_bonded, pairs, rigid_meta = get_cg_ff(molecules,reaction_template,iterations=iterations, create_epsilon=True)

    # Extract and format parameters
    nonBondPara = {k:stats[k] for k in stats if '-' not in k}
    BondPara = {}
    AngPara = {}
    for k in stats:
        if '-' in k:
            if len(k.split('-')) == 2:
                BondPara[k] = stats[k]
            if len(k.split('-')) == 3:
                AngPara[k] = stats[k]
    for k in pairs:
        stats[str(k)] = pairs[k]

    # Export to JSON
    json_str = json.dumps(stats, indent=4)
    with open('cg_parameters.txt', 'w', encoding='utf-8') as f:
        f.write(json_str)
    return

# Execute prediction
mols = {
    'A': {'smiles': 'Nc1ccc(Oc2ccc(N)cc2)cc1', 'file': None},
    'B': {'smiles': 'O1C(=O)c2cc(Oc3cc4C(=O)OC(=O)c4cc3)ccc2C1=O', 'file': None},
}
reaction_template = {
    'B-A': {
        'cg_reactant_list': [('A', 'B')],
        'smarts': '[#7H2:1].[#6:3](=[#8:4])[#8:2][#6:5]=[#8:6]>>[#6:3](=[#8:4])[#7:1][#6:5]=[#8:6].[#8:2]',
        'prod_idx': [0]
    }
}
predict_cg_params(mols, reaction_template)
# Outputs 'cg_parameters.txt' containing the HSP-derived bonded and non-bonded parameters.

3. All-Atom Backmapping (CG to AA)

This stage reconstructs the all-atom details from an equilibrated CG trajectory using geometric embedding algorithms and chemical SMARTS rules.

Wrapped function: ``build_aa_topology``

build_aa_topology(mols, reaction_template, xml, reactions=[], large=500, chunks_per_d=1, parallel=False)
  • mols & reaction_template: The atomic structures and reaction rules that will replace the CG beads.

  • xml (str): Path to the relaxed CG configuration file (e.g., final.xml).

  • reactions (list, default: []): Specific reaction tuples. If left empty, ChemFAST automatically infers connectivity directly from the bond section of the XML.

  • large (int, default: 500): Atom count threshold. If a polymer network exceeds this number of atoms, ChemFAST switches to a grid-based embedding strategy to ensure stability and avoid spatial overlaps.

  • chunks_per_d (int, default: 1): Grid subdivision parameter for embedding large molecules.

  • parallel (bool, default: False): If set to True, utilizes multiprocessing to generate AA coordinates in parallel. Highly recommended for systems with thousands of chains.

Note

Advanced Usage: Standalone Topology Building and Backmapping

Under the hood, the build_aa_topology function executes a two-phase pipeline: first, it logically constructs the full all-atom connectivity (Topology Building), and second, it folds these structures into the 3D space defined by the CG beads (Backmapping).

If you require granular control—for instance, if you only want to generate the 2D polymer structures without 3D coordinates, or if you want to custom-embed a single specific polymer chain for testing—you can decouple these steps and call the underlying modules directly.

Phase 1: Standalone Topology Building (Constructing the 2D Graph) You can use the Reactor class to chemically link the monomers based on the CG connectivity matrix, bypassing any 3D embedding. This is highly useful for generating massive libraries of polymer SMILES or topological graphs.

from domd_topology.reactor import Reactor
from misc.cg_system import read_cg_topology
from misc.io.xml_reader import XmlParser
from domd_topology.functions import set_molecule_id_for_h
from rdkit import Chem
from misc.logger import logger
logger.setLevel('INFO')

reaction_template = ...
mols = ...

# 1. Parse the CG simulation output
xml = XmlParser('final.xml')
cg_sys, cg_mols = read_cg_topology(xml, mols)
print(cg_mols[0].edges(data=True))

# 2. Extract bond connections to act as reaction triggers
reactions = [(bond[0], bond[1], bond[2]) for bond in xml.data['bond']]

# 3. Initialize the Reactor and synthesize the 2D all-atom structures
reactor = Reactor(mols, reaction_template)
aa_mols, meta = reactor.process(cg_mols, reactions)

# 4. Standardize the RDKit molecules (add hydrogens for full valency)
[Chem.SanitizeMol(_) for _ in aa_mols]
aa_mols_h = [Chem.AddHs(m) for m in aa_mols]
[set_molecule_id_for_h(m) for m in aa_mols_h]

# Output: aa_mols_h now contains the fully connected 2D polymer structures!

Phase 2: Standalone Backmapping (Embedding into 3D Space) Once you have the synthesized 2D RDKit molecule (aa_mol_h) and its corresponding 3D CG scaffold (cg_mol), you can embed the all-atom structure into the simulation box independently using the embed_molecule function.

from domd_xyz.embed_molecule import embed_molecule
import numpy as np

# Define the simulation box boundaries
# the box information can be extracted from the XML or defined based on your system's requirements
# box = np.array([xml.box.lx, xml.box.ly, xml.box.lz]).astype(float) *10  # Extracted from XML, usually in nm, convert to Angstroms
box = np.array([100.0, 100.0, 100.0]) # Angstroms, adjust as needed based on your CG system

# Select a specific chain (e.g., the first molecule in the system)
target_aa_mol = aa_mols_h[0]
target_cg_mol = cg_mols[0]

# Embed the 2D molecule into the 3D CG beads
# 'large' triggers grid-decomposition for massive chains; 'chunk_per_d' defines grid resolution
conf = embed_molecule(target_aa_mol, target_cg_mol, box=box, large=500, chunk_per_d=1)

# Extract the generated 3D coordinates
positions = conf.GetPositions()

# Output: 'positions' is a NumPy array containing the exact 3D coordinates of the embedded atoms.

4. Force Field Assignment

The assignment engine processes the reconstructed all-atom graphs to allocate charges, bonds, angles, and dihedrals.

Wrapped function: ``assign_ff_parameters``

assign_ff_parameters(molecules, strategies, custom_rules=[], parallel=False)
  • molecules (list): The NetworkX graphs returned by the backmapping step.

  • strategies (list[str]): Specifies the parameterization strategy for each molecule. Options include:

    • 'hybrid' (default): A robust method utilizing the BOSS database, GMX template rules, and ML models. The assignment logic follows a strict ML -> GMX -> DB priority sequence. If a parameter is predicted/found in a higher priority method, it safely overwrites the lower one.

    • 'ml': Pure Graph Attention Network (GAT) ML model.

    • 'db': Pure BOSS database lookup.

    • 'tpl': Pure GMX rules.

  • custom_rules (list[str], default: ['none']): Provides fine-grained control over interaction parameterization. You can pass specific rule flags to override the default assignment logic:

    • 'none' (default): Applies the standard strategy logic without custom overrides.

    • 'all': Forces all parameters (both bonded and non-bonded) to be generated by the ML model, regardless of the overarching strategy chosen.

    • 'bonded': Forces only the bonded parameters (bonds, angles, dihedrals) to be generated by the ML model, leaving non-bonded parameters to the standard hierarchy.

  • parallel (bool, default: False): Distributes parameter assignment across CPU cores.

5. System Export

Assembles the global geometry, connectivity, and force field data into formats ready for MD engines.

Wrapped function: ``get_gmx``

get_gmx(molecules, confs, ffs, box, filename='chemfast', crossbk_scale=1.0, crossak_scale=1.0, charge_scale=1.0)
  • molecules, confs, ffs, box: The accumulated outputs from all previous steps (topology graphs, coordinates, parameters, and box dimensions).

  • filename (str, default: 'chemfast'): Output prefix (generates .gro, .top, and .xml).

  • crossbk_scale, crossak_scale, charge_scale (float, default: 1.0): Scaling factors for bonds, angles, and charges that span across different CG beads.

Note

This scaling protocol (typically set to < 1.0) is specifically designed to preserve local stereochemistry (such as chirality and cis/trans isomerism) during Energy Minimization (EM). By temporarily softening these boundary interactions, it prevents the system from experiencing unphysically high energy spikes that could trigger impossible conformational inversions. The charge_scale factor is used with polyelectrolytes, usually charge_scale=0.5-0.8.

Part 2: End-to-End Example (Polyimide)

To demonstrate how these APIs are chained together in a real-world scenario, we walk through the polyimides.py example script. This script utilizes the compiler to build a cross-linked polyimide network.

The Complete Script

Here is the complete code for the end-to-end workflow. You can copy and run this directly if you have the required pre-equilibrated XML file.

from domd_tools import create_cg_system, build_aa_topology, assign_ff_parameters, get_gmx
from misc.logger import logger

logger.setLevel('INFO')

# 1. Chemical Definition
smiA = 'Nc1ccc(Oc2ccc(N)cc2)cc1'
smiB = 'O1C(=O)c2cc(Oc3cc4C(=O)OC(=O)c4cc3)ccc2C1=O'

mols = {
    'A': {'smiles': smiA, 'file': None},
    'B': {'smiles': smiB, 'file': None},
}

reaction_template = {
    'B-A': {
        'cg_reactant_list': [('A', 'B')],
        'smarts': '[#7H2:1].[#6:3](=[#8:4])[#8:2][#6:5]=[#8:6]>>[#6:3](=[#8:4])[#7:1][#6:5]=[#8:6].[#8:2]',
        'prod_idx': [0]
    }
}

# 2. CG Pre-equilibrium (Optional if pre-equilibrated XML is provided)
"""
meta1 = {'type': ['A','B']*10, 'bond': '0-1,1-2...', 'N': 20, 'is_rigid': False, 'is_angle': True}
meta2 = {'type': ['A','B']*5, 'bond': '0-1,1-2...', 'N': 20, 'is_rigid': False, 'is_angle': True}
create_cg_system(mols, reaction_template, [meta1, meta2])
"""

# We use a pre-equilibrated CG configuration for this example
xmlfile = 'CG_pre_eq/final.xml'

# 3. AA Backmapping
confs, mol_graphs, box = build_aa_topology(mols, reaction_template, xmlfile)

# 4. Force Field Assignment
ffs = assign_ff_parameters(mol_graphs, strategies=['ml'] * len(mol_graphs))

# 5. GROMACS Output
get_gmx(mol_graphs, confs, ffs, box, filename='chemfast')

Step-by-Step Breakdown

Let’s break down what is happening at each stage of the script above.

Step 1: Chemical Definition We start by feeding the script the SMILES of our two constituent monomers (A and B). The reaction_template uses SMARTS to tell the compiler exactly how an amine group on monomer A reacts with an anhydride group on monomer B.

Step 2: CG Pre-equilibrium If starting from scratch, we would define topological blueprints (meta1, meta2) and call create_cg_system to generate a random walk box and predicted HSP parameters. To save computational time in this tutorial, we bypass this and directly load a pre-equilibrated trajectory (CG_pre_eq/final.xml).

Step 3: All-Atom Backmapping The build_aa_topology() function parses the final.xml box. It reads the coarse-grained positions, applies our SMARTS reaction rules to link the monomers, and embeds the all-atom geometries into the 3D space, returning the coordinates (confs) and the topology maps (mol_graphs).

Step 4: Force Field Assignment We pass our reconstructed topology graphs to assign_ff_parameters(). By providing the list ['ml'] * len(mol_graphs), we are explicitly instructing the compiler to use the pure Graph Attention Network (GAT) ML model to predict and assign force field parameters for every single molecule in the system.

Step 5: System Export Finally, get_gmx() takes all the gathered data—the connectivity graphs, the 3D coordinates, the assigned force fields, and the box dimensions—and translates them into formats ready for MD engines.

Outputs Generated:

  • chemfast.gro: The equilibrated all-atom coordinates.

  • chemfast.top: The complete system topology (atom types, charges, interactions).

  • out_chemfast.xml: Topology and coordinates formatted for PyGAMD/GALAMOST.