Wrapped functions
The domd_tools module serves as the high-level “Chemical Compiler” interface for ChemFAST. It encapsulates the complex S-CGFG algorithms, Force Field assignment engines, and I/O exporters into four primary stages: Coarse-Graining, Backmapping, Parameterization, and Exporting.
The main APIs for each stage are:
Coarse-Grained System Generation:
create_cg_systemCG Parameter Prediction:
get_cg_ffAll-Atom Backmapping:
build_aa_topologyForce Field Assignment:
assign_ff_parametersSystem Export:
get_gmx
This document first explains the theoretical logic and API usage for each stage, followed by a complete end-to-end example.
Part 1: The Compilation Pipeline & wrapped functions
1. Chemical Definition Logic
Before utilizing the API, the system’s chemical composition and reaction logic must be defined using standard human-readable descriptors:
SMILES: Defines the atomic connectivity and structure of individual monomers.
SMARTS: Defines the reaction rules for polymerization or crosslinking. Correct atom map indices within SMARTS (e.g.,
[#7H2:1]) are crucial for defining regio-specificity (e.g., ensuring Head-to-Tail connectivity).
2. Coarse-Grained System Generation
To generate a relaxed, pre-equilibrated topology, ChemFAST maps the SMILES/SMARTS into a Coarse-Grained (CG) representation. The force field parameters for these CG beads are derived automatically using the Hansen Solubility Parameter (HSP) predictor.
Wrapped function: ``create_cg_system``
create_cg_system(molecules, reaction_template, cg_config, rc=1.0, rho=0.85, boxl=None, program='hoomd', filename='chemfast_cg')
molecules (dict): Dictionary defining monomers (SMILES).
reaction_template (dict): SMARTS rules defining how monomers connect.
cg_config (list[dict]): The structural blueprint (e.g., number of chains, topology types, rigidity).
rc (float, default:
1.0): Cutoff radius for initial geometric embedding.rho (float, default:
0.85): Target number density. ChemFAST calculates the box size automatically based on this density and the total number of generated beads.boxl (float, default:
None): Explicit box length. If provided, overridesrho.program (str, default:
'hoomd'): Target engine for the CG output XML.filename (str, default:
'chemfast_cg'): Generates output files likeout_chemfast_cg.xml.
Note
Standalone Parameter Prediction:
If you only want to predict the CG force field parameters without building a full simulation box, you can bypass create_cg_system and use the HSP predictor directly. This is extremely useful for rapidly deriving parameters for novel monomers.
Example: Call get_cg_ff(molecules, reaction_template, iterations=3) from domd_cgbuilder.cg_ff to extract bonded and non-bonded parameters.
import json
from domd_cgbuilder.cg_ff import get_cg_ff
def predict_cg_params(molecules, reaction_template, iterations=3):
graphs, prod_mols, stats, all_bonded, pairs, rigid_meta = get_cg_ff(molecules,reaction_template,iterations=iterations, create_epsilon=True)
# Extract and format parameters
nonBondPara = {k:stats[k] for k in stats if '-' not in k}
BondPara = {}
AngPara = {}
for k in stats:
if '-' in k:
if len(k.split('-')) == 2:
BondPara[k] = stats[k]
if len(k.split('-')) == 3:
AngPara[k] = stats[k]
for k in pairs:
stats[str(k)] = pairs[k]
# Export to JSON
json_str = json.dumps(stats, indent=4)
with open('cg_parameters.txt', 'w', encoding='utf-8') as f:
f.write(json_str)
return
# Execute prediction
mols = {
'A': {'smiles': 'Nc1ccc(Oc2ccc(N)cc2)cc1', 'file': None},
'B': {'smiles': 'O1C(=O)c2cc(Oc3cc4C(=O)OC(=O)c4cc3)ccc2C1=O', 'file': None},
}
reaction_template = {
'B-A': {
'cg_reactant_list': [('A', 'B')],
'smarts': '[#7H2:1].[#6:3](=[#8:4])[#8:2][#6:5]=[#8:6]>>[#6:3](=[#8:4])[#7:1][#6:5]=[#8:6].[#8:2]',
'prod_idx': [0]
}
}
predict_cg_params(mols, reaction_template)
# Outputs 'cg_parameters.txt' containing the HSP-derived bonded and non-bonded parameters.
3. All-Atom Backmapping (CG to AA)
This stage reconstructs the all-atom details from an equilibrated CG trajectory using geometric embedding algorithms and chemical SMARTS rules.
Wrapped function: ``build_aa_topology``
build_aa_topology(mols, reaction_template, xml, reactions=[], large=500, chunks_per_d=1, parallel=False)
mols & reaction_template: The atomic structures and reaction rules that will replace the CG beads.
xml (str): Path to the relaxed CG configuration file (e.g.,
final.xml).reactions (list, default:
[]): Specific reaction tuples. If left empty, ChemFAST automatically infers connectivity directly from the bond section of the XML.large (int, default:
500): Atom count threshold. If a polymer network exceeds this number of atoms, ChemFAST switches to a grid-based embedding strategy to ensure stability and avoid spatial overlaps.chunks_per_d (int, default:
1): Grid subdivision parameter for embedding large molecules.parallel (bool, default:
False): If set toTrue, utilizes multiprocessing to generate AA coordinates in parallel. Highly recommended for systems with thousands of chains.
Note
Advanced Usage: Standalone Topology Building and Backmapping
Under the hood, the build_aa_topology function executes a two-phase pipeline: first, it logically constructs the full all-atom connectivity (Topology Building), and second, it folds these structures into the 3D space defined by the CG beads (Backmapping).
If you require granular control—for instance, if you only want to generate the 2D polymer structures without 3D coordinates, or if you want to custom-embed a single specific polymer chain for testing—you can decouple these steps and call the underlying modules directly.
Phase 1: Standalone Topology Building (Constructing the 2D Graph)
You can use the Reactor class to chemically link the monomers based on the CG connectivity matrix, bypassing any 3D embedding. This is highly useful for generating massive libraries of polymer SMILES or topological graphs.
from domd_topology.reactor import Reactor
from misc.cg_system import read_cg_topology
from misc.io.xml_reader import XmlParser
from domd_topology.functions import set_molecule_id_for_h
from rdkit import Chem
from misc.logger import logger
logger.setLevel('INFO')
reaction_template = ...
mols = ...
# 1. Parse the CG simulation output
xml = XmlParser('final.xml')
cg_sys, cg_mols = read_cg_topology(xml, mols)
print(cg_mols[0].edges(data=True))
# 2. Extract bond connections to act as reaction triggers
reactions = [(bond[0], bond[1], bond[2]) for bond in xml.data['bond']]
# 3. Initialize the Reactor and synthesize the 2D all-atom structures
reactor = Reactor(mols, reaction_template)
aa_mols, meta = reactor.process(cg_mols, reactions)
# 4. Standardize the RDKit molecules (add hydrogens for full valency)
[Chem.SanitizeMol(_) for _ in aa_mols]
aa_mols_h = [Chem.AddHs(m) for m in aa_mols]
[set_molecule_id_for_h(m) for m in aa_mols_h]
# Output: aa_mols_h now contains the fully connected 2D polymer structures!
Phase 2: Standalone Backmapping (Embedding into 3D Space)
Once you have the synthesized 2D RDKit molecule (aa_mol_h) and its corresponding 3D CG scaffold (cg_mol), you can embed the all-atom structure into the simulation box independently using the embed_molecule function.
from domd_xyz.embed_molecule import embed_molecule
import numpy as np
# Define the simulation box boundaries
# the box information can be extracted from the XML or defined based on your system's requirements
# box = np.array([xml.box.lx, xml.box.ly, xml.box.lz]).astype(float) *10 # Extracted from XML, usually in nm, convert to Angstroms
box = np.array([100.0, 100.0, 100.0]) # Angstroms, adjust as needed based on your CG system
# Select a specific chain (e.g., the first molecule in the system)
target_aa_mol = aa_mols_h[0]
target_cg_mol = cg_mols[0]
# Embed the 2D molecule into the 3D CG beads
# 'large' triggers grid-decomposition for massive chains; 'chunk_per_d' defines grid resolution
conf = embed_molecule(target_aa_mol, target_cg_mol, box=box, large=500, chunk_per_d=1)
# Extract the generated 3D coordinates
positions = conf.GetPositions()
# Output: 'positions' is a NumPy array containing the exact 3D coordinates of the embedded atoms.
4. Force Field Assignment
The assignment engine processes the reconstructed all-atom graphs to allocate charges, bonds, angles, and dihedrals.
Wrapped function: ``assign_ff_parameters``
assign_ff_parameters(molecules, strategies, custom_rules=[], parallel=False)
molecules (list): The NetworkX graphs returned by the backmapping step.
strategies (list[str]): Specifies the parameterization strategy for each molecule. Options include:
'hybrid'(default): A robust method utilizing the BOSS database, GMX template rules, and ML models. The assignment logic follows a strict ML -> GMX -> DB priority sequence. If a parameter is predicted/found in a higher priority method, it safely overwrites the lower one.'ml': Pure Graph Attention Network (GAT) ML model.'db': Pure BOSS database lookup.'tpl': Pure GMX rules.
custom_rules (list[str], default:
['none']): Provides fine-grained control over interaction parameterization. You can pass specific rule flags to override the default assignment logic:'none'(default): Applies the standard strategy logic without custom overrides.'all': Forces all parameters (both bonded and non-bonded) to be generated by the ML model, regardless of the overarching strategy chosen.'bonded': Forces only the bonded parameters (bonds, angles, dihedrals) to be generated by the ML model, leaving non-bonded parameters to the standard hierarchy.
parallel (bool, default:
False): Distributes parameter assignment across CPU cores.
5. System Export
Assembles the global geometry, connectivity, and force field data into formats ready for MD engines.
Wrapped function: ``get_gmx``
get_gmx(molecules, confs, ffs, box, filename='chemfast', crossbk_scale=1.0, crossak_scale=1.0, charge_scale=1.0)
molecules, confs, ffs, box: The accumulated outputs from all previous steps (topology graphs, coordinates, parameters, and box dimensions).
filename (str, default:
'chemfast'): Output prefix (generates.gro,.top, and.xml).crossbk_scale, crossak_scale, charge_scale (float, default:
1.0): Scaling factors for bonds, angles, and charges that span across different CG beads.
Note
This scaling protocol (typically set to < 1.0) is specifically designed to preserve local stereochemistry (such as chirality and cis/trans isomerism) during Energy Minimization (EM). By temporarily softening these boundary interactions, it prevents the system from experiencing unphysically high energy spikes that could trigger impossible conformational inversions. The charge_scale factor is used with polyelectrolytes, usually charge_scale=0.5-0.8.
Part 2: End-to-End Example (Polyimide)
To demonstrate how these APIs are chained together in a real-world scenario, we walk through the polyimides.py example script. This script utilizes the compiler to build a cross-linked polyimide network.
The Complete Script
Here is the complete code for the end-to-end workflow. You can copy and run this directly if you have the required pre-equilibrated XML file.
from domd_tools import create_cg_system, build_aa_topology, assign_ff_parameters, get_gmx
from misc.logger import logger
logger.setLevel('INFO')
# 1. Chemical Definition
smiA = 'Nc1ccc(Oc2ccc(N)cc2)cc1'
smiB = 'O1C(=O)c2cc(Oc3cc4C(=O)OC(=O)c4cc3)ccc2C1=O'
mols = {
'A': {'smiles': smiA, 'file': None},
'B': {'smiles': smiB, 'file': None},
}
reaction_template = {
'B-A': {
'cg_reactant_list': [('A', 'B')],
'smarts': '[#7H2:1].[#6:3](=[#8:4])[#8:2][#6:5]=[#8:6]>>[#6:3](=[#8:4])[#7:1][#6:5]=[#8:6].[#8:2]',
'prod_idx': [0]
}
}
# 2. CG Pre-equilibrium (Optional if pre-equilibrated XML is provided)
"""
meta1 = {'type': ['A','B']*10, 'bond': '0-1,1-2...', 'N': 20, 'is_rigid': False, 'is_angle': True}
meta2 = {'type': ['A','B']*5, 'bond': '0-1,1-2...', 'N': 20, 'is_rigid': False, 'is_angle': True}
create_cg_system(mols, reaction_template, [meta1, meta2])
"""
# We use a pre-equilibrated CG configuration for this example
xmlfile = 'CG_pre_eq/final.xml'
# 3. AA Backmapping
confs, mol_graphs, box = build_aa_topology(mols, reaction_template, xmlfile)
# 4. Force Field Assignment
ffs = assign_ff_parameters(mol_graphs, strategies=['ml'] * len(mol_graphs))
# 5. GROMACS Output
get_gmx(mol_graphs, confs, ffs, box, filename='chemfast')
Step-by-Step Breakdown
Let’s break down what is happening at each stage of the script above.
Step 1: Chemical Definition
We start by feeding the script the SMILES of our two constituent monomers (A and B). The reaction_template uses SMARTS to tell the compiler exactly how an amine group on monomer A reacts with an anhydride group on monomer B.
Step 2: CG Pre-equilibrium
If starting from scratch, we would define topological blueprints (meta1, meta2) and call create_cg_system to generate a random walk box and predicted HSP parameters. To save computational time in this tutorial, we bypass this and directly load a pre-equilibrated trajectory (CG_pre_eq/final.xml).
Step 3: All-Atom Backmapping
The build_aa_topology() function parses the final.xml box. It reads the coarse-grained positions, applies our SMARTS reaction rules to link the monomers, and embeds the all-atom geometries into the 3D space, returning the coordinates (confs) and the topology maps (mol_graphs).
Step 4: Force Field Assignment
We pass our reconstructed topology graphs to assign_ff_parameters(). By providing the list ['ml'] * len(mol_graphs), we are explicitly instructing the compiler to use the pure Graph Attention Network (GAT) ML model to predict and assign force field parameters for every single molecule in the system.
Step 5: System Export
Finally, get_gmx() takes all the gathered data—the connectivity graphs, the 3D coordinates, the assigned force fields, and the box dimensions—and translates them into formats ready for MD engines.
Outputs Generated:
chemfast.gro: The equilibrated all-atom coordinates.chemfast.top: The complete system topology (atom types, charges, interactions).out_chemfast.xml: Topology and coordinates formatted for PyGAMD/GALAMOST.