ff_assess

Background

ff_assess is the model assessment subprogram of Fast-Forward. Once you have generated a model using ff_inter, and performed a test simulation using your new coarse grained model, ff_assess can be used to validate how faithful your model is to the pseudo-atomistic reference is was generated from. In short, ff_assess is designed to compare the reference distributions used to generate the new coarse grained topology to what was actually simulated.

Options

-f TRAJFILE           simulated trajectory file (default: None)
-s TPRFILE            simulated tpr file (default: None)
-i [ITP_FILES ...]    itp file (default: None)
-d REFERENCE          Path to directory with reference distributions (default: None)
-plots [PLOTS]        Make plots comparing distributions. Optionally provide a path (default: current dir) (default:
                  None)
-outliers             exclude outliers from overall score (default: False)
-plot-data            save data for making plots as single pickle file (default: False)
-score-weight HELLINGER_WEIGHT
                  weight of the Hellinger distance in the distance score (default: 0.7)
-include-constraints  fully include constrained distances in the distance score score calculation (default: False)
-dists                Save text files with time series and distribution data for interactions (default: False)

Example

The simulated trajectory of a new model of a molecule could be assessed like so:

ff_assess -f vis.xtc -s vis.tpr -i molecule.itp -d /path/to/reference/files/ -plots

Note

For a single molecule, it is usually sensible to extract the trajectory of only the molecule beforehand, and correct it for pbc artifacts using Gromacs. For example:

gmx trjconv -f trajectory.xtc -s topology.tpr -pbc mol -center -o vis.xtc
gmx trjconv -f trajectory.xtc -s topology.tpr -pbc mol -center -o vis.gro -e 0

The vis.gro can also be used to generate the topology of the solvent-free system using gmx grompp and an appropriate topology file

In the ff_assess command given above, the topology and trajectory are now a “real” simulated trajectory of some molecule in the process of being parameterised.

In addition to the new topology and trajectory, ff_assess also requires the model itp used, and the files generated by ff_inter containing the distribution information, indicated by -i and -d respectively. The interactions annotated in the itp will be assessed against their reference distributions, and a warning is issued if the relevant files cannot be found.

What is the subprogram actually doing?

ff_assess works in three steps:

  1. Find interactions annotated in the input itp file

  2. Generate distributions of the annotated interaction as were actually simulated in the input trajectory

  3. Perform statistical tests comparing the simulated and reference distributions. Plot comparisons and generate reports.

Principally, ff_assess is calculating a score using the Hellinger distance between the two distributions. For two discrete probability distributions, P and Q, the distance is defined by:

\[H(P,Q) = \frac{1}{\sqrt{2}}\sqrt{\sum_{i=1}^k (\sqrt{p_i} - \sqrt{q_i} )^2}\]

H(P,Q) is then a score in the interval [0,1], with 0 indicating a perfect match and 1 indicating no overlap between the two distributions. Fast forward also calculates a modified value using an additional term to ensure agreement in the mean, as well as overall similarity between distributions. The extent of this contribution is controlled by the score-weight flag in ff_assess. By default, the native Hellinger distance is weighted 70% with a 30% contribution from the correction.

Common pitfalls in the use of ff_assess

Pitfall 1: Poor scores from large bead sizes

Consider the case where you have a small or tiny bead in between two large beads. The large beads may have an equilibrium angle at atomistic resolution which is impossible to access at CG resolution. In this case, the distributions may have little to no overlap because it is simply not possible to overcome the force generated by the nonbonded potential.

Pitfall 2: Use of constraints

If constraints are used in a molecule, distance scores may be negatively affected. The constrained beads now sit at a single distance throughout the simulation, which will poorly match any more broad distance distribution observed in a pseudo-CG one.

Pitfall 3: Misuse of dihedrals

Fast-forward makes the generation of many dihedral potentials possible and easy across your molecule. While this may help lower the interaction score (by improving a match to pseudo-CG distributions), it may have little-to-no effect on the distance score. In such a case the additional dihedrals should not be used.