Type I modular polyketide synthases and nonribosomal peptide synthetases have a unique modular structure in which the product of each module, and therefore each megasynthase, is determined by the catalytic domains that comprise each module. This modular nature allows us to predict polyketide intermediates and final products using only the sequence of catalytic domains in the cognate PKS. This unusual property has fueled significant research efforts to use engineered PKSs and NRPSs in combinatorial biosynthesis.
Due to the complicated nature of protein-protein interactions, one heuristic that is commonly used in PKS/NRPS engineering is to seek to design chimeric PKS/NRPSs that are as close to a naturally occuring PKS/NRPS as possible. Using this guiding principle, we propose the following paradigm for designing a chimeric PKS/NRPS capable of producing a small molecule compound of interest:
If the target is a natural product analog, a starting point does not need to be identified, and ClusterCAD can be applied to simply select donor catalytic domains required to effect the desired structural changes in the final polyketide or nonribosomal peptide product. While the goal of ClusterCAD is to identify potential parent PKS/NRPS starting points and donor catalytic domains, it will likely prove important to consider additional factors when designing chimeric PKS/NRPSs. For example, modules from well-characterized clusters, particularly modules that have previously been determined to be well-expressed in the host organism of choice, are particularly attractive choices for engineering. We therefore emphasize that ClusterCAD is intended to augment, rather than supercede, the expert domain knowledge of the experienced PKS or NRPS researcher.
ClusterCAD is based on the Minimum Information about a Biosynthetic Gene cluster (MIBiG) database. In order to construct the database entries for ClusterCAD, we first identified the MIBiG entries that were annotated as type I modular PKS or nonribosomal peptide clusters. Annotations for these clusters were generated using the antibiotics and Secondary Metabolite Analysis SHell (antiSMASH) software. The resulting output was parsed using a whitelist of recognized catalytic domains in order to refine analysis of each cluster based on supported PKS/NRPS catalytic domains. Domain annotations, which include predictions for acyltransferase (AT) and adenylation (A) domain substrate specificity and ketoreductase (KR) domain stereochemical outcome, were then used to generate predictions of the polyketide or nonribosomal peptide intermediates expected to be produced by each module in the cluster.
In order to validate the intermediate and final structure predictions, the predicted final structure was compared against the known final structure. SMILES structures for known file products were taken from the MIBiG database, or were identified using the ChemAxon Naming tool using the text description of the final structure from MIBiG. Finally, additional structures were obtained by a literature search and manually incorporated into ClusterCAD.
A comparison between the predicted and known final structures was used to manually curate each ClusterCAD entry to perform the following corrections:
The entry for each cluster contains links to the corresponding MiBiG database and NCBI Nucleotide database entries, as well as an indication for whether a cluster has been manually reviewed for consistency with experimental evidence. Cluster entries may also provide Cluster Notes, where curation notes and/or relevant publications and references may be viewed. Buttons to display annotations of AT or A substrate specificity and KR stereochemical outcome are also provided. Clicking on the final product or polyketide/peptide intermediate chemical structures will display SMILES representations of these structures. Further, clicking on the name of the module will provide links to the NCBI Protein database entry for that module, the nucleotide and amino acid sequences for the module, and precomputed secondary structure and relative solvent accessibility annotations if available.
Note that AT substrates with an "_ACP" suffix in the name represent ACP linked substrates, whereas those without a suffix represent CoA linked substrates.
The structure search tool was designed to enable the identification of a truncated PKS/NRPS to use as a starting point for PKS/NRPS engineering, and takes as input a small molecule chemical structure in the form of a SMILES string or a structure that is drawn in an interactive GUI. Matches to the query structure are ranked using AP (atom pair) descriptors and the Tanimoto coefficient similarity metric.
The sequence search tool was designed to enable researchers to select donor catalytic domains for domain exchange experiments. The tool was designed to enabled flexible queries, allowing researchers to test hypotheses regarding which domain-domain interactions may be important in facilitating successful domain exchanges. The sequence search tool takes as input a valid amino acid sequence, and performs a Blast search against a Blast database containing all of the subunits in ClusterCAD.
ClusterCAD 2.0: an updated computational platform for chimeric type I polyketide synthase and nonribosomal peptide synthetase design. Tao, X.B., LaFrance, S., Xing, Y., Nava, A.A., Martin, H.G., Keasling, J.D., Backman, T.W.H. Nucleic Acids Research, 2022 Nov. https://doi.org/10.1093/nar/gkac1075
ClusterCAD: a computational platform for type I modular polyketide synthase design. Eng, C.H.*, Backman, T.W.H.*, Bailey, C.B., Magnan, C., Martin, H.G., Katz, L., Baldi, P., Keasling, J.D. Nucleic Acids Research, 2017 Oct. https://doi.org/10.1093/nar/gkx893 *co-first authors