Type I modular polyketide synthases have a unique modular structure in which the product of each module, and therefore each megasynthase, is determined by the catalytic domains that comprise each module. This modular nature allows us to predict polyketide intermediates and final products using only the sequence of catalytic domains in the cognate PKS. This unusual property has fueled significant research efforts to use engineered PKSs combinatorial biosynthesis.
Due to the complicated nature of protein-protein interactions, one heuristic that is commonly used in PKS engineering is to seek to design chimeric PKSs that are as close to a naturally occuring PKS as possible. Using this guiding principle, we propose the following paradigm for designing a chimeric PKS capable of producing a small molecule compound of interest:
If the target is a natural product analog, a starting point does not need to be identified, and ClusterCAD can be applied to simply select donor catalytic domains required to effect the desired structural changes in the final polyketide product. While the goal of ClusterCAD is to identify potential parent PKS starting points and donor catalytic domains, it will likely prove important to consider additional factors when designing a chimeric PKSs. For example, modules from well-characterized clusters, particularly modules that have previously been determined to be well-expressed in the host organism of choice, are particularly attractive choices for engineering. We therefore emphasize that ClusterCAD is intended to augment, rather than supercede, the expert domain knowledge of the experienced PKS researcher.
ClusterCAD is based on the Minimum Information about a Biosynthetic Gene cluster (MIBiG) database. In order to construct the database entries for ClusterCAD, we first identified the MIBiG entries that were annotated as type I modular PKS clusters. Annotations for these clusters were generated using the antibiotics and Secondary Metabolite Analysis SHell (antiSMASH) software. The resulting output was parsed using a whitelist of recognized catalytic domains in order to truncate analysis of each cluster at a subunit containing a non-ribosomal peptide synthethase (NRPS) or another unusual catalytic domain that is otherwise not supported by ClusterCAD. Domain annotations, which include predictions for acyltransferase (AT) domain substrate specificity and ketoreductase (KR) domain stereochemical outcome, were then used to generate predictions of the polyketide intermediates expected to be produced by each module in the PKS cluster.
In order to validate the intermediate and final structure predictions, the predicted final structure was compared against the known final structure. SMILES structures for known file products were taken from the MIBiG database, or were identified using the ChemAxon Naming tool using the text description of the final structure from MIBiG. Finally, additional structures were obtained by a literature search and manually incorporated into ClusterCAD.
A comparison between the predicted in known final structures was used to manually curate each ClusterCAD entry to perform the following corrections:
The entry for each cluster contains links to the corresponding MiBiG database and NCBI Nucleotide database entries. Buttons to display annotations of AT substrate specificity and KR stereochemical outcome are also provided. Clicking on the final product or polyketide intermediate chemical structures will display SMILES representations of these structures. Further, clicking on the name of the module will provide links to the NCBI Protein database entry for that module, the nucleotide and amino acid sequences for the module, and precomputed secondary structure and relative solvent accessibility annotations.
Note that AT substrates with an "_ACP" suffix in the name represent ACP linked substrates, whereas those without a suffix represent CoA linked substrates.
The structure search tool was designed to enable the identification of a truncated PKS to use as a starting point for PKS engineering, and takes as input a small molecule chemical structure in the form of a SMILES string or a structure that is drawn in an interactive GUI. Matches to the query structure are ranked using AP (atom pair) descriptors and the Tanimoto coefficient similarity metric.
The sequence search tool was designed to enable researchers to select donor catalytic domains for domain exchange experiments. The tool was designed to enabled flexible queries, allowing researchers to test hypotheses regarding which domain-domain interactions may be important in facilitating successful domain exchanges. The sequence search tool takes as input a valid amino acid sequence, and performs a Blast search against a Blast database containing all of the subunits in ClusterCAD.
ClusterCAD: a computational platform for type I modular polyketide synthase design. Eng, C.H.*, Backman, T.W.H.*, Bailey, C.B., Magnan, C., Martin, H.G., Katz, L., Baldi, P., Keasling, J.D. Nucleic Acids Research, 2017 Oct. https://doi.org/10.1093/nar/gkx893 *co-first authors