One of the new features in Compound Discoverer 3.1 (as well as Proteome Discoverer 2.4) is the Scripting node. This node allows users to enhance their workflows by adding custom calculations in the form of e.g. R- or Python scripts. Result tables from Compound Discoverer can easily be exported, read and processed by a script and the results can be returned as new columns or new tables into Compound Discoverer. Please find below the manual as well as examples.
BY DOWNLOADING ANY SOFTWARE, SCRIPTS, TEMPLATES, DOCUMENTATION AND/OR OTHER MATERIALS (COLLECTIVELY “MATERIALS”) FROM THIS WEBSITE, YOU AND ANY COMPANY OR INSTITUTION YOU REPRESENT (COLLECTIVELY “YOU”) ACKNOWLEDGE AND AGREE AS FOLLOWS: (1) THE MATERIALS ARE PROVIDED “AS IS” WITHOUT WARRANTY OF ANY KIND, EXPRESS OR IMPLIED, AND (2) THERMO FISHER SCIENTIFIC INC., ITS AFFILIATES AND EMPLOYEES WILL NOT BE RESPONSIBLE FOR ANY DAMAGES ARISING FROM YOUR USE OF THE MATERIALS, INCLUDING BUT NOT LIMITED TO DAMAGES ASSOCIATED WITH LOSS OR CORRUPTION OF DATA, INACCURATE RESULTS, AND/OR DIMINISHED INSTRUMENT PERFORMANCE.
1.) Hello World (sort of, in R). A very simple example that demonstrates how to export a table (Compounds), read the Molecular Weight column, add a value (+H) and return the result as a new column to CD.
This example also shows how to register a node as a "standalone" scripting node, so that it appears in the list of the nodes like a "regular" node. Multiple scripting nodes can be installed that way and the usage can be simplified for the end user by hiding the underlying scripting mechanism.
2.) CheckMark (R). An example how to use Options for Columns, here PositionAfter and SpecialCellRenderer. This script creates a column with green check marks for all compounds with molecular weight < 400. The new column will be positioned right after the Molecular Weight column in the Compounds table.
3.) Calculate mass difference for Features (Python). This example script in Python calculate the mass difference between Molecular Weight and m/z * charge in the Features table. The result is returned as a new column.
4.) Add m/z column to the Compounds table (R). This script creates a column with m/z values for each compound, positioned right after the Molecular Weight column in the Compounds table. To populate the "m/z" column it currently performs the following procedure. For each compound, it finds the sample with highest abundance for this compound, looks for a molecular ion, uses ion with highest abundance if a molecular ion was not detected. This can of course be customized to select representative m/z values in a different way.
5.) Add m/z and Delta Mass [ppm] columns to the Expected Compounds table (R). This script creates a column with m/z values for each expected compound, based on features detected in the sample with highest abundance for this compound. It also copies the delta mass in ppm for this feature to the Expected Compounds table.
6.) Calculate O/C and H/C ratio (R). This script calculates the O/C and the H/C ratio for each compound and returns the result as two new columns to the Compounds table.
7.) Show KEGG IDs in the Compounds table (R). This script creates a new column "KEGG ID (top mzCloud hit)" in the Compounds table. It looks up the KEGG ID of the top mzCloud hit for each compound in the mzCloud Results table and copies it into the new column.
8.) Calculate logP values (R). This script creates a new column "logP" in the Compounds table. It uses OpenBabel to calculate the logP value for the currently assigned structure for each compound.
9.) Export to BioCyc (R). Compound Discoverer includes the capability to map compounds to BioCyc pathways and visualize pathways with omics data overlay directly in CD. However there is much more you can do with BioCyc, like the Cellular Overview with interactive omics data overlay (screenshot below), the Omics Dashboard, a pathway collage etc.
This script exports Compound Discoverer results to BioCyc using the BioCyc "Omics data file" format for import into BioCyc. Result files (.csv) will be written into the same folder where the cdResult file is located. A separate CSV file is generated for each BioCyc database (e.g. HUMAN, MOUSE, ECOLI, etc.) that was selected during the mapping process in CD. The result table is checked for duplicates (duplicated BioCyc IDs). It then removes all duplicates but the one with the highest overall peak area.