There were some discussion about this at the KNIME UGM last week and then Simon posted the following request to the RDKit, Indigo, and CDK forums:
With the RDKit, Indigo, and CDK chemistry nodes become more used and expanded, and often used together within a workflow, are there any plans to allow the Molecule to CDK node to accept Indigo and RDKit molecules to save on the number of translater nodes required.
I think this would be a great benefit to KNIME users and could (perhaps) reduce the amount of duplication of work currently going on among the open-source node packages.
An outline of how this could work, as something to poke holes in (apologies in advance if my rough outline doesn't feel like Java... that's the Python/C++ programmer showing):
We create a type KnimeMolCell that supports getSmilesValue() and getSdfValue() as well as some new methods: hasSmilesValue(), hasSdfValue(), hasCustomValue(), getCustomValue() and setCustomValue() [probably need to refine the names a bit]. The custom value methods are used to return specialized molecule types like RDKit, Indigo, CDK, Maestro, etc. They each take an argument defining which custom type is of interest.
Nodes that uses these cells can rely on them having at least a SMILES or SDF value to work with. They can then check to see if there is a more specialized format avaiable and, if it's not there, add it. For example an RDKit node would start by checking if there is an RDKit value available, if so it would use that value. If not, it would create one based on the SMILES or SDF data and store it on the cell.
There would obviously, be a strong constraint on nodes working with these types: modifying the SMILES or SDF once the node is created would not, under any circumstances, be allowed.
This would have the advantage that RDKit, Indigo, and CDK nodes could be chained without converter nodes in the middle, and without having the nodes having to repeatedly re-process the molecules.
comments? thoughts?
-greg