User login

suggestion: Advanced regular expression node

4 posts / 0 new
Last post
azhdari
Offline
Joined: 09/08/2010
suggestion: Advanced regular expression node

regular expressions are powerful way for cleaning & preparing data specialy text and web data. It would be excellent if we have an advanced regular expression node in KNIME.

I use a commercial RegEx tool (PowerGREP) for my task, but It's based on GREP (http://en.wikipedia.org/wiki/Grep) that is fast , powerful and publicly available.

regular expressions can be used for searching, filtering, replacing, merging, splitting data with complex patterns .

thor
thor's picture
Offline
Joined: 02/12/2007

You can already use regular expression in the Row Filter node as well as in the String Manipulation node. If this is not enough the Java Snippet node is the swiss army knife.

azhdari
Offline
Joined: 09/08/2010

hi,thanks for replay.

those are very basic solutions but I need for example to find 40000 RegEx patterns and replace them in my data (about 300k rows).

Row Filter can find one RegEx pattern each time and String Replace (Dictionary) can find strings not RegEx pattern.

thor
thor's picture
Offline
Joined: 02/12/2007

40k regular expression sounds like a lot... You could use two loops, one over the regexes, the other over the data, but I suspect that this will be very slow. I'll check if a dictionary replacer with regular expressions is of general use.