We present here nodes and workflows used for processing next generation sequencing results. Most nodes presented here are not necessarily specific to NGS data, but might be useful in other circumstances as well.
With regards to the workflows presented here, they are coming with sample data and are pre-executed. They are not only showing how to use a specific node (this is meant to be described in the node help), but rather solve a specific NGS related problem.
Last but not least, there is no garantee/promise or whatsoever associated with any of the information here. We are very happy to discuss anything described on these pages and also welcome contributions from other KNIME NGS users/developers.
Kind regards,
Bernd (baj)
About the Nodes
Name of the node | Description |
---|---|
IO | |
BEDGraphWriter | Writes out BED files. |
Bio Sequence Reader | Reads in sequence information (RNA, DNA, Protein) from FASTA, GENBANK, UNIPROT, EMBL, INSDseq file formats. No annotion is being read. |
FQCReader | Reads in file with multiple tables. Works together with FQCRow2Table, Table2FQCrow, and FQCWriter |
FQCWriter | Writes multiple tables to file. Works together with FQCRow2Table, Table2FQCrow, and FQCReader |
FastQReader | Reads in FastQ file into table. One FASTQ entry (i.e. 4 lines) are translated into one row. This node is using BioJava |
FastQWriter | Writes out FastQ file into a file. This node is using BioJava. |
GenbankAnnotReader | Reads in just the annotation information from a genbank file. See example workflow below on how to convert genbank to GFF. |
ROIReader | Reads regions of interest (ROIs) formated files. See example workflow and node annotation for further information |
SAMReader | Reads Sam or Bam files. |
ROI | |
PositionStr2Position (to be deprecated) | Part of the ROI concept. |
RegionOverlap | Identifies regions that overlap. This node is usually used within a sub-workflow that divides the data set per chromosome. The first input node is being retained. |
TOOLS | |
Bash | Executes commands in bash or cmd.exe (see inline documentation) |
CmdwInput | Similar to the bash node only that it takes the input table and executes strings within that table. |
CollectionLinePlot | Way of showing SVG graphs in a table view. Uses numerical collections. |
CountSorted | Counts occurrences within a sorted column. It is faster than the ValueCounter and useful for counting reads from a FASTQ file as they are already sorted. It also uses minimum amount of memory |
FQCRow2Table | Each line coming from the FQCReader represents a table. This node converts such a table in a KNIME table. Works together with FQCWriter, Table2FQCrow, and FQCReader. |
GetSequenceName | Get the name of a sequence object as a string |
IGVview | Enable link to IGV through table view |
JoinSorted | Creates a full outer join of two sorted tables. |
NGSConcat | Concat 2 tables with identical table specs |
Table2FQCrow | Converts a whole table into a FQCrow to be written out using FQCWriter. Works together with FQCWriter, FQCWriter, and FQCReader. |
TableSpecs | Retrieves simple stats for table and columns(n) included are column type, index, lower and upper bound (table 1) number of rows and columns (table2). This is very similar to the KNIME nodes "Extract Table Dimension" and "Extract Table Spec" |
Wait | Does nothing other than synchronising executions. This can also be done using the Variable Ports of existing nodes |
DEPRECATED | |
GetRegions (deprecated) | The concept of ROIs is now implemented in seqan (http://www.seqan.de/projects/ngs-roi/) |
Seq2PosIncidents (deprecated) | Part of the ROI concept. |
OneString (deprecated) | Superceeded by KNIME node (TableCreator) |
PileupCounts (deprecated) | Part of the ROI concept. |
GroupByLoopStart (deprecated) | Superseded by KINIME version... |
AdapterRemoval (deprecated) | the algorithm is now implemented in seqan. (seqan.de, https://projets.pasteur.fr/projects/pf2workflows/repository/show/jagla/apps/clean_ngs) |
AdapterRemovalAdv (deprecated) | See above... |
Workflows
Name of the workflow | Description |
---|---|
FastQ-stats | Descriptive statistics of Illumina results in fastq format. (usually before mapping) |
Genbank-GFF conversion | Example workflow showing how to convert genbank files into GFF format. |
Workflows showing the use of the nodes
Name of the workflow | Description |
---|---|
FASTQReader | Simply one node with data from NCBI/SRA (SRR001356, Illumina sequencing of Mouse brain transcript fragment library) |
FASTQWriter | Simple workflow that reads in a FastQ file, then reduces the sequence and quality string to the first position and writes out the result. |
Count sorted | Simple workflow that reads in a FastQ file, sorts the data by the sequence and then applies both the value counter and the "countSorted" nodes, as well as sorts by the counts. |
GetRegions | Simple workflow that uses SAMReader, Seq2PosIncidents, CountSorted, PositionStr2Position, and GetRegions. |
RegionOverlapp | Intersect annoation from UCSC database with regions of interest |
Bash example | Execute something (ls) on the command line |
CmdwInput example | Execute something (ls) on the command line |
Roi workflow | workflow showing some of the features for the ROI (region of interest) concept in the context of next generation sequencing projects. |
Advanced ROI workflow | shows an advance workflow that shows how to work with multiple miRNAs experiments. It shows how to display multiple samples. |
Source Code
The source code can be accessed at https://anonymous:knime@community.knime.org/svn/nodes4knime/trunk/org.pasteur.
License
The NGS nodes are released under GPLv2.