Gene and Protein Tag Cloud Example

Tag Cloud Example Workflow

The workflow starts with a list of documents, which have been downloaded from PubMed and parsed beforehand and saved as data table. The data is available as drop file in the corresponding drop directories.

The documents are assigned to two categories and are split, based on the category assignments, into two sets. The first set consists of documents about human and aids, the second set consists of documents about mouse and cancer.

Part of speech tags as well as gene names are recognized and assigned by the corresponding tagger (POS tagger and Abner tagger), in order to assign a color based on a tag type later on.

After transformation into a bag of words, the data is preprocessed by various filters. Than the assigned tags are extracted for each term and transformed into strings (for coloring purposes). Afterwards the term frequencies are computed and a color is assigned to each term, based on the assigned tag. Finally the Tag Cloud is used to visualize the terms.

This workflow requires the Text Processing plugin.

 Download workflow Text Processing tag cloud example workflow

Tag Cloud of the most frequent terms. Words are colored red if they represent genes or proteins otherwise gray.