Classification Example Workflow
The workflow starts with a list of documents, which have been downloaded from PubMed and parsed beforehand and saved as data table. The data is available as drop file in the corresponding drop directories.
The documents are assigned to two categories and are split, based on the category assignments, into two sets. The first set consists of documents about human and aids, the second set consists of documents about mouse and cancer.
After transformation into a bag of words, the data is preprocessed by various filters. Then the most important keywords are extracted and, based on these keywords, the documents are transformed into document vectors.
The document vectors are a numerical representation of documents and are in the following used for classification via a decision tree, support vector machine and k nearest neighbor classifier.
This workflow requires the Text Processing plugin.
Download workflow
The decision tree view shows the terms with the highest information gain, i.e.: IL-18, tumor, mice, etc. 