Palladian Nodes for KNIME (trusted extension)

Palladian is a Java-based toolkit which provides functionality to perform typical Internet Information Retrieval tasks. It provides a collection of algorithms for text processing focused on classification, extraction of various types of information, and retrieval.

The nodes are intended to integrate with existing KNIME Nodes, such as the KNIME Textprocessing and the KNIME XML-Processing nodes.

The growing collection of Palladian KNIME nodes provide the possibility to use Palladian’s capabilities directly within KNIME, to complement and extend existing workflows, or to allow for quick prototyping without having to write any code. The current version features the following nodes:

Installation instructions for the nodes can be found here: http://tech.knime.org/community

More information about the Palladian toolkit is available here: http://palladian.ws/

If you have any questions, comments, or problems, we are happy to hear from you: mail@palladian.ws

License

The Palladian extension is released under the Palladian Free Software License Version 2.1.

The Palladian KNIME Nodes were created by Philipp Katz, Klemens Muthmann, David Urbansky; 2011 – 2016.

There’s even more — check out the Selenium Nodes!

For advanced web scraping, task automatization and web application testing, also check out the Selenium Nodes, which allow you to control your browser from KNIME.

Version history

Note: The given date shows when the mentioned modification was added to the nightly build. The stable versions usually follow later.

  • [2016-01-15] Added FreeGeoIP node
  • [2016-01-04] Added RankingService for Hacker News
  • [2016-01-04] Missing value handling in RankingServices node
  • [2015-12-20] Added new column-based distance calculation node
  • [2015-12-09] Fixed proxy issue in HttpRetriever node
  • [2015-12-01] Removed obsolete RankingServices: Friendfeed Stats, Friendfeed Aggregated Stats, Twitter
  • [2015-12-01] Removed obsolete WebSearchers: WebKnox News
  • [2015-11-01] UrlDomainExtractor node to extract domain from URLs, optionally without subdomains
  • [2015-09-17] Replaced "accept self-signed certificates" by "accept all certificates" option in HttpRetriever
  • [2015-06-22] HttpRetriever also accepts StringValues as HTTP entity, HttpRetriever allows to specify an arbitrary content type.
  • [2015-06-22] Additional preprocessing options for TextClassifierLearner node: stemming, stop word removal for German and English language
  • [2015-06-01] Setting for HttpRetriever to allow self-signed SSL certificates
  • [2015-05-31] Remove temporary debugging code in HtmlParser, which was causing exception with invalid encoding string
  • [2015-05-28] HtmlParser node additionally accepts binary object cells as input
  • [2015-05-27] Improve missing value handling in FeedDiscovery node
  • [2015-05-23] Cookie support for new HttpRetriever node (optional input and output tables)
  • [2015-05-23] Ability to specify HTTP methods in new HttpRetriever node by input column
  • [2015-05-23] HttpResultDataExtractor node optionally creates a binary instead of a string cell
  • [2015-05-23] New HttpRetrieverNode can send binary data, which can be specified through an optional input column
  • [2015-05-23] Added FormEncodedHttpEntityCreator node to convert key-value data to form-encoded input for HttpRetriever
  • [2015-05-23] Possibility to input HTTP headers in HttpRetriever requests.
  • [2015-05-23] Mark old HttpRetriever node as deprecated
  • [2015-05-01] Change default file extension for text classifier models from '.gz' to 'palladianDictionaryModel', ability to drop models and create appropriate TextClassifierModelReader dialog
  • [2015-04-30] Stop training via TextClassifierLearner when memory is getting full (using KNIME's MemoryWarningSystem)
  • [2015-04-20] Fix guessing of category column in TextClassifierLearner node
  • [2015-04-17] Better handling of missing values in HttpRetriever, return IntCell instead of LongCell for HTTP status codes
  • [2015-04-16] Better handling of missing values in ContentExtractor and HtmlParser nodes
  • [2015-04-10] Added MapQuest cell renderer
  • [2015-04-07] Added MapQuestGeocoder node
  • [2015-03-28] Added GoogleAddressGeocoder node
  • [2015-03-26] Added Jaro–Winkler string distance measure
  • [2015-02-24] Renamed CoordinateParser to LatitudeLongitudeToCoordinate node
  • [2015-02-24] Added CoordinateToLatitudeLongitude node
  • [2015-02-24] Renamed CoordinateParser to LatitudeLongitudeToCoordinate.
  • [2015-02-24] Added ReverseLocationLookupNode
  • [2015-02-23] Adding HttpResultToStringNode
  • [2014-11-20] Output warning in HtmlParser when processing http URLs (should use HttpRetriever)
  • [2014-11-10] Use explicitly given encoding in HtmlParser node when processing HttpResults
  • [2014-08-28] Possibility for weighted inputs for TextClassifierLearner node
  • [2014-08-27] Additional preprocessing options for TextClassifierLearner node: case sensitivity, border padding
  • [2014-08-05] Offer additional languages in WebSearcher node
  • [2014-08-04] Add SocialMentionSearcher to WebSearcher node
  • [2014-08-04] Mark RMSE node as deprectated
  • [2014-07-10] Provide accuracy values in ThresholdAnalyzer node
  • [2014-07-10] Changed pruning capabilities to updated Palladian functionality
  • [2014-06-17] Automatically trim spaces when entering API keys in preferences
  • [2014-05-14] WebSearcher node append column with tags
  • [2014-05-14] Provide paging for Twitter searcher in WebSearcher node
  • [2014-05-03] Ability to switch scoring algorithms in TextClassifierPredictor node (expert mode)
  • [2014-04-30] InformationGain node
  • [2014-04-25] Fixing shifted month in DateParserNode (+ adding test)
  • [2014-04-23] TextClassifierModelToTable outputs second table with category priors
  • [2014-03-14] Setting for maximum number of terms for TextClassifierLearner
  • [2014-03-10] TextClassifierModelToTable provides the term counts as column
  • [2014-02-27] Cutoff irrelevant parts of graph in ThresholdAnalyzer node (values on the left, which are no different from their successors)
  • [2014-02-25] Greatly reduce memory consumption when training with TextClassifierLearner node
  • [2014-02-25] Fix NullPointerException in ThresholdAnalyzer node
  • [2014-02-21] Try to auto-select positive class column in ThresholdAnalyzer node
  • [2014-02-20] Give statistics about text classifier dictionary on output port's tooltip
  • [2014-02-20] TextClassifierModelToTable node to write a Palladian text classifier dictionary to a KNIME table.
  • [2014-01-23] Output warning to log, in case a deprecated searcher is used.
  • [2014-01-23] WebSearcher allows to append column with total number of results available for a given query (in case the specific searcher provides this information)
  • [2014-01-18] FeedParser now allows input of XML documents
  • [2013-12-26] DateExtractor now optionally appends a column with the parse pattern used for extracting a specific date.
  • [2013-12-26] DateExtractor now handles date/time precision correctly (e.g. only extract date without time in case it is appropriate)
  • [2013-12-26] WebSearcher node adds a column providing GeoCoordinate values (in case this information is provided by the actual search engine; currently, YouTube, Twitter, Instagram, Flickr, Panoramio provide coordinates for some results)
  • [2013-12-25] Provide additional short version rendering for GeoCoordinate values (beside full precision and DMS)
  • [2013-12-25] Log output from the Palladian library is now piped to KNIME's integrated node logger.