Skip to main content

Open Access Visual Interactive Creation and Validation of Text Clustering Workflows to Explore Document Collections

Download Article:
 Download
(PDF 1,194.3 kb)
 
The exploration of text document collections is a complex and cumbersome task. Clustering techniques can help to group documents based on their content for the generation of overviews. However, the underlying clustering workflows comprising preprocessing, feature selection, clustering algorithm selection and parameterization offer several degrees of freedom. Since no "best" clustering workflow exists, users have to evaluate clustering results based on the data and analysis tasks at hand. In our approach, we present an interactive system for the creation and validation of text clustering workflows with the goal to explore document collections. The system allows users to control every step of the text clustering workflow. First, users are supported in the feature selection process via feature selection metrics-based feature ranking and linguistic filtering (e.g., part-of-speech filtering). Second, users can choose between different clustering methods and their parameterizations. Third, the clustering results can be explored based on the cluster content (documents and relevant feature terms), and cluster quality measures. Fourth, the results of different clusterings can be compared, and frequent document subsets in clusters can be identified. We validate the usefulness of the system with a usage scenario describing how users can explore document collections in a visual and interactive way.
No References for this article.
No Supplementary Data.
No Article Media
No Metrics

Keywords: CLUSTERING; TEXT ANALYSIS; VISUAL ANALYTICS

Document Type: Research Article

Publication date: 29 January 2017

More about this publication?
  • For more than 30 years, the Electronic Imaging Symposium has been serving those in the broad community - from academia and industry - who work on imaging science and digital technologies. The breadth of the Symposium covers the entire imaging science ecosystem, from capture (sensors, camera) through image processing (image quality, color and appearance) to how we and our surrogate machines see and interpret images. Applications covered include augmented reality, autonomous vehicles, machine vision, data analysis, digital and mobile photography, security, virtual reality, and human vision. IS&T began sole sponsorship of the meeting in 2016. All papers presented at EIs 20+ conferences are open access.

    Please note: For purposes of its Digital Library content, IS&T defines Open Access as papers that will be downloadable in their entirety for free in perpetuity. Copyright restrictions on papers vary; see individual paper for details.

  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content
Cookie Policy
X
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more