Skip to main content
padlock icon - secure page this page is secure

Summarization and Classification of CNN.com Articles using the TF*IDF Family of Metrics

Buy Article:

$17.00 + tax (Refund Policy)

TF*IDF (term frequency times inverse document frequency) is a common metric used to automatically discover keywords in documents for use in classification and other text processing applications. We are interested in determining whether these measures can help in determining the most relevant sentences for summarization and classification purposes. However, there are many ways to define TF*IDF, and to date no attempt to relatively—and systematically—gauge the value of these different forms has been performed. We investigate a comprehensive family of 112 TF*IDF measures (corresponding to an a priori estimate of 20 degrees of freedom among these measures) applied to 3000 CNN articles belonging in 12 classes such as Business, Sport, and World. The assumption is that at least some sets of these measures must be effective for document summarization and classification. The goal is to identify the summaries provided by TF*IDF measures that best match human generated summaries as well as find effective TF*IDF definitions for classification purposes.
No Reference information available - sign in for access.
No Citation information available - sign in for access.
No Supplementary Data.
No Article Media
No Metrics

Document Type: Research Article

Publication date: April 1, 2016

More about this publication?
  • The IS&T (digital) Archiving Conference offers a unique opportunity for imaging scientists and those working in the cultural heritage community (curators, archivists, librarians, photographers etc) from around the world to come together to discuss the most pressing issues related to the digital preservation and stewardship of hardcopy, and other cultural heritage documents and objects. Authors come from museums, archives, libraries, government institutions, industry and academia. Cutting edge topics related to multispectral and 3D imaging, as well as best practices for workflow, sharing, standards, and asset/collection management and dissemination are explored in papers presented at this annual, international event.

    Please note: For purposes of its Digital Library content, IS&T defines Open Access as papers that will be downloadable in their entirety for free in pertuity. Copyright restrictions on papers vary; see individual paper for details.

  • Editorial Board
  • Information for Authors
  • Submit a Paper
  • Subscribe to this Title
  • Membership Information
  • Terms & Conditions
  • Author guidelines
  • IS&T publication guidelines
  • IS&T publication policy
  • Ingenta Connect is not responsible for the content or availability of external websites
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content
Cookie Policy
X
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more