Skip to main content
padlock icon - secure page this page is secure


Buy Article:

$60.00 + tax (Refund Policy)

Citations in documents contain important information about the sources that authors cite and their importance and impact. Therefore, automatic identification of citations from documents is an important task. Citations included in rabbinic literature are more difficult to identify and to extract than citations in scientific papers written in English for various reasons. The aim of this novel research is to automatically identify undated citations included a unique data set: rabbinic documents written in Hebrew-Aramaic. We formulate four feature sets: orthographic, quantitative, stopword-based, and n-gram-based. Different experiments on all combinations of these feature sets using six common machine learning methods and Infogain have been performed. A combination of all four feature sets using logistic regression achieves an accuracy of 91.98%, which is an improvement of 16.53% compared to a baseline result.
No Reference information available - sign in for access.
No Citation information available - sign in for access.
No Supplementary Data.
No Article Media
No Metrics

Keywords: Hebrew-Aramaic documents; citation identification; knowledge discovery; machine learning methods; undated documents

Document Type: Research Article

Affiliations: 1: Department of Computer Science, Jerusalem College of Technology, Jerusalem, Israel 2: Department of Computer Science, Bar-Ilan University, Ramat-Gan, Israel

Publication date: March 1, 2011

More about this publication?
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content
Cookie Policy
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more