Some Statistical Properties and Zipf's Law in Korean Text Corpus

Author: Choi S.-W.

Source: Journal of Quantitative Linguistics, Volume 7, Number 1, April 2000 , pp. 19-30(12)

Publisher: Routledge, part of the Taylor & Francis Group

Key:
Free Content - Free Content
New Content - New Content
Subscribed Content - Subscribed Content
Free Trial Content - Free Trial Content

Abstract:

Some statistical characteristics of Korean texts are analyzed by experiments on large corpora. We obtain the number of occurrences of syllables and of words in Korean texts. The entropy of syllables is estimated using finite context model. Digram and trigram entropy of syllables are also estimated. The entropy of words is estimated using the same model. We try to examine how Korean text obeys the well-known Zipf's law. Two mathematical models are constructed by modifying Mandelbrot distribution and are simulated for Korean texts. The coefficient B in Mandelbrot distribution is determined for our models by experiment. We compare Zipf's law in Korean text with that in English and in French. According to Mandelbrot, the coefficient B is B > 1 in all the usual cases, however, we obtain B < 1 in some range of the rank-frequency distribution of Korean text. We also checked that the coefficient B does not depend on the kind and on the size of corpus but on the language.

Document Type: Research article

DOI: 10.1076/0929-6174(200004)07:01;1-3;FT019

The full text electronic article is available for purchase. You will be able to download the full text electronic article after payment.

$38.49 plus tax      Refund Policy

 

OR

Back to top

Key:
Free Content - Free Content
New Content - New Content
Subscribed Content - Subscribed Content
Free Trial Content - Free Trial Content
Share this item with others: These icons link to social bookmarking sites where readers can share and discover new web pages.
Page Help Click here for Page Help
Shopping cart
Tools
Sign in






Need to register?
Sign up here
Text size: A | A | A | A