Skip to main content
padlock icon - secure page this page is secure

Crawling Social Media to Create Morphological Resource of Under-Resourced Language: Melanau Language

Buy Article:

$106.51 + tax (Refund Policy)

To build a morphological analyser for under-resourced language, a creation of morphological resource is required. With a limitation of morphological resource in digital format, a digitisation process, which is time-consuming and a tedious task, is used to create the resources. An objective of this work is to develop new steps in creating the morphological resources from social media. The steps comprise of crawling of the blogs and tweets. A limited list of words of the under-resourced language was used to reduce the number of crawled web pages. Then, the crawled pages and tweets were normalised. This step cleaned and transformed the crawled data with informal and noisy nature into a cleaned wordlist for the next process, which is dictionary lookup validation. Lastly, the validation of wordlist was carried out due to languages mixing that caused uncertainty of spelling standard. At this stage, edit distance algorithms, namely, Jaro-Winkler is applied to determine an accuracy of the spelling standard by comparing with the dictionary. The findings suggest that the availability of huge amount of dictionary word entries could improve the accuracy of the poor results. It is recommended that the developed steps can assist other researchers to create validated morphological resources or even language resources for the under-resourced languages.
No Reference information available - sign in for access.
No Citation information available - sign in for access.
No Supplementary Data.
No Article Media
No Metrics

Keywords: Morphological Resource; Social Media; Under-Resourced Language

Document Type: Research Article

Affiliations: 1: Faculty of Computing and Informatics, Multimedia University, Persiaran Multimedia, 63100 Cyberjaya, Selangor, Malaysia 2: Department of Information System, Faculty of Computer Science and Information Technology, Universiti Malaysia Sarawak, 94300 Kota Samarahan, Sarawak, Malaysia

Publication date: November 1, 2017

More about this publication?
  • ADVANCED SCIENCE LETTERS is an international peer-reviewed journal with a very wide-ranging coverage, consolidates research activities in all areas of (1) Physical Sciences, (2) Biological Sciences, (3) Mathematical Sciences, (4) Engineering, (5) Computer and Information Sciences, and (6) Geosciences to publish original short communications, full research papers and timely brief (mini) reviews with authors photo and biography encompassing the basic and applied research and current developments in educational aspects of these scientific areas.
  • Editorial Board
  • Information for Authors
  • Subscribe to this Title
  • Ingenta Connect is not responsible for the content or availability of external websites
  • Access Key
  • Free content
  • Partial Free content
  • New content
  • Open access content
  • Partial Open access content
  • Subscribed content
  • Partial Subscribed content
  • Free trial content
Cookie Policy
Cookie Policy
Ingenta Connect website makes use of cookies so as to keep track of data that you have filled in. I am Happy with this Find out more