BENTRCIA, Rahima (2017) Fouille de Texte et Analyse pour Extraction/Découverte de la Connaissance du Saint Coran. Doctoral thesis, Université de Batna 2.
|
Text
RAHIMA BENTRCIA.pdf Download (1MB) | Preview |
Abstract
There is an immense need to information systems that rely on Arabic Quranic text to present a precise and comprehensive knowledge about Quran to the world. This motivates us to conduct our research work which uses Quran as a corpus and exploits text mining techniques to perform three different tasks: extracting semantic relations that exist between words linked by AND conjunction, analyzing the order of the words of that conjunctive phrase, and finally measuring the similarity between Quran chapters based on lexical and statistical measures. Since semantic relations are a vital component in any ontology and many applications in Natural Language Processing strongly depend on them, this motivates the development of the first part in our thesis to extract semantic relations from the holy Quran, written in Arabic script, and enrich the automatic construction of Quran ontology. We focus on semantic relations resulting from proposed conjunctive patterns which include two words linked by the conjunctive AND. These words can be nouns, proper nouns, or adjectives. The strength of each relation is measured based on the correlation coefficient value between the two linked words. Finally, we measure the significance of this method through hypothesis testing and Student t-test. Moreover, some aspects of semantic relations that may exist between words are inspired from patterns of word co-occurrences. Hence, statistics performed on these patterns are very useful to provide further information about such relations. This fact induces conducting the second part in our research, which is an analytical study, on one type of these patterns called the AND conjunctive phrases, that exist in the holy Quran. First, we propose a set of AND conjunctive patterns in order to extract the conjunctive phrases from the Quranic Arabic Corpus which we convert to Arabic script. Then, we analyze the order of the two words that form the conjunctive phrase. We report three different cases: words that have occurred in a specific order in the conjunctive phrase and repeated only once in the Quran, words that have occurred in a specific order in the conjunctive phrase and repeated many times in the Quran, and words that have occurred in different positions in the conjunctive phrase and repeated one or many time(s) in the holy Quran. Finally, we show that different word orders in the conjunctive phrase yield different contextual meanings as well as different values of association relationship between the linked words. Similarity Measure between documents is a very important task in information retrieval. However, a crucial issue is the selection of an efficient similarity measure which improves time and performance of such systems.. In the last part of our thesis, we present a lexical approach to extracting similar words and phrases from Arabic texts, represented by Quran chapters (Surah). Furthermore, we measure the similarity value between these chapters using three different statistical metrics: cosine, Jaccard, and correlation distances.
Item Type: | Thesis (Doctoral) |
---|---|
Subjects: | Informatique |
Divisions: | Faculté des mathématiques et de l'informatique > Département d'informatique |
Date Deposited: | 24 Oct 2017 09:41 |
Last Modified: | 24 Oct 2017 09:41 |
URI: | http://eprints.univ-batna2.dz/id/eprint/1488 |
Actions (login required)
View Item |