Summarizing Arabic Text with AI
System Accurately Summarizes Information Written in Arabic and Could Have Great Impact in Many Sectors, Including Government
The amount of information we consume from multiple sources is growing like never before. To handle this massive amount of information, people are turning to computers to provide summarizations that can be quickly read. Advanced algorithms and machine learning techniques have recently been developed by major software companies that successfully paraphrase lengthy English texts better than anything previously available. However, work on algorithms capable of summarizing Arabic text has not been progressing as quickly, until now.
A group of researchers from Khalifa University鈥檚 Emirates ICT Innovation Center (EBTIC) and the Khalifa University College of Engineering have developed artificial intelligence (AI) algorithms that can automatically summarize long Arabic texts to produce coherent briefs. Their system overcomes some of the major challenges to automating Arabic summarization, which stem from the complicated nature of the Arabic language.
鈥淭he Arabic language tends to have a high degree of ambiguity due to its structure,鈥 explained Ahmad Al-Rubaie, EBTIC鈥檚 Head of Research, Operations and Strategy. 鈥淚n addition, the presence of Arabic colloquial and classical variations, marks for pronunciation above and below letters that are often omitted when writing but change word meanings, and differences between formal written classical Arabic and its modern counterpart complicate matters further. Moreover, summarization evaluation standards for Arabic summarization have not reached the maturity of the English language, although this is starting to change, but it remains that different Arabic summarization systems use different evaluation methods.鈥
To tackle the issues of automatic Arabic summarization using AI, EBTIC proposed and implemented a unique system that combines the advantages of current state-of-the-art summarization research with methods specifically developed for the Arabic language鈥檚 complex structure.
The system was developed by Lamees Al Qassem as part of her MSc by Research in Engineering. Al Qassem was supervised by Dr. Hassan Barada, Associate Dean of the College of Engineering, Dr. Di Wang, EBTIC Senior Researcher, Ahmad Al-Rubaie and Dr. Nawaf Almoosa, Director of EBTIC and Assistant Professor of Electrical Engineering and Computer Science. She is now pursuing her PhD in Engineering at KU.

Developed as a complete end-to-end solution, the system was designed with a back-end component that collected newspaper articles from various UAE based newspapers and online news outlets, which it then archived in order to produce summaries for each article. The output was served to users through a mobile application developed on the Android mobile operating system. Summaries of relevant stories were provided to users based on their profiles. A limited trial was conducted at KU to test and improve the system followed by demonstrations at various showcases and events where EBTIC was involved. The most recent of these events was EBTIC鈥檚 10th Anniversary celebration in April 2019. The system remains operational and there are current plans to further develop it for use by EBTIC partner.
The EBTIC Arabic text summarizer leverages Natural Language Processing (NLP), the branch of AI that helps computers understand, interpret and produce written human language. It works by first running the Arabic text through an algorithm designed by Al Qassim that first detects and extracts nouns, as nouns are representative of the key information contained in sentences. The extracted nouns, as well as a number of other features selected through research and experimentation, are fed into a 鈥淔uzzy Logic鈥 engine, a type of scoring system that determines the degree to which sentences are important and thus should be included in the final summary. Fuzzy Logic is well suited for determining how important a sentence is in an article or text.
鈥淚n traditional binary logic the importance of a sentence in an article or text can be either important or not important. In Fuzzy Logic, the level of importance can be an infinite range of importance values, enabling the representation of vague concepts. For example, in Fuzzy Logic, a sentence can be very important, slightly important, not that important, or not important at all,鈥 Dr. Di Wang said.
The team followed a rigorous evaluation criterion when researching and developing the summarization system, which has already garnered significant interest from EBTIC鈥檚 partners. A paper describing the system titled 鈥淎utomatic Arabic Text Summarization Based on Fuzzy Logic鈥 won Best Paper Award at the 2019 UAE Graduate Student Research Competition (GSRC). An extended version of the paper has been accepted for publication at the International Conference on Natural Language Processing and Speech Recognition 2019 to be held at Trento University in Italy. Three other papers have already been published.
鈥淭he proposed method and its components were tested and compared to existing state-of-the-art systems, at both the component level and system level,鈥 Dr. Barada explained. 鈥淔or example, our noun extraction algorithm was evaluated against the Stanford morphological analyser to ensure it matches or outperforms the current state-of-the-art. The team also looked at the various Arabic summarization engines available and evaluated their system against the others using the same Arabic texts and evaluation method where available and possible.鈥
鈥淭he research and development process was demanding, but the result was that we outperformed most, if not all, existing methods that have been implemented,鈥 Al-Rubaie said.
Dr. Almoosa added, 鈥淭he team is currently in discussion with a UAE government agency to adapt the system to be able to summarize information that is more demanding, which would require further enhancements and a greater understanding of the underlying meanings in sentences.鈥
Erica Solomon
Senior Editor
5 September 2019