Impact Feature Vectorization Methods on Arabic Large Data Using Logistic Regression Classification

Ali Shafah; Ahmed Suleiman; Samira Alshafah

PDF

Published: Dec 24, 2023

Keywords:

Keywords: Arabic Text Classification, Large data, Big data, Feature Vectorization, TF-IDF, BoW, N-gram

Ali Shafah

Data analysis department, Faculty of Economics, University of Zawia, Zawia, Libya

Ahmed Suleiman

Computer department, Faculty of Education, University of Zawia, Zawia, Libya

Samira Alshafah

Computer department, Faculty of Education, University of Zawia, Zawia, Libya

Abstract

The process of assigning text documents to a predetermined set of categories is known as text categorization. The objective of this study is to present experimental assessments of various feature vectorization methods for the purpose of categorizing a large Arabic corpus using a logistic regression classifier. N-Gram, Bag of Words, and Term Frequency–Inverse Document Frequency are these methods. A corpus of around 111,000 Arabic documents was utilized, which was split up into five categories: news, sports, culture, economics, and varied. Each method's experimental findings were assessed using three different performance indicators. According to the experimental findings, the Logistic Regression model using Term Frequency–Inverse Document Frequency and N-gram (1,2) had the best accuracy, scoring 96%, while Bag of Words came in second with 95%.

How to Cite

Shafah, A., Suleiman, A., & Alshafah, S. (2023). Impact Feature Vectorization Methods on Arabic Large Data Using Logistic Regression Classification. University of Zawia Journal of Engineering Sciences and Technology, 1(1). Retrieved from https://journals.zu.edu.ly/index.php/UZJEST/article/view/49

Issue

Vol. 1 No. 1 (2023): University of Zawia Journal of Engineering Sciences and Technology

Section

Information Technology

ISSN

ISSN: 3006-2764

Partners

DOI: https://doi.org/10.26629/uzjest

Account_creation

Important Notice for Authors: Account Creation

Please be advised that online registration is currently suspended due to technical reasons.

However, those who wish to submit a manuscript to the journal may contact us via email for creating an account and to attach the Publication Request Form:

Email: jet@zu.edu.ly Click here to download the Form

Please use WhatsApp only for initial inquiries or follow‑ups (no attachments)

WhatsApp: +218 91 367 1983

Information

Make a Submission

Indexing Partners

INDEXED IN & PARTNERS

Our Announcements

Call for Reviewers

We invite you to contribute your expertise as a peer reviewer for UZJES, a leading publication in Engineering Sciences and Technology. By joining our panel of reviewers, you will play a vital role in advancing scholarly research by ensuring the quality, rigor, and impact of published work.

Send your CV to jet@zu.edu.ly

Call for Papers

We are pleased to announce that the University of Zawia Journal of Engineering Sciences and Technology is transitioning to a Continuous Article Publication (CAP) model. Effective immediately, accepted papers will be published online as soon as they are ready, rather than waiting for the completion of an entire issue.

Publication Schedule:

The journal publishes one volume per calendar year, consisting of two issues: Issue 1 (January–June) and Issue 2 (July–December).

Plagiarism

Plagiarism Checker

All submissions

✔

Must Not Exceed 25%

Impact Feature Vectorization Methods on Arabic Large Data Using Logistic Regression Classification

Abstract

Similar Articles

Contact Us

Quick Links

Journal Information

Article Sidebar

Main Article Content

Abstract

Article Details

Similar Articles