Optimizing News Categorization with Machine Learning: A Comprehensive Study Using Naive Bayes (MultinomialNB) Classifier

Ahmed Mansoori, Khalaf Tahat, Dina Naser Tahat, Mohammad Habes, Said A. Salloum

Research output: Chapter in Book/Report/Conference proceedingChapter

3 Citations (Scopus)

Abstract

The rapid growth of online news content necessitates efficient automated categorization systems to manage and organize vast amounts of information. This study addresses the gap in effective news article classification by utilizing a Naive Bayes (MultinomialNB) classifier. We leverage the “News Aggregator” dataset from the “UCI Machine Learning Repository”, consisting of over 400,000 news articles categorized into business, science and technology, entertainment, and health. Our preprocessing steps include handling missing values, text normalization, and “term frequency-inverse document frequency (TF-IDF)” vectorization. The trained Naive Bayes model achieved an overall accuracy of 89.6%, with high precision and recall particularly in the ‘Entertainment’ category. Charts like the confusion matrix, ROC curve, and learning curve offer a detailed assessment of how well the model performs. These results highlight the Naive Bayes classifier’s effectiveness in news categorization and suggest potential areas for further improvement, particularly in distinguishing ‘Science and Technology’ and ‘Health’ articles. This study demonstrates the practical application of machine learning in organizing news content, with implications for enhancing automated news categorization systems.

Original languageEnglish
Title of host publicationStudies in Big Data
PublisherSpringer Science and Business Media Deutschland GmbH
Pages169-178
Number of pages10
DOIs
Publication statusPublished - 2025

Publication series

NameStudies in Big Data
Volume158
ISSN (Print)2197-6503
ISSN (Electronic)2197-6511

Keywords

  • Automated news systems
  • Data preprocessing
  • Machine learning
  • MultinomialNB
  • Naive Bayes
  • Natural language processing
  • News categorization
  • Text classification
  • TF-IDF
  • UCI machine learning repository

ASJC Scopus subject areas

  • Control and Systems Engineering
  • Engineering (miscellaneous)
  • Computer Science Applications
  • Artificial Intelligence

Fingerprint

Dive into the research topics of 'Optimizing News Categorization with Machine Learning: A Comprehensive Study Using Naive Bayes (MultinomialNB) Classifier'. Together they form a unique fingerprint.

Cite this