Research

NLP classifier · Course project

Fake News Detection

Fine-tuned BERT to 99.5% accuracy on the WELFake fake-vs-real news benchmark.

NLP classifier

Fake News Detection

99.5% · BERT on WELFake

Overview

Group NLP project built around the WELFake dataset (~72K labeled articles). The work compared classical feature pipelines against a fine-tuned BERT classifier for fake-news detection.

My contribution

BERT fine-tuning, preprocessing/lemmatization workflow, and benchmark evaluation against classical embedding baselines.

Problem

Fake news spreads quickly on social platforms, but reliable detection needs models that generalize beyond simple keyword heuristics on noisy article text.

Approach

  • Built a preprocessing pipeline with lemmatization, stopword handling, and stratified train/test splits on WELFake.
  • Compared TF-IDF, Doc2Vec, Word2Vec, and Sentence2Vec embeddings with logistic regression, random forest, XGBoost, and k-NN baselines.
  • Fine-tuned BERT on cleaned article text and evaluated against the classical embedding baselines.

Result

Achieved 99.5% accuracy with fine-tuned BERT on WELFake — outperforming the classical embedding pipelines built earlier in the project.

Stack

PythonBERTscikit-learnGensimNLTKpandas
Team repo (course)