NLP classifier · Course project
Fake News Detection
Fine-tuned BERT to 99.5% accuracy on the WELFake fake-vs-real news benchmark.
NLP classifier
Fake News Detection
99.5% · BERT on WELFake
Overview
Group NLP project built around the WELFake dataset (~72K labeled articles). The work compared classical feature pipelines against a fine-tuned BERT classifier for fake-news detection.
My contribution
BERT fine-tuning, preprocessing/lemmatization workflow, and benchmark evaluation against classical embedding baselines.
Problem
Fake news spreads quickly on social platforms, but reliable detection needs models that generalize beyond simple keyword heuristics on noisy article text.
Approach
- Built a preprocessing pipeline with lemmatization, stopword handling, and stratified train/test splits on WELFake.
- Compared TF-IDF, Doc2Vec, Word2Vec, and Sentence2Vec embeddings with logistic regression, random forest, XGBoost, and k-NN baselines.
- Fine-tuned BERT on cleaned article text and evaluated against the classical embedding baselines.
Result
Achieved 99.5% accuracy with fine-tuned BERT on WELFake — outperforming the classical embedding pipelines built earlier in the project.