Sentiment analysis is a key application of natural language processing (NLP) in the e-commerce sector, particularly for understanding customer feedback. This study benchmarks the performance of five classical machine learning algorithms—Naive Bayes, Support Vector Machine (SVM), Logistic Regression (LR), Decision Tree (DT), and Random Forest (RF)—on a dataset of Amazon product reviews. Using both CountVectorizer and TF-IDF for feature extraction, we evaluate models based on accuracy and F1-score over multiple runs to account for variance. Results indicate that Logistic Regression achieves the highest performance (F1: 0.847, Accuracy: 0.843), while Decision Tree performs the least effectively. We further analyze overfitting, bias-variance trade-offs, and the explainability of each model’s behavior.