Why Prime Members Stay: A Deep-Dive with XGBoost

Predictive Modeling for Subscription Plan Preferences

Machine Learning Python Decision Trees Data Analysis

Project Overview

This project focused on analyzing Amazon Prime subscriber data to build predictive models that identify factors influencing users' subscription plan choices between Annual and Monthly plans. The insights aim to help optimize subscription offerings, improve customer retention, and enhance overall user satisfaction.

Key Achievement

Developed a Decision Tree model that achieved an F1-score of 58.3%, recall of 60.6%, and accuracy of 54.7% in predicting subscription plan preferences, providing actionable insights for targeted marketing and retention strategies.

Data & Methodology

The analysis utilized Amazon Prime subscriber data containing demographic information, engagement metrics (such as usage frequency and favorite genres), and behavioral patterns (including renewal status and payment methods).

Project Workflow

  1. Data Preprocessing: Cleaned the dataset, handled missing values, and encoded categorical variables
  2. Exploratory Data Analysis: Identified patterns and correlations between features and subscription plan choices
  3. Feature Selection: Determined the most influential variables for predicting plan preferences
  4. Model Development: Trained and evaluated multiple machine learning models
  5. Model Evaluation: Compared performance metrics to select the best model
  6. Insights & Recommendations: Derived actionable strategies based on model findings
Project Workflow Diagram

Workflow diagram showing the end-to-end process from data preprocessing to recommendations

Modeling Approach

I implemented and compared multiple machine learning algorithms to find the most effective approach for predicting subscription plan preferences:

# Model training and evaluation code snippet
from sklearn.model_selection import train_test_split
from sklearn.tree import DecisionTreeClassifier
from sklearn.metrics import classification_report, accuracy_score, f1_score, recall_score

# Split data
X_train, X_test, y_train, y_test = train_test_split(X, y, test_size=0.3, random_state=42)

# Train Decision Tree model
dt_model = DecisionTreeClassifier(random_state=42)
dt_model.fit(X_train, y_train)

# Make predictions
y_pred = dt_model.predict(X_test)

# Evaluate model
accuracy = accuracy_score(y_test, y_pred)
f1 = f1_score(y_test, y_pred)
recall = recall_score(y_test, y_pred)

print(f"Accuracy: {accuracy:.3f}")
print(f"F1-Score: {f1:.3f}")
print(f"Recall: {recall:.3f}")
print("\nClassification Report:\n", classification_report(y_test, y_pred))

Model Performance

After evaluating multiple models, the Decision Tree algorithm emerged as the best performer for this specific prediction task.

Accuracy

54.7%

Overall prediction correctness

F1-Score

58.3%

Balance of precision and recall

Recall

60.6%

True positive rate

Model Performance Comparison

Performance comparison of different modeling approaches

While the accuracy metrics might appear moderate, they represent a significant improvement over baseline predictions in this complex domain where subscription choices are influenced by numerous subtle factors.

Key Findings

Feature Importance Chart

Feature importance chart showing the top factors influencing subscription plan choice

Critical Factors Influencing Plan Choice

The analysis revealed several key factors that significantly impact users' subscription plan preferences:

Subscriber Behavior Patterns

Annual plan subscribers typically showed higher engagement and loyalty, while Monthly plan subscribers exhibited more variable usage patterns and required more customer support assistance.

Recommendations

Based on the model insights, I developed the following actionable recommendations:

  1. Personalized Subscription Plans: Offer customized plans based on users' usage patterns and favorite content genres
  2. Targeted Retention Campaigns: Identify users at risk of churning and target them with specific promotional offers
  3. Optimized Renewal Strategies: Implement timely renewal reminders and incentives for auto-renewal enrollment
  4. Enhanced Customer Support: Improve support services for users showing patterns associated with churn risk

Business Impact

Implementation of these data-driven strategies could significantly improve customer retention rates and optimize the distribution of subscription plans, ultimately enhancing revenue and customer satisfaction.

Conclusion & Future Work

This project successfully identified key factors influencing Amazon Prime users' subscription plan preferences and developed a predictive model to guide business strategies. The insights provide a foundation for more personalized marketing approaches and improved customer retention efforts.

Future work could expand on these findings by:

Tools & Technologies

Python
Programming
Pandas
Data Processing
Scikit-learn
Machine Learning
Matplotlib
Visualization
Seaborn
Visualization
Jupyter
Development