Sportswear ESG News Classifier
Multi-label text classification for sportswear brand ESG news
Overview
This project implements an end-to-end machine learning pipeline for classifying news articles about sportswear brands into Environmental, Social, Governance (ESG), and Digital Transformation categories. The system monitors 50+ global sportswear brands including Nike, Adidas, Puma, Lululemon, and Patagonia.
ESG Categories
The classifier identifies four main categories with ternary sentiment (positive/neutral/negative):
| Category | Description | Examples |
|---|---|---|
| Environmental | Climate, emissions, sustainability | Carbon neutrality commitments, recycling programs |
| Social | Labor, diversity, community | Worker rights, DEI initiatives, community programs |
| Governance | Ethics, transparency, leadership | Board changes, ethical sourcing, transparency reports |
| Digital Transformation | Technology, innovation | Digital retail, supply chain tech, AI adoption |
Live News Feed
View the classified ESG news articles in real-time:
The feed includes 382 articles with interactive filtering by brand and ESG category.
Technical Architecture
The pipeline consists of six integrated phases:
Data Collection
Automated collection from NewsData.io and GDELT APIs, with intelligent scraping and language detection.
LLM Labeling
Claude Sonnet classifies articles into ESG categories with evidence extraction and sentiment analysis.
ML Pre-filters
Random Forest (FP) and Logistic Regression (EP) classifiers reduce API costs by 40%.
MLOps
MLflow tracking, Evidently drift monitoring, and automated retraining pipeline.
Deployment
Docker containers on Google Cloud Run with CI/CD via GitHub Actions.
Technology Stack
- Data Collection: Python, PostgreSQL + pgvector, NewsData.io, GDELT
- ML Pipeline: scikit-learn, sentence-transformers, spaCy
- LLM Integration: Claude Sonnet (Anthropic), OpenAI embeddings
- MLOps: MLflow, Evidently AI, Docker, GitHub Actions
- Deployment: FastAPI, Google Cloud Run
Model Performance
Model: Random Forest + Sentence Transformers
Test F2: 0.974
Recall: 98.8%
Filters non-sportswear brand mentions (e.g., "Puma" the animal)
Model: Logistic Regression + TF-IDF/LSA
Test F2: 0.931
Recall: 100%
Identifies ESG content before detailed classification