Sportswear ESG News Classifier

Multi-label text classification for sportswear brand ESG news

Overview

This project implements an end-to-end machine learning pipeline for classifying news articles about sportswear brands into Environmental, Social, Governance (ESG), and Digital Transformation categories. The system monitors 50+ global sportswear brands including Nike, Adidas, Puma, Lululemon, and Patagonia.

ESG Categories

The classifier identifies four main categories with ternary sentiment (positive/neutral/negative):

Category Description Examples
Environmental Climate, emissions, sustainability Carbon neutrality commitments, recycling programs
Social Labor, diversity, community Worker rights, DEI initiatives, community programs
Governance Ethics, transparency, leadership Board changes, ethical sourcing, transparency reports
Digital Transformation Technology, innovation Digital retail, supply chain tech, AI adoption

Live News Feed

View the classified ESG news articles in real-time:

Browse ESG News Feed

The feed includes 382 articles with interactive filtering by brand and ESG category.

Technical Architecture

The pipeline consists of six integrated phases:

Data Collection

Automated collection from NewsData.io and GDELT APIs, with intelligent scraping and language detection.

LLM Labeling

Claude Sonnet classifies articles into ESG categories with evidence extraction and sentiment analysis.

ML Pre-filters

Random Forest (FP) and Logistic Regression (EP) classifiers reduce API costs by 40%.

MLOps

MLflow tracking, Evidently drift monitoring, and automated retraining pipeline.

Deployment

Docker containers on Google Cloud Run with CI/CD via GitHub Actions.

Technology Stack

  • Data Collection: Python, PostgreSQL + pgvector, NewsData.io, GDELT
  • ML Pipeline: scikit-learn, sentence-transformers, spaCy
  • LLM Integration: Claude Sonnet (Anthropic), OpenAI embeddings
  • MLOps: MLflow, Evidently AI, Docker, GitHub Actions
  • Deployment: FastAPI, Google Cloud Run

Model Performance

False Positive Classifier

Model: Random Forest + Sentence Transformers

Test F2: 0.974

Recall: 98.8%

Filters non-sportswear brand mentions (e.g., "Puma" the animal)

ESG Pre-filter Classifier

Model: Logistic Regression + TF-IDF/LSA

Test F2: 0.931

Recall: 100%

Identifies ESG content before detailed classification

Source Code

View on GitHub