Skip to content

MSPO Public API Scraper

Overview

Scraper for Malaysian Sustainable Palm Oil (MSPO) certification public data - a comprehensive database of certified palm oil entities, smallholders, and supply chain actors.

Authentication

None required. Public API.

Source

Property Value
Base URL https://api.mspots.org.my/api
Portal https://emspo.org.my/public
Format JSON
Update Frequency Daily
Data Provider MPOCC (Malaysian Palm Oil Certification Council)

Available Public Endpoints

Smallholders

Endpoint Type Records Description
spoc/get-spoc-list-public Bulk ~306k Independent smallholders (SPOC)
groupmanager/list-public Bulk ~2.3k Group Manager smallholders
osh/all-smallholder Bulk ~47.5k Organised smallholders (FELDA, etc)

Reference Data

Endpoint Type Records Description
references/states Reference 25 Malaysian states
references/countries Reference ~200 Countries
spoc/zones Reference 11 MSPO zones
spoc/zone-state Reference 16 Zone-state mapping
references/cb Reference ~30 Certification bodies
references/entity-types Reference ~10 Entity type codes

Note: Public endpoints return ALL records in a single response regardless of pagination parameters.

Workflow

1. Fetch JSON → 2. Clean/Transform → 3. Save to GCS (Parquet) → 4. Upsert Supabase

GCS Storage

gs://calee_data/raw/mspo/api/
├── smallholders/
│   ├── spoc/
│   │   ├── list-public/         # Independent smallholders (~306k)
│   │   ├── zones/
│   │   ├── zone-state/
│   │   ├── daerah/
│   │   └── spocs/
│   ├── gm/
│   │   ├── list-public/         # Group Manager smallholders (~2.3k)
│   │   └── groupmanagers/
│   └── osh/
│       └── all-smallholder/     # Organised smallholders (~47.5k)
├── references/
│   ├── states/
│   ├── countries/
│   ├── cb/
│   ├── audit-scopes/
│   ├── audit-types/
│   └── entity-types/
├── certification/               # Internal only (Prefect)
└── entity/                      # Internal only (Prefect)

Quick Start

cd cron/scrapers/mspo

# Run all endpoints
uv run python main.py

# List available endpoints
uv run python main.py --list

# Sync specific endpoint
uv run python main.py --endpoint spoc     # Independent smallholders
uv run python main.py --endpoint gm       # Group Manager smallholders
uv run python main.py --endpoint osh      # Organised smallholders
uv run python main.py --endpoint refs     # Reference data

Environment Variables

Variable Description
SUPABASE_URL Supabase project URL
SUPABASE_SERVICE_ROLE_KEY Supabase service role key
DISCORD_WEBHOOK_NOTIFICATIONS Discord webhook for failure alerts

Schedule

Daily at 06:00 SGT (6 AM Singapore time)

# Server is UTC, so 06:00 SGT = 22:00 UTC (previous day)
0 22 * * * /home/leeca/workspace/cron/scrapers/mspo/run.sh

Migration Status

Completed (Public Endpoints - Standalone Scraper)

Endpoint Type Records Status
spoc/get-spoc-list-public Bulk ~306k Done
groupmanager/list-public Bulk ~2.3k Done
osh/all-smallholder Bulk ~47.5k Done
Reference data (11 endpoints) Reference Various Done

Remaining (Internal Endpoints - Prefect Scraper)

Endpoint Type Records Portal Page Notes
entity Paginated ~130k Certified Entity Requires session auth
certification/detail-2 Per-entity ~130k Entity Details Requires session auth

Note: Entity and certification endpoints require the internal Prefect scraper with session authentication.

Documentation

Smallholders

Reference