MSPO Public API Scraper
Overview
Scraper for Malaysian Sustainable Palm Oil (MSPO) certification public data - a comprehensive database of certified palm oil entities, smallholders, and supply chain actors.
Authentication
None required. Public API.
Source
| Property |
Value |
| Base URL |
https://api.mspots.org.my/api |
| Portal |
https://emspo.org.my/public |
| Format |
JSON |
| Update Frequency |
Daily |
| Data Provider |
MPOCC (Malaysian Palm Oil Certification Council) |
Available Public Endpoints
Smallholders
| Endpoint |
Type |
Records |
Description |
spoc/get-spoc-list-public |
Bulk |
~306k |
Independent smallholders (SPOC) |
groupmanager/list-public |
Bulk |
~2.3k |
Group Manager smallholders |
osh/all-smallholder |
Bulk |
~47.5k |
Organised smallholders (FELDA, etc) |
Reference Data
| Endpoint |
Type |
Records |
Description |
references/states |
Reference |
25 |
Malaysian states |
references/countries |
Reference |
~200 |
Countries |
spoc/zones |
Reference |
11 |
MSPO zones |
spoc/zone-state |
Reference |
16 |
Zone-state mapping |
references/cb |
Reference |
~30 |
Certification bodies |
references/entity-types |
Reference |
~10 |
Entity type codes |
Note: Public endpoints return ALL records in a single response regardless of pagination parameters.
Workflow
1. Fetch JSON → 2. Clean/Transform → 3. Save to GCS (Parquet) → 4. Upsert Supabase
GCS Storage
gs://calee_data/raw/mspo/api/
├── smallholders/
│ ├── spoc/
│ │ ├── list-public/ # Independent smallholders (~306k)
│ │ ├── zones/
│ │ ├── zone-state/
│ │ ├── daerah/
│ │ └── spocs/
│ ├── gm/
│ │ ├── list-public/ # Group Manager smallholders (~2.3k)
│ │ └── groupmanagers/
│ └── osh/
│ └── all-smallholder/ # Organised smallholders (~47.5k)
├── references/
│ ├── states/
│ ├── countries/
│ ├── cb/
│ ├── audit-scopes/
│ ├── audit-types/
│ └── entity-types/
├── certification/ # Internal only (Prefect)
└── entity/ # Internal only (Prefect)
Quick Start
cd cron/scrapers/mspo
# Run all endpoints
uv run python main.py
# List available endpoints
uv run python main.py --list
# Sync specific endpoint
uv run python main.py --endpoint spoc # Independent smallholders
uv run python main.py --endpoint gm # Group Manager smallholders
uv run python main.py --endpoint osh # Organised smallholders
uv run python main.py --endpoint refs # Reference data
Environment Variables
| Variable |
Description |
SUPABASE_URL |
Supabase project URL |
SUPABASE_SERVICE_ROLE_KEY |
Supabase service role key |
DISCORD_WEBHOOK_NOTIFICATIONS |
Discord webhook for failure alerts |
Schedule
Daily at 06:00 SGT (6 AM Singapore time)
# Server is UTC, so 06:00 SGT = 22:00 UTC (previous day)
0 22 * * * /home/leeca/workspace/cron/scrapers/mspo/run.sh
Migration Status
Completed (Public Endpoints - Standalone Scraper)
| Endpoint |
Type |
Records |
Status |
spoc/get-spoc-list-public |
Bulk |
~306k |
Done |
groupmanager/list-public |
Bulk |
~2.3k |
Done |
osh/all-smallholder |
Bulk |
~47.5k |
Done |
| Reference data (11 endpoints) |
Reference |
Various |
Done |
Remaining (Internal Endpoints - Prefect Scraper)
| Endpoint |
Type |
Records |
Portal Page |
Notes |
entity |
Paginated |
~130k |
Certified Entity |
Requires session auth |
certification/detail-2 |
Per-entity |
~130k |
Entity Details |
Requires session auth |
Note: Entity and certification endpoints require the internal Prefect scraper with session authentication.
Documentation
Smallholders
Reference