Universal Mill List (UML) Scraper
Overview
Scraper for the Universal Mill List (UML) from Rainforest Alliance - a public database of RSPO-certified palm oil mills worldwide.
Authentication
None required. Public download.
Source
| Property | Value |
|---|---|
| URL | https://www.rainforest-alliance.org/business/certification/the-universal-mill-list/ |
| Format | Excel (.xlsx) or CSV |
| Update Frequency | Monthly |
| Data Provider | Rainforest Alliance / RSPO |
Workflow
1. Scrape page for download link → 2. Download Excel/CSV → 3. Save to GCS → 4. Parse & clean → 5. Upsert Supabase → 6. Replace SQL Server
GCS Storage
Database Tables
Supabase: traceability.uml_data
| Column | Type | Description |
|---|---|---|
uml_id |
text | Unique mill identifier (PK) |
group_name |
text | RSPO member group name |
parent_company |
text | Parent company name |
company_name |
text | Operating company name |
mill_name |
text | Mill name |
address |
text | Mill address |
rspo_status |
text | RSPO certification status |
rspo_type |
text | Type of RSPO certification |
date_rspo_certification_status |
date | Certification date |
latitude |
float | GPS latitude (-90 to 90) |
longitude |
float | GPS longitude (-180 to 180) |
gps_coordinates |
geometry | PostGIS Point (SRID 4326) |
iso |
text | ISO country code |
country |
text | Country name |
province |
text | Province/state |
district |
text | District |
state |
text | State |
confidence_level |
text | GPS confidence level |
alternative_name |
text | Alternative mill names |
updated_at |
timestamptz | Last sync timestamp |
Unique Constraint: uml_id
SQL Server: traceability.uml_data
Identical schema. Table is fully replaced on each sync.
Quick Start
Environment Variables
| Variable | Description |
|---|---|
SUPABASE_URL |
Supabase project URL |
SUPABASE_KEY |
Supabase service role key |
SQLSERVER_HOST_JINLEE |
SQL Server hostname |
SQLSERVER_DATABASE_JINLEE |
SQL Server database name |
SQLSERVER_USER_JINLEE |
SQL Server username |
SQLSERVER_PASSWORD_JINLEE |
SQL Server password |
DISCORD_WEBHOOK_NOTIFICATIONS |
Discord webhook for failure alerts |
Schedule
Daily at 21:00 SGT (9 PM Singapore time)