Universal Mill List Report

Overview
| Property | Value |
|---|---|
| URL | https://www.rainforest-alliance.org/business/certification/the-universal-mill-list/ |
| Source | Rainforest Alliance |
| Format | Excel (.xlsx) or CSV |
| Update Frequency | Monthly |
| Authentication | None (public download) |
Data Source
The UML is published in two places:
- WRI Global Forest Watch - Updated every 6 months
- Rainforest Alliance - Updated monthly (used by this scraper)
The scraper parses the page HTML to find the "Download the Excel version" button and extracts the download URL.
Raw Data Structure
The Excel/CSV file contains the following columns:
┌──────────┬─────────────┬────────────────┬──────────────┬───────────┬─────────┐
│ UML ID │ Group Name │ Parent Company │ Company Name │ Mill Name │ Address │
├──────────┼─────────────┼────────────────┼──────────────┼───────────┼─────────┤
│ PO-00001 │ Sime Darby │ Sime Darby Plt │ SD Palm... │ Mill A │ Jln... │
│ PO-00002 │ IOI Group │ IOI Corp Bhd │ IOI Olein │ Mill B │ Lot... │
│ PO-00003 │ Wilmar │ Wilmar Int'l │ Wilmar M'sia │ Mill C │ No... │
└──────────┴─────────────┴────────────────┴──────────────┴───────────┴─────────┘
Continued...
┌─────────────┬───────────┬─────────────────────────────────┬──────────┬──────────┐
│ RSPO Status │ RSPO Type │ Date RSPO Certification Status │ Latitude │Longitude │
├─────────────┼───────────┼─────────────────────────────────┼──────────┼──────────┤
│ Certified │ IP │ 12/15/2023 │ 3.1234 │ 101.5678 │
│ Certified │ MB │ 06/20/2022 │ 2.9876 │ 102.3456 │
│ RSPO Member │ - │ - │ 1.2345 │ 110.9876 │
└─────────────┴───────────┴─────────────────────────────────┴──────────┴──────────┘
Continued...
┌─────────────────┬─────┬─────────────┬──────────┬──────────┬───────┐
│ GPS Coordinates │ ISO │ Country │ Province │ District │ State │
├─────────────────┼─────┼─────────────┼──────────┼──────────┼───────┤
│ 3.1234,101.5678 │ MY │ Malaysia │ Selangor │ Klang │ - │
│ 2.9876,102.3456 │ MY │ Malaysia │ Pahang │ Bentong │ - │
│ 1.2345,110.9876 │ ID │ Indonesia │ Riau │ Pelalawan│ - │
└─────────────────┴─────┴─────────────┴──────────┴──────────┴───────┘
Additional columns: Confidence Level, Alternative Name
Data Transformations
Column Name Standardization
All column names are converted to lowercase with underscores:
- UML ID → uml_id
- Parent Company → parent_company
- GPS Coordinates → gps_coordinates
Date Format Conversion
- Input:
MM/DD/YYYY(e.g.,12/15/2023) - Output:
YYYY-MM-DD(e.g.,2023-12-15)
GPS Coordinate Processing
String coordinates are converted to PostGIS WKT format:
# Input: "3.1234,101.5678" (lat,lon)
# Output: "POINT (101.5678 3.1234)" (lon,lat for PostGIS)
from shapely.geometry import Point
def process_gps(coord_str: str) -> str:
lat, lon = map(float, coord_str.split(","))
return Point(lon, lat).wkt # PostGIS expects (lon, lat)
Coordinate Validation
- Latitude clipped to [-90, 90]
- Longitude clipped to [-180, 180]
- Missing values filled with 0.0
Encoding Detection
For CSV files, character encoding is auto-detected using chardet:
Tidy Data Structure
Table: traceability.uml_data
┌──────────┬─────────────┬────────────────┬───────────┬─────────────┬───────────┐
│ uml_id │ group_name │ parent_company │ mill_name │ rspo_status │ rspo_type │
├──────────┼─────────────┼────────────────┼───────────┼─────────────┼───────────┤
│ PO-00001 │ Sime Darby │ Sime Darby Plt │ Mill A │ Certified │ IP │
│ PO-00002 │ IOI Group │ IOI Corp Bhd │ Mill B │ Certified │ MB │
│ PO-00003 │ Wilmar │ Wilmar Int'l │ Mill C │ RSPO Member │ │
└──────────┴─────────────┴────────────────┴───────────┴─────────────┴───────────┘
┌────────────┬───────────┬─────────────────────────┬─────────────┬────────────────────────┐
│ latitude │ longitude │ gps_coordinates │ country │ updated_at │
├────────────┼───────────┼─────────────────────────┼─────────────┼────────────────────────┤
│ 3.1234 │ 101.5678 │ POINT (101.5678 3.1234) │ Malaysia │ 2025-12-14T21:00:00+08 │
│ 2.9876 │ 102.3456 │ POINT (102.3456 2.9876) │ Malaysia │ 2025-12-14T21:00:00+08 │
│ 1.2345 │ 110.9876 │ POINT (110.9876 1.2345) │ Indonesia │ 2025-12-14T21:00:00+08 │
└────────────┴───────────┴─────────────────────────┴─────────────┴────────────────────────┘
Schema
| Column | Type | Nullable | Description |
|---|---|---|---|
uml_id |
text | NO | Unique mill identifier (Primary Key) |
group_name |
text | YES | RSPO member group name |
parent_company |
text | YES | Parent company name |
company_name |
text | YES | Operating company name |
mill_name |
text | YES | Mill name |
address |
text | YES | Mill physical address |
rspo_status |
text | YES | RSPO certification status |
rspo_type |
text | YES | Certification type (IP, MB, etc.) |
date_rspo_certification_status |
date | YES | Date of certification status |
latitude |
float8 | YES | GPS latitude (0.0 if missing) |
longitude |
float8 | YES | GPS longitude (0.0 if missing) |
gps_coordinates |
geometry | YES | PostGIS Point geometry (SRID 4326) |
iso |
text | YES | ISO 3166-1 alpha-2 country code |
country |
text | YES | Country name |
province |
text | YES | Province or state |
district |
text | YES | District |
state |
text | YES | State (if different from province) |
confidence_level |
text | YES | GPS coordinate confidence |
alternative_name |
text | YES | Alternative mill names |
updated_at |
timestamptz | YES | Last sync timestamp (SGT) |
GCS Storage
gs://calee_data/raw/uml/
├── 20251214_210000.xlsx # Raw file with timestamp
├── 20251213_210000.xlsx
└── ...
Naming Convention: {YYYYMMDD}_{HHMMSS}.xlsx
Example Queries
Count Mills by Country
SELECT country, COUNT(*) as mill_count
FROM traceability.uml_data
GROUP BY country
ORDER BY mill_count DESC
LIMIT 10;
Find RSPO Certified Mills in Malaysia
SELECT uml_id, mill_name, parent_company, rspo_status, rspo_type
FROM traceability.uml_data
WHERE country = 'Malaysia'
AND rspo_status = 'Certified'
ORDER BY parent_company, mill_name;
Mills with Valid GPS Coordinates
SELECT uml_id, mill_name, latitude, longitude, gps_coordinates
FROM traceability.uml_data
WHERE latitude != 0 AND longitude != 0
AND gps_coordinates IS NOT NULL;
Spatial Query: Mills within Radius
SELECT uml_id, mill_name, country,
ST_Distance(
gps_coordinates::geography,
ST_MakePoint(101.6869, 3.1390)::geography
) / 1000 as distance_km
FROM traceability.uml_data
WHERE gps_coordinates IS NOT NULL
AND ST_DWithin(
gps_coordinates::geography,
ST_MakePoint(101.6869, 3.1390)::geography,
100000 -- 100km radius
)
ORDER BY distance_km;
Load from GCS (DuckDB)
RSPO Status Values
| Status | Description |
|---|---|
Certified |
Fully RSPO certified |
RSPO Member |
RSPO member but not certified |
Not Certified |
Not RSPO certified |
Suspended |
Certification suspended |
Withdrawn |
Certification withdrawn |
RSPO Type Values
| Type | Full Name | Description |
|---|---|---|
IP |
Identity Preserved | Certified sustainable oil traceable to single mill |
SG |
Segregated | Certified sustainable oil, may mix mills |
MB |
Mass Balance | Mix of certified and conventional oil |
B&C |
Book & Claim | Credits purchased, oil may be conventional |
Data Quality Notes
- Missing Coordinates: Mills with missing GPS data have latitude/longitude set to 0.0
- Coordinate Precision: Original data may have varying precision levels
- Date Format: Input dates in MM/DD/YYYY format, stored as YYYY-MM-DD
- String Nulls: Empty strings used for missing text values
- File Format: Both Excel (.xlsx) and CSV formats supported