Skip to content

Universal Mill List Report

Rainforest Alliance Download Page

Overview

Property Value
URL https://www.rainforest-alliance.org/business/certification/the-universal-mill-list/
Source Rainforest Alliance
Format Excel (.xlsx) or CSV
Update Frequency Monthly
Authentication None (public download)

Data Source

The UML is published in two places:

  1. WRI Global Forest Watch - Updated every 6 months
  2. Rainforest Alliance - Updated monthly (used by this scraper)

The scraper parses the page HTML to find the "Download the Excel version" button and extracts the download URL.

Raw Data Structure

The Excel/CSV file contains the following columns:

┌──────────┬─────────────┬────────────────┬──────────────┬───────────┬─────────┐
│ UML ID   │ Group Name  │ Parent Company │ Company Name │ Mill Name │ Address │
├──────────┼─────────────┼────────────────┼──────────────┼───────────┼─────────┤
│ PO-00001 │ Sime Darby  │ Sime Darby Plt │ SD Palm...   │ Mill A    │ Jln...  │
│ PO-00002 │ IOI Group   │ IOI Corp Bhd   │ IOI Olein    │ Mill B    │ Lot...  │
│ PO-00003 │ Wilmar      │ Wilmar Int'l   │ Wilmar M'sia │ Mill C    │ No...   │
└──────────┴─────────────┴────────────────┴──────────────┴───────────┴─────────┘

Continued...
┌─────────────┬───────────┬─────────────────────────────────┬──────────┬──────────┐
│ RSPO Status │ RSPO Type │ Date RSPO Certification Status  │ Latitude │Longitude │
├─────────────┼───────────┼─────────────────────────────────┼──────────┼──────────┤
│ Certified   │ IP        │ 12/15/2023                      │ 3.1234   │ 101.5678 │
│ Certified   │ MB        │ 06/20/2022                      │ 2.9876   │ 102.3456 │
│ RSPO Member │ -         │ -                               │ 1.2345   │ 110.9876 │
└─────────────┴───────────┴─────────────────────────────────┴──────────┴──────────┘

Continued...
┌─────────────────┬─────┬─────────────┬──────────┬──────────┬───────┐
│ GPS Coordinates │ ISO │ Country     │ Province │ District │ State │
├─────────────────┼─────┼─────────────┼──────────┼──────────┼───────┤
│ 3.1234,101.5678 │ MY  │ Malaysia    │ Selangor │ Klang    │ -     │
│ 2.9876,102.3456 │ MY  │ Malaysia    │ Pahang   │ Bentong  │ -     │
│ 1.2345,110.9876 │ ID  │ Indonesia   │ Riau     │ Pelalawan│ -     │
└─────────────────┴─────┴─────────────┴──────────┴──────────┴───────┘

Additional columns: Confidence Level, Alternative Name

Data Transformations

Column Name Standardization

All column names are converted to lowercase with underscores: - UML IDuml_id - Parent Companyparent_company - GPS Coordinatesgps_coordinates

Date Format Conversion

  • Input: MM/DD/YYYY (e.g., 12/15/2023)
  • Output: YYYY-MM-DD (e.g., 2023-12-15)

GPS Coordinate Processing

String coordinates are converted to PostGIS WKT format:

# Input: "3.1234,101.5678" (lat,lon)
# Output: "POINT (101.5678 3.1234)" (lon,lat for PostGIS)

from shapely.geometry import Point

def process_gps(coord_str: str) -> str:
    lat, lon = map(float, coord_str.split(","))
    return Point(lon, lat).wkt  # PostGIS expects (lon, lat)

Coordinate Validation

  • Latitude clipped to [-90, 90]
  • Longitude clipped to [-180, 180]
  • Missing values filled with 0.0

Encoding Detection

For CSV files, character encoding is auto-detected using chardet:

import chardet
encoding = chardet.detect(content)["encoding"] or "utf-8"

Tidy Data Structure

Table: traceability.uml_data

┌──────────┬─────────────┬────────────────┬───────────┬─────────────┬───────────┐
│ uml_id   │ group_name  │ parent_company │ mill_name │ rspo_status │ rspo_type │
├──────────┼─────────────┼────────────────┼───────────┼─────────────┼───────────┤
│ PO-00001 │ Sime Darby  │ Sime Darby Plt │ Mill A    │ Certified   │ IP        │
│ PO-00002 │ IOI Group   │ IOI Corp Bhd   │ Mill B    │ Certified   │ MB        │
│ PO-00003 │ Wilmar      │ Wilmar Int'l   │ Mill C    │ RSPO Member │           │
└──────────┴─────────────┴────────────────┴───────────┴─────────────┴───────────┘

┌────────────┬───────────┬─────────────────────────┬─────────────┬────────────────────────┐
│ latitude   │ longitude │ gps_coordinates         │ country     │ updated_at             │
├────────────┼───────────┼─────────────────────────┼─────────────┼────────────────────────┤
│ 3.1234     │ 101.5678  │ POINT (101.5678 3.1234) │ Malaysia    │ 2025-12-14T21:00:00+08 │
│ 2.9876     │ 102.3456  │ POINT (102.3456 2.9876) │ Malaysia    │ 2025-12-14T21:00:00+08 │
│ 1.2345     │ 110.9876  │ POINT (110.9876 1.2345) │ Indonesia   │ 2025-12-14T21:00:00+08 │
└────────────┴───────────┴─────────────────────────┴─────────────┴────────────────────────┘

Schema

Column Type Nullable Description
uml_id text NO Unique mill identifier (Primary Key)
group_name text YES RSPO member group name
parent_company text YES Parent company name
company_name text YES Operating company name
mill_name text YES Mill name
address text YES Mill physical address
rspo_status text YES RSPO certification status
rspo_type text YES Certification type (IP, MB, etc.)
date_rspo_certification_status date YES Date of certification status
latitude float8 YES GPS latitude (0.0 if missing)
longitude float8 YES GPS longitude (0.0 if missing)
gps_coordinates geometry YES PostGIS Point geometry (SRID 4326)
iso text YES ISO 3166-1 alpha-2 country code
country text YES Country name
province text YES Province or state
district text YES District
state text YES State (if different from province)
confidence_level text YES GPS coordinate confidence
alternative_name text YES Alternative mill names
updated_at timestamptz YES Last sync timestamp (SGT)

GCS Storage

gs://calee_data/raw/uml/
├── 20251214_210000.xlsx    # Raw file with timestamp
├── 20251213_210000.xlsx
└── ...

Naming Convention: {YYYYMMDD}_{HHMMSS}.xlsx

Example Queries

Count Mills by Country

SELECT country, COUNT(*) as mill_count
FROM traceability.uml_data
GROUP BY country
ORDER BY mill_count DESC
LIMIT 10;

Find RSPO Certified Mills in Malaysia

SELECT uml_id, mill_name, parent_company, rspo_status, rspo_type
FROM traceability.uml_data
WHERE country = 'Malaysia'
  AND rspo_status = 'Certified'
ORDER BY parent_company, mill_name;

Mills with Valid GPS Coordinates

SELECT uml_id, mill_name, latitude, longitude, gps_coordinates
FROM traceability.uml_data
WHERE latitude != 0 AND longitude != 0
  AND gps_coordinates IS NOT NULL;

Spatial Query: Mills within Radius

SELECT uml_id, mill_name, country,
       ST_Distance(
         gps_coordinates::geography,
         ST_MakePoint(101.6869, 3.1390)::geography
       ) / 1000 as distance_km
FROM traceability.uml_data
WHERE gps_coordinates IS NOT NULL
  AND ST_DWithin(
    gps_coordinates::geography,
    ST_MakePoint(101.6869, 3.1390)::geography,
    100000  -- 100km radius
  )
ORDER BY distance_km;

Load from GCS (DuckDB)

SELECT *
FROM read_parquet('gs://calee_data/raw/uml/*.xlsx')
LIMIT 100;

RSPO Status Values

Status Description
Certified Fully RSPO certified
RSPO Member RSPO member but not certified
Not Certified Not RSPO certified
Suspended Certification suspended
Withdrawn Certification withdrawn

RSPO Type Values

Type Full Name Description
IP Identity Preserved Certified sustainable oil traceable to single mill
SG Segregated Certified sustainable oil, may mix mills
MB Mass Balance Mix of certified and conventional oil
B&C Book & Claim Credits purchased, oil may be conventional

Data Quality Notes

  1. Missing Coordinates: Mills with missing GPS data have latitude/longitude set to 0.0
  2. Coordinate Precision: Original data may have varying precision levels
  3. Date Format: Input dates in MM/DD/YYYY format, stored as YYYY-MM-DD
  4. String Nulls: Empty strings used for missing text values
  5. File Format: Both Excel (.xlsx) and CSV formats supported