The RAdiative transfer Model Intercomparison (RAMI) activity focuses on the benchmarking of canopy radiative transfer (RT) models. For the current fourth phase of RAMI, six highly realistic virtual plant environments were constructed on the basis of intensive field data collected from (both deciduous and coniferous) forest stands as well as test sites in Europe and South Africa. Twelve RT modelling groups provided simulations of canopy scale (directional and hemispherically integrated) radiative quantities, as well as a series of binary hemispherical photographs acquired from different locations within the virtual canopies. The simulation results showed much greater variance than those recently analysed for the abstract canopy scenarios of RAMI-IV. Canopy complexity is among the most likely drivers behind operator induced errors that gave rise to the discrepancies. Conformity testing was introduced to separate the simulation results into acceptable and non-acceptable contributions. More specifically, a shared risk approach is used to evaluate the compliance of RT model simulations on the basis of reference data generated with the weighted ensemble averaging technique from ISO-13528. However, using concepts from legal metrology, the uncertainty of this reference solution will be shown to prevent a confident assessment of model performance with respect to the selected tolerance intervals. As an alternative, guarded risk decision rules will be presented to account explicitly for the uncertainty associated with the reference and candidate methods. Both guarded acceptance and guarded rejection approaches are used to make confident statements about the acceptance and/or rejection of RT model simulations with respect to the predefined tolerance intervals. © 2015 The Authors.