Abstract
In this paper, we discuss the Bile Acid Comparison and Harmonisation project, a sub-study of the Trial of URsodeoxycholic acid vs RIFampicin in early-onset severe Intrahepatic Cholestasis of pregnancy, giving an overview of the current state of affairs for total bile acid measurements.
Keywords
Introduction
In this paper, we discuss the Bile Acid Comparison and Harmonisation project (BACH), a sub-study of the Trial of URsodeoxycholic acid vs RIFampicin in early-onset severe Intrahepatic Cholestasis of pregnancy (TURRIFIC), giving an overview of the current state of affairs for total bile acid (BA) measurements. Readers can refer to the original BACH protocol for a more detailed explanation. 1
Imagine a conversation between an obstetric physician and his chemical pathologist colleague: Q. We are planning a clinical trial reliant on the results of bile acid tests. A. Did you know that the results of bile acid tests are not standardised?
Firstly, let us introduce two terms, commonly used in laboratory medicine – standardisation and harmonisation. Standardisation – results for a particular analyte are equivalent and are traceable to the International System of Units (SI units) through a high-order primary reference material and/or standard reference measurement procedures. Harmonisation – results are equivalent, being based on a consensus approach, using mean values obtained with the different methods.
Slightly different in approach, both standardisation and harmonisation have the same end goal – making results equivalent between different laboratories and different methods. Where applicable, standardisation is the preferred approach, as this incorporates a full traceability chain to higher-order references through linkage to SI units.
It is important to recognise that BA exist as a mixture of the different species through various modifications and conjugations: what is measured, and then reported, is the total BA concentration.
Common to all BA species is the presence of a 3-hydroxyl group, which the enzyme 3α-hydroxysteroid dehydrogenase (3α-HSD) converts to a keto-steroid. This conversion generates thio-nicotinamide-adenine dinucleotide (thio-NADH): the increase in absorbance of thio-NADH can be measured and is directly proportional to the concentration of total BA (Figure 1).

Total Bile Acid (TBA) measurements are non-specific, a) converting the 3-hydroxy group (circled) of all bile acids to ketosteroids, b) via the enzymatic action of 3α hydroxysteroid dehydrogenase (3 alpha-HSD). (Enzymatic reaction modified from Ducroq et al, Ann Clin Biochem, 2010).
For many standard analytes, e.g., creatinine, a primary reference material (2-amino-1-methyl-imidazol-4-one – pure creatinine), is measured by a reference procedure at a metrology institute and used to assign values to a primary calibrator. Manufacturers then use this primary calibration material to calibrate their reagents and instruments and assign values to materials used in calibration of instruments in the routine laboratory to produce patient results.
Patient results are then traceable all the way back to the standard reference material, and any differences in variability between different manufacturers’ creatinine tests can be globally addressed by alignment of the manufacturers’ reagents to this primary reference material. This allows global application of estimated glomerular filtration rate calculations and clinical guidelines.
For BA, however, there is no international standard reference material nor is there a reference measurement method. Moreover, BA exist as a mixture of species, but it is not universally agreed which BA species should be selected to use as a calibrator.
Each individual manufacturer therefore prepares their own master calibrator, assigning values to the product calibrator, which is then used by each individual routine laboratory to generate patient results.
However, the calibrators used by manufacturers vary in several aspects, so that patient results from one manufacturer may not be equivalent to that of another manufacturer. So, outcomes from clinical trials or recommendations of clinical guidelines developed, based on those trials, may only be applicable to the particular methods used.
The study by Danese et al., 2 evaluating three common total BA assays in routine clinical use, found that three different BA species were used as calibrators by manufacturers: chenodeoxycholic acid (Randox), glycocholic acid (Diazyme) and taurodeoxycholic acid (Sentinel). Alarmingly, the study found that the manufacturer-assigned target values of the Randox and Diazyme calibrators were 10–20% lower than the measured values in the material. In addition, the 3α-HSD enzyme being used showed varying activity or sensitivity to the different BA species.
Analysis
TURRIFIC relies heavily on the measurement of serum BA provided by local laboratories. Recognising that there is heterogeneity in results between different assays, we implemented BACH. 1
In this project, we first simulated laboratory data from external quality assurance programmes to determine the feasibility of total BA recalibration, using a reference set of patient samples, with a consensus value assignment approach. From the simulations, we determined that the necessary sample size to achieve 90% power for a bias of 5% would require a minimum of 40 samples. We then demonstrated that mathematical recalibration of total BA results was both plausible and feasible, with a high probability of successfully harmonising results across participating laboratories. 1
We then manufactured a set of 60 reference samples, using BA species as a universal calibrator, including 49 serum samples, 4 samples spiked with individual BA species, and 7 samples spiked with bile fluid, with a view to assaying these in the 15 different laboratories used by the trial collaborators, using a range of common reagent manufacturers for the BA assays. 1
Our study acted as a fortuitous post-market surveillance of BA reagents, when we identified a commutability issue (the mathematical relationship between the patient samples and the calibrator material) of one assay at a participating laboratory testing our representative patient samples. In week 5, we identified a negative bias of 13% in our comparison samples when the reagent lot number was changed. The relevant laboratory was using a commercial internal quality control (QC) material with known BA values and no change from target values was seen. However, there was a change in BACH results, as these were representative intact serum samples derived from pooled patient serum, and the bias exceeded the Royal College of Pathologists of Australasia (RCPA) analytical performance specifications (±4 µmol/L ≤ 40 µmol/L; ±10% > 40 µmol/L) at values greater than 35 µmol/L 3 (Figure 2).

BACH – post-market surveillance bias detection in BA reagent. Left – percentage differences from All Procedure Median (all units in µmol/L) for the various Australian participating laboratories (multicoloured circles) from samples tested in week 1 to week 8 of the BACH study. Orange squares in week 5 and onwards denote the laboratory results, which displayed negative bias with a new reagent lot. Right – graphical representation of percentage bias for negative shift with a new reagent lot at one participating laboratory. Dotted red line represents the Royal College of Pathologists of Australasia (RCPA) analytical performance specifications (±4 µmol/L ≤ 40 µmol/L; ±10% > 40 µmol/L).
Investigation by the relevant manufacturer revealed that ‘low calibration absorbance may lead to failed calibration or low-end inaccuracy’; indicating that only BA results at lower concentrations were impacted and in contrast to our findings. However, affected reagent lots were recalled by the Australian Therapeutics Goods Administration.
We have suggested that there is a need for improvement in laboratory post-market surveillance of BA reagents by using quality testing materials with demonstrated commutability and/or real-time patient-based QC, such as using moving averages of patient results.
In further unpublished data, we have identified between-laboratory variability or heterogeneity for results on the same samples exceeding the RCPA analytical performance specifications. From the participating laboratories in the BACH study, we have had good representation from the three major reagent manufacturers – Randox, Abbott and Diazyme. BA results from laboratories using Randox reagents tended to be lower than those using Abbott reagents, which in turn tended to be lower than those using Diazyme reagents. In addition, there was a non-linear response when the Diazyme reagent was being used, with results showing less bias at higher BA concentrations. The Dialab reagent displayed similar behaviour and we therefore wonder if the Dialab reagent was perhaps a re-branded Diazyme reagent, although enquiry of the manufacturer so far remains unanswered.
Importantly, when relevant clinical decision points in relation to BA values (40 and 100 µmol/L) are considered, 4 there is variability in results surrounding these decision points. At 40 µmol/L, the threshold at which there is an increased risk of spontaneous pre-term birth and neonatal asphyxia, the estimated values ranged from 35 to 46 µmol/L, while at 100 µmol/L, at which the risk of stillbirth is increased, the estimated values ranged from 88 to 110 µmol/L (Figure 3).

BACH – impact at clinical decision points. Fitted BA values for each of 15 participating laboratories with 95% confidence intervals derived from regression coefficients at thresholds of 40 µmol/L and 100 µmol/L. At 40 µmol/L, the threshold at which there is an increased risk of spontaneous pre-term birth and neonatal asphyxia, the estimated values ranged from 35 to 46 µmol/L, while at 100 µmol/L, at which the risk of stillbirth is increased, the estimated values ranged from 88 to 110 µmol/L.
This variability in results makes application of a common reference interval or thresholds to use in clinical guidelines difficult, as results are not equivalent. In other words, any clinical guidelines developed would need to be anchored to the BA methodology used in that clinical trial.
Standardisation of the assays would require a universal calibrator material, generally a serum-based material spiked with a pure BA species, that will behave similarly to serum patient samples across all the different manufacturer assays. Alternatively, harmonisation of the assays could be undertaken, having a number of laboratories measure several samples and then use the mean or median result of each sample as the consensus value.
As part of the BACH project, four bile species were spiked into low BA concentration serum to investigate the differing activities of the enzymatic assays. Using the coefficients derived from principal component analysis of all the serum samples, we assessed the variability between laboratories for individual BA species, in order to identify those species that behave similarly to native serum samples. The coefficients for glycodeoxycholic, chenodeoxycholic and cholic acid lay within the region of serum samples, suggesting similar variability between laboratories as that of serum samples. However, lithocholic acid did not behave similarly to serum samples (Figure 4).

Standardisation – search for a universal calibrator. Outcomes from Principal Component Analysis, with plots representing two-dimensional space. The first principal component (PC1) accounts for 99.8% of the variability in results and the second principal component (PC2) accounts for 0.1% of the variability between results. The datasets are normalised during PCA, hence those points closest to the mean have values closer to zero, while more extreme values are further away from zero (either positive or negative depending on direction).
Discussion
There is increasing use of mass spectrometry methods for measuring total BA and profiles, which, combined with the availability of isotope-labelled BA standards, could facilitate calibrator target value setting by a higher order measurement method. Standardisation of BAs is therefore possible.
Consensus value approaches have been used in the past to harmonise results: Assay for thyroid-stimulating hormone is one example of a commonly requested laboratory test where a harmonisation initiative has been undertaken. 5
Using such an approach, participating laboratories in the BACH project measured the same samples according to a specific protocol to ensure that the pre-analytical and analytical phases were consistent between participants. The consensus value was determined for each sample, using the all-procedure median. The median was selected as this parameter is more robust and resistant to the influence of any outliers in the results. Regression approaches were then used to convert the observed values from the laboratories to harmonised values.
Harmonisation, using the all-procedure median and regression techniques with values obtained from the BACH study, reduced the average between laboratory variability from 8.6% to 3.3%, a reduction of 62% (Figure 5).

Harmonisation – BACH reduced laboratory variability (all units in µmol/L).
Also, a majority of results for serum samples are within the RCPA Analytical Performance Specifications post-harmonisation. Additionally, we now know the correlations between results from the laboratories servicing TURRIFIC recruitment sites. This allows us to convert BA results to harmonised values and subsequently to pool results from all trial sites for statistical analysis.
Extending this further, these regression coefficients could subsequently be applied by manufacturers to their products and hence to harmonised values produced by their reagents, although one difficulty with this approach is that periodic re-measurement of the reference panel samples will be needed to prevent long-term drift or bias in assays from the manufacturers.
Summary
Harmonisation of BA results is achievable and has been demonstrated both in simulations and using remainder patient serum samples. An expanded reference panel needs to be developed and additional BA methods explored. While the top-down approach of standardisation using a higher-order reference material is more difficult to achieve, this would provide the highest level of traceability and result equivalence. Consideration of this must be given when selecting the reference method(s) used in standardisation initiatives when linked to clinical outcome studies, as the method(s) must be sustainable long-term. For example, the Diabetes Control and Complications Trial in 1987 used an ion-exchange HPLC method for HbA1c measurement. While this method has been stable for over 30 years, it has required a network of reference laboratories to maintain this stability and subsequent unbroken linkage to the original trial. 6
In the search for a universal calibrator material, glycodeoxycholic acid appears to be a good starting point, in combination with candidate mass spectrometry reference methods available.
There is significant variability introduced by reagent manufacturers, through non-linearity, a lack of commutability and poor calibrator target-setting, which needs to be addressed by laboratories through enhanced post-market surveillance mechanisms.
Given that we now know the relationships between the different BA reagents, and that BA results can be converted to harmonised values for pooling in statistical analysis, individual participant results in different studies could now be converted and included in future meta-analyses involving BA results.
Footnotes
Acknowledgements
We acknowledge all the participating laboratories and the staff testing the BACH samples. This work is dedicated to the memories of Professor Hanns-Ulrich Marschall and Dr Michael Metz.
Contributorship
WH drafted the original manuscript. CM performed the simulation, summarized statistical results, produced the graphical output and reviewed the manuscript. Both authors have read and approved the final manuscript.
Declaration of conflicting interests
The authors declared no potential conflicts of interest with respect to the research, authorship, and/or publication of this article.
Ethical approval
Ethical approval for the study has been granted by the Women's and Children's Hospital Human Research Ethics Committee (HREC/18/WCHN/36).
Funding
The authors disclosed receipt of the following financial support for the research, authorship, and/or publication of this article: Financial support for sample transportation to Australian laboratories was generously provided by the Australasian Association of Clinical Biochemists (AACB).
Guarantor
WM “Bill” Hague is the guarantor of the present work.
