Abstract
BACKGROUND:
To ensure quality and stability, monitoring systems are recommended to analyze pharmaceutical manufacturing processes.
OBJECTIVE:
This study was performed to predict powder X-ray diffraction (PXRD) patterns of formulation powders through attenuated total reflectance (ATR)-infrared (IR) spectroscopy in a nondestructive manner along with chemometrics.
RESULTS:
Caffeine anhydrate, acetaminophen, and lactose monohydrate were grinded at six weight ratios. The six sample groups were evaluated using ATR-IR spectroscopy and PXRD analysis. Partial least squares models were constructed to predict the PXRD intensities of the samples from the ATR-IR spectra. The prediction accuracy on the prepared PLS regression models was as high as R 2 = 0.993.
CONCLUSIONS:
Linear relationships were obtained between the prediction data set and reference PXRD intensity at each degree. 2D PLS regression coefficient analysis enabled the analysis of the correlation between PXRD patterns and IR spectra.
Keywords
Introduction
Powder X-ray diffraction (PXRD) analysis is a widely used quantification method for evaluating the powder quality of drugs, metals, and ceramics. Crystals composed of organic molecules exhibit unique PXRD patterns. The specific PXRD patterns of crystals enable the analysis of their polymorphisms, crystallinity, quantity and properties [1,2]. However, PXRD measurement requires large amounts of time and sample destruction often occurs (i.e., drying or powdering).
Attenuated total reflectance (ATR)-infrared (IR) spectroscopy exhibits great performance in terms of sensitivity, non-pretreatment and time resolution for the evaluation of rapid chemical reactions and phase transformations [3–5]. This type of spectroscopy affords chemical and physical information on materials with infrared-active vibrations (symmetrical/anti-symmetrical stretching, scissoring, etc.) and light diffusivity. However, decomposition and quantification of multiple components such as polymorphism from peak area integral calculations is difficult. In these cases, chemometrics can be a powerful tool to quantify polymorphism contents from ATR-IR spectra [6].
The process analytical technology (PAT) development is required by the U.S. Food and Drug Administration in the pharmaceutical manufacturing process [7]. The PAT guidelines recommend the analysis of pharmaceutical manufacturing processes via in-line/on-line monitoring systems to insure their quality and stability [8–10]. Various analytical tools, such as Terahertz, Raman, and near IR spectroscopy have been used to examine various manufacturing processes [11–13].
The partial least squares (PLS) method is commonly used to analyze large amounts of data. The PLS method constructs multiple linear regressions from the measured data set and the constructed PLS model can estimate correlation data. The model decomposes the other matrices which are scored and loaded using both measured and predicted data sets. For IR spectroscopy, loadings and regression vectors decompose their own principal contents and unique molecular groups vibration.
In a previous study, we reported multivariate analysis on pseudo-polymorphic transformation of carbamazepine [14] and phase transformation of calcium phosphate [15] using the ATR-IR spectroscopy data set.
In this study, we predicted PXRD patterns of multiple components of caffeine anhydrate, acetaminophen and lactose monohydrate in pharmaceutical formulations from their respective IR spectra with chemometrics methods. PXRD patterns were predicted as functions from constructed multi-PLS regression models from a single set of spectra data. Linear relationships were obtained between the prediction intensity data set and reference intensity of PXRD in each of the constructed PLS regression models. 2D PLS regression coefficient analysis enabled the analysis of the relationship between the diffractograms and ATR-IR spectra.
Materials and methods
Materials
The pharmaceutical bulk powders of caffeine anhydrate form I (CA), acetaminophen (AA), and lactose monohydrate (LC) were purchased from Shizuoka Caffeine Co. Ltd. (Japan), Iwaki Seiyaku Co. Ltd. (Japan), and DMV Co. Ltd. (Germany), respectively. These bulk material powders were grinded and mixed in a mortar to create various formulations. Table 1 lists the contents of CA, AA, and LC bulk materials as wt% for all the formulations used in this study.
The composition of the six prepared formulation powders groups
The composition of the six prepared formulation powders groups
The ATR-IR spectra of samples were measured from 2000 to 600 cm−1 using a FT/IR-6200 (JASCO Co., Tokyo, Japan) instrument with 64 accumulations at a resolution of 4 cm−1. The spectrometer was equipped with a germanium ATR sampling accessory and the measurements were performed 18 times at room temperature.
Powder X-ray diffraction analysis
The samples were grinded to prevent orientation. The formulations were loosely packed in a PXRD sample glass holder. The sample in the holders were analyzed using a powder X-ray diffraction (PXRD) system (Miniflex, Rigaku Co. Ltd., Tokyo, Japan) at room temperature. The measurement conditions were as follows: (1) Ni-filtered; (2) Cu K𝛼 radiation (𝜆1∕4 0.1540 nm); (3) voltage 30 kV; (4) current 15 mA; (5) step slit, 0.2°; (6) counting time, 1.0 s; (7) time constant 1 s; and (8) measurement range (2𝜃) 5.0–45.0°. The averaged PXRD profiles were obtained from six replicate measurements and the profiles were normalized by an area normalization method.
Partial least squares regression
The concentrations of drugs and excipient in the mixed formulation powders were estimated using the PLS calibration models. The averaged PXRD intensities of each sample was used for further analysis as a response variable. Since the IR spectra were recorded 18 times for a single sample, 108 ATR-IR spectra for six formulation groups were collected in total for PLS analysis. In this study, a generalized non-linear iterative partial least squares (NIPALS) algorithm was used for the PLS regression analysis [16]. PLS regression can construct a mathematical relationship between the measured independent variables (
The cross-validation using the leave-one-out method [19] was applied to evaluate PLS calibration models. The optimum number of factors is considered, resulting in a minimum in the predicted residual sum of squares (PRESS2𝜃) vs. PLS components graph. The former can be defined as follows:
UNSCRAMBLER X version 10.4 from CAMO Software (Computer Aided Modelling, Trondheim, Norway) of multivariate software was applied to evaluate the data set.
ATR-IR spectroscopy analysis of pharmaceutical formulation powders
Figure 1 shows the ATR-IR spectra of the ground pharmaceutical bulk powder samples. The absorbance peaks at 1696 and 1650 cm−1 were assigned to the C=O stretching vibration of CA form I [21]. The peaks at 1604, 1508 and 1441 cm−1 can be assigned to the C–C stretching of the aromatic benzene ring of AA [22]. The absorbance peaks at 1650 and 1560 cm−1 were assigned to the C=O stretching vibration of the amide I band of AA and N–H in-plane bending of AA, respectively [22]. The absorbance peak at 1369 cm−1 corresponds to the symmetric CH3 bending vibrations of AA, and the peak at 1327 cm−1 was assigned to the C–N stretching vibration of AA [22]. The absorbance peak at 1250 cm−1 correspond to the C–O stretching vibrations of AA [22]. The IR vibration peaks of carboxyl groups from AA show significant peaks at 1700 cm−1. It was reported that the polymorphic transformation of AA significantly affects the IR peak shift [23]. The IR absorbance region of 1100–1000 cm−1 contains the C–O stretching vibration peak of LC [24]. The infrared absorbance of the samples depends on the concentrations of their pharmaceutical constituents.

ATR-IR spectra of the pharmaceutical formulation powders. Close circles, triangles and squares show C=O of CA, AA and LC, respectively.
Principal component analysis (PCA) was used to clarify the data set for subsequent IR peak evaluation [25]. Cumulative percent variances of the 108 IR spectra were 57.3%, 81.0%, 91.2%, and 97.3% for PC1, PC2, PC3 and PC4, respectively, upon validation. The PCA results of IR spectra suggested that the spectra data set was highly collinear due to two major reasons: the mixed samples including three materials and loadings of PCA are orthogonal to each other. It is reasonable to assume that using the PLS method was the best way to predict PXRD intensity with high accuracy at each degree with high collinearity. Dowling et al. [26] reported that the IR spectra of Alhydrogel® exhibited high collinearity Kazarian and Chan [27] reported mapping quantification of biomedical samples using ATR-IR spectra and chemometrics methods. It was suggested that PCA pretreatment is useful for understanding data components of the obtained measurements.
Figure 2 shows the powder X-ray diffractograms of the mixed powder samples from 5.00 to 45.0°. Significant peak changes were observed at 10.4, 11.8, 13.8, 15.4, 18.2, 19.0, 20.2, 20.8, 23.4, 24.2, 26.4, 27.0, 28.4, 31.6, and 33.0° among the six samples. The specific diffraction points are described in the figure as closed circle, triangle and square for CA, AA and LC, respectively.

PXRD patterns of the pharmaceutical formulation powders. Close circles, triangles and squares show CA, AA and LC, respectively.
The diffractograms were calculated from the PCA using the non-linear iterative partial least squares (NIPALS) algorithm [30] with cross validation at 20 segments. The cumulative percent variance of the PCA model of PXRD data was 53.3%, 95.6%, 99.5%, and 99.8% at PC1, PC2, PC3, and PC4, respectively. The values were reasonable to consider three component data sets from the relationship between cumulative percent variance and material content. The variance results suggest that the PXRD pattern of the samples exhibit low collinearity.
Table 2 lists the cumulative percent variance (%) of the IR spectra calibration and validation data set (54 spectra). In order to prevent overfitting [31], it is important to construct PLS models with high prediction accuracy. The plateau point of the cumulative percent variance increase occurs at a logical point. Gowen et al. [32] reported that the selection of the principal component number on the latent variables is important for constructing PLS models. Increasing cumulative percent variance percentages flatten at PC4, and thus PC4 was selected as the component number in the developed PLS models.
Cumulative percent variance on the constructed PLS regression models
Cumulative percent variance on the constructed PLS regression models
Figures 3(a–c) show score plots of the constructed PLS models used to predict PXRD patterns. The figures are 2D scatter plots of the scores for two specified PCs, which yield information regarding clustering among the sample groups. Puhakka et al. [33] estimated articular cartilage properties using multivariate analysis of optical coherence tomography signals, evaluating the constructed PLS models using score plots. Mercier et al. [34] used the PAT tool for bioprocess development data based on the evaluation of PC score plots. In Fig. 3, closer plot points indicate similar IR data for each principal component in the constructed PLS models. Large distances indicate differences in the IR spectra based on their principal components. The PC1 vs. PC2 plot (Fig. 3(a)) shows a good separation among the IR spectra of samples without sample groups E and F. The PC3 vs. PC1 plot (Fig. 3(b)) shows well-separated E and F groups. This separation suggests that constructing PC4 can be used to decompose the IR spectra to predict PXRD patterns.

Score plot of the constructed PLS models. (a) PC1 vs. PC2, (b) PC1 vs. PC3, (c) PC1 vs. PC4.

(a) Actual PXRD patterns and predicted intensities in the region of 10–20 degrees for tablet groups A and B. (b) Predicted accuracy of PXRD intensities based on the constructed PLS models.

2D PLS regression coefficient analysis. (x: degrees of PXRD, y: wavenumber of the IR spectra, z: correlation values.)
Figures 4(a,b) show the predicted PXRD patterns based on the IR spectra of ternary mixtures generated from the constructed robust PLS models. Figure 4(a) shows actual PXRD patterns in the partial region of and predicted count per second intensities. The predicted PXRD patterns have a high correlation with actual patterns at the tablet of A groups. It was suggested to use the prediction method, which enables the prediction of PXRD from ATR-IR spectra. Figure 4(b) shows the relationship between the predicted and actual PXRD intensities from the constructed PLS models with four latent variables. All plotted numbers were constructed from 10854 points (degree range 2𝜃, 5–45°; resolution, 0.2°; n = 54) and the prediction regression line shows a high correlation value (R 2 = 0.993). The composition could be easily determined from the IR or PXRD data with chemometrics [28]. This prediction method shows that IR spectra as mathematical matrices can be converted into PXRD patterns via PLS regression models. The results suggest that the constructed PLS models include correlations between the IR spectra and PXRD patterns via components amount of pharmaceutical formulation. Lopes et al. [35] reported that the multi-block PLS method can be used to model industrial pharmaceutical production processes. Figure 4(b) shows each line as a reference PXRD intensities from the formulation groups and the predicted PXRD intensities are represented in the error bars. The prediction accuracy was suggested to be sufficient to evaluate sample powders to control pharmaceutical quality in the manufacturing process analysis. The developed models can predict PXRD patterns using IR measurements without pretreatment, sample destruction. This method represents a novel type of process analysis technology that can be used for pharmaceutical production. Moreover, these results suggest that high time resolution formulation analysis methods using IR spectroscopy can improve production rates in pharmaceutical industries. Many chemometrics articles reported data relationships between the predictive value and data sets from regression coefficients values [36,37]. In this study, we plotted regression coefficients in 2D and evaluated the relationship between predictions and experimentally determined data sets.
Figure 5 shows the 2D PLS regression coefficient analysis of the constructed PLS models and each bulk-powder spectra as well as their diffractograms PXRD patterns and IR spectra. This method can analyze unknown complex correlations between two spectral data sets using the constructed PLS model. This method could identify components based on the 2𝜃 values of the PXRD diffractogram peaks (X), wavenumber in the IR spectra (Y), and regression coefficient (b) constructed for 200 PLS models. The (X, Y) crossing point with a high b value indicates a positive correlation between the IR wavenumber and diffraction degree. The regression coefficient describes the relationship between the IR vibrations of various functional groups and the crystal structure. The negative value of b indicates an inverse correlation between those detections. A positive b value was observed in the cross positions at (12.0: 1060, 1650 and 1700), (20.8: 1060, 1430, 1504, 1560 and 1604), and (26.4: 1700), (27.0:1060 and 1700). A negative b value was found respectively in cross points at (12.0: 1220, 1250, 1430, 1504, 1560 and 1606), (15.4: 1060), (18.2: 1060), (20.8: 1700), (23.4: 1060), (24.2: 1060), and (26.4: 1060). Some additional correlations were suggested from PXRD patterns of bulk powders. It was considered that the positive b values of (12.0, 23.4, 24.2: 1700) were due to CA correlation points CA exhibits a strong IR peak position from the C=O vibration at 1700 cm−1 [21]. Correlations of AA show a positive b value at (20.8: 1430, 1504, 1560, and 1604). Correlations of LC with positive b were observed at (20.8, 17.0: 1060) and the peak at 1060 cm−1 originated from LC [23]. It was indicated the analysis shows correlation points between IR absorbance peaks and PXRD patterns.
In this study, we primarily used IR spectroscopy and PXRD patterns. This method is versatile; multivariable data (for example, near IR, mass spectra, and chromatography) could be further applied to clarify the relationships described above [38–44].
In this study, we predicted PXRD patterns of multiple components in pharmaceutical formulation powders using their IR spectra with constructed PLS models. The predictions showed high accuracy at each degree of the diffractograms and the 2D PLS coefficient analysis was used to identify relationships between crystalline and molecular group vibrations. The method can be widely applied for the analysis of data from a number of fields.
Footnotes
Acknowledgements
This study was supported by JSPS KAKENHI grant number JP18H0615. The authors would like to thank Prof. Purnendu K. Dasgupta, Chemistry and Biochemistry, University of Texas at Arlington, Prof. Kenichi Hamada and Dr. Ji-young Bae, Institute of Biomedical Sciences, Tokushima University and Prof. Satoru Goto, Faculty of Pharmaceutical Sciences, Tokyo University of Science, for their help in interpreting the significance of the results. The authors would furthermore like to thank Noda, Chiba, Japan.
Conflict of interest
The authors declare that they have no conflicts of interest.
