Improving Prediction of Nutrient Recovery via Struvite Precipitation from Organic Waste Digestate

Abstract

Increased organic waste generation in the residential, industrial, and agricultural sectors results in massive amounts of organic waste that are landfilled and incinerated, thereby contributing to environmental pollution. Opportunities exist to recover valuable resources from organic waste to potentially leverage economic and environmental benefits. One common strategy for managing organic waste is anaerobic digestion (AD). The liquid effluent from AD, called digestate, is a concentrated source of phosphorus and nitrogen. These nutrients can be recovered via struvite precipitation. The overall study goal was to quantify the effectiveness of five statistical and machine learning (ML) models in predicting the percentage of nutrients recovered from digestate derived from different organic waste streams via struvite precipitation. Nine combinations of parameters were developed to quantify the effects of multiple parameters on nutrient recovery efficiency. These five models were multiple linear regression (MLR), polynomial regression (PLR), K-nearest neighbors (KNN), random forest (RF), and eXtreme Gradient Boosting (XGBoost). RF and XGBoost had the best performance in predicting nutrient recovery efficiency among the five developed models. Both models had a regression coefficient (R²) for phosphate and ammonium recoveries above 0.90 and a root mean square error of 2–7.67. The comparison of different combinations indicated that predicting PO₄³⁻ and NH₄⁺ recoveries (%) was most influenced by the following input variables: pH, Mg:P and N:P molar ratios, mixing speed, reaction temperature, hydraulic retention time, and concentrations of sodium, potassium, calcium, magnesium, ammonium, and phosphate. We concluded that ML models can provide useful nutrient recovery predictions via struvite precipitation. As a result, the operation of resource recovery systems can be optimized using ML models.

Introduction

The discharge of untreated or undertreated organic waste (e.g., animal manure and sewage sludge) into surface water is a key source of eutrophication (Möller and Müller, 2012; Orner et al., 2020; Sampat et al., 2019), which is adversely impacting accessibility to clean drinking water and safe water recreation (Kakade et al., 2021; Kogler et al., 2021). In the United States, eutrophication affects 50–80% of freshwater ecosystems, contributing to an annual economic loss of $2.2 billion (Kogler et al., 2021). Sustainable organic waste management practices such as anaerobic digestion (AD) and composting could help divert these wastes to reduce eutrophication while also producing valuable resources such as fertilizer and energy (Sganzerla et al., 2023; Wang et al., 2023a, Wang et al., 2023b; Wang and Ng, 2019).

Resource recovery technologies can reduce the negative impacts of conventional organic waste management (e.g., landfilling, untreated discharge) on the economy, environment, and human health (Ddiba et al., 2022; Wu and Vaneeckhaute, 2022). The United Nations Sustainable Development Goals (SDGs) address nutrient pollution and the damage that it may cause to ecosystem functions (United Nations, 2022) (Supplementary Data S1 (Section A), Supplementary Fig. S1). AD is one of the most popular resource recovery technologies to treat organic waste streams (Rocamora et al., 2020). It can be a sustainable method for treating wastes rich in organic matter and moisture content such as organic waste (e.g., food waste), industrial waste (e.g., dairy waste), and municipal waste (e.g., sewage sludge) (Avinash and Mishra, 2023; Bella and Rao, 2023; Di Capua et al., 2020; Prasanna Kumar et al., 2024; Xu et al., 2018). The primary products from AD are biogas and digestate (Agarwal et al., 2022; Lorick et al., 2020; Tariq et al., 2024). Natural gas can be obtained from biogas after carbon dioxide (CO₂), and other products have been removed via posttreatment (Angenent et al., 2022). Globally, large amounts of digestate are produced each year as the number of AD plants increases (e.g., in the European Union, 180 million tons are produced annually) (Doyeni et al., 2021; Weckerle et al., 2023). Untreated digestate poses a serious threat to soil, water bodies, and the atmosphere (Weckerle et al., 2023). Digestate is a source of nutrients, toxic elements (e.g., heavy metals), and pathogens (Golovko et al., 2022; Zhu et al., 2014). The excess discharge of these sources could lead to water and soil pollution. Pathogens from digestate could also cause food contamination, affecting human health. The digestate could release volatile emissions that could pollute the air. It was estimated that digestate releases 139 g of CO₂-eq per kg produced (Barampouti et al., 2020). The digestate contains high levels of inorganic compounds such as nitrogen and phosphorus (Campos et al., 2019). These compounds are considered beneficial to the environment when recovered and used as fertilizers to minimize the use of synthetic fertilizers in agriculture (Ekstrand et al., 2022; Orner et al., 2021b; Pan et al., 2019).

Our study focused on recovering the liquid fraction of the digestate, and many nutrient recovery technologies are used to recover nutrients in digestate including struvite precipitation and ammonia stripping (Fig. 1) (Wu and Vaneeckhaute, 2022). Struvite precipitation is a chemical process with high efficiency in removing phosphorus and nitrogen from digestate (Mousavi et al., 2024; Pandey and Chen, 2021; Zhou and Wu, 2012). Struvite is a slow-release fertilizer that could reduce global reliance on synthetic fertilizers produced from phosphate rock (Battaz et al., 2024). It is considered better than other technologies for many reasons. It can recover ammonium and phosphorus simultaneously compared to ammonia stripping, which removes only ammonia from wastewater (Corona et al., 2021). Additionally, struvite precipitation is regarded as an economically viable option if used in wastewater treatment plants (Achilleos et al., 2022). Constructed wetlands or lagoons are other low-cost technologies that can provide further treatment for the liquid effluent from struvite precipitation, which can be used later for irrigation or aquaculture (Orner et al., 2021a; Styles et al., 2018). Recent literature studies have been conducted using life cycle assessment and techno-economic analysis to assess the sustainability of deploying struvite precipitation integrated with anaerobic digesters. Multiple studies have found that struvite precipitation can simultaneously reduce carbon footprint and eutrophication and be economically beneficial using struvite as fertilizer (Aghdam, 2022; Mayor et al., 2023; Orner et al., 2021a).

FIG. 1.

Resource recovery feedstocks, technologies, and final products. The red dashed box represents this study’s main treatment track.

Predicting the efficiency of nutrient recovery technologies like struvite precipitation is time- and cost-consuming utilizing conventional methods such as laboratory experiments (Leng et al., 2024; Pavan et al., 2022; Soo et al., 2023). Alternative approaches, such as computer modeling, are increasingly being investigated to achieve cheaper, faster, and more accurate outcomes (Vaneeckhaute et al., 2020). Different prediction models have been used in literature for this purpose, such as regression models, equilibrium-based models (thermodynamic), and kinetic-based models (Hurairah et al., 2021; Liao et al., 2020; Saadabadi et al., 2020). Recently, machine learning (ML) applications have gained considerable attention for addressing sustainability challenges through the development of highly accurate predictive models. ML algorithms rely on statistical analysis and computer software to analyze and identify hidden patterns in datasets (Rodrigues et al., 2021; Tsui et al., 2023).

Few scientific publications have used ML tools to model the nutrient recovery efficiency of phosphorus in the form orthophosphate (PO₄³⁻) and nitrogen in the form of ammonium (NH₄⁺) during struvite precipitation compared to statistical models (Li et al., 2021). Supplementary Table S1 summarizes the previous studies that used the statistical and ML models to predict the efficiency of nutrient recovery of PO₄³⁻ and NH₄⁺ by struvite precipitation. The application of data-driven models to investigate struvite precipitation was commonly used to recover nutrients from a single organic waste stream, although the investigation of digestate as a source for nutrient recovery by struvite precipitation has not been well studied (Astals et al., 2021; Lavanya et al., 2019; Leng et al., 2024; Nageshwari et al., 2022). Furthermore, only statistical models were widely used to predict nutrient recovery (McIntosh et al., 2022). Additionally, Nageshwari et al. (2022) used small datasets to predict the efficiency of struvite precipitation (n = 100), and less than 5% of the collected data were for digestate, which is problematic considering that larger sample sizes can give more reliable outcomes (Zaki et al., 2023). Another study used struvite precipitation to recover phosphorus and nitrogen from synthetic wastewater (Leng et al., 2024). The study suggested that ML models such as gradient boosting regression and random forest (RF) could be used to optimize the phosphorus and nitrogen recoveries using struvite precipitation. According to the sensitivity analysis of the models developed by Leng et al., P initial concentration, pH, and Mg:P:N molar ratio played an important role in maximizing nutrient recovery by struvite precipitation. Therefore, ML models can assist organic waste management facilities in optimizing the control parameters by testing the developed models with a larger sample size and then increasing the statistical significance of the optimizing process. In summary, using ML tools to model nutrient recovery for the struvite crystallization process of organic waste digestate needs further study.

The overall goal of this research was to investigate the effectiveness of statistical and ML models in predicting the nutrient recovery efficiency of struvite precipitation from different organic waste digestates and then compare the effectiveness of the two types of data-driven models. The three objectives of the study were the following: (1) to investigate the effect of different parameters on nutrient recovery efficiency (e.g., pH, molar ratio, and reaction temperature), (2) to identify the effective sample size for modeling, and (3) to identify the most accurate model in predicting nutrient recovery efficiency.

Materials and Methods

Data sources

The database used in this study was developed from The Research Repository at West Virginia University, ProQuest, and Google Scholar search engines. The following keywords were used in the search engines using different combinations: “organic waste,” “struvite,” “struvite precipitation,” “digestate,” “anaerobic digestion,” “regression,” and “machine learning.” From this keyword search, the dataset for this study was derived from 26 peer-reviewed publications that provided data on nutrient recovery efficiency by struvite precipitation (both laboratory- and pilot-scale applications) from digestate of organic feedstocks such as sewage sludge, swine manure, and chicken slurry (Fig. 2). Supplementary Table S2 shows the number of studies used in this study and the number of data points used from each one. The number of data points used confirms that the data source is not biased to one study.

FIG. 2.

The distribution of digestate derived from organic feedstock sources in the dataset used by this study.

Typically, struvite precipitation consists of two tanks: a mixing tank for the organic waste digestate and a settling tank for the precipitate. The organic waste digestate goes into a process where NaOH is added to maintain pH and Mg to achieve efficient struvite precipitation from digestate, rich in PO₄ and NH₄. The production of struvite can be shown in the following equation (Jabr et al., 2019; Zhang et al., 2024): ${Mg}^{+ 2} + {NH}_{4}^{+} + {PO}_{4}^{- 3} + 6 H_{2} O \to {MgNH}_{4} {PO}_{4.} 6 H_{2} O$ (1)

Struvite formation is affected by several factors including organic waste composition, pH, degree of supersaturation, and molar ratios of magnesium to phosphorus (Mg:P) and nitrogen to phosphorus (N:P) (Corona et al., 2021; Korchef et al., 2023; Zhang et al., 2024). The cumulative effects of these factors have not been studied. For example, there were limited studies that have reported the effects of some parameters such as reaction temperature on the yield of struvite precipitation and particle size for different wastewater sources (Moyo et al., 2024; Shaddel et al., 2020). Therefore, we focused on the chemistry of the process (i.e., effects on nucleation and nutrient recovery), while the biology was not considered.

Based on the availability of data in the literature, two different scenarios were developed for identifying the effects of multiple parameters on nutrient recovery efficiency. Scenario I analyzed a lower number of parameters, such as pH and the molar ratios of Mg:P and N:P. Scenario II analyzed a higher number of parameters, including pH, mixing speed, reaction temperature, hydraulic retention time (HRT), and concentrations of sodium, potassium, calcium, magnesium, ammonium, and phosphate. These two scenarios have not been well studied in the literature.

Preprocessing of the dataset

Data preprocessing is essential to ensure the preparation of a robust dataset before evaluation using statistical and ML models (Zaki et al., 2023). Preprocessing involves data cleaning, imputation, and normalization.

Data cleaning and imputation

Data cleaning is the process of reducing noise in the data by removing outliers and improving data quality. In most cases, keeping outliers in the dataset causes low confidence in the prediction models (Soo et al., 2023). The developed models were first processed through imputation, followed by the removal of the outliers. The dataset of Scenario I did not have any missing values. Missing data in Scenario II were imputed using the mean imputation method (Altuhaifa et al., 2023). In this method, the missing values in each variable are filled with the average of the total data points (Chen and McCoy, 2024).

Additionally, cleaning the dataset by removing the outliers can help achieve more reliable and accurate predictions. The interquartile range (IQR) rule was applied to all observations in Scenario I and Scenario II to remove outliers. The IQR calculates the difference between the lower quartile and the higher quartile of the dataset. Data points that lie outside the range of the upper quartile plus 1.5 times IQR and the lower quartile minus 1.5 times IQR are considered outliers (Jeong et al., 2020). An average of 10% of the data was removed via this cleaning process for both scenarios and different combinations.

Data normalization

The normalization of data is a preprocessing step that transforms the input features into a comparable range to facilitate an unbiased analysis of all data, thereby improving prediction accuracy (Chen et al., 2022; Singh and Singh, 2020). The input data were normalized using the z-score technique as shown in Equation (2): $z = (y_{i} - \bar{y_{i}}) ∕ σ$ (2)where y_i is the input variable value, $\bar{y}$ _i is the average of all y_i values, and σ is the standard deviation of all y_i values. The dataset was normalized using StandardScaler within the sklearn package in Python.

Feature relationship

The understanding of features is a crucial step in the development of statistical and ML models. The purpose of this step is to understand the relationships between the input variables (i.e., predictors) and the target variable. Pearson’s correlation coefficient (r) determines the linear relationship between different variables, where r = 1 indicates a strong positive relationship, r = −1 indicates a weak negative relationship, and r = 0 indicates no relationship (Ghasemi and Naser, 2023). The corr() method within the Pandas Dataframe in Python was used to determine r between all the parameters in the dataset. Pearson’s correlation analysis is an effective way to identify the impact of multicollinearity in a dataset. However, despite showing pairwise multicollinearity between the input variables, the Pearson’s correlation analysis cannot identify the degree order of multicollinearity. Therefore, we used variance inflation factor (VIF) to identify predictors with high levels of multicollinearity from the dataset. VIF values exceeding 5 indicate moderate levels of multicollinearity, while VIF values exceeding 10 indicate high levels of multicollinearity (Chan et al., 2022). The VarianceInflationFactor within the statistical model’s library in Python was used to quantify multicollinearity.

Data splitting

The dataset was split into two subsets (one for training and one for testing) for both scenarios to develop accurate models and avoid overfitting. Overfitting usually occurs when models fit the training dataset well but fail to fit unseen testing data (Kernbach and Staartjes, 2022; Yeom et al., 2018). Two common methods have been typically used in the literature for data splitting in resource recovery literature: the single hold-one-out approach and the k-fold cross-validation (Zaki et al., 2023). In the single hold-one-out approach, the dataset is split into a 70:30 ratio, a 80:20 ratio, or a 90:10 ratio for the training and testing datasets (Hany et al., 2021; Jha, 2023; Nguyen et al., 2021). The sklearn train_test_split function in Python was used to split the datasets applied to the single hold-one-out approach. A k-fold cross-validation approach splits the dataset into k subsets, or folds, where (k − 1) folds are used for training the model, and the rest of each fold is used for testing. The average of k values was calculated and used to estimate the generalization performance of the final model (Zhang and Liu, 2023). The KFold function in sklearn Python package with k = 5 was utilized to perform this operation. Considering the large variations of sample sizes across the different combinations (n = 74–1062), we applied various training/testing splitting ratios such as 70:30, 80:20, and k-fold typically found in organic waste resource recovery literature since different splitting ratios in past studies have provided best-performing models for a range of sample sizes (Zaki et al., 2023). The training/testing ratios mentioned above were investigated to determine which had the greatest influence on the accuracy of developed models. After that, the best train-to-test ratio for each developed model was selected for modeling the outcomes.

Model selection and development

Two statistical models and three ML models were developed to predict PO₄³⁻ and NH₄⁺ recovery efficiency via struvite precipitation from the digestate of different organic waste streams. The statistical models included multiple linear regression (MLR) and polynomial regression (PLR), and the ML models included K-nearest neighbors (KNN), RF, and eXtreme Gradient Boosting (XGBoost). More details about each model utilized in this study are available in Supplementary Data S1 (Sections A.2–A.6). The framework for developing the five statistical and ML models to predict the efficiency of PO₄³⁻ and NH₄⁺ recovery is shown in Figure 3. The study was developed based on different combinations that vary in sample size and the number of input variables. A combination of different input variables and a number of data points were used to predict the NH₄⁺ and PO₄³⁻ recovery efficiency (Table 1). Zaki et al. (2023) found that there can be a variety of ways to select the predictor variables, such as utilizing correlation analysis and stepwise regression. Based on this, we leveraged correlation analysis, VIF, and feature importance approach in our study because past studies found this procedure to be more effective for small sample cases, such as those in our study. For each combination, the five statistical and ML models were developed to predict the nutrient recovery efficiency of struvite precipitation. For Scenario I, Combinations 1, 2, and 3 were utilized to predict the PO₄³⁻ recovery (%) with different predictors such as pH, Mg:P, reaction temperature, Mg:P, and N:P molar ratios (Table 1). Additionally, Combinations 4, 5, 6, and 7 were utilized to predict NH₄⁺ recovery (%) with different predictors such as pH, N:P, Mg:P, and reaction temperature (Table 1). For Scenario II, Combinations 8 and 9 utilized a higher number of predictors to predict the PO₄³⁻ and NH₄⁺ recoveries (%), such as reaction temperature, mixing speed, HRT, and concentrations of sodium, potassium, calcium, magnesium, ammonium, and phosphate. Scenario I focused on the effect of sample size on the prediction models, while Scenario II focused on the influence of the number of input variables. It is important to highlight that the feedstock type is not considered a predictor because aggregating datasets based on feedstock type will significantly reduce the sample size, eventually impacting the explanatory power of the developed models.

FIG. 3.

Theoretical framework of this study.

Table 1.

The Data-Driven Models for Different Combinations of Input-Output Variables for Scenario I and Scenario II

Combination #	Input variables	Output	Scenario #	# of studies	Sample size (n)	Organic feedstock used
1	pH, Mg:P	PO₄^3- Recovery (%)	I	20	1,062	SW, CW, DW, FW, PM, POM, SS, SM, SW, WW
2	pH, Mg:P, T	PO₄^3- Recovery (%)	I	7	538	SW, DW, SS, SM, WW
3	pH, Mg:P, T, N:P	PO₄^3- Recovery (%)	I	4	251	SW, SS, WW
4	pH, N:P	NH₄⁺ Recovery (%)	I	10	506	SW, CM, FW, SS, SM, SWW, WW
5	pH, N:P, Mg:P	NH₄⁺ Recovery (%)	I	10	506	SW, CM, FW, SS, SM, SWW, WW
6	pH, N:P, T	NH₄⁺ Recovery (%)	I	3	217	SW, SS, WW
7	pH, N:P, Mg:P, T	NH₄⁺ Recovery (%)	I	3	232	SW, SS, WW
8	pH, MS, T, HRT, SC, KC, CC, MC, NC, PC	PO₄^3- Recovery (%)	II	2	74	SW, SS
9	pH, MS, T, HRT, SC, KC, CC, MC, NC, PC	NH₄⁺ Recovery (%)	II	2	74	SW, SS

Each combination was analyzed with the Multiple Linear Regression (MLR), Polynomial Regression (PLR), K-Nearest Neighbors (KNN), Random Forest (RF), and eXtreme Gradient Boosting (XGBoost) models.

T, Reaction temperature; MS, Mixing speed; HRT, Hydraulic Retention Time; SC, Sodium concentration; KC, Potassium concentration; CC, Calcium concentration; MC, Magnesium concentration; NC, Ammonium concentration; PC, Phosphate concentration; SW, Slaughterhouse wastewater; CW, Cattle manure; DW, Dairy waste; FW, Food wastewater; PM, Pig manure; PMO, Poultry manure; SS, Sewage sludge; SM, Swine manure; SW, Swine wastewater; WW, Wastewater; SWW, Swine wastewater.

Model evaluation

The five models were compared to determine which model performed best and what sample size was most effective for future predictions. The performance of the models was evaluated using two criteria: coefficient of determination (R²) and root mean square error (RMSE) because all these models were calibrated for regression, not classification. The training and testing predicted values were compared using R². The formula used for the R² calculation was the following: $R^{2} = 1 - \frac{Σ_{i} {(y_{i} - {\hat{y}}_{i})}^{2}}{Σ_{i} (y_{i} - {\bar{y}}_{i})}$ (3)

The RMSE represented the difference between the predicted target values from a model and actual target values from a dataset. The ideal RMSE value is 0, which means there is no residual error between the actual and predicted values. The RMSE was calculated using Equation (4) (Shyu et al., 2023): $RMSE = \sqrt{\sum_{i = 1}^{n} \frac{{({\hat{y}}_{i} - y_{i})}^{2}}{n}}$ (4)where $y_{i}$ is the observed value, ${\hat{y}}_{i}$ is the predicted value, and n is the number of the samples in the predicted model. If the data-driven model did not meet model evaluation criteria (high R² and low RMSE), it was rejected, and data preprocessing was used again to redevelop the model and test its performance. The Python library package “Scikit-Learn 1.3.0” version was used to preprocess the dataset and implement the data-driven models. Google Colab notebook was used to run and process the codes.

Feature importance using tree-based algorithms

The feature importance analysis, which is the last step in the process, was used to determine the contribution of each variable to the target variable in the ML prediction. A weight was assigned to each predictor based on how significantly it influenced the output variable (Wei et al., 2024). The technique is a postmodel sensitivity of parameters that is only available for tree-based models (e.g., RF and XGBoost) (Zaki et al., 2024), and it is not available for KNN. The scores were calculated based on the assigned weight of each input parameter in the trained dataset. The higher the score for the input variable, the more influential it had in making predictions, which increases its possibility of being used for feature selection.

The variability of the predictors was covered in the feature importance section. Additionally, because the feature importance attribute is only available for RF and XGBoost, and not for the other models, these variations have been further discussed through Pearson’s correlation analysis. The variance in the dataset could affect the predictability of the models. We used the data preprocessing step (remove outliers, imputation, and normalization) to build a robust dataset before developing the models (see the Preprocessing of the Dataset section).

Results and Discussion

A description of the dataset and statistical analyses can be found in Supplementary Data S1 (Sections A.7–A.14, Supplementary Table S3, and Supplementary Figs. S2–S8). Information regarding data correlation is also in Supplementary Data S1 (Section A.15, Supplementary Figs. S9–S16, and Supplementary Tables S4–S11). Results of nutrient recovery prediction, the role of input variables, and study implications are detailed in the following subsections.

Predictions of PO₄³⁻ and NH₄⁺ recovery (%)

Effects of input variables and the sample size on prediction

Input variables and sample sizes were investigated to identify the best model to predict PO₄³⁻ and NH₄⁺ recovery during struvite precipitation of various organic waste digestates. In this case, Table 2 summarizes the results of the most accurate models for each combination from Supplementary Tables S12–S15 based on the different splitting ratios for training and testing and values of R² and RMSE. Additionally, the emboldened data from each row per combination (see Table 2) represented the best-performing model of the five models developed (either statistical or ML). For example, for Combination 3, RF with a splitting ratio of 90:10 provided the most accurate model since R² values were the highest (training = 0.94 and testing = 0.72) than that of the other models (training = 0.38–0.86 and testing = 0.34–0.71). Additionally, the RMSE values were the lowest (training = 2.00 and testing = 4.40) compared with the other models (training = 2.84–5.87 and testing = 4.92–7.48). Both statistical models (MLR and PLR) of Combination 3 (R² = 0.38–0.56, RMSE = 5.00–5.87, Table 2) achieved slightly better predictions compared to Combination 1 (R² = 0.06–0.08, RMSE = 26.81–27.01, Table 2) and Combination 2 (R² = 0.28–0.37, RMSE = 16.78–17.90, Table 2). Combination 3 had the lowest sample size (n = 251) compared with Combination 1 (n = 1,026) and Combination 2 (n = 538). The ML models had superior predictive accuracy (R² = 0.59–0.94, RMSE = 2.00–18.00, Table 2) compared with statistical models for Combinations 1–3. The ML model of Combination 3 had better prediction performance (R² = 0.83–0.94, RMSE = 2.00–3.11, Table 2) compared with similar models in Combinations 1 and 2 (R² = 0.59–0.90, RMSE = 6.87–18.00, Table 2). Interestingly, Combination 3 had the lowest sample size (n = 251) compared with Combinations 1 (n = 1,026) and 2 (n = 538).

Table 2.

Summary of the Best Performance Statistical and ML Models Based on Different Splitting Ratios Used

Combination #	Model name	Training		Testing		Splitting ratio
Combination #	Model name	R ²	RMSE	R ²	RMSE	Splitting ratio
1	MLR	0.064	27.01	0.06	26.96	70:30
	PLR	0.08	26.81	0.07	26.80	70:30
	KNN	0.75	13.92	0.49	19.83	70:30
	RF	0.84	11.05	0.66	15.50	90:10
	XGBoost	0.59	18.00	0.60	16.72	90:10
2	MLR	0.28	17.9	0.20	19.63	70:30
	PLR	0.37	16.78	0.24	19.06	70:30
	KNN	0.85	8.26	0.52	12.28	K-fold
	RF	0.90	6.87	0.68	11.83	80:20
	XGBoost	0.79	9.81	0.62	12.36	K-fold
3	MLR	0.38	5.87	0.34	7.48	K-fold
	PLR	0.56	5.00	0.55	6.15	K-fold
	KNN	0.86	2.84	0.71	4.97	K-fold
	RF	0.94	2.00	0.72	4.40	90:10
	XGBoost	0.83	3.11	0.71	4.92	K-fold
4	MLR	0.15	37.05	0.14	36.16	K-fold
	PLR	0.67	22.83	0.65	23.24	70:30
	KNN	0.93	10.68	0.90	12.43	K-fold
	RF	0.97	6.91	0.87	13.82	K-fold
	XGBoost	0.86	15.1	0.76	19.04	90:10
5	MLR	0.36	32.19	0.11	37.37	K-fold
	PLR	0.70	22.20	0.67	21.79	90:10
	KNN	0.96	8.09	0.94	9.76	80:20
	RF	0.98	6.11	0.95	9.45	K-fold
	XGBoost	0.94	10.19	0.90	12.27	70:30
6	MLR	0.32	2.74	0.15	3.28	70:30
	PLR	0.38	2.61	0.13	3.31	70:30
	KNN	0.65	1.97	−0.19	3.89	70:30
	RF	0.85	1.28	−0.04	3.64	70:30
	XGBoost	0.67	1.90	0.13	3.31	70:30
7	MLR	0.29	3.12	0.18	3.71	70:30
	PLR	0.35	2.99	0.22	3.61	70:30
	KNN	0.71	1.97	0.19	3.90	80:20
	RF	0.86	1.36	0.30	3.43	70:30
	XGBoost	0.68	2.11	0.26	3.52	70:30
8	MLR	0.7	14.06	−0.54	35.46	80:20
	PLR	0.94	6.21	−0.51	33.3	80:20
	KNN	0.84	10.33	0.70	13.06	K-fold
	RF	0.91	7.55	0.83	11.78	80:20
	XGBoost	0.96	5.42	0.93	7.68	80:20
9	MLR	0.82	10.22	0.53	22.39	90:10
	PLR	0.95	7.05	−0.55	56.61	90:10
	KNN	0.87	8.93	0.80	12.87	80:20
	RF	0.89	8.85	0.89	7.50	K-fold
	XGBoost	0.92	7.67	0.87	8.32	K-fold

Emboldened rows represent the best-performed model for each combination.

KNN, K-nearest neighbor; ML, machine learning; MLR, multiple linear regression; PLR, polynomial regression; RF, random forest; RMSE, root mean square error; XGBoost, eXtreme Gradient Boosting.

The effect of the sample size and the number of input variables was inconclusive on the remaining developed models in Combinations 4–7. The ML models in Combinations 4 and 5 exhibited superior prediction performance (R² = 0.86–0.98, RMSE = 6.11–15.10, Table 2), surpassing the performance of other models (R² = 0.28–0.94, RMSE = 1.28–18.00, Table 2). Scenario II had lower sample sizes (n = 74) and more input variables than Scenario I. The results of the statistical models in Scenario II (MLR) had slightly better prediction performance (training R² = 0.7–0.82 and testing R² = −0.54 to 0.53, Table 2) when compared with the same models in Scenario I (training R² = 0.06–0.67 and testing R² = 0.06–0.65, Table 2). In contrast, the prediction performance of the ML models in Scenario II outcompeted the performance of similar models in Scenario I.

The study indicated that changing the sample size had no clear impact on developing data-driven models, as the only two sample sizes that positively impacted the ML models were in two different ranges: n = 506 (Combinations 4 and 5) and n = 74 (Combinations 8 and 9). At the same time, the comparison of different combinations indicated that the input variables (pH, Mg:P and N:P molar ratios, mixing speed, reaction temperature, HRT, concentrations of sodium, potassium, calcium, magnesium, ammonium, and phosphate) were the best choices to predict PO₄³⁻ and NH₄⁺ recovery (%).

This study has similar findings to Nageshwari et al. (2022), who studied the effects of different parameters on the recovery of nutrients from different wastewater sources. Several factors common to both studies had a positive impact on the results, including pH, mixing speed, reaction temperature, and magnesium, ammonium, and phosphate concentrations. However, their study did not discuss the effect of sample size on the developed models. The only sample size used was 100, which differs from our study’s range of sample sizes. Alternatively, the results of our study are in agreement with those of Leng et al. (2024) when it comes to the use of different sample sizes. Using a sample size of 210 and 510, they showed a significant effect with their results, which is partly aligned with our results (e.g., n = 506). Furthermore, the pH and Mg:P:N molar ratio were common factors employed in both studies, and they had a significant impact on the outcomes. Finally, we would like to point out that the different combinations of data utilized for this study did not indicate a consistent trend based on the different training-to-testing ratios (see Table 2, where some models with a small sample size performed best with a 70:30 ratio while others with a small sample size performed best with a k-fold splitting ratio).

Prediction performance of the statistical and ML models

Among all models, the results showed that MLR and PLR (Combinations 1–9) had lower prediction accuracies (R² = 0.06–0.95, RMSE = 2.61–37.05, Table 2) than ML models in both scenarios (R² = 0.59–0.98, RMSE = 1.28–18.00, Table 2). This was especially relevant to Combinations 8 and 9, where multicollinearity may have contributed to the low statistical performance (VIF for sodium, potassium, and ammonium concentrations >10). It should be noted that all statistical models were prone to overfitting (training R² = 0.70–0.95 and testing R² = −0.55 to 0.53, Table 2). Nevertheless, some of the KNN models showed robust and accurate prediction in many developed models, such as Combination 4 and Combination 5 (R² = 0.93–0.96, RMSE = 8.09–10.68, Table 2). However, other KNN models had low prediction performance due to overfitting, such as Combination 1 and Combination 6 (training R² = 0.65–0.75 and testing R² = −0.19 to 0.49, Table 2). The RF model (Scenario I) and XGBoost model (Scenario II) had superior prediction accuracy (R² = 0.92–0.98, RMSE = 2.00–7.67, Table 2, Fig. 4a and b) in modeling the efficiency of PO₄³⁻ and NH₄⁺ recovery than all the other models (R² = 0.06–0.97, RMSE = 1.90–37.05, Table 2).

FIG. 4.

Linear fitting results for the actual versus predicted values for ML models developed for (a) PO₄^3- recovery (RF, Combination 3), (b) NH₄⁺ recovery (RF, Combination 5), (c) PO₄^3- recovery (XGBoost, Combination 8), and (d) NH₄⁺ recovery (XGBoost, Combination 9). The dashed line represents the line of equality (y = x). These models were developed using different training and testing ratios: 90:10 for (RF, Combination 3), K-fold for (RF, Combination 5), 80:20 for (XGBoost, Combination 8) and fivefold cross-validation for (XGBoost, Combination 9). ML, machine learning; RF, random forest; XGBoost, eXtreme Gradient Boosting.

In Combinations 8 and 9 (Scenario II), the XGBoost models were more accurate for predicting PO₄³⁻ and NH₄⁺ recovery than RF and KNN (XGBoost [Combination 8], training R² = 0.92–0.96 and testing R² = 0.87–0.93, Fig. 4c and d). The results across different combinations indicated that RF (RF [Combination 3] and RF [Combination 5]) was the best prediction model for Scenario I and XGBoost (XGBoost [Combination 8] and XGBoost [Combination 9]) for Scenario II. Overall, the results showed that the tree-based models (RF and XGBoost) performed better than the KNN model. Additionally, the ML models performed better than statistical models.

The role of input variables in prediction

Based on the best performing models, as discussed in the “Prediction Performance of the Statistical and ML Models” section, across different scenarios and combinations, we investigated the role of input variables in prediction (RF [Combination 3], RF [Combination 5], XGBoost [Combination 8], and XGBoost [Combination 9]). The Mg:P molar ratio significantly impacted the efficiency of PO₄³⁻ recovery, followed by N:P (RF [Combination 3], Fig. 5A). This is understandable because the proportion of magnesium and nitrogen ions relative to phosphate ions in the wastewater is essential for struvite formation and PO₄³⁻ recovery (Wang et al., 2021). Typically, struvite precipitation requires a molar ratio (Mg:N:P) of (1:1:1) (Wang et al., 2021). Additionally, some studies showed that the increase in PO₄³⁻ recovery is related to the increase in the N:P molar ratio (Moyo et al., 2023; Wang et al., 2021). In our study, results from the feature importance analysis indicated that the reaction temperature and pH had less impact on the performance of the model when compared with Mg:P and N:P molar ratios, which had a relatively higher effect (RF [Combination 3], Fig. 5a). The reaction temperature influences the solubility of struvite crystals and then the precipitation in wastewater (Shaddel et al., 2020). A high Mg:P molar ratio could reduce the effect of reaction temperature on the efficiency of PO₄³⁻ recovery (Otieno et al., 2023). The N:P molar ratio was the second important factor (with <30% of the contribution), and the pH was the third contribution with the prediction model (with <20% of the contribution). In RF, Combination 5, the N:P molar ratio had a significant effect on predicting the efficiency of NH₄⁺ recovery with more than 70% of the total contribution. However, the Mg:P molar ratio and pH had a low contribution to NH₄⁺ recovery (<30%) (Fig. 5b).

FIG. 5.

Feature importance analysis results. RF model for (a) Combination 3 for PO₄^3- recovery, and (b) Combination 5 using RF model for NH₄⁺ recovery. XGBoost for (c) Combination 8 for PO₄^3- recovery and (d) Combination 9 for NH₄⁺ recovery. MS, mixing speed; RT, reaction temperature; HRT, hydraulic retention time; Na, sodium concentration; K, potassium concentration; Ca, calcium concentration; Mg, magnesium concentration; AC, ammonium concentration; PC, phosphate concentration; P%, PO₄^3- recovery; N%, NH₄⁺ recovery.

Figure 5C shows the association between different input variables and the efficiency of PO₄³⁻ recovery in XGBoost, Combination 8 (Scenario II). The ammonium concentration is the most important parameter influencing PO₄³⁻ recovery (>35% of the contribution). According to the literature, pH values of 7–11 are considered the optimum value for PO₄³⁻ and NH₄⁺ recovery (Balaguer-Barbosa, 2018; Hakimi et al., 2020; Otieno et al., 2023). Consistent with previous literature, pH values ranged from 8 to 11 (Supplementary Table S3) showed the second highest impact on PO₄³⁻ recovery with more than 15% of the total contribution in our study (Fig. 5c).

The results indicated that mixing speed played a relatively minor role in predicting the outcomes of the models using Combinations 8 and 9. Mixing speed usually enhances crystal nucleation and the process of struvite growth (Korchef et al., 2023). The optimum mixing speed values that enhance the production of large crystals range between 100 and 200 rpm (González-Morales et al., 2019). The range mixing speed in this study was 150–700 rpm, which is slightly higher than the optimum range reported in the literature, which could explain its limited effect on nutrient recovery (Korchef et al., 2023; Perwitasari et al., 2023). High values of mixing speed could increase the breaking of struvite crystals, affecting the struvite growth and PO₄³⁻ recovery (Siciliano et al., 2020). The intensity of mixing speed affects other factors as well, such as the reaction time (which was not considered in this study) and the molar ratio of Mg:N:P, thereby affecting the nutrient recovery by struvite (Rodlia et al., 2020). González-Morales et al. (2019) found that the PO₄³⁻ recovery was not affected by the mixing speed, which is consistent with our findings. On the contrary, the contribution of the reaction temperature (ranging between 20°C and 25°C; Supplementary Table S3) to the prediction model was very low compared with the other parameters.

Struvite precipitation is usually influenced by the presence of foreign ions in the solution, such as calcium, potassium, and sodium (Hakimi et al., 2020). According to a previous study (Kubar et al., 2021), calcium and potassium concentrations enhance PO₄³⁻ recovery efficiency. Our results showed that calcium had the third higher contribution to the prediction model along with PO₄³⁻ concentration (around 12%), while potassium concentration ranked fourth along with sodium (around 8%) (Fig. 5c).

The low contribution of magnesium (<5%) to the model could be explained by the low variability of Mg²⁺ concentrations compared with that of NH₄⁺ concentration in the wastewater source (Supplementary Table S3). Also, the results showed that HRT (operational parameter) had less contribution (<6%) on the model compared to pH (around 19%). According to the results, ammonium concentration and the reaction pH had the highest contributions to the prediction decision (>53%).

For NH₄⁺ recovery, most of the significant influence on the prediction model was carried out by the operational parameters, HRT, and pH (around 85% of the total contribution) (Fig. 5d). The reaction pH (8–11) had the most significant impact, with more than 47% of the total contribution. It was reported by previous studies (Otieno et al., 2023; Rodlia et al., 2020) that when pH values are within the optimum ranges (7–11), an ideal effect on struvite crystallization and NH₄⁺ recovery can be achieved. The contributions of HRT, reaction temperature, and mixing speed in the NH₄⁺ recovery prediction model were higher than those of the PO₄³⁻ recovery prediction decision (25%, 8%, and 5%, respectively). The concentrations of coexisting cations such as calcium, sodium, and potassium had a low significant impact on NH₄⁺ recovery. The presence of coexisting cations in wastewater, such as calcium, sodium, and potassium cations, can replace magnesium ions in wastewater and hinder the formation of struvite and reduce the recovery of NH₄⁺ (Ye et al., 2018). Results showed that phosphate and magnesium concentrations (<1% and <5%, respectively) had a low contribution in predicting the outcome.

Implications of the developed models

This study is considered the first to predict nutrient recovery efficiency from digestate of various organic waste streams using multiple statistical and ML models. In previous literature, only small sample sizes (∼7 data points) of digestate along with a combination of other wastewater sources were used to develop ML models (Nageshwari et al., 2022). Comparing the performance of different ML models (e.g., compare R² values) with previous studies can be quite challenging due to the variation in data collection procedure that includes input variables and sample sizes. As part of our study, we investigated the influence of various sample sizes, the number and type of input variables, and the use of statistical and ML models to predict the efficiency of recovering phosphorus and nitrogen from organic waste digestate by struvite precipitation. The overall results showed that ML models performed better than statistical models in predicting nutrient recovery efficiency using struvite precipitation, and the effect of sample size was less apparent compared to the effect of input variables. Notably, there is a dearth of literature addressing the application of data-driven models to assess the struvite precipitation efficiency of digestate. We utilized literature studies to develop a general model that organic waste management facilities can use to mitigate eutrophication and generate fertilizer to support food security. Therefore, the potential impact of this work on the general public is to cost-effectively promote nutrient recovery in existing organic waste using data science.

In the past, only theoretical models, such as the chemical equilibrium model, were used to estimate nutrient recovery efficiency. Using Visual MINTEQ, a chemical equilibrium model, Çelen et al. (2007) estimated PO₄³⁻ recovery from liquid swine manure through struvite precipitation. The model’s input variables were magnesium, calcium, potassium, orthophosphate concentration, NH₄⁺ concentration, alkalinity, and pH. The results showed that the PO₄³⁻ recovery efficiency was 97% based on actual experimental values. The same model, Visual MINTEQ, was applied by a previous study (Jia et al., 2017) to optimize the NH₄⁺ recovery from anaerobic digester effluent. This study showed that Mg^2+:PO₄³⁻ significantly affected NH₄⁺ recovery and achieved over 96% efficiency through the prediction model. Based on the results of the studies, it is possible to use data-driven models to achieve similar outcomes without the necessity of performing laboratory experiments and consuming money on the supplies for such experiments. Future researchers can integrate the RF and XGBoost models developed in this study with theoretical models like MINTEQ to develop hybrid models to inform further improved predictions (Mehrani et al., 2022). One way to develop hybrid models would be to predict the efficiency of nutrient recovery using theoretical models first and then predict the errors of the actual and predicted values using the ML models (Quaghebeur et al., 2022; Xu et al., 2024).

The results suggest that ML methods can benefit the modeling, forecasting, and decision-making processes related to the process of struvite precipitation. This could be very effective for large-scale integrated systems (anaerobic digesters and struvite precipitation) as an integrated resource recovery system for energy and fertilizer. ML models could help solve problems related to feedstock fluctuations and optimize the treatment of digestate for each digestion case (e.g., monodigestion and codigestion). Accordingly, the ML models presented in this study may assist system operators in better-optimizing control parameters, for example, using response surface methodology (RSM). RSM is a statistical method that investigates the relationship between different factors and responses and helps optimize the process outcomes (Pereira et al., 2021). The developed models could be tested using the optimal conditions of the experiments. This could help validate these models’ predictions and ensure the desired results. For example, typical RSM applications in literature are conducted using small datasets (n = 10–30) depending on the experimental procedures (Zaki et al., 2023). Therefore, a potential way to further improve the optimization capabilities of RSMs in struvite precipitation could be to utilize our developed models with higher sample sizes (n = 74–1,026). This could help operators further improve decision-making through increased statistical significance of the optimized control parameters. Additionally, using this approach could increase the amount of nutrients recovered for fertilizer production (such as nitrogen and phosphorus) and improve the economic viability of AD systems to manage organic wastes. Finally, the optimization of different parameters controlled by the system could contribute to achieving other relevant SDG goals such as SDG 1 (No Poverty), SDG 2 (Zero Hunger), and SDG 12 (Response Consumption and Production).

In many rural areas of the world, improper disposal of organic waste from organic waste management facilities is causing pollution in the environment. Our model can help the managers of these facilities to effectively integrate nutrient recovery into their facilities, which can help accelerate not only the reduction of nutrient pollution (addressing SDG 6 and 14) but also help local communities to gain revenues by selling the struvite as fertilizers (SDG 1 and SDG 12).

Conclusion

This study investigated the efficiency of nutrient recovery (PO₄³⁻ and NH₄⁺) from organic waste digestate via struvite precipitation. Determining nutrient recovery efficiency for any source of organic feedstock typically requires a series of laboratory experiments, which can be expensive and time-consuming. Statistical and ML models were employed to predict the nutrient recovery efficiency of struvite precipitation. After testing different scenarios, results indicated that ML models could provide better predictions compared to statistical models. The tree-based ML models (RF and XGBoost) showed the best predictions for PO₄³⁻ and NH₄⁺ recovery. Changing the sample size had no clear impact on developing data-driven models, as the only two sample sizes that positively impacted the ML models were in two different ranges: n = 506 and n = 74. At the same time, the comparison of different combinations indicated that the input variables (pH, Mg:P and N:P molar ratios, reaction temperature, HRT, concentrations of ammonium, and phosphate) were the best choices to predict PO₄³⁻ and NH₄⁺ recovery (%). More experimental data are needed to improve the models in the future. Additionally, ML models need to be developed, considering a feedstock type as a predictor in future studies. The practical application of our models has three benefits. First, the model can be used as a cost-effective tool by organic waste management facilities to promote nutrient recovery in rural areas. Second, the scientific communities can utilize this model to develop hybrid models for improved predictions. Third, the model can promote a circular economy and address SDGs to mitigate global climate change issues related to waste management.

Footnotes

Authors’ Contributions

H.A.: Conceptualization, data curation, formal analysis, investigation, methodology, validation, visualization, writing–original draft, and writing–review and editing. M.T.Z.: Investigation, methodology, software, and writing–review and editing. K.D.O.: Conceptualization, project administration, supervision, and writing–review and editing.

Author Disclosure Statement

No conflict of interest.

Funding Information

No funding to disclose.

Supplementary Material

References

Achilleos

, Roberts

, Williams

. Struvite precipitation within wastewater treatment: A problem or a circular economy opportunity? Heliyon, 2022; 8(7):e09862; doi: 10.1016/j.heliyon.2022.e09862

Agarwal

, Kumar

, Ghosh

, et al. Anaerobic digestion of sugarcane bagasse for biogas production and digestate valorization. Chemosphere, 2022; 295:133893; doi: 10.1016/j.chemosphere.2022.133893

Aghdam

. Environmental Impact Assessment (LCA) and Techno-Economic Assessment (TEA) of struvite recovery in swine manure. M.Sc. North Carolina State University: United States – North Carolina; 2022.

Altuhaifa

, Win

, Su

. Predicting lung cancer survival based on clinical data using machine learning: A review. Comput Biol Med, 2023; 165:107338; doi: 10.1016/j.compbiomed.2023.107338

Angenent

, Usack

, Sun

, et al. Upgrading anaerobic digestion within the energy economy – the methane platform. In: Resource Recovery from Water. ( Pikaar

, Guest

, Ganigué R eds) IWA Publishing; 2022; pp. 141–158; doi: 10.2166/9781780409566_0141

Astals

, Martínez-Martorell

, Huete-Hernández

, et al. Nitrogen recovery from pig slurry by struvite precipitation using a low-cost magnesium oxide. Sci Total Environ, 2021; 768:144284; doi: 10.1016/j.scitotenv.2020.144284

Avinash

, Mishra

. Enhancing biogas production in anaerobic digestion of MSW with addition of bio-solids and various moisture sources. Fuel, 2023; 354:129414; doi: 10.1016/j.fuel.2023.129414

Balaguer-Barbosa

. Recovery of Nutrients from Anaerobically Digested Enhanced Biological Phosphorus Removal (EBPR) Sludge through Struvite Precipitation. M.S.E.V. University of South Florida: United States – Florida; 2018.

Barampouti

, Mai

, Malamis

, et al. Exploring technological alternatives of nutrient recovery from digestate as a secondary resource. Renew Sustain Energy Rev, 2020; 134:110379; doi: 10.1016/j.rser.2020.110379

10.

Battaz

, Djazi

, Allal

, et al. Phosphorus recovery as struvite from wastewater by using seawater, brine and natural brine. Desalination Water Treat, 2024; 317:100082; doi: 10.1016/j.dwt.2024.100082

11.

Bella

, Rao

. Anaerobic digestion of dairy wastewater: Effect of different parameters and co-digestion options—A review. Biomass Conv Bioref, 2023; 13(4):2527–2552; doi: 10.1007/s13399-020-01247-2

12.

Campos

, Crutchik

, Franchi

, et al. Nitrogen and phosphorus recovery from anaerobically pretreated agro-food wastes: A review. Front Sustain Food Syst, 2019; 2.

13.

Çelen

, Buchanan

, Burns

, et al. Using a chemical equilibrium model to predict amendments required to precipitate phosphorus as struvite in liquid swine manure. Water Res, 2007; 41(8):1689–1696; doi: 10.1016/j.watres.2007.01.018

14.

Chan

JY-L

, Leow

SMH

, Bea

, et al. Mitigating the multicollinearity problem and its machine learning approach: A review. Mathematics, 2022; 10(8):1283; doi: 10.3390/math10081283

15.

Chen

W-Y

, Chan

, Lim

, et al. Artificial Neural Network (ANN) modelling for biogas production in pre-commercialized Integrated Anaerobic-Aerobic Bioreactors (IAAB). Water, 2022; 14(9):1410; doi: 10.3390/w14091410

16.

Chen

, McCoy

. Missing values handling for machine learning portfolios. J Financ Econ, 2024; 155:103815; doi: 10.1016/j.jfineco.2024.103815

17.

Corona

, Hidalgo

, Martín-Marroquín

, et al. Study of the influence of the reaction parameters on nutrients recovering from digestate by struvite crystallisation. Environ Sci Pollut Res Int, 2021; 28(19):24362–24374; doi: 10.1007/s11356-020-08400-4

18.

Ddiba

, Andersson

, Rosemarin

, et al. The circular economy potential of urban organic waste streams in low- and middle-income countries. Environ Dev Sustain, 2022; 24(1):1116–1144; doi: 10.1007/s10668-021-01487-w

19.

Di Capua

, Spasiano

, Giordano

, et al. High-solid anaerobic digestion of sewage sludge: Challenges and opportunities. Appl Energy, 2020; 278:115608; doi: 10.1016/j.apenergy.2020.115608

20.

Doyeni

, Stulpinaite

, Baksinskaite

, et al. The effectiveness of digestate use for fertilization in an agricultural cropping system. Plants (Basel), 2021; 10(8):1734; doi: 10.3390/plants10081734

21.

Ekstrand

E-M

, Björn

, Karlsson

, et al. Identifying targets for increased biogas production through chemical and organic matter characterization of digestate from full-scale biogas plants: What remains and why? Biotechnol Biofuels Bioprod, 2022; 15(1):16; doi: 10.1186/s13068-022-02103-3

22.

Ghasemi

, Naser

. Tailoring 3D printed concrete through explainable artificial intelligence. Structures, 2023; 56:104850; doi: 10.1016/j.istruc.2023.07.040

23.

Golovko

, Ahrens

, Schelin

, et al. Organic micropollutants, heavy metals and pathogens in anaerobic digestate based on food waste. J Environ Manage, 2022; 313:114997; doi: 10.1016/j.jenvman.2022.114997

24.

González-Morales

, Camargo-Valero

, Molina-Pérez

, et al. Effect of the stirring speed on the struvite formation using the centrate from a WWTP. Rev Facing Univ Antioquia, 2019(92):42–50.

25.

Hakimi

, Jegatheesan

, Navaratna

. The potential of adopting struvite precipitation as a strategy for the removal of nutrients from Pre-AnMBR treated abattoir wastewater. J Environ Manage, 2020; 259:109783; doi: 10.1016/j.jenvman.2019.109783

26.

Hany

, Atef

, Mostafa

, et al. Detection COVID-19 using machine learning from blood tests. In: 2021 International Mobile, Intelligent, and Ubiquitous Computing Conference (MIUCC), 2021; pp. 229–234; doi: 10.1109/MIUCC52538.2021.9447639

27.

Hurairah

, Halim

, Aziz

. Stabilized leachate treatment by using combination of struvite precipitation and coagulation-flocculation methods: RSM optimization. IOP Conf Ser: Earth Environ Sci, 2021; 646(1):12026; doi: 10.1088/1755-1315/646/1/012026

28.

Jabr

, Saidan

, Al-Hmoud

. Phosphorus recovery by struvite formation from Al Samra municipal wastewater treatment plant in Jordan. Desalination Water Treat, 2019; 146:315–325; doi: 10.5004/dwt.2019.23608

29.

Jeong

, Woo

, Park

. Machine learning methodology for management of shipbuilding master data. Int J Nav Archit Ocean Eng, 2020; 12:428–439; doi: 10.1016/j.ijnaoe.2020.03.005

30.

Jha

. A systematic study on student performance prediction from the perspective of machine learning and data mining approaches. In: 2023 8th International Conference on Communication and Electronics Systems (ICCES), 2023; pp. 1336–1342; doi: 10.1109/ICCES57224.2023.10192866

31.

Jia

, Zhang

, Krampe

, et al. Applying a chemical equilibrium model for optimizing struvite precipitation for ammonium recovery from anaerobic digester effluent. J Clean Prod, 2017; 147:297–305; doi: 10.1016/j.jclepro.2017.01.116

32.

Kakade

, Salama

E-S

, Han

, et al. World eutrophic pollution of lake and river: Biotreatment potential and future perspectives. Environ Technol Innov, 2021; 23:101604; doi: 10.1016/j.eti.2021.101604

33.

Kernbach

, Staartjes

. Foundations of machine learning-based clinical prediction modeling: Part II—generalization and overfitting. In: Machine Learning in Clinical Neuroscience. ( Staartjes

, Regli

, and Serra

. eds). Acta Neurochirurgica Supplement Springer International Publishing: Cham; 2022; pp. 15–21; doi: 10.1007/978-3-030-85292-4_3

34.

Kogler

, Farmer

, Simon

, et al. Systematic evaluation of emerging wastewater nutrient removal and recovery technologies to inform practice and advance resource efficiency. ACS Est Eng, 2021; 1(4):662–684; doi: 10.1021/acsestengg.0c00253

35.

Korchef

, Abouda

, Souid

. Optimizing struvite crystallization at high stirring rates. Crystals, 2023; 13(4):711; doi: 10.3390/cryst13040711

36.

Kubar

, Huang

, Sajjad

, et al. The recovery of phosphate and ammonium from biogas slurry as value-added fertilizer by biochar and struvite co-precipitation. Sustainability, 2021; 13(7):3827; doi: 10.3390/su13073827

37.

Lavanya

, Ramesh

, Nandhini

. Phosphate recovery from swine wastewater by struvite precipitation and process optimization using response surface methodology. Desalination Water Treat, 2019; 164:134–143; doi: 10.5004/dwt.2019.24447

38.

Leng

, Kang

, Xu

, et al. Machine-learning-aided prediction and optimization of struvite recovery from synthetic wastewater. J Water Process Eng, 2024; 58:104896; doi: 10.1016/j.jwpe.2024.104896

39.

, Emaminejad

, Aguiar

, et al. Evaluating long-term treatment performance and cost of nutrient removal at water resource recovery facilities under stochastic influent characteristics using artificial neural networks as surrogates for plantwide modeling. ACS Est Eng, 2021; 1(11):1517–1529; doi: 10.1021/acsestengg.1c00179

40.

Liao

, Liu

, Tian

, et al. Phosphorous removal and high-purity struvite recovery from hydrolyzed urine with spontaneous electricity production in mg-air fuel cell. Chem Eng J, 2020; 391:123517; doi: 10.1016/j.cej.2019.123517

41.

Lorick

, Macura

, Ahlström

, et al. Effectiveness of struvite precipitation and ammonia stripping for recovery of phosphorus and nitrogen from anaerobic digestate: A systematic review. Environ Evid, 2020; 9(1):27; doi: 10.1186/s13750-020-00211-x

42.

Mayor

, Vinardell

, Ganesan

, et al. Life-Cycle assessment and techno-economic evaluation of the value chain in nutrient recovery from wastewater treatment plants for agricultural application. Sci Total Environ, 2023; 892:164452; doi: 10.1016/j.scitotenv.2023.164452

43.

McIntosh

, Hunt

, Thompson Brewster

, et al. Struvite production from dairy processing waste. Sustainability, 2022; 14(23):15807; doi: 10.3390/su142315807

44.

Mehrani

M-J

, Bagherzadeh

, Zheng

, et al. Application of a hybrid mechanistic/machine learning model for prediction of nitrous oxide (N2O) production in a nitrifying sequencing batch reactor. Process Saf Environ Prot, 2022; 162:1015–1024; doi: 10.1016/j.psep.2022.04.058

45.

Möller

, Müller

. Effects of anaerobic digestion on digestate nutrient availability and crop growth: A review. Eng Life Sci, 2012; 12(3):242–257; doi: 10.1002/elsc.201100085

46.

Mousavi

, Goyette

, Zhao

, et al. Struvite-driven integration for enhanced nutrient recovery from chicken manure digestate. Bioengineering, 2024; 11(2):145; doi: 10.3390/bioengineering11020145

47.

Moyo

, Simate

, Hobane

, et al. Characterization, kinetics and thermodynamic evaluation of struvite produced using ferrochrome slag as a magnesium source. South Afr J Chem Eng, 2024; 47(1):83–90; doi: 10.1016/j.sajce.2023.10.012

48.

Moyo

, Simate

, Mamvura

, et al. Recovering phosphorus as struvite from anaerobic digestate of pig manure with ferrochrome slag as a magnesium source. Heliyon, 2023; 9(4):e15506; doi: 10.1016/j.heliyon.2023.e15506

49.

Nageshwari

, Senthamizhan

, Balasubramanian

. Sustaining struvite production from wastewater through machine learning based modelling and process validation. Sustain Energy Technol Assess, 2022; 53:102608; doi: 10.1016/j.seta.2022.102608

50.

Nguyen

, Ly

H-B

, Ho

, et al. Influence of data splitting on performance of machine learning models in prediction of shear strength of soil. Math Probl Eng, 2021; 2021:e4832864–e15; doi: 10.1155/2021/4832864

51.

Orner

, Camacho-Céspedes

, Cunningham

, et al. Assessment of nutrient fluxes and recovery for a small-scale agricultural waste management system. J Environ Manage, 2020; 267:110626; doi: 10.1016/j.jenvman.2020.110626

52.

Orner

, Cornejo

, Rojas Camacho

, et al. Improving life cycle economic and environmental sustainability of animal manure management in marginalized farming communities through resource recovery. Environ Eng Sci, 2021a;38(5):310–319; doi: 10.1089/ees.2020.0262

53.

Orner

, Smith

, Breunig

, et al. Fertilizer demand and potential supply through nutrient recovery from organic waste digestate in California. Water Res, 2021b;206:117717; doi: 10.1016/j.watres.2021.117717

54.

Otieno

, Funani

, Khune

, et al. Struvite recovery from anaerobically digested waste-activated sludge: A short review. J Mater Res, 2023; 38(16):3815–3826; doi: 10.1557/s43578-023-01108-4

55.

Pan

, Zhang

, Zicari

. Integrated Processing Technologies for Food and Agricultural By-Products. Academic Press; 2019.

56.

Pandey

, Chen

. Technologies to recover nitrogen from livestock manure—a review. Sci Total Environ, 2021; 784:147098; doi: 10.1016/j.scitotenv.2021.147098

57.

Pavan

, Arvind

, Nikhil

, et al. Predicting performance of in-situ microbial enhanced oil recovery process and screening of suitable microbe-nutrient combination from limited experimental data using physics informed machine learning approach. Bioresour Technol, 2022; 351:127023; doi: 10.1016/j.biortech.2022.127023

58.

Pereira

LMS

, Milan

, Tapia-Blácido

. Using response surface methodology (RSM) to optimize 2G bioethanol production: A review. Biomass Bioenergy, 2021; 151:106166; doi: 10.1016/j.biombioe.2021.106166

59.

Perwitasari

, Fauziyah

, Tola

, et al. Phosphate removal from wastewater for the manufacture of struvite fertilizer. Tech Romanian J Appl Sci Technol, 2023; 16:330–333; doi: 10.47577/technium.v16i.10006

60.

Prasanna Kumar

, Mishra

, Chinnam

, et al. A comprehensive study on anaerobic digestion of organic solid waste: A review on configurations, operating parameters, techno-economic analysis and current trends. Biotechnol Notes, 2024; 5:33–49; doi: 10.1016/j.biotno.2024.02.001

61.

Quaghebeur

, Torfs

, De Baets

, et al. Hybrid differential equations: Integrating mechanistic and data-driven techniques for modelling of water systems. Water Res, 2022; 213:118166; doi: 10.1016/j.watres.2022.118166

62.

Rocamora

, Wagland

, Villa

, et al. Dry anaerobic digestion of organic waste: A review of operational parameters and their impact on process performance. Bioresour Technol, 2020; 299:122681; doi: 10.1016/j.biortech.2019.122681

63.

Rodrigues

, Florea

, de Oliveira

MCF

, et al. Big data and machine learning for materials science. Discov Mater, 2021; 1(1):12; doi: 10.1007/s43939-021-00012-0

64.

Saadabadi

S A

, Patel

, Woudstra

, et al. Thermodynamic analysis of solid oxide fuel cell integrated system fuelled by ammonia from struvite precipitation process. Fuel Cells, 2020; 20(2):143–157; doi: 10.1002/fuce.201900143

65.

Sampat

, Zavala

, Ruiz-Mercado

. Hidden Economic Impacts of Harmful Algae Blooms. NIH: Orlando, FL; 2019.

66.

Sganzerla

, da Rosa

, Barroso

TLCT

, et al. Techno-economic assessment of on-site production of biomethane, bioenergy, and fertilizer from small-scale anaerobic digestion of jabuticaba by-product. Methane, 2023; 2(2):113–128; doi: 10.3390/methane2020009

67.

Shaddel

, Grini

, Ucar

, et al. Struvite crystallization by using raw seawater: Improving economics and environmental footprint while maintaining phosphorus recovery and product quality. Water Res, 2020; 173:115572; doi: 10.1016/j.watres.2020.115572

68.

Shyu

H-Y

, Castro

, Bair

, et al. Development of a soft sensor using machine learning algorithms for predicting the water quality of an onsite wastewater treatment system. ACS Environ Au, 2023; 3(5):308–318; doi: 10.1021/acsenvironau.2c00072

69.

Siciliano

, Limonti

, Curcio

, et al. Advances in struvite precipitation technologies for nutrients removal and recovery from aqueous waste and wastewater. Sustainability, 2020; 12(18):7538; doi: 10.3390/su12187538

70.

Singh

, Singh

. Investigating the impact of data normalization on classification performance. Appl Soft Comput, 2020; 97:105524; doi: 10.1016/j.asoc.2019.105524

71.

Soo

, Wang

, et al. Machine learning for nutrient recovery in the smart city circular economy—A review. Process Saf Environ Prot, 2023; 173:529–557; doi: 10.1016/j.psep.2023.02.065

72.

Styles

, Adams

, Thelin

, et al. Life cycle assessment of biofertilizer production and use compared with conventional liquid digestate management. Environ Sci Technol, 2018; 52(13):7468–7476; doi: 10.1021/acs.est.8b01619

73.

Tariq

, Mehmood

, Abbas

, et al. Digestate quality and biogas enhancement with laterite mineral and biochar: Performance and mechanism in anaerobic digestion. Renew Energy, 2024; 220:119703; doi: 10.1016/j.renene.2023.119703

74.

Tsui

T-H

, van Loosdrecht

MCM

, Dai

, et al. Machine learning and circular bioeconomy: Building new resource efficiency from diverse waste streams. Bioresour Technol, 2023; 369:128445; doi: 10.1016/j.biortech.2022.128445

75.

United Nations. The Sustainable Development Goals Report 2022. 2022.

76.

Vaneeckhaute

, Meers

, Belia

, et al. Modeling and optimization of nutrient recovery from wastes. In: Biorefinery of Inorganics John. Wiley & Sons, Ltd; 2020; pp. 381–404; doi: 10.1002/9781118921487.ch8-1

77.

Wang

, Bouwman

, Van Gils

, et al. Hindcasting harmful algal bloom risk due to land-based nutrient pollution in the Eastern Chinese coastal seas. Water Res, 2023a;231:119669; doi: 10.1016/j.watres.2023.119669

78.

Wang

, Hu

, Wang

, et al. A critical review on dry anaerobic digestion of organic waste: Characteristics, operational conditions, and improvement strategies. Renew Sustain Energy Rev, 2023b;176:113208; doi: 10.1016/j.rser.2023.113208

79.

Wang

, Mou

, Liu

, et al. Phosphorus recovery from wastewater by struvite in response to initial nutrients concentration and nitrogen/phosphorus molar ratio. Sci Total Environ, 2021; 789:147970; doi: 10.1016/j.scitotenv.2021.147970

80.

Wang

, Ng

. Robustness of resource recovery systems under feedstock uncertainty. Prod Oper Manag, 2019; 28(3):628–649; doi: 10.1111/poms.12944

81.

Rodlia

, Ikhlas

, Pandebesie

, et al. The effect of mixing rate on struvite recovery from the fertilizer industry. IOP Conf Ser Earth Environ Sci, 2020; 506(1):12013; doi: 10.1088/1755-1315/506/1/012013

82.

Weckerle

, Ewald

, Guth

, et al. Biogas digestate as a sustainable phytosterol source for biotechnological cascade valorization. Microb Biotechnol, 2023; 16(2):337–349; doi: 10.1111/1751-7915.14174

83.

Wei

, Yu

, Tian

, et al. Comparative performance of three machine learning models in predicting influent flow rates and nutrient loads at wastewater treatment plants. ACS Est Water, 2024; 4(3):1024–1035; doi: 10.1021/acsestwater.3c00155

84.

, Vaneeckhaute

. Nutrient recovery from wastewater: A review on the integrated physicochemical technologies of ammonia stripping, adsorption and struvite precipitation. Chem Eng J, 2022; 433:133664; doi: 10.1016/j.cej.2021.133664

85.

, Li

, Ge

, et al. Anaerobic digestion of food waste—challenges and opportunities. Bioresour Technol, 2018; 247:1047–1058; doi: 10.1016/j.biortech.2017.09.020

86.

, Pooi

, Yeap

, et al. Hybrid model composed of machine learning and ASM3 predicts performance of industrial wastewater treatment. J Water Process Eng, 2024; 65:105888; doi: 10.1016/j.jwpe.2024.105888

87.

, Ngo

, Guo

, et al. A critical review on ammonium recovery from wastewater for sustainable wastewater management. Bioresour Technol, 2018; 268:749–758; doi: 10.1016/j.biortech.2018.07.111

88.

Yeom

, Giacomelli

, Fredrikson

, et al. Privacy risk in machine learning: Analyzing the connection to overfitting. In: 2018 IEEE 31st Computer Security Foundations Symposium (CSF), 2018; pp. 268–282; doi: 10.1109/CSF.2018.00027

89.

Zaki

, Rowles

, Adjeroh

, et al. A critical review of data science applications in resource recovery and carbon capture from organic waste. ACS Est Eng, 2023; 3(10):1424–1467; doi: 10.1021/acsestengg.3c00043

90.

Zhang

, Liu

C-A

. Model averaging prediction by K-fold cross-validation. J Econom, 2023; 235(1):280–301; doi: 10.1016/j.jeconom.2022.04.007

91.

Zhang

, Zhang

, Wang

, et al. Simultaneous recovery of carbon, nitrogen, and phosphorous from waste activated sludge: Influence of pH in anaerobic fermentation. Environ Eng Sci, 2024; 41(6):243–250; doi: 10.1089/ees.2023.0304

92.

Zhou

, Wu

. Improving the prediction of ammonium nitrogen removal through struvite precipitation. Environ Sci Pollut Res Int, 2012; 19(2):347–360; doi: 10.1007/s11356-011-0520-6

93.

Zhu

, Qiang

, Guo

, et al. Sequential extraction of anaerobic digestate sludge for the determination of partitioning of heavy metals. Ecotoxicol Environ Saf, 2014; 102:18–24; doi: 10.1016/j.ecoenv.2013.12.033

Supplementary Material

Please find the following supplemental material available below.

For Open Access articles published under a Creative Commons License, all supplemental material carries the same license as the article it is associated with.

For non-Open Access articles published, all supplemental material carries a non-exclusive license, and permission requests for re-use of supplemental material or any part of supplemental material shall be sent directly to the copyright owner as specified in the copyright notice associated with the article.

0.00 MB

1.43 MB

Improving Prediction of Nutrient Recovery via Struvite Precipitation from Organic Waste Digestate

Abstract

Introduction

Materials and Methods

Data sources

Preprocessing of the dataset

Data cleaning and imputation

Data normalization

Feature relationship

Data splitting

Model selection and development

Model evaluation

Feature importance using tree-based algorithms

Results and Discussion

Predictions of PO43− and NH4+ recovery (%)

Effects of input variables and the sample size on prediction

Prediction performance of the statistical and ML models

The role of input variables in prediction

Implications of the developed models

Conclusion

Footnotes

Authors’ Contributions

Author Disclosure Statement

Funding Information

Supplementary Material

References

Supplementary Material

Predictions of PO₄³⁻ and NH₄⁺ recovery (%)