Comparative forecasting of water quality index using long short-term memory and extreme gradient boosting

dc.contributor.authorDissanayake, H.K.H.K.
dc.contributor.authorPunchi-Manage, S.A.R.
dc.date.accessioned2025-11-05T16:52:14Z
dc.date.available2025-11-05T16:52:14Z
dc.date.issued2025-11-07
dc.description.abstractMonitoring and forecasting river water quality is critical for sustainable water resource management, particularly in densely populated regions. This study evaluates and compares the performance of deep learning and machine learning models in forecasting the Weighted Arithmetic Water Quality Index (WAWQI) across 20 monitoring stations along the River Thames. Physicochemical parameters, including pH, temperature, total suspended solids (TSS) and nitrates, collected from 2009 to 2017, were used to compute WAWQI, with missing values approximated through linear interpolation. Long Short-Term Memory (LSTM) neural networks were trained using an eight-month input sequence to capture temporal dependencies, while Extreme Gradient Boosting (XGBoost) models utilised lagged WAWQI values (lags 1 – 3) and time-based features. Both models were independently developed for each site and evaluated using an 80:20 train-test split. For benchmarking, classical time-series models such as Seasonal Auto-Regressive Integrated Moving Average (SARIMA) and Prophet were also applied. Results show that both LSTM and XGBoost outperform classical time-series models. Model performance was evaluated using Root Mean Squared Error (RMSE), Mean Absolute Error (MAE), and Mean Absolute Percentage Error (MAPE). LSTM outperformed XGBoost at most sites, achieving lower RMSE at 12 sites and lower MAE and MAPE at 13 sites, highlighting its strength in capturing seasonal and temporal trends. Although the largest MAPE improvement was relatively small (1.0%), it was consistent across sites. LSTM performance declined at sites with irregular or noisy data, where XGBoost provided more robust predictions by modelling non-linear relationships and adapting to varying patterns. In addition, LSTM required longer computational training time, while XGBoost was significantly faster and more efficient for frequent retraining. These findings emphasise the importance of tailoring model selection to site-specific characteristics. Combining temporal deep learning approaches with tree-based methods can enhance the reliability and scalability of water quality forecasting, supporting informed decision-making in environmental monitoring and water resource management.
dc.identifier.citationProceedings of the Postgraduate Institute of Science Research Congress (RESCON)-2025, University of Peradeniya,p89
dc.identifier.issn3051-4622
dc.identifier.urihttps://ir.lib.pdn.ac.lk/handle/20.500.14444/5979
dc.language.isoen
dc.publisherPostgraduate Institute of Science (PGIS), University of Peradeniya, Sri Lanka
dc.relation.ispartofseriesVolume 12
dc.subjectEnvironmental monitoring
dc.subjectExtreme gradient boosting
dc.subjectLong short-term memory
dc.subjectTime-series forecasting
dc.subjectWater quality index
dc.titleComparative forecasting of water quality index using long short-term memory and extreme gradient boosting
dc.typeArticle

Files

Original bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
18 RESCON 2025 CMS-41.pdf
Size:
288.65 KB
Format:
Adobe Portable Document Format

License bundle

Now showing 1 - 1 of 1
Loading...
Thumbnail Image
Name:
license.txt
Size:
1.71 KB
Format:
Item-specific license agreed to upon submission
Description:

Collections