A long short-term memory approach to predicting air quality based on social media data
Published in Atmospheric Environment, 2020
Recommended citation: Zhai Weixin, Cheng Chengqi. A long short-term memory approach to predicting air quality based on social media data [J]. Atmospheric Environment, 2020: 117411.
Abstract : Air pollution, such as PM2.5 (particulate matter with an aerodynamic equivalent diameter of less than 2.5 μm), PM10 (particulate matter with an aerodynamic equivalent diameter of less than 10 μm), NOx, and SOx, is a global concern because it may cause many chronic and fatal diseases, especially in developing countries. To better address air pollution problems, an important step is the timely and accurate prediction of air quality. Traditional methods are mainly based on meteorological data, regression model data, remote sensing data and different retrieval methods. Numerous studies on deep learning methods have suggested that these approaches may be able to perform accurate predictions for complex systems. In this paper, a long short-term memory (LSTM) approach for predicting air quality is proposed; moreover, meteorological data are used and Chinese social media is investigated as a proxy for public perceptions and responses for air quality prediction. We gathered daily air quality data, meteorological data and Weibo check-in data for Beijing, China from January 1, 2015 to December 31, 2016. The average sentiment of the related Weibo posts was selected as the public response proxy. The performance of our proposed model is evaluated based on real data. The root-mean-square error (RMSE) and the mean absolute error (MAE) indicated that our method presented better prediction results than traditional methods in terms of the PM2.5, PM10, O3, NO2, SO2 and CO concentrations. We focused on the prediction performance during the 2015 China Victory Day Parade period, during which social and political factors played an important role in air quality predictions. The results indicated that the proposed method, which incorporates public response data, was especially suitable for predicting the air quality in extreme short-term social events and provides a timely social measurement and feedback for environmental problems.