Urban protection data and inference
We designed a questionnaire to obtain the cognitive and protective behaviors data of different regions and groups about air pollution. After a strict quality control (including deleting some samples with obvious logical errors, missing data, and inconsistent addresses), we finally received 1072 valid questionnaires (see Supplementary Figs. 2–6 for the initial statistical information of some important indicators in the questionnaire). This study was approved by the Ethics Committee of the Beijing Institute of Technology (No. 221103). All procedures performed in this study were in accordance with the 1964 Helsinki declaration and its later amendments or comparable ethical standards. All participants are allowed to fill in the questionnaire only when they understand the purpose of the survey and agree to the publication of the research results. And, online informed consent was obtained from all participants.
The settings of the core variables are as follows:

ATTR_{i:} The attention ratio (ATTRi) is the proportion of people in different groups i (such as region, gender, and age) who pay attention to air pollution information. This data represents the statistical values of all samples in the survey questionnaire. For each respondent, we will inquire about the frequency of their daily attention to air pollution. There are 5 options for this question, with frequencies ranging from lowest to highest being most no, occasionally, generally, often, and most every day. When respondents with a frequency of often or above are marked as 1, otherwise it is 0. The group marked as 1 is considered to be concerned about air pollution information. In this way, by aggregating different groups, we can calculate the proportion of people in different groups who pay attention to air pollution information.

MR_{i}, CODR_{i}, and ACR_{i}: The three variables are whether they will wear masks or cancel going out in the air polluted weather (not pandemic period), and whether they have air purification equipment in the workplace and residential areas. If the answer to each question is “Yes,” select 1, otherwise 0. These variables are also used according to the ratio formed after group aggregation: the rate of group i wearing masks (MR_{i}), canceling going out (CODR_{i}), and having indoor air purification equipment (ACR_{i}, the average of the rate of air purification equipment in workplaces and residential areas).

ODR_{i}: The proportion of outdoor activity time is mainly to investigate the average daily outdoor activity hours of individuals during the nonepidemic period, and then to calculate the outdoor activity proportion (ODR_{i}) of group i.
To extrapolate the questionnaire results to all prefecturelevel cities, we introduced transfer learning method into our work (see Supplementary material S3). The idea of transfer learning is to use the similarity of data, task type, or models to apply the models and knowledge learned in the old fields to the new fields. Including the problems and data in this paper, the final required prediction results are calculated as follows:
Step 1 Align provincial statistical characteristic data (source domain) with urban characteristic data (target domain) by CORAL algorithm^{42}:
$$\beginarraycD_\mathrms=\left[F_s1^T,F_s2^T\dots F_sm^T\right]_m\times n\endarray$$
(1)
$$\beginarraycD_\mathrmt=\left[F_t1^T,F_t2^T\dots F_tm^T\right]_m\times k\endarray$$
(2)
$$\beginarraycC_s=\Sigma _s+eye\left(m\right)\endarray$$
(3)
$$\beginarraycC_t=\Sigma _t+eye\left(m\right)\endarray$$
(4)
$$\beginarraycD_\mathrms^new=D_s*C_s^\frac12*C_t^\frac12\endarray$$
(5)
Equations (1) and (2) represent the feature datasets of the source domain and target domain, respectively; \(F_m^T\) is the m^{th} feature of the dataset, where the source domain feature data are provincial statistical data from China Statistical Yearbook 2020^{43}, and the target domain feature data are urban statistical data from China Urban Statistical Yearbook 2020^{44}. The source domain and target domain have the same type of statistical indicators, including 18 indicators in the fields such as economics, environment, education, and population structure. As these indicators differ greatly at the city level and provincial level, we divide all indicators by the total population of the current region to obtain the per capita value of each indicator so that the characteristic scales of the source domain and target domain are the same.
Step 2 Use the transformed source domain data to establish a supervised machine learning model and train it and use the trained model to predict the citylevel variables.
The model architecture is shown in Fig. 2. \(D_\mathrms^new\) is the feature of input data that includes the five variables, which are the five tasks’ goal of training model, respectively. We selected four machine learning models as our candidate models: random forest, Lasso regression, Ridge regression, and support vector machine. These models are simple and efficient in structure, and their easytouse regularization technology limits the occurrence of overfitting. In the training process, the grid search method is used to automatically select the best super parameter for each task’s model. The fivefold crossvalidation method is used to verify the accuracy of each model. Then, we select the model with the best performance in each task, and finally predict the corresponding variables of each city with citylevel dataset (\(D_\mathrmt\)).
According to the crossvalidation and test results of the model, the validity and accuracy of our model are established (see Supplementary Material S5 and Table 1). Considering age, gender, and urban and rural groups, we used the total original questionnaire to calculate the variables of each group (see of Supplementary Material S5 and Table 2).
Calculation of equivalent PM_{2.5}
This research refers to the integrated population weighted exposure (IPWE) model created by Shen et al.^{45} and enhances it accordingly. The IPWE model distinguishes between household air pollution (HAP) and outside ambient air pollution (AAP) and incorporates people’s activity patterns into the model. We added outdoor PM_{2.5} permeability and people’s protective behavior led by risk information to the model (see Supplementary Material S4) and developed the IBEPEM to assess people’s real PM_{2.5} exposure concentration.
Equation 6 expresses the IBEPEM model based on the previous assumptions. The urban attention ratio and the protective behavior ratio are obtained from the prediction results of Section “Urban protection data and inference”, and both follow the \(N(\mu _i, \theta ^2)\) distribution. \(\mu _i\) is the indicator’s forecast data for city i, and \(\theta\) is the indicator’s standard deviation. \(\mathrmpm_i,t\) represents the average concentration of PM_{2.5} in city i on day t. This indicator is derived from the data of over 2000 monitoring sites for surface air quality in China’s Ministry of Ecology and Environment^{46}. The air quality index for city i on day t is denoted by \(AQI_i,t\). \(IEPE_i\) is the annual equivalent comprehensive PM_{2.5} exposure value for city i. \(threshold\) is the AQI value at which the air quality level of “lightly polluted” is reached. \(DM\) is the mask’s protective effect or the PM_{2.5} attenuation rate after being filtered by the mask. The protective effect conforms to the Chinese governmentissued group standard F9053 for “PM_{2.5} protective masks”^{47}. According to Xiang et al.^{48}, \(DH_i\) represents the protective impact of buildings in various areas or the attenuation rate of PM_{2.5} in the outer environment when it penetrates a room. \(DAC\) is the purification efficiency of air purification equipment, or the rate of PM_{2.5} concentration attenuation after air purification equipment has cleansed indoor air. This information is derived from the existing relevant measured data^{49,50,51,52}. We consider the mean of these studies as the decay rate value. To ensure uncertainty, we assume that all types of decay rate data have a normal distribution, with the mean serving as their survey or reference value (see Supplementary Material S2 and for the corresponding variance settings).
$$\left\{\beginarrayl\beginarraylODR_i=ODR_i*(1CODR_i*ATTR_i)\\ MR_i=MR_i*ATTR_i\\ ACR_i=ACR_i*ATTR_i\\ IEPE_AAP,i=\frac1T\left\{\beginarrayl\sum _t=1^T\mathrmpm_i,t*ODR_i*\left(MR_i*DM+1MR_i\right), if\,AQI_i,t>Threshold\\ \sum _t=1^T\mathrmpm_i,t*ODR_i,\,else\endarray\right.\\ IEPE_HAP,i=\frac1T\left\{\beginarrayl\sum _t=1^T\mathrmpm_i,t*\left(1ODR_i\right)*DH_i*\left(ACR_i*DAC+1ACR_i\right), if\,AQI_i,t>Threshold\\ \sum _t=1^T\mathrmpm_i,t*\left(1ODR_i\right)*DH_i, else \endarray\right.\\ IEPE_i= \, IEPE_AAP,i+IEPE_HAP,i\text.\endarray\endarray\right.$$
(6)
Table 2 displays the settings for several indicators for scenarios S0–S5. “Yes” indicates that the actual value of the indicator should be maintained. The values 0 and 1 denote the setting index value. “No” indicates that the indicator is not considered. According to our survey results, residents generally refer to the overall air quality level, rather than being limited to the AQI value of PM_{2.5}. Residents are only likely to take protective measures when the air pollution level reaches “light polluted” (AQI > 100) or above. Both China and the United States regard the highest AQI value of all pollutants at each moment as the current overall AQI value and designate it as the primary pollutant^{41,53}. According to the overall AQI value, the current air quality is divided into six levels: excellent, good, lightly polluted, moderately polluted, heavily polluted, and severely polluted. The difference is that when the PM_{2.5} concentration is less than 150 μg/m^{3} and PM_{2.5} is the primary pollutant, China’s AQI value may be lower than that of the United States (see Supplementary Material S10). Therefore, we map the Chinese air quality level to the new air quality level and AQI value based on the PM_{2.5} level in the US standard. In summary, we will use 100 as the threshold for AQI in our model. The protection level parameter for Beijing residents is set to Column S5 with the subscript BJ.
Premature death estimation
This study mainly uses the IER model developed by Burnett et al. and GBD 2019 disease data to estimate PM_{2.5}related premature death. IER model is widely recognized and uses PM_{2.5} concentrationrelated premature death risk estimation model^{54}, and its calculation method is shown in Eq. 7.
$$\beginarray*20l {RR_IER \left( z \right) = \left\{ {\beginarray*20l {1 + \alpha \left( {1 – e^ – \gamma \left( z – z_cf \right)^\delta } \right),} & if z > z_cf \\ 1, & else \\ \endarray .} \right.} \\ \endarray$$
(7)
Among them, z represents the annual mean equivalent PM_{2.5} concentration calculated for each city in Section “Calculation of equivalent PM_{2.5}”. \(z_cf\) is the minimum PM_{2.5} concentration with additional risk.\(\alpha\), \(\gamma\), and \(\delta\) are computed by fitting this equation. This paper focuses primarily on the four major causes of premature PM_{2.5} mortality, namely ischemic heart disease (IHD), stroke, chronic obstructive pulmonary disease (COPD), and lung cancer (LC). The \(z_cf\), \(\alpha\), γ, and \(\delta\) parameter values corresponding to the above four diseases are from Institute for Health Metrics and Evaluation (IHME). Each disease contains 1000 sets of parameter simulations. The final calculation method of PM_{2.5}related premature death for each city is shown in Eq. 8:
$$\beginarraycAC_i,k=\fracRR_i,k1RR_i,k\times B_k\times P_i,\endarray$$
(8)
where \(AC_i,k\) and \(RR_i,k\) are the number of PM_{2.5}related additional deaths and the relative risk of disease k in the ith city or group, respectively. \(B_k\) is the basal incidence of disease k, which is from GBD 2019^{4}. \(P_i\) is the total population of the city or group i. To obtain interval estimates of PM_{2.5}related premature death, 1000 Monte Carlo simulations were performed for all parameters.
Reduction amount of premature death and distribution of environmental risk information
Weibo (China’s equivalent to Twitter) and Baidu Index are the two main sources of ERI. Sina Weibo is the largest open social networking platform in China. It was founded in 2009 and had 450 million monthly active users and 250 million daily active users by 2018^{55}. Baidu is the largest search engine in China. Through distributed crawler technology, the public application program interfaces (APIs) of these two platforms were searched for content containing environmentrelated keywords, as shown in Supplementary Table 3. After information extraction, cleaning, and conversion, approximately 2.3 million original microblogs related to the environment were obtained. These microblogs were forwarded approximately 140 million times and more than 30 million people participated in the discussion during 2013–2020. In addition to the Weibo data, we received the dailylevel search index data for 294 cities during 2013–2020 as a supplement. We used all environmentrelated Weibo reposts and originals from different regions and Baidu search index as the total distribution of regional environmental information. Equation (9) defines the per capita access to ERI:
$$\beginarraycERI_i=\frac1P_i\sum_T+1^T+t\left(W_i,t+B_i,t\right),\endarray$$
(9)
where \(W_i,t\) and \(B_i,t\) are the total number of original and reposted environmentrelated microblogs and the search index in city i at time t, respectively. The time range is \([T+1, T+t]\). \(P_i\) is the total population of city i.
The relationship between the reduction of premature death and the distribution of ERI is shown in Eq. (10).
$$\beginarraycDDP10k_i=\beta \cdot ERI_i+\sum _k\gamma _kX_k,i.\endarray$$
(10)
\(DDP10k_i\) is the PM_{2.5}related premature deaths reduced by active protection per 10,000 people in city i. \(X_k,i\) denotes the kth covariate of the ith city. All variables are log transformed. \(\gamma _k\) is the coefficient of the kth covariate; β is our target coefficient, representing the percentage change in \(DDP10k_i\) for every 1% change in ERI.
More Stories
A In depth Image of Well being Gains of Eggs
Ecofriendly areas endorse well being but really do not terminate discrimination’s results
The psychological wellbeing benefits of seeing horror videos: it can make genuine life significantly less scary, launch stress, handle anxiousness and establish resilience