李强：Corruption drives brain drain：Cross-country evidence from machine learning-广州大学廉政研究中心

科学研究

当前位置: 首页 >> 科学研究 >> 正文

李强：Corruption drives brain drain：Cross-country evidence from machine learning

发布时间：2023-06-06 来源：Economic Modelling（2023）作者:Qiang Li ，Lian An, Ren Zhang

Corruption drives brain drain:

Cross-country evidence from machine learning

Qiang Li ，Lian An, Ren Zhang

Abstract:

The impact of good governance, particularly control of corruption, on migration has been a subject of intense debate. This paper addresses this governance-migration nexus by employing a novel approach that combines machine learning techniques with empirical analysis across 130 countries. By isolating the effects of corruption from other governance qualities, we reveal that corruption dominates other governance qualities as the primary driver of brain drain, which challenge prevailing notions. Through the adoption of an innovative machine-learning technique, we select an optimal set of instrumental variables, enhancing the accuracy and reliability of our results. We show that weakened property rights protection, unmet basic needs, and declining well-being are crucial channels that propagate the effects of corruption. Furthermore, our results remain robust to fixed-effect estimations, falsification tests, and alternative corruption measures. By highlighting the dominant role of corruption in driving brain drain, our results underscore the urgent need for effective anti-corruption measures and improving governance practices to retain talent.

Keywords: Corruption; Brain drain; Good governance; IV-Lasso; Machine learning

1. Introduction

Brain drain refers to the phenomenon of highly educated individuals migrating from developing to developed countries. According to the United Nations Global Migration Database, the average emigration rate from developing countries has tripled since 1960. Research examining the causes of emigration has proposed various push factors, including but not limited to political conflicts (De Haas, 2005), weak institutions (Ariu et al., 2016), economic freedom (Meierrieks and Renner, 2017), and political institutions (Arif, 2022); however, no consensus view on this topic has been reached.

A burgeoning stream of literature explored the relationship between migration and corruption, defined as the abuse of entrusted power for private gains (Kaufmann, 2004), which plays a crucial role in migrants’ decision-making. On the one hand, corruption can directly affect migration as people tend to prefer communities with optimal bundles of taxes and public goods (Banzhaf and Walsh, 2008; Tiebout, 1956). If we consider corruption a tax (Aidt, 2003), individuals are likely to “vote with their feet” for a lower corruption tax. On the other hand, corruption can indirectly affect migration by impacting economic development (Dzhumashev, 2014; Farooq et al., 2013; Saha and Gounder, 2013), financial development (Krifa-Schneider et al., 2022), unemployment

(Cooray and Dzhumashev, 2018; Lim, 2019), and income inequality (Young, 2013).

Previous literature generally finds a positive correlation between corruption and emigration (Giang et al., 2020; Poprawe, 2015), particularly among highly-skilled workers (Dimant et al., 2013). Figure 1 illustrates a positive relationship between corruption, as measured by the Corruption Perception Index (CPI), and the human flight and brain drain index (hereafter referred to as brain drain).

Despite the significant positive correlation, estimating the causal effect of corruption on brain drain is challenging for several reasons. First, the estimation may be plagued by omitted variable bias from unobserved asymmetric factors correlated with emigration decisions and corruption. Second, precursory researchers find a bidirectional relationship between corruption and emigration (Baudassé et al., 2018), which causes biased estimates when attempting to establish a causal relationship between corruption and migration. Third, the other government qualities may confound the relationship between corruption and migration. Corruption is one of the six dimensions of the governance indicator.3 The other five dimensions include (1) voice and accountability, (2) political stability and absence of violence, (3) government effectiveness, (4) regulatory quality, and (5) the rule of law (Kaufmann, 2004). Previous literature usually focuses on one dimension only while omitting others, potentially leading to biased corruption estimates. Fourth, measurement errors should be considered since corruption scale and brain drain are rarely accurately measured.

Identifying the causal effect of corruption on brain drain can be challenging if endogeneity is not adequately addressed. Previous studies have used relatively weak instrumental variables (IVs) or IVs that may not satisfy the exclusion restriction condition. For instance, Ariu etal. (2016) use the complexity of a country’s name as an IV for the quality of institutions. Although the variable is correlated with the cross- country variation in the quality of institutions, it may affect migration flows through its impact on economic and non-economic attitudes (Chen, 2013). Cooray and Schneider (2016) use the latitude and initial level of corruption as instruments for corruption; however, since corruption interacts with the other two variables in their specifications, the estimation may suffer from an under-identification problem. Moreover, most previous work pays little attention to the role of other governance qualities, which could lead to omitted variable bias. Additionally, some literature aggregates all six dimensions of the governance indicator into their first principal components, failing to identify the dominant factor driving the brain drain. Finally, the issue of measurement error was not adequately addressed.

This study used the recently developed post-least absolute shrinkage and selection operator (post-Lasso) method to select the most relevant instruments to identify the causal effect of corruption on brain drain from a set of 33 candidate IVs. We simultaneously treat corruption and other governance qualities as endogenous variables. The post-Lasso IV approach enhances the first-stage predictive relationship between endogenous variables and IVs by discarding weak instruments. To establish the validity of the selected instruments, we test the exclusion restriction for the selected instruments with a newly developed exclusion restrictions test by Kripfganz and Kiviet (2021). We also report the instrument-free estimators to verify our results and provide robustness checks using different corruption measures.

Our study reveals several significant findings. First, corruption, not other governance qualities, is the key driver of brain drain. Our estimate indicates that a 1- point increase in corruption is associated with a 0.11-point increase in brain drain index, while the estimated coefficient on “other governance qualities” is not statistically significant. Our results also have great economic significance. For example, if we consider Malawi,a country with the highest increase in corruption (7 points) from 2012 to 2021, our results suggest that real gross domestic product (GDP) (in 2011 U.S. dollars) in Malawi must increase by 1.18% to offset the negative effect of increased corruption on brain drain. We further confirm our findings through robustness checks, including alternative corruption measures,a placebo test with refugees as the dependent variable, different samples, fixed-effect (FE) estimations, and excluding influential outliers. Second, our results suggest that the ordinary least square (OLS) method underestimate the effects of corruption on brain drain owing to omitted variable bias, reverse causality, and measurement errors. Third, our study demonstrates that the IV- Lasso approach provides a more precise estimator than the other two IV specifications regarding the standard deviations of the estimators. Therefore, the IV-Lasso approach improves the efficiency of the estimation.

Our study is situated at the intersection of two relevant research areas. The first is the literature on migration determinants, including works by Bertoli et al. (2016). The second area is the literature on corruption and its impact, including studies by Cooray and Dzhumashev (2018), Farooq et al. (2013), Keneck-Massil et al. (2021), Lee et al. (2020), Lu et al. (2021), and Song et al. (2021). Our study also connects to the broader literature on good governance (as corruption is one dimension of institutions), which has been a subject of academic research and policy discussions for several decades. Whether corruption dominates other governance qualities in driving brain drain remainsunclear.

This paper contributes to the growing literature on the determinants of international migration and the consequences of corruption. Our contribution is threefold. First, we employ a machine learning method that identifies important IVs from data, relaxing the identification assumption that requires significant IVs to be known a priori. To the best of our knowledge, this is the first study that uses machine learning to investigate the effects of corruption on brain drain. Our methodology employs a structural model based on the IV-Lasso approach, which improves estimation precision without increasing finite-sample bias. Second, we provide new insights into good governance by identifying corruption as the most critical factor driving brain drain. This study is also the first to simultaneously include endogenous corruption and “other governance qualities” in a model, enabling us to isolate corruption as the dominant cause of brain drain. Our main conclusion that corruption drives brain drain has important policy implications. Third, we use the latest development in the exclusion restriction test to demonstrate that measurement error in corruption is one of the primary sources of endogeneity. These findings shed light on the implications of corruption effects in previous studies, highlighting the need for international organizations, such as the World Bank, to make significant efforts to measure corruption accurately.

The remainder of the paper is organized as follows. Section 2 explains the Lasso method, Section 3 introduces the empirical model and the data used in our study, and Section 4 presents the empirical baseline results. Finally, Section 5 performs robustness checks, while Section 6 provides our conclusions.

2. Description of the IV-Lasso method

Following Tibshirani (1996) and Belloni et al. (2012), this paper makes the first endeavor to infer the causal relationship between corruption and brain drain with the post-Lasso IV approach. This method strives to improve the inference by selecting the most relevant IVs from a large pool of candidates. The IV approach generally enables consistent estimations even when the explanatory variables correlate with the error terms in a regression model; however, inferences from the IV approach are imprecise when instruments are weakly correlated with endogenous regressors. The weak IV problem can be ameliorated when many valid IVs are in stock and pressed into service concomitantly (Hansen et al., 2008); however, the refinement does not come without cost.

In this section,we attest that finite-sample bias is introduced when using numerous IVs so that econometricians must contend with a trade-off between efficiency and finite-sample bias. To overcome this issue, we introduce the post-Lasso IV approach, a machine-learning approach that can break the trade-off when numerous IVs are available. In particular, it improves the efficiency of the estimator and minimizes the finite-sample bias.

In general, a regression model with IVs can be formulated as follows:

where yi denotes the outcome variable, and di represents the endogenous regressor of interest for i=1, 2, …, N.4 xi is a (k1+1) by one vector of exogenous regressors, and k2 by one vector zi stores the IVs excluded from (1). Intercepts are subsumed in β2 and π2. Error terms εi and νi are assumed to have zero means with variances σε2 and σν2 and covariance σε,ν . The two-stage least square (2SLS) estimator of the coefficients of interest β = (β1, β2(′))′ is specified as follows:

where matrix X embraces the regressorsd and x, matrix Z comprises a collection of the IVs, and PZ = z(z′z)−1z ′ is referred to as the projection matrix. Although the 2SLS estimator has been widely used to tame the endogeneity issues, the efficiency of the estimator rests on the correlation between IVs and the endogenous regressor. Bound et al. (1995) illustrate that the inconsistency of 2SLS relative to OLS when k1=0 and k2=1

where ρ denotes the simple Pearson correlation coefficients. If z and d are weakly correlated, even with a minimal correlation between zand ε, the inconsistency of 2SLS could be substantial. One can mitigate the inconsistency by adding more instruments, i.e., increasing the correlation between z and d; however, this may intensify the finite- sample bias, which would converge to the OLS estimator as the number of instruments(k2) increases. As is shown by Buse (1992), when k1=0:

where Π is the vector of the first-stage coefficients. Adding instruments increases the value of the denominator Π′z ′zΠ, as well as the second term in (5). As a result, the finite-sample bias grows with an increase in weak instruments.

Therefore, adding instruments generates a trade-off between estimator precision and finite-sample bias; the efficiency gains from using more instruments come at the cost of finite-sample bias. To alleviate the bias-efficiency trade-off, Belloniet al. (2012) propose a post-Lasso IV method for selecting the optimal instruments, which comprises two steps. The first stage applies a data-driven penalty method to choose optimal instruments for endogenous variables. The second stage conducts estimation using the selected IVs. The approach helps maintain estimation precision while significantly reducing the number of instruments.

The standard Lasso estimator of the first-stage regression is obtained byminimizing the sum of the usual least-squares objective function and a penalty function:

where di(∗) and z are the projected regressors and instruments obtained by multiplying the projection matrix Mx = IN − Px that aim to project the control variable x in Equation (2). ‖π1 ‖1 is the l1-norm of π1 . 5 Since the norm is non-differentiable, it penalizes Equation (6) so that it must reduce the number of instruments by eliminating negligible or insignificant ones. By minimizing the sum of the standard least squares objective function and an added penalty that considers the model size, particularly the total of the coefficients' absolute values, the Lasso method is utilized to obtain regression coefficients. This technique can effectively decrease overfitting and enhance the model's interpretability.

The non-negative tuning parameter λ, also known as the regularization parameter or penalty level, controls the overall level of penalty on selecting redundant instruments. When λ=0, the IV-Lasso method is reduced to 2SLS. The choice of lambda is guided by various techniques, such as cross-validation, information criteria, and a rigorous penalization (Ahrenset al., 2020). Rigorous penalization chooses a λ that guarantees an optimal prediction and parameter estimation convergence rate,6 which facilitates causal inference in the presence of many instruments (or control) variables; therefore, it is frequently used in the Lasso approach. In practice, the default λ for the Lasso approach in the independent and identically distributed case is 2c√N Φ −1 ( 1 − ∗ Tmse, where p is the number of penalized regressors, c and γ are constants with default values of 1.1 and 0.1/log(N),7 and rmse is an estimate of the standard deviation of the error variance (Belloni et al., 2012). Since the convergence rate of rigorous penalization is faster than that of the cross-validation, and the cross-validated penalty parameter is often smaller than the desired (Chetverikov et al., 2021), we adopt rigorous penalization.

We adopt the Lasso estimator as our instrument selection method, with the first- stage regression coefficients estimated via a shrinkage procedure. The post-Lasso IV estimator discards the Lasso coefficient estimates and employs a set of data-dependent

instruments selected by the Lasso approach to refit the first-stage OLS regression, mitigating the Lasso shrinkage bias; thus, the Lasso approach can be employed in causal inference (Athey and Imbens, 2019). Notably, the IV selection procedure does not require prior knowledge of which variables are “important” since the information is inferred from the data. In addition, Belloniet al. (2012) demonstrate that the post-Lasso IV estimators are root-n consistent and asymptotically normal under approximate sparsity. By selecting the optimal instruments from a large set of variables, the post- Lasso IV approach improves inference precision without increasing the finite-sample bias (Kim et al., 2019). In summary, the machine learning procedure identifies important IVs from data, relaxing the identification assumption that requires significant IVs to be known a priori. Therefore, the machine learning approach provides us an opportunity to utilize spare IVs that standard 2SLS cannot deliver.

Besides the Lasso approach, forward and backward selection methods are widely used invariable selection; however, these methods have limitations. A once-eliminated variable cannot be reintroduced in the backward selection, whereas the forward selection only allows each selected variable to stay once. As a result, neither method provides a comprehensive view of the correlation structure among endogenous and instrumental variables. The Lasso approach overcomes these limitations in two ways. First, it considers all possible combinations of instruments and selects the most efficient combination of instrumental variables. Compared to backward and forward selection methods, the Lasso approach is more efficient as the number of instrumental variables and the selection of the variables are jointly determined. Second, the Lasso approach minimizes the sum of the usual least-squares objective function and includes a model size penalty, preventing researchers from including redundant instruments. This feature is particularly beneficial as it ensures that the selected variables are efficient and parsimonious.

3. Model and data

The following regression was specified to estimate the effect of corruption on brain drain:

where Yi is the brain drain index in country i. We use CPI to represent corruption, whereas OGQ is the first principal component of the other governance qualities. Additionally, X is a vector of control variables, which includes commonly-used economic and social variables in the literature, such as real GDP at constant prices, human development index, unemployment rate, percentage of the land surface within 100 km of the nearest ice-free coast, population-weighted exposure to PM2.5,8 business freedom, internal and international conflict, and religious fractionalization.9 The error term specific to country i is denoted as εi.

The human flight and brain drain index is a tool used to measure brain drain in a country. The index tracks the voluntary emigration of the middle class, such as entrepreneurs, and the forced displacement of professionals, such as people fleeing their country owing to actual or feared persecution or repression. The index is positively associated with human displacement. Notably, previous literature has used migration as a proxy for brain drain; however, measuring the migration flow from the source country (the country of origin) to the destination country can be problematic (Beine et al., 2011). Therefore, we didn’t adopt migration flow as our dependent variable.

The primary independent variable is the CPI from Transparency International (TI), a composite index with a score of 100 indicating very little corruption and 0 indicating very high corruption.10 For convenience, wereverse the CPI scale so that a lower score is associated with lower corruption.

Following Winter (2020), our regression includes real GDP (in 2011 U.S. dollars) as a control variable for a country’s economic development correlated with emigration tendencies and corruption. We also include the human development index, which measures the average achievement heallthy life, knowledge, and a decent standard of living.

Overwhelming evidence suggests that being unemployed dramatically affects people’s life satisfaction (Li and An, 2020) and, thus, their willingness to emigrate. According to Dimantetal. (2013), jobs are awarded not based on merits but on political connections in a corrupt country. Such cronyism can lead to high unemployment, slow economic growth, and emigration; therefore, our analysis includes the unemployment rate as a control variable.

We include the religious fractionalization index to account for cultural diversity following Li etal. (2018). The business freedom score ranges from 0 to 100, where 100 represents the highest degree of business freedom. According to Meierrieks andRenner (2017), business freedom is inversely associated with brain drain from developing to developed countries. Recent studies suggest that poor air quality is a factor that prompts people to emigrate (Banzhaf and Walsh, 2008); therefore, we include population- weighted exposure to PM2.5 as a control variable. Additionally, we incorporate the percentage of land surface area within 100 km of the nearest ice-free coast as another control variable following Banzhaf and Walsh (2008). Bang and Mitra (2013) and Christensen et al. (2018) find that civil war and conflicts also affect brain drain. Therefore, we also include ongoing domestic and international conflicts measured by the Institute for Economics and Peace (2018) in our regression analysis.

We use an IV estimation strategy with three specifications to establish causal inference.11 The first specification uses three instruments (Three IVs, hereafter): the constructed proportion of pre-industrial cousin marriage 12 , the distance from the equator (latitude), and the tropical climate. Cousin marriage may lead to small groups of related people, which can encourage favoritism and corruption (Giuliano and Nunn,2018); therefore, the pre-industrial cousin marriage could be a valid instrument for corruption if it is uncorrelated with omitted unobservable. Additionally, Hall and Jones (1999) argue that the distance from the equator, measured by the absolute value of latitude in degrees divided by 90, is correlated with governance quality. Acemoglu et al. (2005) propose that geography, such as latitude, impacts institutions in a time- varying manner. For instance, Europeans created “latitude-specific” technology that only worked in temperate latitudes, not tropical soils; therefore, we use latitudes and tropical climates as instruments for corruption.

Despite its simplicity, the Three IVs specification could be ineffective owing to potentially weak correlations among cousin marriage, latitude, tropical climate, corruption, and other governance qualities. More importantly, these instruments may correlate with the omitted confounding factors associated with migration. For example, tropical countries tend to have high ultraviolet radiation intensity, significantly shaping the incumbents ’ window of opportunity through the disease channel (Vu, 2021). Therefore, tropical climates may affect migration owing to the disease burden. Considering these limitations, we view the first specification as an attempt to replicate the results from previous literature.

With these caveats in mind, we further improve our estimation efficiency by introducing a set of new instruments in the second specification. These include pre- industrial marital composition, such as monogamy and polygamy, and local democracy, such as local leader election by consensus rather than hereditary appointment. Monogamous societies tend to be less corrupt than polygamous societies, where officials may embezzle public funds to cover the expenses of having multiple wives and children (Sanderson, 2001). Similarly, pre-industrial local democracy is associated with less corruption, as contemporary political development has deep roots in history (Giuliano and Nunn, 2013; Li and An, 2020). We utilize the original data from Murdock (1967), considering that pre-industrial marital composition, cousin marriage, and local democracy are non-linearly correlated with the independent variables. The second specification enhances efficiency using 33 indicators as instruments (All IVs, hereafter). We propose a third specification that employs the post-Lasso IV approach to address the trade-off between efficiency and finite-sample bias associated with a large set of instruments. This machine-learning method selects the most appropriate controls and IVs to improve the first-stage predictive relationship between endogenous and instrumental variables (Danquah et al., 2021). By allowing for the optimal selection of IVs from All IVs, the post-Lasso IV estimator maintains efficiency while alleviating finite-sample bias (IV Lasso, hereafter).

We consider corruption and other dimensions of governance qualities as endogenous variables simultaneously. To reduce the number of governance indicators in the specification, we construct a new variable called “other governance qualities.” This variable is the principal component of several governance indicators, including voice and accountability, political stability and absence of violence, government effectiveness, regulatory quality, and the rule of law following Ariu et al. (2016). This variable is orthogonal to construction variables and allows us to capture the common variation in governance indicators.

To address the challenge of measuring corruption, we adopt different measures to test the robustness of our results, as in Li et al. (2018). Specifically, in addition to the commonly-used CPI, we employ alternative measures, such as freedom from corruption, control of corruption, and the Bayesian Corruption Index (BCI). Our dependent variable, brain drain, is assessed with the human flight and brain drain index, which primarily captures voluntary emigration information of the middle class and a small portion of forcibly displaced professionals or intellectuals. We also conduct placebo tests with the number of refugees.

We exploit Murdock’s (1967) Ethnographic Atlas and the QoG data. Murdock (1967) provides pre-industrial ethnographic data for over 100 ancestral characteristics of 1,265 ethnic groups. The QoG dataset compiles various indicators, including the corruption index, brain drain index, and real GDP, from publicly available sources, including the World Bank, TI, Fund for Peace, and Giuliano and Nunn (2018).

Since our IVs do not change over time, we use the cross-country variation to identify the causal effect of corruption on brain drain. Our baseline results were based on data from 2017; we regress Equation (7) for each year from 2012 to 2017 to test whether our main results are robust to time variations. Moreover, CPI is not ideal for

time-series analysis since each country’s CPI score comprises a 3-year moving average (Lambsdorff, 2007). To check whether our main results are sensitive to omitted time- invariant factors, we estimate the FE model using data from 2005, 2010, and 2015 to catch the time variation in the CPI.

Table 1 presents the descriptive statistics of all variables in 2017. The mean of the CPI is 55.75, with a standard deviation of 19.69. Hungary (55) and Oman (56) are close to the mean, while New Zealand has the lowest CPI score (11) and Syria has the highest CPI score (86). Cameroon is about one standard deviation above the mean (75), while France has a lower standard deviation (35). The mean value of the first principal component of the governance qualities (excluding corruption) is 0.08, comprising 85% of the total variability in governance quality. Figure A1 in the appendix shows that the first principal component captures significant variances in the original five governance quality indices.

4.Baseline results

We initiate the regression analysis by employing OLS to estimate Equation (7), and the outcomes are presented in Column 1 of Table 2. The CPI coefficient on the human flight index is 0.042, and its robust standard error is 0.015. Our findings suggest that an escalation in corruption leads to a rise in brain drain in a country. The coefficient on “other governance qualities” is positive but fails to reach any conventional significance level. Consequently, the result highlights that corruption is the main driving factor behind a country’s brain drain.

The signs of all other control variables demonstrate consistency with our expectations. Higher GDP reduces brain drain, while a lower human development index is associated with a higher brain drain at a significant level of 1%. This result suggests that improving human development, such as quality of life and living conditions, can reduce a country’s brain drain. Furthermore, an increase in unemployment leads to an increase in brain drain, indicating a positive correlation. Surprisingly, more business freedom encourages emigration, although this result is not statistically significant. Land surface area within 100 km of the nearest ice-free coast does not significantly correlate with brain drain. We did not find that air pollution, measured by average exposure to PM2.5, impacts brain drain. In contrast, religious fractionalization reasonably increases brain drain, although this result is not statistically significant. Finally, ongoing internal and international conflicts positively correlate with brain drain, but this result is not statistically significant.

OLS analysis may produce biased and inconsistent parameter estimates when endogeneity is present; therefore, we adopt three specifications (Three IVs, All IVs, and IV Lasso) to establish a reliable causal relationship. Column 2 in Table 2 presents the 2SLS results using pre-industrial cousin marriage, latitude, and tropical climates as instruments (Three IVs). The coefficient estimate for CPI is 0.18, with a standard error of 0.12, and the coefficient estimate for other governance qualities remains insignificant. We employ the KP Wald F statistic to test the strength of the IVs; the result is low at 1.30, indicating that the model of Three IVs is weakly identified. The weak IVs result in a significant standard error for the coefficients on corruption, which is almost eight times larger than that of the OLS specification; however, the p-value of the overidentification test is 0.936, indicating that the three instrumental variables are exogenous conditional on the other control variables. Therefore, using pre-industrial cousin marriage, latitude, and tropical climate alone may not provide strong enough IVs to establish a causal relationship. This evidence suggests that additional IVs may be necessary to improve the efficiency of the estimation.

Column 3 in Table 2 (All IVs) adds pre-industrial marital composition (such as monogamy and polygamy) and local democracy (such as local leader election by consensus rather than hereditary appointment) as IVs to improve the first-stage correlations. The coefficient estimate for CPI is 0.046, with a standard error of 0.026. The coefficient for “other governance qualities” is similar to the OLS results in magnitude and significance. The KP Wald F statistic is 19.13, indicating that the model is not weakly identified13 ; however, the p-value for the overidentification testis 0.037,suggesting that not all IVs are exogenous. The results indicate that while using more IVs increases the first-stage correlation, some included IVs may not be exogenous;therefore, an optimal set of IVs must account for this issue.

We address the weak and overidentification issues encountered in our previous analyses by adopting the post-IV-Lasso method to select the optimal set of instruments and control variables from all available options. Specifically, the Lasso optimization procedure identifies the absence of preferred cousin marriage (V25), political succession for the local community by councils or other collective bodies (V94), and latitude as the most suitable instruments for measuring corruption and other governance qualities.

We must address some caveats before presenting our results using the selected IVs. Because our chosen IVs are predetermined, they are arguably exogenous; however, they may still be correlated with omitted confounding factors related to migration. For instance, pre-industrial cousin marriage practices may affect a society’s openness to outsiders, potentially imparting migration through both economic and non-economic attitudes. Similarly, political succession through elections could act as an incentive or disincentive for emigration; therefore, there is a risk that our instruments may not be valid. Nevertheless, as Danquah et al. (2021) indicated, “using the IV LASSO with proper penalty parameters theoretically guarantees that any instruments selected are not simply spuriously correlated to the endogenous variable but have true predictive power. This means that the IV-Lasso approach could select no instruments as there maybe no set of variables with sufficient predictive power to achieve the required standard.” If the IV model is weakly identified, Belloni et al. (2012) recommend using weak- identification-robust hypothesis tests and confidence sets based on the Chernozhukov et al. (2013) sup-score test to check the relevance of instruments. This test is a high- dimensional version of the Anderson and Rubin (1949) test. Our IV-Lasso specification has a KP Wald F statistic of 126.49, which provides additional evidence to reject the null hypothesis that the model is weakly identified.

Our results in Column 4 show that the IV-Lasso approach yields a CPI coefficient of 0.11, while the coefficients on other governance qualities remain insignificant. This estimate lies between the OLS estimator and the Three IVs estimator, suggesting that OLS may underestimate the actual causal effect, while weak IVs may overestimate it. Notably, the IV-Lasso estimator is not negligible from an economic standpoint. For example, consider Malawi. Its real GDP in 2012 was 16,538.37 million (in 2011 U.S. dollars), which became 7 points more corrupt from 2012 to 2021. Our estimation suggests that Malawi’s real GDP would have to increase by 244.69 million (1.18%)14 to offset the increase in corruption. The standard error in the IV-Lasso specification is 0.053, larger than that from OLS and All IVs but much smaller than the Three IVs specification. Our findings indicate that the IV-Lasso approach effectively reduces issues of finite-sample bias and inconsistency while maintaining efficiency.

The first-stage results are presented in the appendix. Column 1 shows a positive correlation between political succession for the local community by councils or other collective bodies and the CPI; however, the absence of preferred cousin marriage is not significantly associated with corruption, which contradicts Giuliano and Nunn’s (2018) findings. This discrepancy may be owing to the intercorrelation between cousin marriage and democracy (Schulz, 2022), with cousin marriage becoming insignificant once democracy is included as a regressor. Conversely, Column 2 reveals a significant correlation between cousin marriage and other governance qualities (excluding corruption), whereas local democracy is not significantly correlated. These results suggest that cousin marriage and local democracy can relate to governance differently, which is reasonable given that good governance is multi-dimensional. Latitude does not significantly correlate with corruption or other governance qualities; however, we fail to reject the null hypothesis that the coefficients on all three IVs are zero (F statistics are 4.16 and 5.41, respectively). Notably, the adjusted R2 in both first-stage regressions are greater than 0.70, indicating that our optimal IVs, selected by IV Lasso, predict corruption and other governance qualities quite well; therefore, these findings provide confidence in our IV Lasso approach.

The Lasso approach is efficient in selecting the relevant instrumental variables. To verify the validity of the instrumental variables, we employ a new approach developed by Kiviet (2020) to test whether the instrumental variables used in the IV-Lasso model satisfy the exclusion condition. This approach overcomes the limitation of the traditional overidentification test, which relies on the untested assumption that at least as many instruments as endogenous regressors exist in a structural model. 15 Additionally, the approach provides kinky least-squares (KLS) estimators that do not depend on exclusion restrictions but instead impose admissible assumptions on the degree of regressor endogeneity through the use of the instrument-free approach based on Kiviet (2020). We first report the exclusion restriction test results and then apply the KLS estimators to corruption using the instrument-free approach.

We begin by presenting the test results, allowing the levels of endogeneity in corruption to fall within the range of [–0.75, 0.75]16 while fixing the endogeneity of the other governance qualities at 0.35.17 The left panel of Figure 2 shows that the KLS intervals somewhat overlap with the corresponding 2SLS confidence intervals, indicating that the 2SLS approach with the IVs is identical to the IV-Lasso specification. Additionally, the left panel reveals that the instrument-free KLS estimators decline as postulated endogeneity of corruption increases in the range of [–0.4, 0], indicating that the measurement error in corruption is a significant source of endogeneity. The right panel of Figure 2 displays the p-values for the exclusion restriction test on the three individual instruments selected by the IV-Lasso approach and their combination. The results suggest (at the 5% significance level) that the absence of a preferred cousin mariage is not a valid instrument if the postulated endogeneity of corruption falls within the [–0.46, –0.2] range and the endogeneity of the other governance qualities is fixed at 0.35; however, we do not reject the null hypothesis that the other two instruments (V94 and Latitude) are valid at conventional significance levels. Importantly, when all three instruments are used together, they are valid at any

conventional significance level.

[Insert figure 2 here]

Furthermore, we employ an instrument-free approach to provide KLS estimators for corruption. The right panel of Figure 2 shows that latitude is the least likely invalid instrument for corruption; therefore, we assume that the other two selected instruments (V25 and V94) are correlated with error and thus invalid. To account for endogeneity on V25 and V94, we vary their postulated endogeneity within the range of [–0.4, 0.4]18 and produce three-dimensional surface and contour plots. The upper left panel in Figure 3 shows the estimators of corruption on brain drain, while the left below shows the corresponding contour plots of the coefficients on corruption. The right panel displays the corresponding p-values. Our results indicate that the coefficients on corruption are near 0.10 when the postulated endogeneity of V25 approaches 0.40 and the postulated endogeneity of V94 nears –0.40. Moreover, a significant proportion of corruption estimators in the contour plots is positive, suggesting a positive and significant relationship between corruption and brain drain. These findings, together with the exclusion restrictions test and the KLS estimators, support the validity of the IVs selected by the IV-Lasso approach; thus, we can identify the causal relationship between corruption and brain drain. In conclusion, our quantitative estimations confirm that corruption contributes to the emigration of professionals.

[Insert figure 3 here]

An important question is whether corruption affects brain drain more than the other governance qualities. To answer this question, we compare the effects of each governance indicator on brain drain while controlling for the first principal component of the remaining governance qualities. Table 3 presents these results. The first Column replicates our baseline IV Lasso estimate for ease of comparison, while Columns 2–6 report IV-Lasso estimators on the other five governance indicators. For example, Column two regresses brain drain on government effectiveness and other governance qualities; the latter is the principal component of political stability, the rule of law, regulatory quality, voice and accountability, and the CPI. Other governance qualities in Columns 3–6 follow the same principle. Our analysis shows that government effectiveness, political stability, the rule of law, regulatory quality, and voice and accountability do not significantly affect brain drain. Comparing Columns 1–6, we can infer that corruption has a more pronounced impact on shifting brain drain than a country’s other five governance indicators. In other words, our findings suggest that corruption,rather than the other governance indicators, drives brain drain.

5. Robustness checks and mechanisms

We conduct five additional analyses to test the robustness of our baseline results. These tests focus on (1) alternative measures of corruption, (2) placebo tests using refugees as dependent variables, (3) excluding influential outliers and tail dependence, (4) cross- validation checks using different samples, and (5) an alternative estimation strategy using FE and random effects (RE) models.

Several considerations determine our choice of robustness checks. First, corruption is a complex phenomenon that is difficult to measure accurately. Personal perceptions may introduce measurement errors and affect the results. Second, the human flight index includes skilled workers and refugees; if refugees comprise a large portion of the index, it may not accurately capture brain drain. Our placebo tests aim to confirm that the emigration variable measures brain drain. Third, outliers may unduly influence our estimates. Fourth, our results may depend on the specific sample we use, and finally, the effect of corruption on brain drain may differ across countries at different stages of development.

Our baseline analysis utilized a commonly-used corruption measure from TI. We next use different corruption measures to verify the robustness of our preliminary

results. Specifically, we incorporated three measures: control of corruption, freedom from corruption, and the BCI. The control of corruption measurement is derived from the World Bank’s Worldwide Governance Indicators (WGIs). This measure captures various forms of corruption and evaluates grand corruption in the political sphere as well as the inclination of elites to engage in “state capture.” The scale of this measure ranges from approximately –2.5 (weak) to 2.5 (strong). The freedom from corruption measurement based on TI’s CPI. This measure considers the prevalence of corruption in a country, with higher levels of corruption leading to lower scores. The BCI measures the overall perceived corruption level. This index is considered an alternative to the CPI and WGI corruption metrics. It ranges from 0 to 100, where a score of 0 indicates the least corrupt and 100 represents the most corrupt.

The three corruption measures used in this study have different scales, making their coefficients incomparable. To enable a more accurate comparison, we standardized these measures to have a mean of 0 and a standard deviation of 1.19 Table 4 presents the results of our robustness check using alternative corruption measures; only estimates on corruption and other governance qualities are reported to save space. The coefficients on these alternative measures show a similar pattern to the CPI, all significant in our IV-Lasso specifications. Furthermore, our findings indicate that the impact of corruption on brain drain remains relatively stable across different corruption measures, demonstrating the robustness of our results.

The brain drain index captures both voluntary emigration and forced displacement, such as refugee movements. One potential concern is that our main results could be biased if corruption is correlated with forced displacement rather than voluntary migration. To address this concern, we conduct additional tests using refugees as the dependent variables and present the results in Table 5. Our findings indicate that the coefficients on the CPI are insignificant across all specifications, suggesting that corruption, rather than forced displacement, intrinsically drives brain drain.

We conducted tests to check the robustness of our results to the influence of outliers and tail dependence. Figure 1 reveals the presence of some outliers and tail dependence between corruption and brain drain, raising concerns that our results could be driven by extreme values (Liu et al., 2012). To address these concerns, we employed trimming and winsorizing strategies; trimming involves discarding values at the distribution tails while winsorizing recodes extreme values to less extreme ones. We use the exact specification as our baseline IV Lasso estimation after applying trimming or winsorizing. The first four columns of Table 6 show the results of the tests employing trimming, while the last column shows the results using winsorizing. Column 1 excludes the five largest countries by population, GDP per capita, or area. Columns 2 and 3 exclude the five least corrupt and the five most corrupt countries, respectively. These exclusionary strategies in the first three tests were somewhat arbitrary; however, Column 4 adopts a more systematic approach to deal with influential observations. We remove influential observations using the difference in each observation from the estimated coefficient (DFBETA) by OLS for the CPI when the observation is included and excluded from the sample and then run IV Lasso with the remaining sample.

Following Belsley et al. (1980), we omitted observations with |DFBETA|>2/√ are omitted, where N is the number of observations (in our case, 130). Column 5 employs winsorizing by recoding the lowest and highest 2.5% of the values of corruption and brain drain to the value of the 2.5th and 97.5th percentile, respectively. The CPI coefficients remain negative and statistically significant in all five tests, confirming that outliers and tail dependence do not influence the results.

Macroeconomic fluctuations may affect our results ’ robustness. We conduct additional analyses using data from different years to address this concern. TI changed its methodology in computing the CPI after 2012; therefore, we regressed Equation (7) yearly from 2012 to 2016 using the exact specification as our baseline IV-Lasso model. We also averaged corruption and all other control variables from 2012 to 2017 and ran the same regression as our baseline IV-Lasso specification to check the robustness of our results. This test also helps identify any potential measurement errors. Our findings in Table 7 show that the coefficients on CPI are significant, while the coefficients on “other governance qualities” remain insignificant. This result supports our conclusion that fluctuations do not drive our findings overtime.

Our benchmark results maybe driven by omitted time-invariant factors correlated with CPI, such as pre-industrial culture. To address this issue, we conducted a robustness check using a panel data model, particularly from 2005, 2010, and 2015 (with a 5-year interval). The results are presented in Table 8. The first two columns report the FE andRE estimation results, excluding other control variables, showing that corruption has no significant impact on human migration; however, in the RE specification, other governance qualities are significant. The Sagan–Hansen statistic is 15.31, with a p-value of 0.00, indicating that the FE specification is preferred. When we control for all other variables, as in our baseline specifications, CPI remains significant, and FE is still preferred over RE. Table 8 demonstrates that our baseline results are robust to omitted time-invariant factors; however, we must interpret the results cautiously, as TI changed the methodology for estimating CPI in 2012.

We next analyze the transmission mechanism following Dell (2010). Corruption can deplete talent through various channels. For instance, corrupt and ineffective government institutions can foster distrust among citizens, which is linked to lower happiness levels (Helliwell and Huang, 2008). Additionally, corruption can slow investment, reduce foreign direct investment (Cruz et al., 2023), distort government expenditure, and exacerbate income inequality. In addition to these channels, we propose that weak property rights protection, unsatisfied basic needs, and erosion in well-being could potentially contribute to the emigration of skilled workers.

Table 9 presents the results from the channel analysis. Column 1 suggests that corruption negatively correlates with equal opportunity, measured on a scale of 1 to 10, with a lower value indicating less equality. Social capital, which assesses the level of trust, is positively correlated with corruption, as indicated in Column 2. Column 3 shows that higher corruption lowers property rights protection, while the last two columns report the result of regressing basic needs and foundations of well-being on corruption. As shown, corruption decreases the population’s capacity to satisfy basic human needs, such as medical care, clean water, sanitation, adequate shelter, and personal safety. Furthermore, corruption erodes the foundations of well-being, such as education, obtaining information, and communicating freely. Previous literature suggests that property rights protection, basic needs, and well-being are associated with immigration (Zhang et al., 2018), and corruption is correlated with property rights protection (Acemoglu and Verdier, 1998), basic needs (Li et al., 2018), and well-being (Li and An, 2020). Therefore, we can infer that corruption drains away skilled workers and professionals through weakened property rights protection, unsatisfied basic needs,and erosion in well-being.

6. Conclusion

In today’s economy, skilled workers are essential for driving economic growth, making it imperative for governments worldwide to understand the factors contributing to the brain drain. Corruption has received little attention from scholars in this context despite its significance. The relationship between corruption and the emigration of highly- skilled workers is particularly complicated owing to endogeneity issues. This study aims to address this gap in the literature by examining the relationship between corruption and brain drain using data from 130 countries. Using the Lasso method, our model innovatively addresses endogeneity concerns by selecting an optimal set of instrumental variables from a group of pre-industrial variables.

Our study provides fresh evidence that corruption is a driving force behind the brain drain, and our results show that corruption, not other aspects of governance quality, contributes to brain drain. We conducted various robustness checks, which supported our conclusion that corruption leads to losing talented individuals. The IV- Lasso method is more successful in improving the efficiency of estimations than other specifications that use the same instruments as previous studies (Three IVs) and those that use many more instrumental variables (All IVs).

Our paper holds crucial policy implications. First, although the significance of good governance in addressing the brain drain has been acknowledged for decades, the specific aspect of governance driving brain drain has remained ambiguous. Our study establishes that corruption, not other governance qualities, is the primary driver of brain drain. This insight can guide policymakers and groups interested in directing their efforts toward controlling corruption to mitigate brain drain. Second, our finding that measurement error in corruption is the primary source of endogeneity underscores the urgent need for better measures of corruption and its consequences. To this end, international organizations such as the World Bank, the United Nations, and the international community should collaborate to collect and share macro-, meso-, and micro-level data to improve corruption measurements. Third, our estimations highlight the importance of addressing societal corruption for academics, government officials, and policymakers alike. Governments must take action to prevent brain drain and retain skilled workers by cracking down on corrupt forces within society. Overall, our study emphasizes the critical role of good governance and the need for concerted efforts to combat corruption in addressing the brain drain issue.

Our empirical analysis provides opportunities for future studies in the following areas. First, while the IV-Lasso approach is a reliable method for selecting optimal IVs and identifying causal effects, other machine learning algorithms may also enhance the efficiency of the IV approach. Therefore, exploring alternative approaches to address endogeneity problems is a promising area for future research. Second, due to the nature of our IVs, our sample size is relatively limited in the time-series dimension. However, our approach can be used to investigate dynamic causal relationships in time series analysis along the lines of Mertens andRavn (2013) and Yang and Zhang (2023), which is another fruitful avenue for future research.

Reference:

Acemoglu, D., Johnson, S., Robinson, J.A., 2005. Institutions as a Fundamental Cause of Long-Run Growth, in: Handbook of Economic Growth. pp. 385–472. https://doi.org/10.1016/S1574- 0684(05)01006-3

Acemoglu, D., Verdier, T., 1998. Property Rights, Corruption and the Allocation of Talent: a General Equilibrium Approach. The Economic Journal 108, 1381– 1403. https://doi.org/10.1111/1468-0297.00347.

Ahrens, A., Hansen, C.B., Schaffer, M.E., 2020. lassopack: Model selection and prediction with regularized regression in Stata. The Stata Journal: Promoting communications on statistics and Stata 20, 176–235. https://doi.org/10.1177/1536867X20909697

Aidt, T.S., 2003. Economic Analysis of Corruption: A Survey. The Economic Journal 113, F632– F652. https://doi.org/10.1046/j.0013-0133.2003.00171.x

Akaike, H., 1974. A new look at the statistical model identification. IEEE transactions on automatic control 19, 716–723.

Anderson, T.W., Rubin, H., 1949. Estimation of the Parameters of a Single Equation in a Complete System of Stochastic Equations. The Annals of Mathematical Statistics 20, 46–63. https://doi.org/10.1214/aoms/1177730090

Arif, I., 2022. Educational attainment, corruption, and migration: An empirical analysis from a gravity model. Economic Modelling 110, 105802. https://doi.org/10.1016/j.econmod.2022.105802

Ariu, A., Docquier, F., Squicciarini, M.P., 2016. Governance quality and net migration flows. Regional Science and Urban Economics 60, 238–248. https://doi.org/10.1016/j.regsciurbeco.2016.07.006

Athey, S., Imbens, G.W., 2019. Machine Learning Methods That Economists Should Know About. Annual Review of Economics 11, 685–725. https://doi.org/10.1146/annurev-economics- 080217-053433

Bang, J.T., Mitra,A., 2013. Civil War, Ethnicity, and the Migration of Skilled Labor. Eastern Econ J 39, 387–401. https://doi.org/10.1057/eej.2012.18

Banzhaf, H.S., Walsh, R.P., 2008. Do People Vote with Their Feet? An Empirical Test of Tiebout’s Mechanism. The American Economic Review 98, 843–863. https://doi.org/10.2307/29730097

Baudassé,T., Bazillier, R., Issifou, I., 2018.Migration andinstitutions: Exit and voice (from abroad)? Journal of Economic Surveys 32, 727–766. https://doi.org/10.1111/joes.12212

Beine, M., Docquier, F., Özden, Ç., 2011. Diasporas. Journal of Development Economics 95, 30– 41. https://doi.org/10.1016/j.jdeveco.2009.11.004

Belloni,A., Chen,D., Chernozhukov, V., Hansen,C., 2012. Sparse Models and Methods for Optimal Instruments With an Application to Eminent Domain. Econometrica 80, 2369–2429. https://doi.org/10.3982/ECTA9626

Belsley, D.A., Kuh, Edwin., Welsch,R.E., 1980. Regression diagnostics : identifying influential data and sources of collinearity. Wiley.

Bertoli, S., Brücker, H., Fernández-Huertas Moraga, J., 2016. The European crisis and migration to Germany. Regional Science and Urban Economics 60, 61–72. https://doi.org/10.1016/j.regsciurbeco.2016.06.012.

Bound, J., Jaeger, D.A., Baker, R.M., 1995. Problems with Instrumental Variables Estimation when the Correlation between the Instruments and the Endogenous Explanatory Variable is Weak. Journal of the American Statistical Association 90, 443–450. https://doi.org/10.1080/01621459.1995.10476536

Buse, A., 1992. The Bias of Instrumental Variable Estimators. Econometrica 60, 173– 180. https://doi.org/10.2307/2951682

Chen, M.K., 2013. The Effect of Language on Economic Behavior: Evidence from Savings Rates, Health Behaviors, and Retirement Assets. American Economic Review 103, 690–731. https://doi.org/10.1257/aer.103.2.690

Chernozhukov, V., Chetverikov, D., Kato, K., 2013. Gaussian approximations and multiplier bootstrap for maxima of sums of high-dimensional random vectors. The Annals of Statistics 41, 2786–2819. https://doi.org/10.1214/13-AOS1161

Chetverikov, D., Liao, Z., Chernozhukov, V., 2021. On cross-validated lasso in high dimensions. The Annals of Statistics 49, 1300– 1317.

Christensen, J., Onul, D., Singh, P., 2018. Impact of Ethnic Civil Conflict on Migration of Skilled Labor. Eastern Econ J 44, 18–29. https://doi.org/10.1057/s41302-016-0069-7

Cooray, A., Dzhumashev, R., 2018. The effect of corruption on labour market outcomes. Economic Modelling 74, 207–218. https://doi.org/10.1016/j.econmod.2018.05.015

Cooray, A., Schneider, F., 2016. Does corruption promote emigration? An empirical examination. Journal of Population Economics 29, 293–310. https://doi.org/10.1007/s00148-015-0563- y

Cruz, M.D., Jha, C.K., Kırşanlı, F., Sedai,A.K., 2023. Corruption and FDI in natural resources: The role of economic downturn and crises. Economic Modelling 119, 106122. https://doi.org/10.1016/j.econmod.2022.106122

Danquah, M., Iddrisu, A.M., Boakye, E.O., Owusu, S., 2021. Do gender wage differences within households influence women’s empowerment and welfare? Evidence from Ghana. Journal of Economic Behavior & Organization 188, 916–932. https://doi.org/10.1016/j.jebo.2021.06.014

De Haas, H., 2005. International migration, remittances and development: myths and facts. Third World Quarterly 26, 1269– 1284. https://doi.org/10.1080/01436590500336757

Dell, M., 2010. The Persistent Effects of Peru’s Mining Mita. Econometrica 78, 1863– 1903. https://doi.org/10.2139/ssrn.1596425

Dimant, E., Krieger, T., Meierrieks, D., 2013. The effect of corruption on migration,1985–2000. Applied Economics Letters 20, 1270– 1274. https://doi.org/10.1080/13504851.2013.806776

Dzhumashev, R., 2014. Corruption and growth: The role of governance, public spending, and economic development. Economic Modelling 37, 202–215. https://doi.org/10.1016/j.econmod.2013.11.007

Farooq,A., Shahbaz, M., Arouri, M., Teulon, F., 2013. Does corruption impede economic growth in Pakistan? Economic Modelling 35, 622–633. https://doi.org/10.1016/j.econmod.2013.08.019

Giang,L.T., Nguyen, C.V., Nguyen,H.Q., 2020. The Impacts of Economic Growth and Governance on Migration: Evidence from Vietnam. The European Journal of Development Research. https://doi.org/10.1057/s41287-020-00262-3

Giuliano, P., Nunn, N., 2018. Ancestral Characteristics of Modern Populations. Economic History of Developing Regions 33, 1– 17. https://doi.org/10.1080/20780389.2018.1435267

Giuliano,P., Nunn,N., 2013. The Transmission of Democracy: From the Village to the Nation-State. American Economic Review Papers & Proceedings 103, 86–92. https://doi.org/10.1257/aer.103.3.86

Gray, J.P., 1998. Ethnographic atlas codebook. World Cultures 10, 86– 136.

Hall, R.E., Jones, C.I., 1999. Why do some countries produce so much more output per worker than others? The quarterly journal of economics 114, 83– 116.

Hansen, C., Hausman, J., Newey, W., 2008. Estimation with many instrumental variables. Journal of Business & Economic Statistics 26, 398–422.

Helliwell, J.F., Huang, H., 2008. How’s Your Government? International Evidence Linking Good Government and Well-Being. British Journal of Political Science 38, 595–619. https://doi.org/10.1017/S0007123408000306

Kaufmann, D., 2004. Governance Matters III: Governance Indicators for 1996, 1998, 2000, and 2002. The World Bank Economic Review 18, 253–287. https://doi.org/10.1093/wber/lhh041

Keneck-Massil, J., Nomo-Beyala, C., Owoundi, F., 2021. The corruption and income inequality puzzle: Does political power distribution matter? Economic Modelling 103, 105610. https://doi.org/10.1016/j.econmod.2021.105610

Kim, J.S., Jiang, B., Li, C., Yang, H.-S., 2019. Returns to women’s education using optimal IV selection. Applied Economics 19, 815–830. https://doi.org/10.1080/00036846.2018.1524126

Kiviet, J.F., 2020. Testing the impossible: Identifying exclusion restrictions. Journal of Econometrics 218, 294–316. https://doi.org/10.1016/j.jeconom.2020.04.018

Krifa-Schneider, H., Matei, I., Sattar, A., 2022. FDI, corruption and financial development around the world: A panel non-linear approach. Economic Modelling 110, 105809. https://doi.org/10.1016/j.econmod.2022.105809

Kripfganz, S., Kiviet, J.F., 2021. kinkyreg: Instrument-free inference for linear regression models with endogenous regressors. The Stata Journal: Promoting communications on statistics and Stata 21, 772–813. https://doi.org/10.1177/1536867X211045575

Lambsdorff, J.G., 2007. The Institutional Economics of Corruption and Reform, Cambridge University Press. Cambridge University Press, Cambridge. https://doi.org/10.1017/CBO9780511492617

Lee, C.-C., Wang, C.-W., Ho, S.-J., 2020. Country governance, corruption, and the likelihood of firms ’ innovation. Economic Modelling 92, 326–338. https://doi.org/10.1016/j.econmod.2020.01.013

Li, Q., An, L., 2020. Corruption Takes Away Happiness: Evidence from a Cross-National Study. Journal of Happiness Studies 21, 485–504. https://doi.org/10.1007/s10902-019-00092-z

Li, Q., An, L., Xu, J., Baliamoune-Lutz, M., 2018. Corruption costs lives: evidence from a cross- country study. The European Journal of Health Economics 19, 153– 165. https://doi.org/10.1007/s10198-017-0872-z

Lim, K.Y., 2019. Modelling the dynamics of corruption and unemployment with heterogeneous labour. Economic Modelling 79, 98–117. https://doi.org/10.1016/j.econmod.2018.10.004

Liu, H., Han, F., Yuan, M., Lafferty, J., Wasserman, L., 2012. High-dimensional semiparametric

Gaussian copula graphical models. The Annals of Statistics 40, 2293–2326.

https://doi.org/10.1214/12-AOS1037

Lu, J., Zhang, H., Meng, B., 2021. Corruption, firm productivity, and gains from import liberalization in China. Economic Modelling 101, 105555. https://doi.org/10.1016/j.econmod.2021.105555

Meierrieks, D., Renner, L., 2017. Stymied ambition: does a lack of economic freedom lead to migration? Journal of Population Economics. https://doi.org/10.1007/s00148-017-0633-4 Mertens, K., Ravn, M.O., 2013. The Dynamic Effects of Personal and Corporate Income Tax Changes in the United States. American Economic Review 103, 1212– 1247.

https://doi.org/10.1257/aer.103.4.1212

Murdock, G.P., 1967. Ethnographic Atlas, University of Pittsburgh Press, Pittsburgh, Pa. https://doi.org/10.1126/science.159.3818.968

Poprawe,M., 2015. On the relationship between corruption and migration: empirical evidence from a gravity model of migration. Public Choice 163, 337–354. https://doi.org/10.1007/s11127- 015-0255-x

Rohwer, A., 2009. Measuring Corruption: A comparison between the transparency international’s corruption perceptions index and the world bank’s worldwide governance indicators. CESifo DICE Report 7, 42–52.

Saha, S., Gounder, R., 2013. Corruption and economic development nexus: Variations across income levels in a non-linear framework. Economic Modelling 31, 70–79. https://doi.org/10.1016/j.econmod.2012.11.012

Sanderson, S.K., 2001. Explaining Monogamy and Polygyny in Human Societies: Comment on Kanazawa and Still. Social Forces 80, 329–335. https://doi.org/10.1353/sof.2001.0087

Schulz, J.F., 2022. Kin Networks and Institutional Development. The Economic Journal 132, 2578– 2613. https://doi.org/10.1093/ej/ueac027

Song, C.-Q., Chang, C.-P., Gong, Q., 2021. Economic growth, corruption, and financial development: Global evidence. Economic Modelling 94, 822–830. https://doi.org/10.1016/j.econmod.2020.02.022

Staiger, D., Stock, J.H., 1997. Instrumental Variables Regression with Weak Instruments. Econometrica 65, 557–586. https://doi.org/10.2307/2171753

Stock, J.H., Yogo, M., 2005. Testing for Weak Instruments in Linear IV Regression, in: Identification and Inference for Econometric Models. Cambridge University Press, pp. 80– 108. https://doi.org/10.1017/CBO9780511614491.006

Tibshirani, R., 1996. Regression Shrinkage and Selection Via the Lasso. Journal of the Royal Statistical Society: Series B (Methodological) 58, 267–288. https://doi.org/10.1111/j.2517- 6161.1996.tb02080.x

Tiebout, C.M., 1956.A Pure Theory of Local Expenditures. Journal of Political Economy 64, 416– 424. https://doi.org/10.1086/257839

van Donkelaar, A., Martin, R.V., Brauer, M., Hsu, N.C., Kahn, R.A., Levy, R.C., Lyapustin, A., Sayer, A.M., Winker, D.M., 2016. Global Estimates of Fine Particulate Matter using a Combined Geophysical-Statistical Method with Information from Satellites, Models, and Monitors. Environmental Science & Technology 50, 3762–3772. https://doi.org/10.1021/acs.est.5b05833

Vu, T. V, 2021. Climate, diseases, and the origins of corruption. Economics of Transition and Institutional Change 29, 621–649. https://doi.org/10.1111/ecot.12293

Wan, F., 2018.A note on a naive regression-based test on the validity of an instrumental variable. Stat Med 37, 4330–4333. https://doi.org/10.1002/sim.7904

Winter, S., 2020. “It’s the Economy, Stupid!”: On the Relative Impact of Political and Economic Determinants on Migration. Population Research and Policy Review 39, 207–252. https://doi.org/10.1007/s11113-019-09529-y

Yang, Y., Zhang, R., 2023. Twisting Theories to Suit Facts: Revisiting the Effects of Technology Shocks. Available at SSRN 4340425.

Young, A., 2013. Inequality, the Urban-Rural Gap, and Migration*. The Quarterly Journal of Economics 128, 1727– 1785. https://doi.org/10.1093/qje/qjt025

Zhang, J., Leoncini, R., Tsai, Y., 2018. Intellectual property rights protection, labour mobility and wage inequality. Economic Modelling 70, 239–244. https://doi.org/10.1016/j.econmod.2017.11.006.

注：注释等审阅，对原文有需要请在官方下载

Please cite this article as: Li, Q., An, L., Zhang, R., Corruption drives brain drain: Cross-country evidence from machine learning, Economic Modelling (2023), doi: https://doi.org/10.1016/ j.econmod.2023.106379.