Abstract
Introducing a new cross-national dataset on the ethnicity of refugees, covering the years 1975–2009, this study analyzes refugee flight patterns. We argue that the asylum destination of refugees is not haphazard but determined by trans-border ethnic linkages. Building on migration theories, we elaborate a theoretical framework for the direction of refugee movements, which includes spatial, temporal and cultural pull factors. The statistical results suggest that refugees flee to nearby countries with ethnic kin populations and a history of accepting other co-ethnic refugees. Thus, sub-national refugee characteristics, such as ethnicity, are essential to understanding the flight direction of refugees.
Introduction
Afghanistan is currently the largest refugee producer in the world, with 2.56 million refugees, followed closely by Syria with 2.47 million. Of those fleeing from Afghanistan, approximately 1.6 million fled to Pakistan, 800,000 to Iran and 10,000 to India and several Western states. Other countries in the proximate region, such as Turkmenistan, Uzbekistan and China, host no significant numbers of Afghan refugees. Furthermore, in 2013, 1.1 million people escaped Somalia. Neighboring Kenya hosts almost half of these refugees. The remaining Somali refugees are settled in Ethiopia (241,000), Yemen (230,500) and Djibouti (19,200), and some are found in other African and Western countries (UNHCR, 2014a, b). These two examples demonstrate that refugee figures strongly differ between asylum states. While refugees are pushed out of their home and involuntarily leave their country of origin, their presence in asylum states is not by chance. Yet why are refugees from the same country of origin unequally distributed among potential asylum states? A better understanding of the destination choice of refugees facilitates the allocation of aid resources and assistance for refugees and host countries.
Although the flight destination of refugees is not haphazard, we have little comparative knowledge on the flight routes of refugees or their journey between home and host state (BenEzer and Zetter, 2014). In addition, no study has yet systematically considered the impact of cultural factors, such as the ethnicity of refugees, on the direction of flight. However, a stronger focus on quantitative refugee information is crucial to understand the causes and consequences of refugee movements as a global and recurring phenomenon (Stein, 1981: 320). Therefore, we introduce a new dataset on the ethnicity of refugees that enables us to systematically assess the flight direction of refugees and test the widespread assumption that refugees are often pulled by transnational ethnic linkages to countries with fellow group members.
The article proceeds as follows: we start by reviewing existing approaches to migrant and refugee flight patterns, whereby most findings are from qualitative case studies. The theoretical section draws on logic derived from general migration models and extends it to a systematic approach explaining forced migration patterns. Considering spatial, temporal and cultural features, we hypothesize that refugees flee to neighboring countries with co-ethnic populations and a history of accepting refugees. In the next section, a new cross-national dataset on the ethnicity of refugees is introduced. We explain coding procedures and present examples. Then, we estimate the predicted number of ethnic refugees in all possible countries of asylum with negative binomial hurdle models and the results are subject to several robustness checks. After examining the empirical results, which support the assumption that many refugees follow trans-border ethnic linkages, we discuss the consequences for potential asylum states.
Literature review: Patterns of voluntary and forced migrations
The following section reviews existing literature on migration patterns. Throughout “human history flight has been an important form of migration” (Petersen, 1958: 261). Refugees emigrate owing to push factors such as violence or persecution, while other migrants, at least to some extent, voluntarily leave their home. Building on Crisp’s (1999a) and Wahlbeck’s (2002) arguments that theories of diaspora or transnationalism also apply to refugee movements, we draw on theory explaining migration destination choices to explain refugee flight trends.
The process of migration includes two stages: the decision to migrate and the direction of migration (BenEzer and Zetter, 2014; Dorigo and Tobler, 1983; Moore and Shellman, 2006; Roseman, 1983; Stein, 1981). Migration is the reaction to push factors in the place of residence and pull factors in another location. Since refugees are forced to migrate, the question arises whether or not they make choices in their departure and their flight destination. Scholarship differs on this issue. Some authors claim that refugees are passive migrants dependent upon institutional forces and thus have no choice in their decision to flee or their destination (Day and White, 2002; Petersen, 1958). Other researchers state that, even if flight is immediate and there are few choices of destination, some form of refugee decision-making process exists (Adhikari, 2012; Castles and Loughna, 2003; Moore and Shellman, 2006; Riddle and Buckley, 1998; Robinson and Segrott, 2002). Several studies explicitly argue that refugees do not arbitrarily leave their home to any possible asylum country but that they are pulled in certain directions (Davenport et al., 2003; Iqbal, 2007; Melander and Öberg, 2007; Morrison, 1993; Neumayer, 2004). We build on these latter works and claim that, while refugees are forced to leave and therefore have little impact on the decision to move or not, they can still consider positive incentives or opportunistic reasons when deciding where to flee.
Given that refugees have some choice, where do they go? The literature unanimously emphasizes geographical proximity as the most important factor for flight patterns (e.g. Iqbal, 2007; Melander and Öberg, 2007; Schmeidl, 1997). Owing to their acute situation, refugees are dependent on spatial proximity and mostly settle in countries contiguous to their home state. Most refugees flee by restricted means and lack resources and thus nearby safe havens are more feasible to reach. Forced migrants initially arrive in border areas and are often unable to move further (Schmeidl, 1997: 296). Also, many refugees refuse to journey on because of their desire to return to their place of origin as soon as possible and to maintain connections with the home state (Crisp and Jacobsen, 1998: 29). Moreover, the access to a possible host country is determined by the terrain, mountains, deserts or obstacles like large bodies of water, which was an issue in the Great Lakes refugee crisis where flight directions during the Rwandan genocide were heavily impacted by the fact that the refugees were unable to cross Lake Kivu. Confirming this claim, the UNHCR (2014a) registers more than 80% of the worldwide refugee caseload within their region of origin in countries of first asylum. However, we elaborate in the theoretical section that vicinity alone does not resolve flight patterns to countries of first asylum, but that the settlement location, ethnic ties and a migration history matter as well.
In addition to geography, previous research highlights the following four major pull categories that affect the refugees’ journey: first, “agency” impacts flight patterns, because many refugees must rely on networks of human smugglers and traffickers. Second, refugees are pulled by political factors, because they have incentives to relocate to places offering more secure institutional conditions, such as peace, public order, rule of law and democracy (e.g. Castles and Loughna, 2003; Iqbal, 2007) as well as liberal immigration laws (Robinson and Segrott, 2002; Schaeffer, 2010). Third, similar to other migrants, refugees consider economic and ecological pull factors, such as better standards of living or employment opportunities (Moore and Shellman, 2006; Morrison, 1993; Neumayer, 2004; Schaeffer, 2010; Warziniack, 2013). 2 Finally, refugees use social networks in facilitating their flight (Hein, 1993: 49). On the one hand, refugees traditionally flee along ethnic linkages to neighboring countries or along past colonial ties, because of existing networks, which facilitate travel and lower assimilation costs (e.g. Moore and Shellman, 2004, 2007; Newland, 1993; Schmeidl, 1997). On the other hand, even more than permanent cultural linkages, most authors stress previous migration movements in determining refugee flight directions (e.g. Crisp, 1999a; Havinga and Böcker, 1999; Iqbal, 2007; Koser, 1997; Melander and Öberg, 2007; Moore and Shellman, 2006; Neumayer, 2004; Riddle and Buckley, 1998; Rubin and Moore, 2007).
Other than broad qualitative evidence for refugee flight patterns along cultural networks, no comparative study provides support for this logic and sub-national refugee characteristics have not been systematically considered. While recent years saw an increase in quantitative refugee studies that focus either on the push and pull factors causing refugee outflows (e.g. Iqbal, 2007; Melander and Öberg, 2007; Moore and Shellman, 2007; Rubin and Moore, 2007; Warziniack, 2013) or the consequences of refugees in host countries (e.g. Choi and Salehyan, 2013; Lischer, 2005; Salehyan and Gleditsch, 2006; Salehyan, 2008), qualitative case analysis is still predominant within the refugee field. Hence, refugee flight patterns have so far not received adequate comparative attention. The new global dataset on the ethnicity of refugees contributes to filling this gap. This study on the direction of refugees to countries of first asylum advances the understanding of global forced migration trends by including the ethnicity of refugees and quantitatively evaluates the widespread assumption that refugees follow trans-border ethnic linkages.
Spatio-temporal pull model of the flight direction of refugees
Refugees and their ethnicity
The following section introduces a new theoretical model explaining the flight direction of ethnic refugee groups. We suggest that cultural pull and spatio-temporal factors affect the refugees’ destination choice. Before outlining the theory to refugee flight patterns, we introduce the definitions of the key concepts “refugee” and “ethnicity”.
Building on the UNHCR’s refugee definition, 3 we regard a refugee as a person who had to leave his country of origin because of conflict or persecution. 4 Political violence that occurs during conflicts, wars, regime changes or under oppressive regimes is the major source of refugee outflows (Davenport et al., 2003; Melander and Öberg, 2006; Moore and Shellman, 2004; Schmeidl, 1997; Uzonyi, 2014; Weiner, 1993, 1996; Wood, 1994; Zolberg et al., 1986).
Arguing that ethnicity influences refugee destination choices, the following paragraph elaborates our understanding of ethnicity. Emphasizing the constructivist nature of ethnicity, we assume that ethnic group membership can be based on different markers with varying relevance in different political scenarios, such as a common language, religion or physical features (Cederman et al., 2010: 13). Lischer (2005: 22–25) and Lebson (2013: 4, 5) argue that the group-based persecution and the experience of flight strengthen the ethnic identity of a refugee group and create strong social and politicized units among them. Thus, ethnicity is especially relevant for refugees, compared with people who have not been affected by conflict. Nevertheless, we acknowledge that many ethnic refugee communities are heterogeneous in terms of political, social and economics views or within-group identification (see e.g. Koinova, 2009: 2).
Many ethnic groups live in more than one country, because ethnic group membership rarely corresponds to state borders, which often were arbitrarily drawn. However, the perception of being a community and the loyalty toward other group members are not restricted by country borders (Davis and Moore, 1997; Saideman, 2002). These transnational ties affect regional politics, in terms of information exchange, trade, common interests and migration processes.
The logic of refugee flight along transnational ethnic linkages
Civil conflicts, which are responsible for the majority of refugees (Schmeidl, 1997; Wood, 1994), often do not affect an entire country, but only certain regions and ethnic groups. This especially applies to territorial land disputes (Walter, 2003). Many ethnic groups are not evenly distributed across their country but are concentrated in particular territories and cities (Wucherpfennig et al., 2011). Consequently, the risk of becoming a refugee is not equal for all groups living in a country (Petersen, 1958: 261). Building on the spatial proximity argument brought forward by the literature as outlined above, we expect the countries that border the settlement territory of an ethnic group involved in conflict to be most attractive to refugees. Hence, forced migrants do not haphazardly spread out to their home country’s neighbors. Contributing to previous research, we claim that, in addition to the distance between countries, the refugees’ settlement location matters for understanding the flight direction. This leads to the first hypothesis:
Trans-border ethnic linkages are strongly correlated to spatial proximity because kinship connections are mostly found in regionally concentrated and contiguous territories. Therefore, we assume that, among the nearby located potential host countries, those with cultural linkages to ethnic kin are most likely to pull refugees. First, ethnic kinship ties lead to informational advantages because of a shared language. Established information networks provide refugees with knowledge of the security situation in the possible asylum state. Second, cross-border ethnic linkages entail existing transportation networks decreasing the costs of flight (Simmons and Elkins, 2004; Zhukov and Stewart, 2013), such as established and maintained streets or public transport. Third, knowledge of the area that has to be crossed and settled decreases flight costs, which is more likely in the territory of co-ethnics. Many forced migrants have to use hidden tracks as they might be further persecuted or involuntarily repatriated. Fourth, refugees can expect a higher degree of acceptance, tolerance and support from ethnic kin groups in the host country (Weiner, 1993: 105). Informal help makes refugees with local ethnic brethren less reliant on international organizations and humanitarian aid. Finally, thanks to a shared language, religion and habits or similar physical appearance, kin refugees are more likely to integrate in the asylum state. Hence, refugees should have strong reasons to relocate to countries with cultural similarities. Therefore, we hypothesize that:
Besides linkages to local residents, refugees are pulled to countries with relevant co-ethnic migrants such as prior refugee movements. Many refugees follow family and friendship ties and use informal contacts (Day and White, 2002; Havinga and Böcker, 1999; Riddle and Buckley, 1998). Thus, refugees are temporally dependent on previous migration movements (Iqbal, 2007; Rubin and Moore, 2007; Stark, 2004). Past flights facilitate and decrease the cost of future flights because of established transportation networks, including official routes and illegal human smuggling agents (Koser, 1997; Lee, 1966). Also, former and future migrants exchange information, increasing this group’s knowledge of the territory (Day and White, 2002; Faist, 1998; Roseman, 1983). Further, earlier refugee crises imply an already established relief infrastructure in the host state, making a “refugee route … more inviting” (Stark, 2004: 328). In addition, earlier settlers often send remittances or provide assistance to newly arriving refugees (Koser, 1997), such as sharing information on registration or accommodation.
However, forced migrants do not haphazardly follow any previous emigration from their home country. Only refugee groups able to resort to border-crossing kinship ties will be affected by transnational networks. For instance, the exchange of information between former and future refugees depends on a common language. Also, co-operation and assistance depend on a shared group identity and loyalty. Ethnic alien forced migrants are unlikely to maintain connections to other identity groups in the home state and should not systematically pull non-ethnic country fellows. Furthermore, refugees have no reason to follow a competing ethnic group from the home country, as this may spread the existing rivalry across the border. Hence, migration cycles depend on ethnic group membership and are not homogenous within countries. Assuming that flight destinations are conditional on time, we expect a path dependency and thus state the following third hypothesis:
New data on the ethnicity of refugees
To test the hypotheses and systematically assess the flight direction of refugees, we need data on the ethnicity of refugees. However, this data has not been readily available until now.
We understand that the ethnic background of a refugee group can be a very sensitive issue. Therefore, refugees are sometimes afraid to share this information and the UNHCR is often unwilling to record and publish it. However, many refugee groups are not homogeneous but consist of several ethnic groups. For instance, in 2009, Kenya hosted refugees from Ethiopia belonging to the Oromo, Amhara or Tigre ethnic group as well as refugees with different ethnic backgrounds from Sudan, Somalia, Uganda, Rwanda and the Democratic Republic of Congo. Knowledge on the ethnicity of refugees is not only important to understand the direction of flight, but the ethnicity might also determine whether a person becomes a forced migrant (Moore and Shellman, 2006; Petersen, 1958). Moreover, the ethnicity can influence how a refugee is received in the asylum country, as for instance cultural similarities with the host population facilitate integration, and refugee status is often provided on ethnic group grounds (Alexander, 1999; Helton, 1983). Also, ethnicity is crucial in the planning of refugee settlements to prevent ethnic rivalries. Hence, data on the ethnicity of refugees is a necessity because refugee groups should not only be analyzed on the country level, but also on a more disaggregated sub-national ethnic group level. To fill this gap in refugee studies, we introduce a new cross-national dataset on the Ethnicity of Refugees (ER).
Refugee stock data and dyadic information on countries of asylum and countries of origin were obtained from the UNHCR (2014b) and the UNRWA (2010). 5 Within each country-dyadic refugee population, we systematically tried to identify up to three of the largest ethnic groups and indicated their share of the total refugee caseload. Although information on the ethnicity of refugees is not provided directly by the UNHCR, it was nevertheless possible to collect information on the ethnic group membership of refugee groups relying on reports and qualitative country assessments from the UNHCR, USCRI, several NGOs, conflict narratives, news articles and several country experts. We used the group list of the Ethnic Power Relations (EPR-ETH) dataset (Cederman et al., 2010) as a source to identify ethnic groups living in a refugee sending country. We collected the ethnicity data for refugee groups that consist of at least 2000 refugees per year between neighboring countries and countries with a maximal distance of 950 km between their borders. Information on borders was obtained from CShapes (Weidmann et al., 2010). Thus, our dataset covers mainly first refugee movements but not secondary flows to third states. Within this framework, we are able to provide worldwide information on the ethnic background of refugees covering the years 1975–2009. The number and the ethnicity of refugees between country-dyads are time variant. These annual shifts are crucial, for example Rwandan refugee outflows alternated between Hutus and Tutsis.
Since precise numbers of refugees from each ethnic group seldom are available, we indicate whether a certain ethnic group within a refugee population was dominant, a majority or a minority. Reports on refugees often give approximate evidence such as: “more than 31,000 [people] from Afghanistan, mostly Hindus, fled to India during the rise of the Taliban in the 1990s” (US Committee for Refugees and Immigrants, 2009). The categorical coding is based on descriptions such as “many refugees” or “few refugees”. These terms and descriptions may have different meanings in different contexts (Cohen, 2013: 466–467). However, it is unlikely that very large ethnic refugee groups are not mentioned in any report or described as small ones. Thus, although the three-point scale is rough and the numbers are approximately estimated, the dataset still permits cross-national comparison of ethnic refugee group shares and estimation of the absolute size of each ethnic refugee group. We applied the rule that, if the refugee population consisted of one dominant ethnic group, then we multiplied the size of the refugee stock, that is, the number obtained from the UNHCR, by the factor 0.95, since there is confidence that at least 95% of the refugees belong to the concerned group. If there was one majority ethnic group within the refugee population, we multiplied it by 0.65. If several ethnic groups were identified within a refugee movement, whereas the coding rules allow for maximally three ethnic groups, the multiplying factors were readjusted according to the rules displayed in Table 1. The total share is mostly below 1 in order to account for uncertainty. 6 For example, Ethiopia hosted 21,018 refugees from Eritrea in 2008; the majority of them were ethnic Tigrinya, the first minority was ethnic Afar, and the second minority was ethnic Kunama. According to these rules, the Tigrinya caseload constitutes 12,611 persons (0.6 × 21,018), the Afar 6305 (0.3 × 21,018) and the Kunama 1051 (0.05 × 21,018).
Share of ethnic refugee group within refugee stock
The ER dataset can be summarized as follows: 3129 country-dyad-years from 1975 to 2009 in the UNHCR dataset qualified for the ethnicity coding; that is, a country-dyadic refugee stock consists of minimally 2000 refugees and the countries are with in a maximal distance of 950 km. We identified 516 different ethnic refugee groups from 189 countries of origin constituting 5197 ethnic refugee group-years of which we estimated the size. Of those 1895 (36%) belong to the category dominant, 939 (18%) are a majority and the remaining 2363 (46%) are a minority. The average ethnic refugee group-year, calculated according to the previously presented estimation rules, consists of 60,000 refugees, and the median ethnic refugee group size is 7000. Four groups constitute more than a million people: the Hutu from Rwanda in Zaire (1994–1995), the Palestinians in Jordan (1992–2009), the Somali in Ethiopia (1979–1980) and the Pashtun from Afghanistan in Pakistan (1980–2009). Approximately 60% of the country-dyadic refugee populations are ethnically homogeneous, that is they are composed of one dominant ethnic group with only small additional minority ethnic groups. The remaining 40% are either composed of a majority with one or two minorities or several minority ethnic groups. Thus, not pursuing a disaggregated approach would underestimate the ethnic complexity of refugee movements.
Conflicts and consequent refugee movements are rare events, thus, most countries neither produce nor host refugees. Between 1975 and 2009, 80% of the refugees relocated to a neighboring or a country in proximity to their home state. Focusing on estimated absolute numbers, refugees who found ethnic kin in the country of asylum constitute 46%. Thus, refugees who flee along border-crossing ethnic linkages constitute a considerable share of the total refugee caseload. The remaining part of this study analyzes these descriptive findings statistically.
Research design
The unit of analysis is directed ethnic group-country-dyad-years, where the origin country in the dyad produces ethnically identified refugees. Since we disaggregate refugee stocks to the ethnic group-level, directed country-dyad-years may appear up to three times in the dataset. For instance, ethnic Azande and ethnic Dinka fled from Sudan to the Central African Republic simultaneously. Thus, our sample includes all years with refugee outflows. To concentrate on pull factors, which determine the flight direction, we only analyze countries producing refugees.
Figure 1 illustrates the outflow of ethnic Acholi from Uganda in 1984. According to the sampling rules the first observation is ethnic Acholi Uganda–Sudan in 1984, the second observation Acholi Uganda–Ethiopia, the third observation Acholi Uganda–Kenya, and so on. Supporting the hypotheses, the majority of Acholi refugees escaped to Sudan, which borders their settlement territory and hosts a local ethnic Acholi minority. The other neighboring countries of Uganda received comparatively few Acholi refugees, or none as in the case of Zaire.

Example of unit of analysis: Acholi refugee outflow from Uganda in 1984.
We consider the spatial dimension of refugee movements by controlling for the minimal distance between an ethnic group’s settlement territory and the possible country of asylum, because refugees are more likely to relocate to nearby countries. Data on settlement zones of ethnic groups was obtained from the GeoEPR dataset (Wucherpfennig et al., 2011). We also include the minimal distance between the home and the host country (Weidmann et al., 2010).
To account for the temporal dependence of refugee flights, that is, the hypothesis that refugees follow previous ethnic kin refugees, we count the years with co-ethnic refugees within a dyad. Further, considering duration dependence, we use natural cubic splines of the years with no co-ethnic refugees with three knots that are placed at equally spaced intervals (Beck et al., 1998).
At the country level, we account for political and economic pull factors that affect the refugees’ destination choice. For these variables, the difference between the sending and the receiving country is used, because many refugees are strongly orientated toward their country of origin and compare their current status with other places (Colic-Peisker and Walker, 2003: 345–346). We include the difference in the share of excluded population between the host and the origin country given by EPR-ETH. We assume that refugees move toward host countries with a high share of people included in state politics, that is, where the political performance is better than in the origin country. As an alternative measurement of the quality of democracy, the difference in the X Polity IV index (Vreeland, 2008) between the host and the source country is included. We claim that refugees flee to countries that are more democratic than their country of origin. Also, refugees, seen as rational utility maximizers, should flee to wealthier countries than their home country, because poor countries have less capacity to accommodate refugees, since providing shelter and food depends on financial and natural resources. Therefore, the difference in the logged and one year lagged annual gross domestic product (purchasing power parity-adjusted real per capita GDP) between the host and the source country is included. GDP data are taken from several sources (Fearon and Laitin, 2003; Gleditsch, 2008; Heston et al., 2011; World Bank, 2011). However, a low GDP and medium X Polity IV value as proxies for low state capacity and consequent accessibility of a state could also lead to more refugees, because forced migrants often have to rely on porous borders to enter a country. Thus, the capacity of a potential asylum state, whether it is either willing to accommodate refugees or unable to control its borders, determines the flight routes of forced migrants (Adamson, 2006). Further, we control for the population size (logged) of the country of asylum, because, as mentioned above, more populous countries can more easily absorb a high number of refugees. Population data are taken from Penn World Tables (Heston et al., 2011). Since several studies found that asylum policies do not systematically impact the refugees’ destination choice (see e.g. Day and White, 2002; Schaeffer, 2010) and because no cross-national data is readily available, we do not control for refugee policies of host states.
Although we focus on pull factors, we consider the push mechanism violence and control for the severity of the conflict that produced the refugees by including the (logged + 1) number of battle deaths in the country of origin provided by Lacina and Gleditsch (2005). 8 Conflict intensities substantially differ between the various countries of origin. A severe conflict is likely to produce more refugees. Usually, unarmed people become refugees, therefore, we control for the impact on civilians by including the roughly estimated annual number of civilians intentionally killed, as obtained from the genocide/politicide indicator from the Political Instability Task Force (2012) consisting of 11 categories (0, 0.5, 1, …, 5; Political Instability Task Force, 2009). A high level of hostilities and fatalities usually generates massive refugee outflows (Kathman, 2011). Both severity dimensions are only available for the country level and do not distinguish between targeted ethnic groups. Table 2 displays the descriptive statistics of the variables.
Summary statistics
Model and results
We apply hurdle models with negative binomial distribution and country of origin-clustered robust standard errors to assess refugee flight patterns. The dependent variable, the number of ethnic refugees, is a count variable with over dispersion and an excessive amount of zero-observations. The distribution of the number of ethnically identified refugees is highly right-skewed even when focusing on non-zero observations exclusively. 9 The strongly zero-inflated count of refugees is because few dyads experience refugee movements. Refugees seldom flee to all possible countries of asylum. Hence, we use a hurdle model, which is appropriate if the binary zero and non-zero observations and the count of non-zero observations, that is when the “hurdle is crossed”, in the data are generated by two different processes (Mullahy, 1986: 345). Thus, we expect two different situations: first, whether a country hosts any refugees; and second, the count of refugees. Some factors determine whether any refugees are found in a country, and a different, but not mutually exclusive, set of variables determines the count of these refugees, once there are refugees in a given asylum state. This is the difference from the more often used zero-inflated count model, where two different processes generating the zero observations are assumed. In the present analysis, we only have one process generating zeros because we reduced the sample to relevant directed dyads consisting of refugee-sending states and potential countries of asylum.
The two data-generating processes can be modeled separately. Consequently, the hurdle model contains two parts (McDowell, 2003: 178): we use logit for the binary outcome of whether any refugees are present. The count of ethnically identified refugees is governed by a zero-truncated negative binomial distribution. The number of refugees ro,a from country of origin o in country of asylum a is based on a vector of variables specific to the country of origin
Table 3 shows the results of the hurdle regression models. The first part displays the zero-truncated negative binomial regression model of the count of refugees greater than zero while the second part depicts the binary logit regression for zero or non-zero refugees. According to the two different processes generating ethnic refugees, the two equations include a different set of explanatory variables.
Regression results, number of ethnic refugees
Standard errors in parentheses (clustered on country of origin).
Cubic splines of years without refugees not shown.
p < 0.1, **p < 0.05, ***p < 0.01.
The main independent variable in the first model measures whether or not the refugees have transnational ethnic linkages to the country of asylum. The spatio-temporal and control variables are added in the second model, including a binary variable measuring whether the country of origin and asylum are neighbors. In Model 3, the logged distance between an ethnic group’s settlement territory and the possible country of asylum is added. The Polity control variable is omitted in the fourth model because of many missing values resulting in a higher number of non-zero observations. In the fifth model, the refugees’ TEK linkages are disaggregated according to whether they affect politically included or excluded ethnic groups. Model 6 includes an interaction term of the distance between an ethnic group’s settlement area and the possible country of asylum with the presence of trans-border ethnic ties. In all models, we test for over dispersion of the data with the parameter α, which is positive and significant, confirming that the observations are over dispersed and that they follow a negative binomial and not a Poisson distribution.
The first bivariate model confirms Hypothesis 2 that refugees move along transnational ethnic linkages: country-dyads where the refugees produced in the origin country have trans-border ethnic ties to the possible country of asylum receive significantly higher numbers of refugees than the country-dyads without ethnic links. Thus, ethnic pull factors are determinant for refugee flight patterns. The second part of Table 3, displaying the binary model of refugee presence, reveals that trans-border ethnic ties also increase a country’s chances of hosting refugees.
The second model indicates that temporal factors also have a strong impact on the count of refugees: the number of refugees within a dyad is positively and significantly affected by previous refugee movements, yielding evidence for the third hypothesis. Similarly, neighboring countries observe a higher predicted count of refugees compared with the other countries in the sample, which are within a maximum distance of 950 km. Hence, Hypothesis 1, that spatial factors matter for the flight direction of refugees, is supported. Further, the larger the kin group compared with other ethnic groups in the host country, the higher the predicted number of co-ethnic refugees. Also, the total population size of the potential host country has a positive and significant effect on refugee numbers. Thus, refugees are particularly pulled by large co-ethnic groups. However, against expectations, conflict-involved kin groups do not significantly deter forced migrants. As a push factor, the number of battle-related deaths in the home country significantly increases the number of refugees within a dyad. The magnitude of civilian deaths, however, has no significant effect. The remaining control variables accounting for political and economic pull factors do not have a significant impact. This is in contrast to previous studies that suggest that migrants are pulled by democracy and wealth. Hence, using more refined refugee data, we find that geography and cultural similarities are better predictors of refugee destinations than the economic situation or the governance type in a country.
The logit part of Model 2 indicates that the probability of refugees is significantly determined by spatio-temporal factors, for example, neighboring dyads are more likely to observe refugees. Hence, distance matters and refugees, owing to their restricted means of transportation, usually relocate to neighboring countries. Further, the probability of receiving refugees increases with every year the dyad observed refugees. Thus, temporal dynamics are important because refugees follow previous forced migrants. The chances of hosting refugees decrease with the ongoing absence of refugee movements, and refugee-producing conflicts are less likely to break out after longer times in peace.
Instead of the binary variable measuring whether the country of origin and the possible country of asylum are neighbors, in Model 3, the logged minimal distance between an ethnic group’s settlement territory and the possible host country is added to further test Hypothesis 1. The coefficient is negative and significant. Hence, the farther away a country is from the region where a group lives, the lower the predicted number of refugees is, which underlines the relevance of spatial proximity in refugee flight studies. The effect of trans-border ethnic ties on refugee figures remains positive and significant when controlling for the distance. However, the logit model reveals that precise spatio-temporal features alone account for the risk of hosting refugees, because the coefficient for trans-border ethnic ties becomes insignificant. This is partially explained by the strong correlation between transnational ethnic ties and distance between two countries as cross-border co-ethnics are almost exclusively settled in contiguous territories.
In Model 4, we excluded the control variable measuring the difference in the Polity value between the countries of origin and asylum. Now we have an increased number of non-zero observations, that is, directed dyads experiencing a refugee movement. Again, ceteris paribus, the predicted number of refugees is higher in country-dyads where the refugee group has ethnic linkages, where the distance between the group’s settlement territory and the asylum state is short and where there were earlier refugee flows. Based on Model 4, Figure 2 displays the difference in the average predicted count of ethnically identified refugees for countries with and without trans-border ethnic ties.

Predicted count of refugees with and without trans-border ethnic linkages.
To test the claim that refugees consider the political status of their kin group, in Model 5, the main explanatory variable is divided into two binary variables measuring (1) whether the transnational ethnic kin group of the refugees is politically powerful or not, and (2) whether or not it is excluded from central decision-making. The assumption that refugees flee toward politically powerful ethnic kin groups is supported, however, with a 10% error margin. This suggests that refugees not only consider cultural similarities in their destination choice, but to a lesser degree also consider political arguments. However, political marginalization does not significantly deter refugees. Many more EPR-TEK groups are politically excluded than included, and refugees, if they are not naturalized, will not have political rights in the country of asylum and thus political opportunities play a less important role, particularly in the short term.
Assuming the effect of cross-border cultural linkages to be conditional on the distance, in Model 6, we added an interaction term of ethnic kin links and the logged distance. The single coefficients change neither signs nor significance. To be able to interpret the interaction term, Figure 3 shows the first difference in the mean predicted number of refugees between dyads that are linked by ethnic ties and that are not as a function of increasing distance. The gray area depicts the 95% confidence interval.

Interaction effect of transnational ethnic ties and distance (logged) on refugee numbers.
The figure demonstrates that, while trans-border ethnic ties do not significantly affect the count of refugees in contiguous states, transnational ethnic linkages significantly increase the predicted number of refugees in countries that are more distant from the home territory of an ethnic group. The binary logit regressions of Models 4–6 reveal that the probability of hosting any refugees mainly depends on the distance to a possible host country and previous refugee groups. A Wald test of the full count model compared with a restricted model without trans-border ethnic kin confirms that including transnational ethnic linkages produces a better fit to the number of refugees.
Sensitivity analysis
As a robustness check of the results supporting the hypotheses that refugees travel short distances and follow ethnic ties or previous refugees, we calculate five additional models, displayed in Table 4.
Robustness checks, number of ethnic refugees
Standard errors in parentheses (clustered on country of origin).
Cubic splines of years without refugees not shown.
p < 0.1, **p < 0.05, ***p < 0.01.
Model 7 includes dummy variables for the region of the country of origin to correct for unit-specific heterogeneity. They comprise America (baseline category), Europe, Sub-Saharan Africa, North Africa, the Middle East, West Asia and South East Asia. The partially significant coefficients for the world regions reveal that there are regional differences, and the impact of border-crossing ethnic connections on the predicted count of ethnic refugees becomes insignificant. However, the positive and significant effect of the size of the trans-border kin group on the number of refugees stands. 10
The eighth model is estimated for ethnic refugee groups consisting of fewer than 100,000 refugees only. It excludes the observations above the 98.5th percentile and allows one to control for whether the outliers with high numbers of refugees, up to over 1 million, have strong leverage on the results. The coefficient for ethnic linkages remains positive and significant with a 5% error margin, which is a consequence of the smaller number of observations. Similarly, the size of the TEK group is positive and significant with low standard errors.
In Model 9, we use different estimations of the refugee figures presented in Table 1. Numbers of ethnic refugee groups have to be handled with care because they are estimated according to whether the coders identified the group as dominant, majority or minority within a refugee population. Thus, we recalculated the absolute ethnic refugee group sizes with more conservative numbers to prevent the risk that the real refugee sizes are overestimated. We multiplied the absolute size of the country-dyadic refugee stock, as obtained from the UNHCR (2014b) and the UNRWA (2010), by the factor 0.75 for dominant ethnic refugee groups, 0.5 for majority groups and 0.05 for minorities. These multipliers are the lowest values still in accordance with the coding instructions of the ER dataset. Yet, using these restricted refugee group sizes does not change the results.
Model 10 is calculated with a zero-inflated negative binomial (ZINB) regression. The ZINB model is similar to the hurdle model but is more frequently used in social sciences. Comparing the coefficients of this model with Model 4 of Table 3 shows very consistent results. Thus, the claim that refugee numbers depend on ethnic networks and spatial proximity is again supported. Only the coefficients of the second part of the ZINB model change signs, because, in contrast to the hurdle model, here the probability of not hosting refugees is computed.
Model 11 estimates the probability of receiving ethnic kin refugees with a simple logit model. Consistent with the second part of the hurdle models, the dependent variable is binary (1 for country-dyads with refugees and 0 without), but we include all explanatory variables used in the count equations. In contrast to the count models, countries where refugees find ethnic kin have no significantly higher probability of hosting these refugees. However, the coefficient for the relative size of the EPR-TEK group is positive and significant. Thus, larger ethnic groups have a higher probability of hosting any co-ethnic refugees. Again, time dynamics and space affect the chances of receiving refugees.
Conclusion
This study introduces a new cross-national dataset on the ethnicity of refugees and examines the flight direction of refugee groups to countries of first asylum. Qualitative evidence suggests that many forced migrants seek refuge among trans-border ethnic kin. However, comparative studies mainly focus on the different push factors producing refugees. Only a few authors analyze pull mechanisms, finding that refugees tend to be drawn to democratic and wealthy states and that they are highly affected by spatio-temporal measures, that is, distance and previous migration flows. This study provides the first systematic test of the claim that refugees follow transnational ethnic linkages. The analysis of the suggested causal logic combining refugee destinations and border-crossing group identities increases the understanding of the relevance of ethnicity of refugees and contributes to the knowledge of refugee flight destinations, consequently facilitating the management of refugee crises.
Drawing on migration theory and assuming that refugees and voluntary migrants have common characteristics, we claim that refugees are not only affected by so-called push factors, such as violence and persecution in the country of origin, but also by pull factors impacting the direction of the refugee movement. Hence, forced migrants do not leave their homes for haphazard host states. Therefore, we introduce an ethno-political pull model with spatio-temporal features to approach the complexity of refugee flight processes. We hypothesize that, first, refugees seek asylum in countries as close as possible to their ethnic group’s settlement territory because of feasibility and lower costs. Second, refugees are pulled to countries with cultural and ethnic similarities. Trans-border ethnic groups often feature established border-crossing networks such as transportation or informations flows. Also, cultural similarities between refugees and the host population facilitate accommodation and integration. Third, we hypothesize that refugees move along similar patterns as earlier co-ethnic refugee groups.
Since no comparative information on the ethnicity of forced migrants was available, we present novel worldwide data on the ethnicity of refugees. The dataset records the ethnic composition of all country-dyadic refugee stocks consisting of at least 2000 persons as obtained from the UNHCR (2014b) and covers the years from 1975 to 2009. The statistical results indicate that refugees follow transnational ethnic linkages. Countries that are linked by ethnic ties to a refugee group have a higher predicted count of refugees than the countries without these ethnic ties. Thus, although forced to leave their home, refugees consider cultural pull factors directing them toward certain countries of asylum. Also, we discover that possible host countries where the refugees’ kin group is relatively large have a higher predicted count of refugees. The causal mechanism behind this is that refugees are pulled by larger co-ethnic groups who have better capacities to absorb the influx. Further, confirming earlier quantitative and qualitative studies, all models reveal a strong spatial dependence of refugee movements: refugees are more likely to move to neighboring states, in particular, to countries of asylum that are closely located to the settlement area of the concerned ethnic group. Finally, refugees often follow previous refugee or migration groups. Hence, refugees are temporally dependent on previous migration flows because of established transportation networks and aid facilities. However, focusing on ethnic refugee groups, in contrast to the country-level approach of previous studies (e.g. Iqbal, 2007; Moore and Shellman, 2007), we did not find significant evidence that refugees are pulled by other political or economic factors. Governments refusing to host refugees, thus, falsely raise fears that refugees seek wealth only.
To conclude, the results confirm that many ethnic refugee groups flee to kin groups in neighboring countries, suggesting that sub-national refugee characteristics such as ethnicity are essential to comprehend refugee flight patterns. Hence, this study has shown that refugee locations are not haphazard and are dependent on ethnic linkages. This knowledge can help stakeholders to better prepare and distribute humanitarian aid in future refugee crises.
Footnotes
Appendix
Acknowledgements
We thank Lars-Erik Cederman, Simon Hug, Idean Salehyan, Anne Hammerstad, Sarah Lischer, Fabien Cottier, Caroline Hartzell and three anonymous reviewers for their helpful comments. We also thank Nadja Schloss for her valuable research assistance.
Funding
This research was funded by the Swiss Network for International Studies.
