Using geolocated Twitter data to study recent patterns of international and internal migration in OECD countries

Ingmar Weber, Yahoo! Research Barcelona
Kiran Garimella, Qatar Computing Research Institute
Emilio Zagheni, Queens College, City University of New York (CUNY)
Bogdan State, Stanford University

Data about migration flows are largely inconsistent across countries, typically outdated, and often inexistent. Despite the importance of migration as a driver of demographic change, there is limited availability of migration statistics. Generally, researchers rely on census data to indirectly estimate flows. However, little can be inferred for specific years between censuses and for recent trends. The increasing availability of geolocated data from online sources has opened up new opportunities to track recent trends in migration patterns and to improve our understanding of the relationships between internal and international migration. In this paper, we use geolocated data for about 500,000 users of the social network website "Twitter", during the period May 2011- April 2013, for OECD countries. We evaluated, for the subsample of users who have posted geolocated tweets regularly, the movements within and between countries for independent periods of four months, respectively. Since Twitter users are not representative of the OECD population, we cannot infer migration rates at a single point in time. However, using a difference-in-differences approach, we could evaluate trends in out-migration rates for single countries, and the heterogeneity in mobility patterns of migrants and non-migrants. We obtained estimates of the age and gender of users using a face recognition software (Face++) with the profile pictures of users. Preliminary results indicate that the approach may be useful to predict turning points in migration trends. That is particularly relevant for migration forecasting. We observed quite a bit of heterogeneity in the relationship between within- and across-countries mobility for OECD countries. Our analysis relies uniquely on publicly available data that could be potentially available in real time and that could be used to monitor migration trends.

  See extended abstract

Presented in Session 47: International migration and migrant populations