This article explores new methods for gathering and analyzing spatially rich demographic data using mobile phones. It describes a pilot study (the Human Mobility Project) in which volunteers around the world were successfully recruited to share GPS and cellular tower information on their trajectories and respond to dynamic, location-based surveys using an open-source Android application. The pilot study illustrates the great potential of mobile phone methodology for moving spatial measures beyond residential census units and investigating a range of important social phenomena, including the heterogeneity of activity spaces, the dynamic nature of spatial segregation, and the contextual dependence of subjective well-being.
Questions of human location and movement are of central importance in demography, but answers are limited by our frequent inability to observe and measure where people are and where they are going. Census data tell us about general patterns of human settlement and migration but very little about where people spend their time, or where and how they travel. Interviews and surveys give us more detail, but these are constrained by scale and self-reporting errors, and they often fail to provide a dynamic picture. We are thus confronted with an important obstacle as we try to better understand a wide range of human activity, from immigration and urbanization to social networks; from the environmental effects of population growth and the health effects of environmental exposure to the spread of diseases and the distribution of social or cultural capital.
Mobile phone techniques offer a way around this obstacle. With their increasing worldwide popularity and sophistication, the research potential of mobile phones is quickly becoming apparent, and the possibilities for using them to record location and movement are especially promising. Any given mobile phone can typically be traced to the cellular tower of closest proximity, and a user’s movement can be estimated by analyzing the phone’s transitions from one tower to another. More accurate location data may be obtained if a phone is capable of processing signal information from multiple cellular towers, from nearby devices, or from satellites (Eagle et al. 2009b). Phones with these capabilities are now firmly entrenched in the market and gaining widespread use around the world. That many of these phones have a variety of additional sensing capabilities beyond location, store huge amounts of data, and function as miniature computers adds to their usefulness (Raento et al. 2009).
In this article, we describe a pilot study—the Human Mobility Project (HMP)—that we conducted to explore the use of mobile phones in demographic research and to test one particular technique for which they may be especially well suited: a dynamic, location-based survey. The pilot study uses global positioning system (GPS) and cellular tower data from mobile phones not only to determine subjects’ residence locations (to which we are limited when using census data) and observe how they respond to questions at a fixed time and place (to which we are limited when relying on traditional survey methods) but also to examine where they spend time when not at home, their trajectories, and how they respond to questions at a variety of different times and places. The study provides a partial glimpse at the answers to such questions as how traditional concepts of neighborhood might be adjusted to better account for the varied spaces in which people carry out their day-to-day activities, how patterns of spatial segregation observed in residential housing permeate other aspects of life, and how subjective well-being depends on spatial context. This study demonstrates the powerful ways in which demographers may use mobile phones to move beyond residential census units and “put people into place” (Entwisle 2007), more fully capturing the complex relationship between spatial behavior and local environment. In addition, this study identifies some of the limitations and risks associated with mobile phone research, including the need for appropriate safeguards to protect subjects’ privacy in light of the particular problems presented by spatial data (Gutmann et al. 2008; VanWey et al. 2005).
We start with an overview of the basic problem of studying human movement, the mobile phone technology that can be used for this purpose, and the research that has made use of this technology to date. We then describe our pilot study and present our results. Finally, we evaluate the study methodology and discuss ways in which this research complements and extends the field of demography.
Mobile Phones as Demographic Instruments
The scientific value of observing human spatial variation has long been apparent (e.g., Anderson and Massey 2001; Du Bois 1899; Park and Burgess 1925), and spatial questions are increasingly at the core of research in a variety of disciplines. Specific research questions are diverse but may be grouped into three basic categories: (1) those that aim to describe and explain patterns of location and movement (e.g., Golledge and Stimson 1997; Hägerstrand 1970), (2) those concerned with how people influence their spatial context (e.g., Cassels et al. 2005), and (3) those concerned with how spatial context influences people (e.g., Inagami et al. 2007). In calling on demographers to take the lead in researching how neighborhoods influence health, Entwisle (2007) demonstrated the interconnectedness of the three categories and the importance of treating them together. People are not merely passive objects on which neighborhoods operate, she argued, but rather agents who make choices about where to spend their time and who modify those places through their presence and activities (see also Cummins et al. 2007). Entwisle’s point can be extended beyond health research and highlights a need that mobile phone methods are well suited to address.
By making it possible to systematically follow people as they move, mobile phones help overcome the limitations of censuses and other methods that focus on static places.1 They allow us to reconstruct detailed movement trajectories and analyze the space through which people move as they go about their daily activities (Golledge and Stimson 1997). Although this can be done using more traditional methods, such as time-space diaries or interviews (Basta et al. 2010; Goodchild and Janelle 1984; Janelle et al. 1998; Jones and Pebley 2012; Kwan 2008, 1999; Lee and Kwan 2011; Matthews 2011; Wang et al. 2012; Wong and Shaw 2011), mobile phones avoid many of the accuracy and logistical problems inherent in self-reporting (Golob and Meurs 1986; Kwan and Lee 2004; Murakami and Wagner 1999; Stopher et al. 2007). Moreover, because people incorporate mobile phones into their daily lives naturally, these devices are particularly well suited for measuring people’s reactions and capturing information about their surroundings in real time, facilitating techniques such as ecological momentary assessment or the construction of precise individual environmental exposure profiles (Ahas 2011; Borrell 2011; Kwan 2009; Nusser et al. 2006; Raento et al. 2009; Shiffman et al. 2008).
Mobile phone location tracking has been the subject of a body of research that has expanded rapidly over the past decade as an array of commercial and scientific applications have become technologically and socially feasible (Ahas 2011; Ahas and Mark 2005; Asakura and Hato 2004). Many mobile phones can estimate their own locations using signals from cellular towers, satellites, or other beacons, but even those that lack this capability can be localized externally based on the traces they leave on cellular networks and other receivers (such as Bluetooth or Wi-Fi devices). Data collection techniques can be divided between active positioning, in which subjects volunteer to be tracked and data are collected from their phones or their phones’ traces, and passive positioning, in which trace data are obtained directly from network operators without contacting the individuals being tracked (and also, consequently, without much information about these individuals, apart from location) (Ahas 2011). Alongside these collection techniques has come the development of methods for organizing and evaluating spatiotemporal data, including techniques for analyzing geospatial lifelines—the paths formed in space-time by successive location estimates of each tracked individual (Laube et al. 2005, 2007).
Within the social sciences, mobile phones have been used to address such topics as social networks (Eagle et al. 2009a; Pentland 2007), mobility and migration (Ahas 2011; Gonzalez et al. 2008; Wesolowski and Eagle 2010), crowds and urban dynamics (Calabrese et al. 2007, 2010; Reades et al. 2007), transportation behavior (Ahas 2011; Asakura and Hato 2001, 2004; Reddy et al. 2010), health behavior (Anokwa et al. 2009; Madan et al. 2010), and happiness (Killingsworth and Gilbert 2010; MacKerron 2012). All this mobile phone research has come alongside the emergence within the social sciences of a “mobilities” paradigm, which emphasizes the role of movement in creating social and material realities (Adey 2010; Büscher and Urry 2009; Büscher et al. 2011; Sheller and Urry 2006; Urry 2007).
The results of this research have been impressive, but much remains to be explored in terms of methods and their integration within theoretical frameworks (D’Andrea et al. 2011). Scholars have complained that existing methods deal poorly with the “fleeting,” transitory nature of movement (Büscher et al. 2011; Law and Urry 2004). D’Andrea et al. (2011:151) noted that methodological development has lagged behind empirical and conceptual advances and “the examination of basic research protocols has only emerged recently.” In particular, they emphasized the need for methods that link micro- and macro-level observation.
From the perspective of demography, mobile phone methods offer new sources of data to answer traditional questions while also suggesting new ways of formulating questions and thinking about populations to begin with. Mobile phones can be used to measure traditional migration events in populations that are otherwise hard to reach. For example, Wesolowski and Eagle (2010) used cellular tower data to infer changes of residence within a Nairobi slum that lacked the administrative or census records that would have been used for this purpose in other settings. By taking this approach, they were able to measure a high rate of residential turnover, much of it attributable to movement within the slum—something that would have been very difficult to detect and quantify using traditional methods.
Mobile phones also make it possible to understand population in a much more dynamic way, untethered to residence, adding an empirical, demographic grounding to the emerging mobilities paradigm (D’Andrea et al. 2011). Isaacman et al. (2010) used mobile phone call records to analyze daily movement patterns in New York City and Los Angeles, finding significant variation by day of the week, city, and even neighborhood. Calabrese et al. (2007, 2010), Carioua et al. (2010), Ratti et al. (2007), Ratti et al. (2006), and Reades et al. (2009) tracked the ebbs and flows of urban populations throughout the day as people go to and from work, attend concerts, visit their weekend homes, and engage in all the other activities that we tend to ignore when asking traditional demographic questions like, “What is the city’s population?”
Viewing population from this perspective can help us to think about what we even mean when we talk about a city and how we should draw urban boundaries. It can help us to better understand what Bertolini and Dijst (2003) described as the increasing divergence between modern cities’ social and physical dimensions. Similarly, it can help us find more meaningful ways to define neighborhoods for research purposes given the inherent limitations of commonly used census units, which exclude a great deal of spatial information (Lee et al. 2008; Reardon et al. 2008) and fail to capture individuals’ actual experiences and exposures (Basta et al. 2010; Chaix et al. 2009; Entwisle 2007; Kwan 2009; Matthews 2011).
The dynamic perspective can also help us to investigate a variety of questions that blend demography with other disciplines. For example, the individual health consequences of environmental exposure and the environmental and epidemiological impacts of population growth depend on the activities in which the population’s members are engaged and the spaces through which they move (Cassels et al. 2005; Cummins et al. 2007; de Castro et al. 2006; Inagami et al. 2007; Martens and Hall 2000). Tracking human movement and environmental exposure can thus provide more insight than simply counting the number of residents within an administratively defined boundary, and mobile phones offer a way to do this at high resolutions (Wiehe et al. 2008). Similarly, many of the social consequences of residential segregation depend on where people spend time when not at home (Ellis et al. 2004). Although place of residence may sometimes be a good proxy for this, there is much to be gained by measuring movement outside the home more precisely. Studies based on self-reported places of work and travel diaries show notable differences in urban space use depending on individuals’ sociodemographic characteristics (Ellis et al. 2004; Jones and Pebley 2012; Zandvliet and Dijst 2006).
In our Human Mobility Project (HMP) pilot study, we tested the feasibility of using mobile phone positioning to obtain spatially rich demographic data and administer a dynamic, location-based survey using participants from around the world who volunteered by downloading an application and installing it on their own phones. The linking of phone position with survey administration is at the forefront of methodological developments in this area (Ahas 2011), providing a way to merge localized response information with larger-scale movement patterns (D’Andrea et al. 2011). Moreover, the active positioning of volunteers with their own phones anywhere in the world is an approach that has been less well explored than the collection of cellular tower data (Blumenstock et al. 2010; Calabrese et al. 2007, 2010; Carioua et al. 2010; Isaacman et al. 2010; Ratti et al. 2006, 2007; Reades et al. 2009; Wesolowski and Eagle 2010) or the tracking of volunteers who are given phones and who often travel through fixed areas containing pre-positioned beacons (Asakura and Hato 2004; Eagle and Pentland 2006, 2009; Eagle et al. 2009a; Raento et al. 2009; Wiehe et al. 2008). These latter two approaches can be quite powerful because cellular towers yield massive quantities of data, while phone distribution improves sample representativeness and pre-positioned beacons boost accuracy. However, the former lacks demographic detail, and the latter has limited scalability.
Because our approach makes it possible to ascertain respondents’ characteristics, it has the potential to provide richer data than that obtained from cellular towers while also allowing larger scales and broader ranges of geographic coverage than phone distribution studies. By harnessing the efforts of large numbers of volunteers to participate remotely in the creation of knowledge, our approach is in the spirit of what Goodchild (2007) described as the volunteered geographic information movement. It is similar to approaches currently being implemented by researchers at the London School of Economics (MacKerron 2012)2 and Harvard (Killingsworth and Gilbert 2010).3 Whereas those studies focus on, respectively, environmental and psychological determinants of happiness, we focused on methodology and explored its application to demographic and sociological questions.
We centered the study on three substantive topics: activity space, spatial segregation, and subjective well-being.
Activity space is the term behavioral geographers use to describe the “locations within which an individual has direct contact as a result of his or her day-to-day activities” (Golledge and Stimson 1997:279). It is the component of the individual’s overall environmental interaction that involves movement and direct contact, as opposed to communication. Activity space is potentially a much better measure of environmental exposure than measures based on census units (Basta et al. 2010; Kwan 2009; Matthews 2011), but the latter continue to dominate research in many fields. It has long been recognized that people spend significant time far from home and that the very notion of “neighborhood” is immensely difficult to define (Foley 1950; Matthews 2008). Gathering data on activity space and refining our understanding of place have been flagged as important priorities, particularly for research on health and environment (Matthews 2011). In our study, we explore subjects’ activity spaces, analyzing the sizes of these spaces using the minimum convex polygon method, which takes the area of the smallest convex polygon that may be drawn around an individual’s locations (Buliung and Kanaroglou 2006; Jones and Pebley 2012).
Spatial segregation along lines of race, ethnicity, and class is an important topic of study because it is usually associated with deep and pervasive inequality (Anderson and Massey 2001; Massey and Denton 1993). What social scientists generally measure when examining this topic is residential segregation (e.g., Massey and Denton 1988), but segregation occurs throughout the day as people engage in a variety of activities (Ellis et al. 2004; Wang et al. 2012; Wong and Shaw 2011). This may be partly driven by the interplay between home locations and the locations of jobs, schools, and other socially significant places. Indeed, the mismatch between suburban job locations and segregated central city neighborhoods provides one explanation of how segregation concentrates unemployment and poverty (Kain 1968; Wilson 1987, 1996). Segregation outside the home may be driven also by a variety of other factors ranging from transportation network design to policing tactics to individual preferences. As a starting point for investigating the question using mobile phones, we explored the amount of time subjects spent in census blocks with different racial characteristics.
Finally, subjective well-being has been a topic of social inquiry since ancient times, and it constitutes an important measure by which to judge the impact of policies and events on people’s lives (Helliwell 2003; MacKerron 2012). Subjective well-being is generally conceived to encompass both affective components (such as feelings of happiness or pain) and cognitive components (such as evaluations of life satisfaction) (Diener et al. 1999). People’s self-reports about both types of components may be influenced by a variety of time-varying personal and environmental factors, and their reports about affective components, in particular, may depend on whether they are communicated as the event is being experienced or recalled afterward (Kahneman and Krueger 2006). Consequently, there are important reasons to employ methods that can record people’s subjective feelings along with contemporaneous contextual data. This is what we attempted to do in this pilot study, employing a location-based survey drawn from the group of methods known as experience sampling (Csikszentmihalyi and Hunter 2003). Although experience sampling methods generally ask a broader array of affect questions as well as questions about activities and surroundings, we kept our survey simple and relied for context on basic location information—namely, distance from home—to test what may be gained by combining this type of survey with a positioning device.
The pilot study had three main components: (1) an open source application that subjects could install for free on any device with an Android operating system, (2) a structured query language (SQL) database that communicated with the phone application to trigger location-based surveys and collect data, and (3) a project website where subjects could securely register and activate the application, provide us with basic demographic information about themselves (age, sex, and race4), and view some of the data being collected. All user interfaces were in English.
We chose to write the application for the Android platform because it is an open source operating system used by a growing number and variety of mobile devices around the world, and because its ability to run applications and processes in the background makes it well suited to location monitoring (Arnold 2010; Chang et al. 2010). The Android operating system is used on hundreds of phone models available on every continent, and by late 2011, it was installed on more than one-half of all new smartphones sold worldwide (Yarow 2011).
We made the application available for download for three weeks in early 2010 via the Android Market (now, Google Play). Those who downloaded it were directed to the HMP webpage, where they could activate the application and consent to participate in the study.
The application ran in the background of subjects’ phones and recorded location estimates at intervals of 2, 5, 10, 30, or 60 minutes.5 The estimates were based on GPS satellite signals when available, and otherwise on cellular tower signals. The application transmitted estimates to a Princeton server, and this server triggered surveys on phones when they entered predetermined locations. This feature would make it possible to focus, for instance, on subjective well-being near major landmarks. For the pilot study, our interest was in the ordinary places people visit throughout the day, so we continuously updated the trigger points to include subjects’ actual locations, and we added a stochastic component to the trigger itself to avoid overwhelming subjects with too many surveys.
Although it would be possible to use multiple survey questions, each one triggered at a different location or time, we used just one question for the pilot: “How happy are you?” Subjects could respond by touching the phone’s screen to select between zero and five stars (with half-star increments) or the response of “Ask me later.”
During the three weeks when the application was available from the Android Market, 869 people downloaded it, and 270 registered and consented to be research subjects. Their self-reported races and sexes are summarized in Table 1, and self-reported ages are shown in Fig. 1.
We collected location estimates and survey responses from subjects during a five-week period. Subjects’ participation times varied, but 128 subjects participated for at least 24 hours, 62 participated for more than one week, and 13 participated for more than two weeks. The mean and median participation times were, respectively, 6.3 days and 2.2 days. We received a total of 68,198 location estimates, of which 29 % were based on GPS signals and the rest on cellular tower signals. We retained and analyzed 67,294 location estimates while discarding 904 (1.3 %) because of accuracy concerns.6 Within this analyzed subset, the mean and median Android-reported accuracies were, respectively, 1,210 meters and 1,000 meters for all estimates combined, 261 meters and 48 meters for the GPS estimates, and 1,586 meters and 1,318 meters for the cellular tower estimates. Figure 2 maps the approximate centroid of each subject’s set of location estimates.7 Most subjects were in the United States, but the project also drew volunteers from Australia, Canada, China, France, Germany, Israel, Japan, Norway, South Korea, Spain, Sweden, and the United Kingdom.
The individual movement patterns revealed by each subject’s location estimates exhibit varying degrees of regularity. As an example, Figs. 3, 4, 5 and 6 plot the movement of two subjects with notably different behavioral patterns. We selected these subjects because their differences are so clear and because their participation times are among the highest, each lasting a little more than 30 days. Figures 3 and 4 show the subjects’ full sets of location estimates (points), with sequential estimates connected by straight lines, and scale indicated by axis tick marks (1 km intervals). To protect privacy, we transformed the coordinates such that the location estimates maintain their relative distances, but it is impossible to identify subjects’ actual locations on Earth.8
Most of the estimates in these plots lie on top of one another and cannot be distinguished. Figures 5 and 6, therefore, plot the same subjects’ patterns with separate panels for weekday (a) and weekend (b) locations and a small amount of random noise added to each estimate to produce artificial spreading and thus aid in visualization. Location estimates in these plots are marked with points shaded (in the online version, colored) by the hour of the day when transmitted (in the subject’s time zone). As in the previous plots, sequential location estimates are connected by straight lines and scale is indicated by axis tick marks (1 km intervals).
Each of the displayed subjects’ movement patterns is characterized by a high degree of clustering, and this is the case for many of our other subjects as well. In addition, each subject has at least two clusters, suggesting that daily movements are not random but rather concentrated in specific locations that have some functional saliency to the subject. Moreover, the clustering of shades in Figs. 5 and 6 suggests that when subjects return to locations on multiple occasions, it is often at the same time of day. Finally, we see different patterns on weekends than on weekdays. Some areas are visited only on weekdays, and some only on weekends. Even in areas visited on both weekdays and weekends, the timing of the visits may differ, as can be seen by the predominance of different sets of shades in each panel of Figs. 5 and 6.
There are also notable differences between the activity spaces of these two subjects. For example, Subject 1 (Figs. 3 and 5) traveled through a much larger area than did Subject 2 (Figs. 4 and 6). Likewise, the main clusters of location estimates are spread further apart in the case of Subject 1 than in that of Subject 2. The area of the smallest convex polygon within which each movement pattern fits (without the addition of any random noise) is 827 km2 for Subject 1 but only 8 km2 for Subject 2.
These types of differences can be seen in the movement patterns of our other subjects as well. Among the 96 who participated for more than 24 hours and provided more than 50 location estimates, the smallest convex polygons within which their movement patterns fit range from 0.01 km2 to 56,000 km2, with a median of 150 km2 and an interquartile range of 60 to 394 km2.
For subjects in the United States, we compared activity-space areas to the areas of the census blocks and tracts within which each subject resided, using U.S. census unit boundaries from 2000. We estimated the home census block for each subject as the one in which he or she spent at least four consecutive hours bracketing 3:00 a.m.9 We estimated the amount of time subjects spent in any given census block by summing the times elapsed between successive location estimates lying within that block. This is a modified version of the approach taken by Bartumeus et al. (2010) and Barraquand and Benhamou (2008), and it relies on the assumption that when successive points lie within the same block, the subject remained in that block during the intervening time. The approach ignores time spent between successive points that fall on either side of a block boundary as well as time spent in a census block prior to the subject’s first location estimate or subsequent to his or her last.
Using this approach, we can estimate home U.S. census blocks for 65 of the subjects for whom we calculated activity-space areas.10 We also identify the census tracts in which these blocks were located. The median home block area was 0.2 km2, with an interquartile range of 0.03 to 0.7 km2, and the median home tract area was 4.9 km2, with an interquartile range of 2.7 to 10.9 km2. In contrast, the median activity-space area for these subjects was 156.8 km2, with an interquartile range of 61.0 to 377.2 km2. If we take the ratio of each subject’s residential tract area to activity-space area, the median is 0.037 and the interquartile range is 0.009 to 0.155.
In terms of the percentage of each subject’s total participation time spent within his or her home tract, the median was 61 %, with an interquartile range of 23 % to 84 %. For home census blocks, the median was 38 %, and the interquartile range 15 % to 65 %. If we recalculate the percentages for home tracts excluding time spent in the home block itself, the median was 4 %, and the interquartile range 0 % to 36 %.
The racial compositions of residential census units are the ingredients of traditional spatial segregation measures (Massey and Denton 1988), and we used data from the 2000 census to calculate the proportion of white residents (PWR) in subjects’ home census block populations. For white subjects whose home blocks were estimated (n = 58), the median PWR was .89, with an interquartile range of .79 to .98. For black and Latino subjects combined (n = 9), it was .74, with an interquartile range of .70 to .77.
To move beyond the traditional focus on residence, we calculated, for each subject, the mean PWR in the census blocks in which that subject spent time, weighted by the amount of time spent in each block. Median results were similar to those based on residence, but the interquartile ranges move closer together, overlapping at .76 to .96 for whites versus .72 to .82 for blacks and Latinos. The ranges overlap further if we exclude time spent in the home block, although this requires a loss of data because some of these subjects spent insufficient time outside their home blocks to make the calculation possible. Subtracting home-block PWR from mean non-home-block PWR for each of these subjects yields a median difference of –.02 for whites (n = 51), with an interquartile range of –.10 to .06, and .01 for blacks and Latinos (n = 8), with an interquartile range of –.10 to .03. In other words, at least one-half of the subjects had greater exposure to members of their own race in their home blocks than outside them. However, the reverse was also true: when they left their home blocks, at least one-quarter of the whites spent time in higher-PWR areas, and one-quarter of blacks and Latinos spent time in lower-PWR areas. Thus, time spent outside the home block can both attenuate and intensify segregation.
Another way to view segregation outside the home block is in terms of the proportion of each subject’s time spent in census blocks of given racial compositions. Figure 7 summarizes this information for blocks grouped into quartiles by PWR, with time in the home block excluded. To increase sample size for these calculations, we consider all subjects who we observed for at least four hours in non-home census blocks (including subjects for whom we could not estimate a home census block). The biggest differences can be seen in the percentage of time spent in blocks with PWR greater than .75: for white subjects (n = 81), the median is 83 % of the subject’s time, with an interquartile range of 38 % to 99 %; for blacks (n = 5), it is 43 %, with an interquartile range of 0 % to 95 %; and for Latinos (n = 9), it is 7 %, with an interquartile range of 0 % to 25 %.
We do not attempt to analyze the extent to which these results are driven by spatial autocorrelation (cf. Zenk et al. 2005), and we recognize that our samples are small and likely biased. Nonetheless, it is apparent from these data how a dynamic, time-and-space approach to human movement can help us to advance beyond traditional measures of spatial segregation.
The approach also allows us to examine the link between subjective well-being and geographic context. During the study, the application sent a total of 485 surveys to subjects’ phones, with each survey triggered by the location of the phone and asking subjects to rate their happiness as between 0 (least happy) and 5 (most happy) in increments of 0.5. We received 232 responses with a pooled mean and median of, respectively, 3.8 and 4. Given the high share of rejections, it is possible that these results are biased upward by a correlation between subjects’ happiness and their willingness to answer the survey at a given time. In particular, it may be that subjects were more likely to respond when they were in a better mood. Leaving that concern aside, however, we explored the relationship between subjects’ self-reported happiness and their distance from home.
Having already estimated census blocks of residence, we inferred that each subject’s home was at the point where the bivariate normal kernel density estimate of the subject’s locations within the home census block reached its maximum value (Venables and Ripley 2002).11 We then calculated how far each subject was from home when sending each survey response. In total, we analyzed 168 survey responses from 36 subjects (Fig. 8; color figure available online).12
We found that male subjects tended to describe themselves as less happy the farther they were from their homes, whereas there was no statistically discernible relationship between female subjects and distance.14 This pattern held even when we controlled for subjects’ race. Each 10 km increase in distance from home is associated with a decrease in reported male happiness of between 0.6 and 1.1 points, estimated with 95 % confidence (Fig. 9). This finding is also robust to alternate model specifications, including pooled-subject linear models and ordered logit models.
Figure 9 displays the relationship between distance from home and observed and predicted happiness responses in the multilevel model for white, Latino, and black men and women. The lines in panels a–c are drawn using coefficients estimated in each of 999 final iterations of the Gibbs sampler, saved after the sequences reached approximate convergence, in order to indicate uncertainty (see Gelman and Hill 2007). The lines in panel d of Fig. 9 are drawn using the mean estimates.
Our study demonstrates that it is possible to obtain detailed information on the locations, demographic characteristics, and context-dependent responses of people all over the world simply by having them download and run an application on their mobile phones. It illustrates the great potential in adopting an approach that lies somewhere between the large-scale collection of coarse mobile network operator data and the small-scale collection of detailed data from devices that have been distributed to subjects or pre-positioned in bounded research areas. By capturing more subject-specific information than the former while offering greater scalability than the latter, this middle approach is well suited for pursuing a wide range of demographic inquiries. The capacity of this approach to incorporate dynamic, location-based surveys is particularly valuable for a range of research questions that depend on subjects’ descriptions of their surroundings, activities, or mental states as they are experiencing them.
The study also helps to clarify many of the methodological issues that must be confronted in this type of research. Representativeness and generalizability are clearly concerns when subjects self-select into a study that requires them to possess a smartphone and register on the Internet because there are systematic differences in mobile phone and Internet use across demographic groups (Blumenstock et al. 2010; Goodchild 2007). However, traditional survey methods present their own problems of representativeness and generalizability. Although the problems may be more serious when using mobile phone methods for certain purposes, the difference is one of degree, and it is likely to shrink with time as the technology becomes more widely accessible. Moreover, mobile phone methods alleviate many of the problems presented by traditional methods, inducing participation by people who would not be willing to respond to paper surveys or interviews and minimizing reporting errors and logistical complications. Indeed, there may be much to be gained from using mobile phone methods in combination with traditional ones such that their strengths complement each other.
Another issue that must be confronted is privacy. One of the major complications of mobile phone localization studies is that the spatially explicit data they produce can be difficult to anonymize properly, particularly if the data are linked to information on social characteristics and if the goal is to make data available publicly (Gutmann et al. 2008; VanWey et al. 2005). In our pilot study, access to the raw data we collected is limited to our research group, with one exception: we gave subjects online access to their own location data (as well as the option for them to request that we delete their data). In terms of processed data, we allowed subjects to view online static maps showing aggregate survey responses from all subjects, and we have presented additional aggregate data in the present article. The only individual-level data we have presented here has been transformed to make it impossible to identify any subject or the actual locations they visited during the study.15
From a substantive perspective, the study provides strong evidence of the importance of moving beyond static, census-unit measurements when investigating the relationship between people and place (Entwisle 2007). We found a great deal of variation in the sizes of subjects’ home census tracts, with the top quartile living in tracts more than four times larger than those of the bottom quartile. This alone should make one cautious about relying too heavily on tract-based measures, as Matthews (2011) argued. The problem is made even clearer by our finding that the residential census tract accounts for less than 15.5 % of the total area in which three-quarters of the subjects moved and less than 3.7 % for one-half of them. This is consistent with findings by Matthews (2011) and Basta et al. (2010), as well as earlier work by Foley (1950) and others (see Matthews 2008). If we care about the environment to which these people were exposed, their spatial choices, or the places they may have influenced through their presence or activities (Entwisle 2007), their residential census tracts alone tell only a small part of the story.
Incorporating time into the analysis gives more weight to home tracts but leads to the same conclusion. One-half of the subjects we measured spent less than 61 % of the study time within their census tracts of residence, and one-quarter spent less than 23 % of the time in these tracts. Moreover, while in their census tracts of residence, subjects spent most of this time in their census blocks of residence—and much of it presumably in their homes (Basta et al. 2010). These subjects’ census blocks of residence accounted for a small fraction of the areas of their census tracts of residence—less than 9 % for three-fourths of them—and if we exclude time spent in the census blocks of residence, one-half of the subjects spent less than 4 % of the time in their census tracts of residence, and one-quarter spent none of the time in these tracts. In other words, the residential census tracts are simultaneously overinclusive and underinclusive.
Mobile phone location tracking helps us overcome both problems by tightening measures of local environment to each individual’s immediate surroundings while also letting these measures follow the individual as he or she moves through space. This gives us a dynamic “personal exposure” (Chaix et al. 2009) that includes much more information than residence-based measures. The racial composition of the populations our subjects encountered outside their home blocks differed notably from the racial composition of their home blocks themselves. Incorporating the non-home blocks into our analysis expanded our view of segregation and interracial exposure.
This is only part of what a moving local environment allows us to accomplish. We used U.S. census information on residential racial compositions as our environmental variable, but with a larger and more concentrated sample, we could estimate the changing racial composition (or other characteristics) of each location based on the sampled individuals themselves. In this way, we would not only temporally refine our measure of exposure but also illuminate the ways in which individuals choose the places they visit and modify the characteristics of these places through their presence and activities (Entwisle 2007).
Finally, just as we can tighten measures of local context around individuals as they move, so too can we tighten our measures of individuals’ dynamic responses to this context. Our location-based survey made it possible to identify significant differences between male and female subjects in terms of the relationship between subjective well-being and distance from home. That male subjects tended to respond differently (reporting lower happiness) the farther they were from home is additional evidence of the limits of residential measures. Had we surveyed these people only in their homes, our picture would have been much less complete.
Our work opens the door to new possibilities for incorporating mobile phones into demographic research. We explored the potential that these new technologies have for better understanding activity space, segregation outside the home, the geographic context of subjective well-being and perhaps many other aspects of human behavior. That we were able to recruit 270 volunteers in 13 countries, examine their demographic characteristics and movement patterns, link these to census block data, and then have subjects respond in real time to surveys triggered by their locations, all in a low-cost and time-efficient manner, suggests the power of these new methodologies in a more broad-based inquiry.
The authors thank Hazer Inaltekin, Spencer Lucian, and David Potere for their contributions to this project during its initial phases, and Matthew Salganik for his invaluable guidance and contributions throughout. The work was funded by a grant from the Center for Information Technology Policy at Princeton University. Institutional support was provided by National Institutes of Health Training Grant T32HD07163 and Infrastructure Grant R24HD047879.
The difference is between selecting people and observing the space through which they move and selecting places and observing the people who move through them. This is a basic choice that must be confronted in studying any collection of moving objects, and it is often the case that the first approach (which mobile phones facilitate) is more illuminating but also harder to implement (Ōkubo and Levin 2001).
We use “race” throughout the article to refer to both racial and ethnic self-identification. Subjects could identify as “White,” “Black, African, African American,” “Spanish/Hispanic/Latino,” “Asian,” “American Indian or Alaska Native,” “Pacific Islander,” or “Other,” and we treat these as nonoverlapping categories for simplicity.
We gave subjects the ability to select the interval.
We discarded estimates with Android-reported accuracy measurements of more than 5 km and those that would have required subjects to have been traveling faster than 54 m per second.
The centroid is calculated as the mean longitude and mean latitude for each subject. A small amount of random noise is added to each centroid to aid in visualization and protect subjects’ privacy, using the random perturbation masking technique described in Armstrong et al. (1999).
We did this by applying what Armstrong et al. (1999) referred to as a “scale transformation mask.” We multiplied each subject’s matrix of location estimates by a randomly generated number. Although one might assume that it is sufficient to simply remove coordinates from the axes, the transformation is necessary because depending on how the graphs are generated, their files may contain underlying coordinate data that can be extracted. We used a multiplicative transformation, rather than an additive one, to make it impossible to reconstruct the original coordinates using scale information. Although this transformation does not preserve actual distances between points, it is possible to retain approximate map scale by measuring and transforming the actual distances between each subject’s maximum and minimum latitude and longitude coordinates.
When there was more than one such block, we used the one that occurred most frequently in the data.
In addition to excluding responses from subjects whose home locations were not estimated (meaning all outside the United States), we also excluded one extreme outlier in terms of distance from home.
These included variables for sex, race (black, white, and Latino were the only responses given in the data analyzed), distance from home, and the interaction between distance from home and sex. Additional models were tested using variables for time of day, weekend, working hours (9:00 a.m.–5:00 p.m.), and census block racial characteristics, but these variables did not have clear relationships to the survey responses or sufficiently improve model fit to warrant inclusion in the final analysis.
These conclusions are based on our estimated coefficients on the variable interacting sex with distance from home.