Methodology
The JPMorgan Chase Institute conducted the analyses in this brief using de-identified data on credit and debit card transactions of residents in the Detroit and New York metropolitan areas. The sample period is restricted to the second quarter of each year in the 2013 to 2016 period. We chose quarterly observations to avoid the volatility we observed in monthly observations, and the second quarter of 2016 was the most recent quarter for which we had complete data at the time of the analysis. The sample population is restricted to residents of the central city inside each metropolitan area, but the merchants they frequent may be anywhere inside said metropolitan area. The final sample captures over 197 million transactions from more than 1 million residents in the 12 months we included.
Each transaction record in the data carries the zip codes of both the resident and the merchant as attributes. The resident's zip code captures the location of their home, as opposed to their place of work. The distance associated with a single transaction is the distance in miles between the centroid of the resident’s zip code and the centroid of the merchant’s zip code.
The distances reported as “typical” represent the median distance in a given distribution. Typical “merchant distances” represent the median distance between residents and the merchants they have chosen to patronize. To identify this value, we first sort distances in the relevant intersection (e.g. all transactions in Detroit in Q2 2016). Within this intersection, we count the total number of transactions for each resident-merchant zip code pairing. We also calculate the cumulative transaction total across these zip code pairings and the total number of transactions within the intersection. The percentile value associated with each distance is calculated as the ratio of the cumulative sum to the total sum of all transactions in the intersection. Once percentile values are associated with each distance in the intersection, we identify the median as the first distance that has a percentile value over 50. Other quartiles are identified in a similar fashion.
The median distance per transaction (or merchant distance) is a useful metric that adds more information than a hard boundary at a fixed distance from a given point. The boundary can tell us whether or not a merchant is within a 20-minute walk, but it provides no information about how far inside or outside of the boundary said merchant resides. However, our implementation of the approach does have limitations.
1. Centroid-to-centroid distance is used instead of address-to-address distance because latitude and longitude information is removed for privacy reasons before the data reach the Institute. If the resident and merchant reside close to the centroid of their respective zip codes, the measurement error associated with a centroid-to-centroid measurement will be small. If the resident and merchant reside close to the borders of their respective zip codes, material errors in distance measurement can arise. The potential size of these errors grows with the size of the zip code. The risk of this type of measurement error is larger in Detroit, which features an average zip code size of 18.9 square miles, than it is in New York, which features an average zip code size of 9.7 square miles. However, we have no a priori rationale supporting the belief that the distribution of residents and merchants is systemically biased towards or away from the centroid in each zip code. If we assume that residents and merchants are randomly distributed within the zip code, said assumption implies that our distance measurements across the entire sample are unbiased (even if individual measurements contain error). The 20-minute neighborhood approach avoids this particular type of measurement error altogether by not attempting to provide the information contained in a continuous measure of distance.
2. Merchants may be spatially arranged in a way that promotes “trip chaining”, which is the practice of patronizing multiple stores in a single trip. Since we treat each transaction equally, we do not account for short trips between merchants, within a single shopping excursion. Again, the 20-minute neighborhood approach avoids this particular type of measurement error by not attempting to provide the information contained in a continuous measure of distance. In general, transactions and their associated weights do not come into play.
3. Our measurement frame is focused on the relationship between the resident’s home and the merchant’s location. Even if residents do not live near merchants they wish to patronize, they may work close to them. Since we do not observe where residents work, it is possible we may overstate the marginal effort residents must expend to reach merchants. In this case, the 20-minute approach suffers from the same vulnerability. It is less important to have a grocery store in your neighborhood if one is easily accessible from your place of employment.
The limitations of the median “centroid-to-centroid distance” approach are far outweighed by the information contained in a continuous measure of the potential distance residents must travel, insofar as it offers at least a rough bucketing of distance that can be used for triage. Stark and persistent differences across zip codes and segments of the population are clear enough signals for action.