Replication instructions for Digital Financial Services Go a Long Way: Transaction Costs and Financial Inclusion

Introduction

This document explains how to replicate the results in Bachas et al. (2018a). The paper can be downloaded from https://www.aeaweb.org/articles?id=10.1257/pandp.20181013, and the replication files can be downloaded from https://www.aeaweb.org/doi/10.1257/pandp.20181013.data. Because some results in the paper use publicly available data while others use confidential administrative data, some results can be immediately replicated using our replication files, but replicating the remaining results requires obtaining access to administrative data.

Folder structure

After downloading the replication code and data, unzip it into a folder on your computer. After unzipping, the parent folder containing the folders described below should be thought of as the main folder when editing the scripts to run on your local computer. The folders inside of the zipped replication file should be placed in a folder that you will denote as the global $main in the Stata .do files and as the working directory using setwd() in the .R scripts. These folders are:

adofiles contains the .ado files required to run our Stata scripts.
data contains the raw data that is publicly available. To replicate the portions of the paper using confidential data, the raw confidential data should also be placed in data.
- data/shapefiles contains the shapefiles
- data/shapefiles/Roads/build (currently empty) will contain the Open Street Map files
- Other raw data files are included directly in data (i.e., not in a subfolder)
graphs (initially empty) is the folder to which graphs produced by the scripts will be written.
logs (initially empty) is the folder in which log files will be written.
proc (initially empty) is the folder in which processed data sets will be saved.
scripts contains the replication code.
waste (initially empty) saves temporary files produced as part of the data preparation.
- waste/block_distances (initially empty) saves temporary files for distance calculations

Data

The replication files use the following data sets.

Administrative data from Bansefi

Administrative data from Bansefi is confidential, and hence is not provided with the replication data. The contact person to request access is Ana Lilia Urquieta, alurquieta@bansefi.gob.mx. The following data from Bansefi is used:

Saldos Promedio Cuentahorro.dta are average balances from accounts without debit cards from 2009-2011, provided to the authors in 2012.
Saldos Promedio Debicuenta.dta are average balances from accounts with debit cards from 2009-2011, provided to the authors in 2012.
SP_*.txt are average monthly balances from Bansefi accounts for additional years, where the * wildcard denotes the year of the data set. These provided to the authors in 2015.
DatosGenerales.txt is an account-level data set with information about each account, provided to the authors in 2015.
MOV*.txt are raw transactions-level data, where the * wildcard denotes the year of the data set. These provided to the authors in 2015.

Administrative data from Oportunidades

Administrative data from Oportunidades (now Prospera) is confidential, and hence is not provided with the replication data. The contact person to request access is Rogelio Grados, rogelio.grados@prospera.gob.mx. The following data from Prospera is used:

620mil_cuentas_con_MZ.dbf, which maps account numbers in Bansefi’s data to the Census block on which that beneficiary lives.

Shapefiles of Census blocks

We use the shapefiles giving the polygon corresponding to each Census block, which allows us to calculate the block centroid and use this as an approximation of the beneficiary’s location. These data are publicly available, but because all results using this data combine it with confidential data, and because these shapefiles are large, we do not include them in the replication data.

The Census block shapefiles are publicly available from Mexico’s National Statistical Institute (INEGI); we use the version that has been compiled by Diego Valle Jones. To download them, go to https://blog.diegovalle.net/2013/06/shapefiles-of-mexico-agebs-manzanas-etc.html and enter your email to receive the files in your email. The zip file you receive should have a subfolder called scince_2010. Place this folder inside of data/shapefiles/ to run the replication code.

Road network from Open Street Maps

The road network we use to calculate road distances is publicly available from Open Street Maps. The necessary file, mexico-latest.osm.pbf, can be downloaded from https://download.geofabrik.de/north-america/mexico.html. We do not include it with the replication data because it is a large file. Once downloaded, mexico-latest.osm.pbf should be placed in the folder data/shapefiles/Roads/build. To prepare the maps it is necessary to run osrminstall.cmd and osrmprepare by Huber and Rust (2016). This process is included in the do file account_block_merge.do if you set local zero_run = 1 and local first_run = 1 under // Control center. (They are currently set to 0; if rerunning the do file after initially building the maps, you will want to set them to 0 to not repeat the slow process of building the maps each time.)

Note that the map may have changed slightly relative to when we downloaded it, and hence replication results could differ slightly (but we do not expect any substantive changes). Users who have obtained access to the other administrative data and wish to exactly replicate the results in our paper should contact us and we will share the exact version of mexico-latest.osm.pbf we used.

Payment Methods Survey

The Payment Methods Survey, conducted by Oportunidades, is publicly available from https://evaluacion.prospera.gob.mx/es/eval_cuant/p_bases_cuanti.php. We have included the data set, medios_pago_titular_beneficiarios.dta, in the data folder.

Geocoordinates of ATMs

This data set, Geocoordinates_ATMs_Mexico.csv, was constructed by the authors based on publicly available information on the location of ATMs from banks’ websites. It is included in the data folder and is also available from https://doi.org/10.7910/DVN/U27JIM.

Geocoordinates of Bansefi branches

This data set, Geocoordinates_Bansefi_branches.csv, was constructed by the authors based on publicly available information on Bansefi locations. It is included in the data folder and is also available from https://doi.org/10.7910/DVN/DT81PP.

Shapefiles for Cuernavaca

The road shapefiles for Cuernavaca, used for the example map in Figure 2, are publicly available from INEGI. They are included in the data/shapefiles folder. There are two types sets of shapefiles:

The 170070001l.* files contain the polygon that is the border of Cuernavaca locality. (170070001 is INEGI’s code for Cuernavaca locality.)
The 170070001v.* files contain the roads in Cuernavaca.

Scripts

More thorough descriptions of each script than those provided here are included in scripts/0_master.do.

Preliminary code

Three files in the scripts folder do not need to be run directly:

0_master.do describes each script in the order they should be run to produce all results, as well as the input and output data sets of each script.
server_header.doh sets global macros for folders, and tells Stata where to look for user-written .ado files included with the replication code. This script is called automatically by the .do files.
myfunctions.R includes functions used by the other .R scripts, and is called automatically by those scripts.

The scripts include both Stata and R code. The code has most recently been tested on Stata version 14.2 and R version 3.4.2. For Stata, any necessary user-written ado files are included in the adofiles folder, and the .do files include code to automatically look for commands in this folder. For R, the user will need to install packages used by the scripts (always under a # PACKAGES header) using install.packages().

Administrative data preparation

Inputs: raw administrative data from Bansefi. The data must be requested from Bansefi to run this replication code. The scripts are:

1_saldos_dataprep.do imports raw average balance data
2_datosgenerales_dataprep.do imports raw account-level data
3_movimientos_dataprep.do imports raw transactions data
4_sucursales_dataprep.do creates a data set with the client, account, and branch IDs for future merges
5_bimswitch.do calculates the bimester (2-month period) during which each beneficiary received a card
6_avgbal.do creates a data set with average balances by bimester
7_transactions_redef_bim.do creates a data set with transactions by redefined bimester, where the redefined bimester accounts for payments shifted to the end of the previous bimester, as described in Bachas et al. (2018b).
8_mechanical_effect.do calculates the account-level mechanical effect, as described in Bachas et al. (2018b).
9_net_savings.do calculates net savings by bimester
10_balance_checks.do creates a data set of balance checks at the transaction level
11_withdrawals_deposits_group.do groups transactions by bimester
12_transaction_bimester_relative.do creates a data set on balance checks, deposits, and withdrawals by bimester relative to receiving a card

Distance calculations

Inputs: raw administrative data from Bansefi and Prospera. The data must be requested from Bansefi and Prospera to run this replication code. The scripts are:

13_account_block_dataprep.R reads in raw data from Prospera with the account to Census block mapping and creates a data set of unique Census blocks on which beneficiaries live
14_block_centroids.R uses Census block shapefiles to calculate centroids of all Census blocks
15_distances_calculate.do merges block centroids with ATM and branch coordinates and calculates road distances
16_distances_append.do appends together the files with distances, which was done in chunks

Distance densities (Figure 1)

Inputs: processed data sets from above scripts. Because this replication code depends on previous code, administrative data must be requested from Bansefi and Prospera before running these scripts. The scripts are:

17_distances_density.do graphs kernel density estimates of households’ distance to the nearest ATM and nearest Bansefi bank branch.

Example locality map (Figure 2)

Inputs: publicly available ATM and branch geocoordinates and road shapefiles. This code does not depend on any of the previous code, and hence can be run on its own to replicate our map. The replication version maps roads, ATMs, and Bansefi branch since these are based on public data. It does not map the household location. The scripts are:

18_locality_example_map.R maps all ATMs, the Bansefi branch, and all roads in Cuernavaca locality.

The map produced by this script includes more ATMs than Figure 2 of our paper, due to a bug in the code we used to determine which ATMs were in the locality of Cuernavaca (based on the ATMs’ geocoordinates). The 18_locality_example_map.R script instead uses the new sf::st_join (Pebesma 2018). The error only affects the map in Figure 2, and not any of the results in the paper, since our distance calculations use the full set of ATMs in the country (not restricting to particular localities, as we do for this map). The map produced by this replication code is included below:

Results from survey (Figure 3)

Inputs: Payment Methods Survey (questionnaire of beneficiaries), conducted by Oportunidades. This code does not depend on any of the previous code and uses publicly available data included in our replication files, and hence can be run on its own to replicate Figure 3. The scripts are:

19_survey_questions.do graphs transport taken and activity forgone to withdraw the transfer, before and after receiving a card

Results from administrative data and distances (Figures 4, 5, 6)

20_transaction_distance.do correlates changes in withdrawals and number of balance checks with distance gains to access account
21_savings_distance.do correlates changes in savings with distance gains to access account

Other files

`.ado` files

The .ado files in the adofiles folder are called automatically by the .do files that use them. These files are:

_gbom.ado from the egenmore package to generate first day of month (Cox 2016)
_geom.ado from the egenmore package to generate last day of month (Cox 2016)
bimestrify.ado for data cleaning (written by us)
fre.ado for ordered tabulations (Jann 2007)
geonear.ado to determine closest ATMs by geodesic distance (Picard 2012), which is an input for our dimensionality-reduction algorithm when calculating road distances
stringify.ado to convert to string and add padding (written by us)
time.ado to timestamp log and other output files (written by us)
uniquevals.ado to calculate number of unique observations (written by us)

The osrmprepare and osrmtime ado files (Huber and Rust 2016) are not included; they are installed directly as part of 15_distances_calculate.do since there are a number of ancillary files as well.

`.here`

The .here file in the main folder is included to enable the here::here() function in R to work with relative file paths.

`README`

This README file and supporting files:

README.html (can be opened in a browser and looks cleaner than the pdf)
README.pdf
README.Rmd is the original source code for generating the README in R Markdown.
README_bib.bib contains the bibliographical references for the README

Replication instructions

To replicate Figure 2, which should appear identical to the version included above:

Open 18_locality_example_map.R
Uncomment the line with setwd() by removing the #, then replace the path in quotation marks with the path to the replication folder on your computer (i.e., the folder that is a direct parent to the data and scripts folders).
If the packages under # PACKAGES are not already installed, install them with the following code:
```
install.packages(c("sf", "tidyverse", "magrittr", "here"))
```
Note that ggplot2 >= 3.0.0 is required, for geom_sf(). Since ggplot2 is included in tidyverse, if you install.packages("tidyverse") you will get the latest version of ggplot2 (Wickham 2016).
Run the revised 18_locality_example_map.R file in R.

To replicate Figure 3:

Open 19_survey_questions.do
Uncomment the commented-out line with global main by removing the **, then replace the path in quotation marks with the path to the replication folder on your computer (i.e., the folder that is a direct parent to the data and scripts folders).
Run the revised 19_survey_questions.do in Stata.

To replicate Figures 1, 4, 5, and 6, which use confidential administrative data:

After requesting and obtaining the administrative data, make sure the data sets have the same names as those used in the replication scripts, and place them in the data folder.
Repeat the instructions above to specify your file path in each script.
For R scripts, install additional packages as needed (the packages are always listed under # PACKAGES)
Run the replication scripts in the order indicated by the numbers in their file names.

Contact

The replication code was written by Pierre Bachas and Sean Higgins, who can be contacted at pbachas@worldbank.org and seanhiggins@berkeley.edu.

References

Bachas, Pierre, Paul Gertler, Sean Higgins, and Enrique Seira. 2018a. “Digital Financial Services Go a Long Way: Transaction Costs and Financial Inclusion.” American Economic Association Papers & Proceedings 108: 444–48.

———. 2018b. “How Debit Cards Enable the Poor to Save More.” NBER Working Paper 23252.

Cox, Nicholas. 2016. “EGENMORE: Stata modules to extend the generate function.” Statistical Software Components, Boston College Department of Economics.

Huber, Stephan, and Cristoph Rust. 2016. “Calculate Travel Time and Distance with OpenStreetMap Data Using the Open Source Routing Machine (OSRM).” Stata Journal 16: 416–23.

Jann, Ben. 2007. “FRE: Stata module to display one-way frequency table.” Statistical Software Components, Boston College Department of Economics.

Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10: 439–46.

Picard, Robert. 2012. “GEONEAR: Stata Module to Find Nearest Neighbors Using Geodetic Distances.” Statistical Software Components, Boston College Department of Economics.

Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag.

Replication instructions for “Digital Financial Services Go a Long Way: Transaction Costs and Financial Inclusion”

Pierre Bachas, Paul Gertler, Sean Higgins, Enrique Seira

Introduction

Folder structure

Data

Administrative data from Bansefi

Administrative data from Oportunidades

Shapefiles of Census blocks

Road network from Open Street Maps

Payment Methods Survey

Geocoordinates of ATMs

Geocoordinates of Bansefi branches

Shapefiles for Cuernavaca

Scripts

Preliminary code

Administrative data preparation

Distance calculations

Distance densities (Figure 1)

Example locality map (Figure 2)

Results from survey (Figure 3)

Results from administrative data and distances (Figures 4, 5, 6)

Other files

`.ado` files

`.here`

`README`

Replication instructions

Contact

References

Replication instructions for “Digital Financial Services Go a Long Way: Transaction Costs and Financial Inclusion”

Pierre Bachas, Paul Gertler, Sean Higgins, Enrique Seira

Introduction

Folder structure

Data

Administrative data from Bansefi

Administrative data from Oportunidades

Shapefiles of Census blocks

Road network from Open Street Maps

Payment Methods Survey

Geocoordinates of ATMs

Geocoordinates of Bansefi branches

Shapefiles for Cuernavaca

Scripts

Preliminary code

Administrative data preparation

Distance calculations

Distance densities (Figure 1)

Example locality map (Figure 2)

Results from survey (Figure 3)

Results from administrative data and distances (Figures 4, 5, 6)

Other files

.ado files

.here

README

Replication instructions

Contact

References

`.ado` files

`.here`

`README`