This document explains how to replicate the results in Bachas et al. (2018a). The paper can be downloaded from https://www.aeaweb.org/articles?id=10.1257/pandp.20181013, and the replication files can be downloaded from https://www.aeaweb.org/doi/10.1257/pandp.20181013.data. Because some results in the paper use publicly available data while others use confidential administrative data, some results can be immediately replicated using our replication files, but replicating the remaining results requires obtaining access to administrative data.
After downloading the replication code and data, unzip it into a folder on your computer. After unzipping, the parent folder containing the folders described below should be thought of as the main
folder when editing the scripts to run on your local computer. The folders inside of the zipped replication file should be placed in a folder that you will denote as the global $main
in the Stata .do
files and as the working directory using setwd()
in the .R
scripts. These folders are:
adofiles
contains the .ado
files required to run our Stata scripts.data
contains the raw data that is publicly available. To replicate the portions of the paper using confidential data, the raw confidential data should also be placed in data
.
data/shapefiles
contains the shapefilesdata/shapefiles/Roads/build
(currently empty) will contain the Open Street Map filesdata
(i.e., not in a subfolder)graphs
(initially empty) is the folder to which graphs produced by the scripts will be written.logs
(initially empty) is the folder in which log files will be written.proc
(initially empty) is the folder in which processed data sets will be saved.scripts
contains the replication code.waste
(initially empty) saves temporary files produced as part of the data preparation.
waste/block_distances
(initially empty) saves temporary files for distance calculationsThe replication files use the following data sets.
Administrative data from Bansefi is confidential, and hence is not provided with the replication data. The contact person to request access is Ana Lilia Urquieta, alurquieta@bansefi.gob.mx. The following data from Bansefi is used:
Saldos Promedio Cuentahorro.dta
are average balances from accounts without debit cards from 2009-2011, provided to the authors in 2012.Saldos Promedio Debicuenta.dta
are average balances from accounts with debit cards from 2009-2011, provided to the authors in 2012.SP_*.txt
are average monthly balances from Bansefi accounts for additional years, where the *
wildcard denotes the year of the data set. These provided to the authors in 2015.DatosGenerales.txt
is an account-level data set with information about each account, provided to the authors in 2015.MOV*.txt
are raw transactions-level data, where the *
wildcard denotes the year of the data set. These provided to the authors in 2015.Administrative data from Oportunidades (now Prospera) is confidential, and hence is not provided with the replication data. The contact person to request access is Rogelio Grados, rogelio.grados@prospera.gob.mx. The following data from Prospera is used:
620mil_cuentas_con_MZ.dbf
, which maps account numbers in Bansefi’s data to the Census block on which that beneficiary lives.We use the shapefiles giving the polygon corresponding to each Census block, which allows us to calculate the block centroid and use this as an approximation of the beneficiary’s location. These data are publicly available, but because all results using this data combine it with confidential data, and because these shapefiles are large, we do not include them in the replication data.
The Census block shapefiles are publicly available from Mexico’s National Statistical Institute (INEGI); we use the version that has been compiled by Diego Valle Jones. To download them, go to https://blog.diegovalle.net/2013/06/shapefiles-of-mexico-agebs-manzanas-etc.html and enter your email to receive the files in your email. The zip file you receive should have a subfolder called scince_2010
. Place this folder inside of data/shapefiles/
to run the replication code.
The road network we use to calculate road distances is publicly available from Open Street Maps. The necessary file, mexico-latest.osm.pbf
, can be downloaded from https://download.geofabrik.de/north-america/mexico.html. We do not include it with the replication data because it is a large file. Once downloaded, mexico-latest.osm.pbf
should be placed in the folder data/shapefiles/Roads/build
. To prepare the maps it is necessary to run osrminstall.cmd
and osrmprepare
by Huber and Rust (2016). This process is included in the do file account_block_merge.do
if you set local zero_run = 1
and local first_run = 1
under // Control center
. (They are currently set to 0; if rerunning the do file after initially building the maps, you will want to set them to 0 to not repeat the slow process of building the maps each time.)
Note that the map may have changed slightly relative to when we downloaded it, and hence replication results could differ slightly (but we do not expect any substantive changes). Users who have obtained access to the other administrative data and wish to exactly replicate the results in our paper should contact us and we will share the exact version of mexico-latest.osm.pbf
we used.
The Payment Methods Survey, conducted by Oportunidades, is publicly available from https://evaluacion.prospera.gob.mx/es/eval_cuant/p_bases_cuanti.php. We have included the data set, medios_pago_titular_beneficiarios.dta
, in the data
folder.
This data set, Geocoordinates_ATMs_Mexico.csv
, was constructed by the authors based on publicly available information on the location of ATMs from banks’ websites. It is included in the data
folder and is also available from https://doi.org/10.7910/DVN/U27JIM.
This data set, Geocoordinates_Bansefi_branches.csv
, was constructed by the authors based on publicly available information on Bansefi locations. It is included in the data
folder and is also available from https://doi.org/10.7910/DVN/DT81PP.
More thorough descriptions of each script than those provided here are included in scripts/0_master.do
.
Three files in the scripts
folder do not need to be run directly:
0_master.do
describes each script in the order they should be run to produce all results, as well as the input and output data sets of each script.server_header.doh
sets global macros for folders, and tells Stata where to look for user-written .ado
files included with the replication code. This script is called automatically by the .do
files.myfunctions.R
includes functions used by the other .R
scripts, and is called automatically by those scripts.The scripts include both Stata and R code. The code has most recently been tested on Stata version 14.2 and R version 3.4.2. For Stata, any necessary user-written ado files are included in the adofiles
folder, and the .do
files include code to automatically look for commands in this folder. For R, the user will need to install packages used by the scripts (always under a # PACKAGES
header) using install.packages()
.
Inputs: raw administrative data from Bansefi. The data must be requested from Bansefi to run this replication code. The scripts are:
1_saldos_dataprep.do
imports raw average balance data2_datosgenerales_dataprep.do
imports raw account-level data3_movimientos_dataprep.do
imports raw transactions data4_sucursales_dataprep.do
creates a data set with the client, account, and branch IDs for future merges5_bimswitch.do
calculates the bimester (2-month period) during which each beneficiary received a card6_avgbal.do
creates a data set with average balances by bimester7_transactions_redef_bim.do
creates a data set with transactions by redefined bimester, where the redefined bimester accounts for payments shifted to the end of the previous bimester, as described in Bachas et al. (2018b).8_mechanical_effect.do
calculates the account-level mechanical effect, as described in Bachas et al. (2018b).9_net_savings.do
calculates net savings by bimester10_balance_checks.do
creates a data set of balance checks at the transaction level11_withdrawals_deposits_group.do
groups transactions by bimester12_transaction_bimester_relative.do
creates a data set on balance checks, deposits, and withdrawals by bimester relative to receiving a cardInputs: raw administrative data from Bansefi and Prospera. The data must be requested from Bansefi and Prospera to run this replication code. The scripts are:
13_account_block_dataprep.R
reads in raw data from Prospera with the account to Census block mapping and creates a data set of unique Census blocks on which beneficiaries live14_block_centroids.R
uses Census block shapefiles to calculate centroids of all Census blocks15_distances_calculate.do
merges block centroids with ATM and branch coordinates and calculates road distances16_distances_append.do
appends together the files with distances, which was done in chunksInputs: processed data sets from above scripts. Because this replication code depends on previous code, administrative data must be requested from Bansefi and Prospera before running these scripts. The scripts are:
17_distances_density.do
graphs kernel density estimates of households’ distance to the nearest ATM and nearest Bansefi bank branch.Inputs: publicly available ATM and branch geocoordinates and road shapefiles. This code does not depend on any of the previous code, and hence can be run on its own to replicate our map. The replication version maps roads, ATMs, and Bansefi branch since these are based on public data. It does not map the household location. The scripts are:
18_locality_example_map.R
maps all ATMs, the Bansefi branch, and all roads in Cuernavaca locality.The map produced by this script includes more ATMs than Figure 2 of our paper, due to a bug in the code we used to determine which ATMs were in the locality of Cuernavaca (based on the ATMs’ geocoordinates). The 18_locality_example_map.R
script instead uses the new sf::st_join
(Pebesma 2018). The error only affects the map in Figure 2, and not any of the results in the paper, since our distance calculations use the full set of ATMs in the country (not restricting to particular localities, as we do for this map). The map produced by this replication code is included below:
Inputs: Payment Methods Survey (questionnaire of beneficiaries), conducted by Oportunidades. This code does not depend on any of the previous code and uses publicly available data included in our replication files, and hence can be run on its own to replicate Figure 3. The scripts are:
19_survey_questions.do
graphs transport taken and activity forgone to withdraw the transfer, before and after receiving a cardInputs: processed data sets from above scripts. Because this replication code depends on previous code, administrative data must be requested from Bansefi and Prospera before running these scripts. The scripts are:
20_transaction_distance.do
correlates changes in withdrawals and number of balance checks with distance gains to access account21_savings_distance.do
correlates changes in savings with distance gains to access account.ado
filesThe .ado
files in the adofiles
folder are called automatically by the .do
files that use them. These files are:
_gbom.ado
from the egenmore
package to generate first day of month (Cox 2016)_geom.ado
from the egenmore
package to generate last day of month (Cox 2016)bimestrify.ado
for data cleaning (written by us)fre.ado
for ordered tabulations (Jann 2007)geonear.ado
to determine closest ATMs by geodesic distance (Picard 2012), which is an input for our dimensionality-reduction algorithm when calculating road distancesstringify.ado
to convert to string and add padding (written by us)time.ado
to timestamp log and other output files (written by us)uniquevals.ado
to calculate number of unique observations (written by us)The osrmprepare
and osrmtime
ado files (Huber and Rust 2016) are not included; they are installed directly as part of 15_distances_calculate.do
since there are a number of ancillary files as well.
.here
The .here
file in the main folder is included to enable the here::here()
function in R to work with relative file paths.
README
This README file and supporting files:
README.html
(can be opened in a browser and looks cleaner than the pdf)README.pdf
README.Rmd
is the original source code for generating the README
in R Markdown.README_bib.bib
contains the bibliographical references for the README
To replicate Figure 2, which should appear identical to the version included above:
18_locality_example_map.R
setwd()
by removing the #
, then replace the path in quotation marks with the path to the replication folder on your computer (i.e., the folder that is a direct parent to the data
and scripts
folders).If the packages under # PACKAGES
are not already installed, install them with the following code:
install.packages(c("sf", "tidyverse", "magrittr", "here"))
Note that ggplot2 >= 3.0.0
is required, for geom_sf()
. Since ggplot2
is included in tidyverse
, if you install.packages("tidyverse")
you will get the latest version of ggplot2
(Wickham 2016).Run the revised 18_locality_example_map.R
file in R.
To replicate Figure 3:
19_survey_questions.do
global main
by removing the **
, then replace the path in quotation marks with the path to the replication folder on your computer (i.e., the folder that is a direct parent to the data
and scripts
folders).19_survey_questions.do
in Stata.To replicate Figures 1, 4, 5, and 6, which use confidential administrative data:
data
folder.# PACKAGES
)The replication code was written by Pierre Bachas and Sean Higgins, who can be contacted at pbachas@worldbank.org and seanhiggins@berkeley.edu.
Bachas, Pierre, Paul Gertler, Sean Higgins, and Enrique Seira. 2018a. “Digital Financial Services Go a Long Way: Transaction Costs and Financial Inclusion.” American Economic Association Papers & Proceedings 108: 444–48.
———. 2018b. “How Debit Cards Enable the Poor to Save More.” NBER Working Paper 23252.
Cox, Nicholas. 2016. “EGENMORE: Stata modules to extend the generate function.” Statistical Software Components, Boston College Department of Economics.
Huber, Stephan, and Cristoph Rust. 2016. “Calculate Travel Time and Distance with OpenStreetMap Data Using the Open Source Routing Machine (OSRM).” Stata Journal 16: 416–23.
Jann, Ben. 2007. “FRE: Stata module to display one-way frequency table.” Statistical Software Components, Boston College Department of Economics.
Pebesma, Edzer. 2018. “Simple Features for R: Standardized Support for Spatial Vector Data.” The R Journal 10: 439–46.
Picard, Robert. 2012. “GEONEAR: Stata Module to Find Nearest Neighbors Using Geodetic Distances.” Statistical Software Components, Boston College Department of Economics.
Wickham, Hadley. 2016. ggplot2: Elegant Graphics for Data Analysis. New York: Springer-Verlag.