What
is Propensity Score Matching (PSM) ?
PSM is
a statistical matching technique that attempts to estimate the effect of a
treatment, policy or other intervention by accounting for the covariates
that predict receiving the treatment. See for example Rosenbaum and Rubin
(1983) – the pioneers of PSM.
Why
use PSM ?
Many of you will have been to a particular
university or school and achieved a particular result. Have you ever wondered
what would have been the result if you had attended somewhere else ? To
determine this you would need to account for the covariates, using information
on people like you who studied the same course, and then you could estimate this
counterfactual outcome using PSM.
What
do you need so that you can do PSM ?
·
A rich data source (i.e. lots of information
that predicts the outcome of interest);
·
An outcome of interest;
·
A “good” predictive model (based on the rich
data source and outcome) – see an earlier blog “My suggested strategy for building a “good” predictive model”;
·
A program (see the appendix for this); and
·
The necessary software platform e.g. the SAS
Language.
Some
examples of PSM being used (and where it could be used)
1. In Justice, policy makers
wanted to look at success due to mediation, and the analysts needed to remove
confounding – they used PSM (Ministry of Justice, 2010).
2. In medicine, an analyst is
carrying out a case-control study and needs to get an accurate estimate of dose
response. They match cases to controls by factors such as age, gender and
smoking status. These are confounding variables and need to be removed to
reduce any bias – they used PSM (Foster, 2003).
3. In education, the performance of
institutions such as universities and colleges is of interest to stakeholders
who require accurate estimates of outcomes such as retention rates. Morton et
al (2010) have controlled for differences in student background characteristics
(the covariates), and they have performed PSM on the Scottish cohort of
students to estimate their counterfactual outcome.
4. In finance, the consumer
magazine Which? reported “High-street banks failing on customer satisfaction”,
and I subsequently wrote that they were comparing apples and pears (“Big traditional banks worst on customer satisfaction” or should that be
“Comparing apples and pears” ?). In particular, I said
that “Any particular bank in the league table will offer different products,
and have different customers, to any other bank in the list – after all, they
need to do this to suit their customers and to gain competitive advantage
through niche products. But I believe this product offering, to different
customers, has the impact that the age and gender make-up of customers at some
banks will be different to the age and gender make-up of customers at other
banks.” You can see where this is going can’t you ? They would get a better
estimate of customer satisfaction per se,
after they had controlled for the make-up of their customers, and the different
products that they offer. A researcher could use PSM, say by looking at just “the
leaders” (First Direct) and “the laggers” (Santander). But practically I
appreciate that the data may not be captured, and even if it was, it would probably
have restrictive availability (to maintain competitive advantage etc. etc.)
What
issues have I heard about, when using PSM ?
·
PSM doesn’t consider the multilevel nature of
the data. [My answer to this] You would require a considerable amount of
qualitative and quantitative data to take account of the multilevel nature, but
that could be an avenue for future work !
·
You can get the same propensity scores for
different combinations of predictor variables. [My answer to this] Yes, I
accept that this could be the case. But I still maintain that it provides a
method of determining the counterfactual outcome, and is better than nothing at
all.
·
The hot
deck procedure (see Penny et al, 2007) is more robust than
PSM. [My answer to this] It’s more complicated to apply, and takes up more
resources. But see the references below, (and maybe I will write a future blog
on this).
References
Coca-Perraillon, M. (2007)
Local and Global Optimal Propensity Score Matching SAS Global Forum 2007,
Orlando, Florida, April16th - 19th 2007.
Foster, E. M. (2003)
Propensity Score Matching: An Illustrative Analysis of Dose Response. Medical
Care 41 10 1183-1192.
Ministry of Justice (2010) Evaluating
the use of judicial mediation in Employment Tribunals Ministry of Justice
Research Series 7/10.
Morton, I. D. (2009) The Use of Hot Decking and Propensity Score
Matching in Comparing Student Outcomes. MSc Dissertation, Edinburgh Napier University.
Morton, I., Penny, K.I.,
Ashraf, M.Z., & Duffy J.C. (2010). An
analysis of student retention rates using propensity score matching, SAES
Working Paper Series, Edinburgh Napier University.
Penny, K. I., Ashraf, M.
Z., and Duffy, J. C. (2007) The Use of Hot Deck Imputation to Compare
Performance of Further Education Colleges
Journal of Computing and Information Technology 15 4 313-318.
Rosenbaum, P. R. and Rubin,
D. B. (1983) The central role of the propensity score in observational studies
for causal effects Biometrika 70 1 41-56.
Ian
Morton has built propensity scoring models for the financial services sector,
for a utility company, and for the public sector. He has given a number of presentations
on the technique of propensity score matching. For example:
3. He
has also co-authored a forthcoming peer-reviewed journal article. It’s not published
yet (as of May 2013), but for a flavour of its contents you could look at
either of the two references with my name in it.
(This is my personal blog, views are my own and not
those of my present or past employers)
Appendix - Here is the complete
SAS Program
/****************************************************/
/* here is the Coca-Peraillon (2007) matching macro */
/* see the references to that paper */
/* it does nearest neighbour and caliper matching */
/****************************************************/
%macro PSMatching(datatreatment=,
datacontrol=, method=, numberofcontrols=, caliper=,replacement=);
/* Create copies of the treated units if N > 1 */;
data _Treatment0(drop= i);
set Treatment;
do i= 1 to
&numberofcontrols;
RandomNumber= ranuni(12345);
output;
end;
run;
/* Randomly sort both datasets */
proc sort data= _Treatment0 out= _Treatment(drop= RandomNumber);
by RandomNumber;
run;
data _Control0;
set Control;
RandomNumber= ranuni(45678);
run;
proc sort data= _Control0 out= _Control(drop= RandomNumber);
by RandomNumber;
run;
data Matched(keep = IdSelectedControl MatchedToTreatID);
length pscoreC 8;
length idC 8;
/* Load Control dataset into the hash object */
if _N_= 1 then do;
declare hash h(dataset: "_Control",
ordered: 'no');
declare hiter iter('h');
h.defineKey('idC');
h.defineData('pscoreC', 'idC');
h.defineDone();
call missing(idC, pscoreC);
end;
/* Open the treatment */
set _Treatment;
%if %upcase(&method) ~=
RADIUS %then %do;
retain BestDistance 99;
%end;
/* Iterate over the hash */
rc= iter.first();
if (rc=0) then
BestDistance= 99;
do while (rc = 0);
/* Caliper */
%if %upcase(&method) =
CALIPER %then %do;
if (pscoreT - &caliper) <= pscoreC <= (pscoreT +
&caliper) then do;
ScoreDistance = abs(pscoreT - pscoreC);
if ScoreDistance < BestDistance then do;
BestDistance = ScoreDistance;
IdSelectedControl = idC;
MatchedToTreatID = idT;
end;
end;
%end;
/* NN */
%if %upcase(&method) =
NN %then %do;
ScoreDistance = abs(pscoreT - pscoreC);
if ScoreDistance < BestDistance then do;
BestDistance = ScoreDistance;
IdSelectedControl = idC;
MatchedToTreatID = idT;
end;
%end;
%if %upcase(&method) =
NN or %upcase(&method) = CALIPER %then %do;
rc = iter.next();
/* Output the best control and remove it */
if (rc ~= 0) and
BestDistance ~=99 then do;
output;
%if %upcase(&replacement)
= NO %then %do;
rc1 = h.remove(key: IdSelectedControl);
%end;
end;
%end;
/* Radius */
%if %upcase(&method) =
RADIUS %then %do;
if (pscoreT - &caliper) <= pscoreC <= (pscoreT +
&caliper) then do;
IdSelectedControl = idC;
MatchedToTreatID = idT;
output;
end;
rc = iter.next();
%end;
end;
run;
/* Delete temporary tables. Quote for debugging */
proc datasets;
delete _:(gennum=all);
run;
quit;
%mend PSMatching;
/****************************************/
/* that’s the end of the matching macro */
/****************************************/
/***********************************************/
/* this part builds the propensity score model */
/***********************************************/
PROC LOGISTIC DATA=<dataset>
Descend;
class <class
variables>/param=ref ref=first ;
model <outcome>
= <independent variables>
/SELECTION =
STEPWISE RISKLIMITS LACKFIT RSQUARE PARMLABEL;
OUTPUT OUT=
Propen prob=prob ;
RUN;
proc sort data=propen;by <outcome>;run;
/* set up the data for the matching macro in two separate data
sets */
data treatment(rename =
(prob=pscoreT));
set
propen;
idT=_n_;
if <outcome>
="Treatment" then output
treatment;
run;
data control(rename =
(prob=pscoreC));
set
propen;
idC=_n_;
if <outcome>
="Control" then output
control;
run;
/*****************************************************/
/* this part does the actual matching */
/* I have shown CALIPER METHOD of doing the matching */
/* caliper of 0.0001 gets n treatments out of m */
/*****************************************************/
%PSMatching(datatreatment=treatment,datacontrol=control,method=caliper,
numberofcontrols=1, caliper=0.0001,
replacement=no);
proc sort data=matched;by
idselectedcontrol;run;
/* need to rename to allow merging */
data caliper1(rename=(idselectedcontrol=idC));set
matched;run;
/* merge the original file with the matched file */
data merged(keep=result result2 idC
matchedtotreatid);
merge
control(in=a) caliper1(in=b);
by idC;
if a
and b;
run;
/*****************************************************************/
/* now go and summarise the results */
/* this part produces the estimate of the counterfactual outcome
*/
/*****************************************************************/
proc summary data=merged
nway missing;
class
matchedtotreatid;
var
result2;output out=caliper2(drop=_type_)
sum=;
run;
proc summary data= caliper2
nway missing;
var
result2;output out= caliper3(drop=_type_)
sum=;
run;
data caliper4;
set
caliper3;answer=(_freq_-result2)/_freq_;run;
proc print;title "The
counterfactual outcome using caliper=0.0001";run;