When reviewing impact analysis results, you should make sure that the comparison group calibration, or the difference between the predicted vs. actual persistence rates of the matched comparison group, is minimal -- our recommended rule of thumb is < 3% for N > 500. The comparison group calibration error can be calculated from the data provided by downloading the Raw Data File from the Initiative Impact page:
Calibration Error = ABS( "Comparison Group Outcome (Predicted)" – "Comparison Group Outcome (Actual)"))
In many cases, calibration issues can be ignored since impact results are based on the difference of differences between actual and predicted persistence rates for the matched groups (see “How is Persistence Lift Calculated” to learn more), and calibration issues for the matched groups .should cancel out as long as the participant and comparison groups’ eligibility criteria are similar and appropriately defined in your uploaded initiative data file. For example, if there is a highly effective program that is also available at the same time as the initiative you are analyzing in Impact, and this program affects a large portion of the student population, it can cause a calibration issue when PPSM models do not incorporate a variable representing participation in this program. However, the difference-of-difference impact calculation takes care of this type of calibration error, and the calibration error warning message can be ignored.
In other cases, legitimate calibration issues can arise for the following reasons:
Stringent eligibility criteria for participant and comparison groups are not accounted for in the PPSM models. Example: An initiative that targets a smaller, specific subgroup of students, such as on-ground students who skip multiple classes, and no PPSM model variable captures attendance. This problem can be less severe for students in mixed modality since LMS engagement features can serve as a surrogate variable for attendance,
Eligibility criteria are directly related to the persistence outcome being measured. For example, the eligible comparison group was defined as students who could not register, where, for instance, registration hold was lifted only for those who met advisors with treatment defined as students seeing advisors. In that case, there is leakage of future information and the impact analysis results would be inaccurate.
In pre-post matching, widely varying persistence trends or data non-stationarity (e.g. changes in registration policies, changes in academic standing criteria, changes in student population mix, etc.) over time can cause calibration errors.
Small N can cause calibration issues since variance of group predictions increases as N gets smaller.
In general, the more different participant and comparison groups are, with no representation of these differences in the PPSM model variables, the more caution you should exercise with calibration errors.