Volume 16, Issue 3 , Pages 295-300, March 2010
Patient-Reported Outcomes for Acute Graft-versus-Host Disease Prevention and Treatment Trials
Article Outline
- Abstract
- Introduction
- What are the Requirements for PROs to be Primary or Supportive Endpoints?
- What Challenges Will be Encountered, Especially for aGVHD Trials?
- In What Circumstances can Short-term PROs or Health-Related Quality of Life (rather than response or survival) be an Endpoint for aGVHD Prevention or Treatment Trials?
- Are There Any Validated PROs or Other Health-Related Quality-of-life Tools That can be used for aGVHD Trials, and What Data Support Their Validation?
- What Data Exist about Patient-Reported Measures in aGVHD?
- Summary
- Acknowledgments
- References
- Copyright
Patient-reported outcomes (PROs) such as health-related quality of life, functional status, and symptom burden have been recognized by the U.S. Food and Drug Administration (FDA) as legitimate measures of clinical benefit for sponsors seeking drug approval. However, in practice, very few agents have been approved based on these endpoints. Successful use of PROs in registration trials requires rigorous methods to overcome numerous logistic and analytic barriers. Acute graft-versus-host disease (aGVHD) is associated with high morbidity and mortality, and its prevention and treatment are the goals of many clinical trials in the hematopoietic cell transplantation (HCT) research community. This article summarizes issues to be considered in the use of PROs as endpoints in aGVHD prevention and treatment trials.
Key Words: Acute graft-versus-host disease, Patient-reported outcomes, Food and Drug Administration, Clinical trials
Introduction
Patient-reported outcomes (PROs) refer to health-related quality of life, functional status, and symptom burden as perceived and reported by patients. For example, symptoms are subjective phenomena reported by patients that indicate a change in normal functioning, sensation, or appearance because of disease [1]. Patient-reported measurement tools include surveys, interviews, or patient diaries. These instruments try to capture what people actually experience with a treatment approach. Patient-reported measures are complementary to physical exam findings and laboratory testing, and are the primary source for much of the clinician-reported symptom information in the chart. For example, patient self-report is the most direct means of capturing severity of nausea, pain, and anorexia, and the only way to capture information about fatigue and patient-perceived illness impact. In recognition of this reality, the Common Terminology Criteria for Adverse Events is undergoing revision to include PRO items for symptom severity [2]. In summary, PROs reflect the patient's personal experience with disease and treatment.
Acute graft-versus-host disease (aGVHD) primarily involves the skin as an erythematous rash, the liver as a cholestatic or hepatitic process, or the gastrointestinal (GI) system with nausea, vomiting, diarrhea, and abdominal pain. Initial treatment for aGVHD includes corticosteroids, with other immunosuppressive agents added as needed. If symptoms or side effects are moderate to severe, patients may require hospitalization for hydration, nutritional support, intravenous delivery of medications, monitoring, treatment of infections, and other supportive care. Both the aGVHD disease process and the effects of treatments used to prevent or treat GVHD may affect PROs.
In May 2009, the U.S. Food and Drug Administration (FDA), in collaboration with several National Institutes of Health (National Heart, Lung and Blood Institute, National Cancer Institute, and National Institute of Allergy and Infectious Diseases), the Center for International Blood and Marrow Transplant Research (CIBMTR), and the American Society of Blood and Marrow Transplantation (ASBMT) convened a meeting to discuss endpoints in aGVHD trials, particularly with regard to the FDA approval process. This article summarizes the discussion about the role of PROs in trial design and interpretation based on 4 questions posed by the conference organizers.
What are the Requirements for PROs to be Primary or Supportive Endpoints?
The FDA requires evidence that treatments provide “clinical benefit” defined functionally as “living longer or living better” before it will consider drug approval. Draft guidance from the FDA states that the amount and kind of PRO evidence to support a labeling claim is the same as that required for any other labeling claim [3]. Patient-reported endpoints may refer to simple concepts, such as single symptoms (eg, pain or nausea) or complex concepts, such as improvement in functioning (eg, working) or psychologic state (eg, mood). Evidence of improvement in simple PRO endpoints is not recognized for complex claims such as improved health-related quality of life. Although it may seem self-evident that decreasing nausea or diarrhea would lead to better quality of life, a sponsor needs to show actual effects on the claim of improved quality of life. The draft guidance also provides insights into the FDA's opinion about several other issues in PRO assessment and analysis such as susceptibility to bias. For example, the guidance notes that cognitive biases may affect patient responses so PROs are considered unreliable in unblinded studies. PRO instruments should capture current status and actual functioning. Recall over more than a short period of time or asking patients to estimate what they may be able to accomplish is subject to substantial bias. The guidance also provides practical advice for sponsors designing trials. Because missing data often compromise analytic plans, reasons for missing data should be recorded during the trial so they can inform the subsequent analysis.
Similar to the use of a new diagnostic tool, the FDA needs to certify a PRO tool to ensure that it is sufficiently validated to support the intended claim in the target population. A previously validated instrument that is modified in any way is considered a different instrument. If the study population differs substantially from the population in which the instrument was validated, the validation may need to be repeated to ensure psychometric integrity. The FDA may choose to review the instrument development and validation process in detail. For example, the FDA may ask to review the process of instrument creation including patient interviews and focus group transcripts, cognitive debriefing procedures, and readability tests. The FDA may evaluate the text of the questions and the response options offered to assess construct validity and ensure absence of ceiling or floor effects. They will determine whether the recall period is appropriate for the study, and evaluate the instrument's psychometric properties including reliability, validity, sensitivity to change, and clinically meaningful differences. Finally, they will review the planned study procedures to ensure accurate data capture, check instrument formatting, and review planned methods of data collection to make sure that results will be considered accurate at the conclusion of the trial.
What Challenges Will be Encountered, Especially for aGVHD Trials?
There are a number of general challenges to use of PROs as endpoints in clinical trials. First, it is notoriously difficult to collect complete PRO data. PROs are not available retrospectively or from other surrogate sources. Collection of PROs requires active patient cooperation, which is difficult although not impossible to achieve when patients are very ill. For example, Wang et al. [4] reported only 1.7% missing PRO data in a group of 30 patients who completed the M.D. Anderson Symptom Inventory (MDASI) twice weekly during the first 30 inpatient days after allogeneic HCT. Outpatients and those obtaining care in multiple health care settings offer different data collection challenges. Regardless of the setting of a clinical trial, a data collection structure must be put in place that is committed and able to collect all data as completely as possible. Many new technologies, such as interactive voice response systems and Web-based applications, are making collection of PRO data across settings easier and more complete.
Frequency and timing of PRO assessments during aGVHD trials may be critical in detecting a difference in PRO endpoints. Symptoms from aGVHD may begin several days before the diagnosis of disease and worsen until several days after the initiation of effective therapy [5]. Symptoms may then decline rapidly in responding patients after initiation of effective therapy, so an assessment at 100 days or 6 months may miss important differences.
Different survey instruments are often required for children or non-English-speaking patients, increasing trial costs, and decreasing sample size, because often these patients are analyzed as separate subsets. Perhaps the greatest challenge is the fact that PRO tools are clearly intended for research, and currently, separate mechanisms must be established for their collection. Physicians cannot just order PRO measurement as they can a clinical test, contributing to the perception that these are “extra” tasks and expendable because they often do not directly contribute to patient care. Although low-cost data collection options such as telephone and computer technology are being developed, these are not widely used yet 6, 7. A notable exception is the assessment of patient-reported pain severity, which has become routine in hospitals and clinics since being mandated by the Joint Commission on the Accreditation of Healthcare Organizations in 2001 [8], suggesting that routine assessment of other symptoms such as fatigue and distress should also be possible 9, 10.
It is also challenging to analyze and interpret PROs. Longitudinal statistical methods are often used, but generate sometimes incomprehensible results. Missing data are a big problem. The fact that patient-reported data are subjective and not objective is their great strength and weakness. In the context of a trial intended for FDA submission, the fact that PROs are open to bias, particularly in opened-label studies, is a serious limitation. However, because the FDA will not accept some of the most common aGVHD trials designs (unblinded, phase II), sponsors will have to increase the rigor of their designs which offers the potential to measure unbiased PROs.
There are some special considerations when using PROs in the context of aGVHD trials. First, some aGVHD such as liver involvement is usually not symptomatic. Second, many symptoms and signs overlap with other common HCT toxicities. Although this is also true for more “objective” measures of aGVHD severity such as laboratory tests and diarrhea, the high background “noise” in PROs from conditioning regimen toxicity, drug side effects, infections, and other complications adds to the background of considerable intrinsic measurement variability. Patients with aGVHD are often quite ill, with multiple concurrent complications. Perhaps this is why more work has been done looking at PROs in chronic GVHD (cGVHD) than in aGVHD, because patients are often more clinically stable further out from HCT.
In What Circumstances can Short-term PROs or Health-Related Quality of Life (rather than response or survival) be an Endpoint for aGVHD Prevention or Treatment Trials?
To clinicians, it is clear that patients with moderate to severe aGVHD, especially GI GVHD, are miserable. Yet, it is hard to recommend PROs as primary endpoints for GVHD prevention trials because the prophylactic agents currently available are not that effective, the change in GVHD rates targeted tends to be small, and the background noise is often too great to allow a difference in PROs to be detected. Thus, the most appropriate role of PROs currently is likely to be in treatment trials. We believe that although prolonged disease-free survival (DSF) is paramount, an agent that offers similar GVHD control and survival but decreases symptoms such as frequency of diarrhea and pain intensity, decreases hospitalization days, or decreases side effects from treatment, may provide a compelling case for FDA approval.
Improvement in PROs may or may not correlate with objective response criteria because they may be fundamentally measuring different aspects of aGVHD or have differing sensitivities to change. For example, amount of diarrhea may or may not correlate with symptoms of anorexia, nausea, abdominal pain, and need for hospitalization. Also, GVHD response measures are categoric (percent body surface area, bilirubin level, volume of diarrhea, overall grade). Improvements within organ stages may not result in changes to overall grade, but could be associated with improvement in symptoms. Conversely, decreasing diarrhea from 1700 cc per day to 1400 cc per day may not be noticeable to patients but would be considered improvement in GVHD severity by objective measures.
Are There Any Validated PROs or Other Health-Related Quality-of-life Tools That can be used for aGVHD Trials, and What Data Support Their Validation?
Currently there are no validated instruments that meet the FDA requirements for a patient-reported instrument in GVHD studies intended to support approval of a claim. Three instruments have been used most frequently to assess patients with aGVHD. The Medical Outcomes Study Short Form 36 (SF-36) is a 36-item generic functional status tool that has been used in many healthy and ill populations. It takes approximately 6 minutes to complete, and provides 8 subscales and 2 summary scores 11, 12. The Functional Assessment of Cancer Therapy (FACT)-bone marrow transplant (BMT) survey contains 37 items and also takes 6 minutes to complete. It provides 4 core subscales, 1 BMT module score, and 1 summary score [13]. The MDASI contains 13 symptom items and 6 interference items. It takes 5 minutes to complete and provides 2 summary scores for symptom severity and interference [14]. Figure 1 shows examples of items from these 3 instruments.

Figure 1
Examples of items from the MOS-SF36 (Medical Outcomes Study Short-Form 36), FACT-BMT (Functional Assessment of Cancer Therapy-Bone Marrow Transplant module), and MDASI (M.D. Anderson Symptom Inventory). See text for details.
There are no studies that directly compare these 3 instruments in patients with aGVHD. The MDASI focuses on specific symptoms and measures maximal symptom severity within the previous 24 hours. Frequency and duration of symptoms per se are not measured, although they may be reflected in the section measuring aggregate impact of symptoms on functioning. The FACT-BMT includes items referring to symptoms, functional impact, and “satisfaction/bother/enjoyment” within each domain. The most common version used in HCT asks respondents to report about the previous week. The FACT-BMT provides domain scores, but not symptom or symptom cluster scores. The SF36 was designed as a generic functional status instrument. The acute version, measuring status over the previous week, is usually used in HCT studies. Some questions on the SF36 can be confusing for HCT patients to interpret. For example, many questions refer to work or housework or normal social activities with family and friends, but HCT patients have limitations on these activities by virtue of undergoing transplantation or being hospitalized independent of their current health.
Although these instruments are well validated in general and used extensively in HCT populations, their psychometric properties in patients with aGVHD are not well established. Studies evaluating clinically meaningful differences and sensitivity to change are particularly needed.
What Data Exist about Patient-Reported Measures in aGVHD?
Several studies provide some insight into how aGVHD does or does not affect patient-reported measures. The NHLBI T cell depletion trial enrolled 410 patients from 1995 to 2000 who were randomized to ex vivo T cell depletion or a cyclosporine (CsA) and methotrexate (MTX) chemoprophylaxis regimen. Patient-reported measures including the FACT-BMT, SF-36, and a depression scale were collected at baseline and after HCT at day 100, 6 months, 1 year, and 3 years. There was a similar trajectory of QOL changes in both groups [15], although there was twice as much grade III-IV aGVHD in the non-T cell depleted group (37% versus 18%, P < .0001) [16].
The group at the Dana-Farber Cancer Institute studied 96 patients transplanted from 1999 to 2004 who provided a baseline Short Form 12 (SF-12) and FACT-BMT with at least 1 follow-up at 6 or 12 months. Grade II-IV aGVHD was associated with worse quality of life at 6 months, whereas cGVHD was associated with worse quality of life at 1 year [17].
The group at M.D. Anderson has reported that grade I-IV aGVHD was associated with greater symptom burden during days 22 to 100 after HCT than no aGVHD. They studied 125 patients with a baseline MDASI and between 6 and 20 follow-up assessments [18].
Some ongoing aGVHD prevention and treatment trials include PROs that will further help define the role of these endpoints in GVHD trials. For example, the recently completed, randomized, placebo-controlled trial of mesenchymal stem cells for steroid-refractory aGVHD treatment collected the FACT-BMT at treatment days 0, 30, 100, and 180 (N = 280). A randomized, placebo-controlled trial of oral beclomethasone diproprionate for initial treatment of GI GVHD (target N = 166) will collect the MDASI weekly from enrollment through 80 days. In both these FDA registration trials, PROs are designated “additional endpoints.” The upcoming Clinical Trials Network Protocol 0802, a randomized placebo-controlled trial of prednisone versus prednisone and mycophenolate mofetil (N = 372) will collect the MDASI prior to randomization and at day 56 when the primary endpoint of GVHD response is assessed. Results of the PRO analyses from these trials are eagerly awaited.
Future studies should test whether patient-reported measures can predict survival as well as objective measures of aGVHD response.
Summary
Based on past work and the FDA's currently stated position, our opinion is that very narrow PRO claims could be sought in GVHD treatment trials, with the focus on symptoms rather than composite endpoints or broad claims. The field urgently needs a validated patient-reported instrument that meets the FDA requirement for validation rigor. If such an instrument were to become available, experience with the instrument in a phase II study is recommended to better plan for the phase III trial design.
aGVHD is a common complication following allogeneic HCT. Treatments that improve patients' experience with aGVHD either through prevention or successful treatment should be considered for FDA approval. However, there are still daunting methodologic barriers to proving, by FDA standards, that a treatment improves symptoms or health-related quality of life related to aGVHD.
Acknowledgments
Financial disclosure: The authors have nothing to disclose.
References
- . Symptom distress—the concept: past and present. Semin Oncol Nurs. 1987;3:242–247
- Patient Reported Outcome (PRO-CTCAE). Available at: https://wiki.nci.nih.gov/display/CTMS/Patient+Reported+Outcome+(PRO-CTCAE). Accessed July 3, 2009.
- Food and Drug Administration. Draft Guidance for Industry. PRO measures: use in medical product development to support labeling claims. 2006. Available at: http://www.fda.gov/cder/guidance. Accessed May 4, 2009.
- Serum interleukin-6 predicts the development of multiple symptoms at nadir of allogeneic hematopoietic stem cell transplantation. Cancer. 2008;113:2102–2109
- Measuring multiple symptoms and inflammatory cytokines related to acute GVHD in AML/MDS patients undergoing allogeneic BMT [abstr 1426]. Blood. 2005;106:412A
- Feasibility and acceptability to patients of a longitudinal system for evaluating cancer-related symptoms and quality of life: pilot study of an e/Tablet data-collection system in academic oncology. J Pain Symptom Manage. 2009;37:1027–1038
- Long-term toxicity monitoring via electronic patient-reported outcomes in patients receiving chemotherapy. J Clin Oncol. 2007;25:5374–5380
- . The new JCAHO pain standards: implications for pain management nurses. Pain Manag Nurs. 2000;1:3–12
- . Could fatigue become the sixth vital sign?. ONS News. 2003;18(1):4–5
- . Emotional distress: the sixth vital sign in cancer care. J Clin Oncol. 2005;23:6440–6441
- . SF-36 Physical and Mental Health Summary Scales: A User's Manual. New England Medical Center. Boston, MA: The Health Institute; 1994;
- . SF-36 Health Survey: A Manual and Interpretation Guide. New England Medical Center. Boston, MA: The Health Institute; 1993;
- Quality of life measurement in bone marrow transplantation: development of the Functional Assessment of Cancer Therapy-Bone Marrow Transplant (FACT-BMT) scale. Bone Marrow Transplant. 1997;19:357–368
- Assessing symptom distress in cancer patients: the M.D. Anderson Symptom Inventory. Cancer. 2000;89:1634–1646
- The effect of unrelated donor marrow transplantation on health-related quality of life: a report of the unrelated donor marrow transplantation trial (T-cell depletion trial). Biol Blood Marrow Transplant. 2006;12:648–655
- . Effect of graft-versus-host disease prophylaxis on 3-year disease-free survival in recipients of unrelated donor bone marrow (T-cell depletion trial): a multi-centre, randomised phase II-III trial. Lancet. 2005;366:733–741
- Quality of life associated with acute and chronic graft-versus-host disease. Bone Marrow Transplant. 2006;38:305–310
- Measuring the symptom burden of allogeneic hematopoietic stem cell transplantation in patients with and without acute graft-versus-host disease [abstract 49]. Biol Blood Marrow Transplant. 2009;15:20–21
Financial disclosure: See Acknowledgments on page 299.
PII: S1083-8791(09)00409-1
doi:10.1016/j.bbmt.2009.08.021
© 2010 Published by Elsevier Inc.
Volume 16, Issue 3 , Pages 295-300, March 2010
