Outcome Measures Review Process
Despite past evidence that suggested clinicians in the rehabilitation field did not regularly use outcome measures (Cole et al. 1994; Deathe et al. 2002; Skinner et al. 2006), there is mounting evidence that confirms more clinicians are now reporting their findings using some ordinal or quantifiable outcome measure (Kay et al. 2001; Skinner et al. 2006). Reasons for this include:
- Good science and good clinical practice depends upon sound information, which in turn relies on sound measurement.
- Measurement enables health care professionals and researchers to describe, predict and evaluate in order to provide benchmarks and summarize change related to the condition and care of individuals with spinal cord injury.
- Using datasets facilitates tracking of patient outcomes in relation to healthcare costs.
- Clinical investigators recognize that using an appropriate outcome measure, to determine the validity of a therapeutic intervention, is the key to establishing or changing the models of best practice.
There is a sincere desire to move beyond minimal data collected through datasets such as the mandatory Canadian Institutes of Health Information (CIHI) Rehabilitation Minimum Data Set or the Functional Independence Measure (FIM). Nevertheless there is a lack of validated measures for many disciplines within rehabilitation research. There is also uncertainty as to the strength and limitations for each type of assessment.
SCIRE Outcome Measures provides information on the psychometric properties and the clinical use of 120+ measures, giving the reader the necessary confidence to move their clinical practice and research forward on a more rigorous basis.
Why is there a need to assess the psychometric or clinometric properties of an outcome measure in different clinical populations?
Many outcome measures have a considerable body of research suggesting validity and reliability. Is it then necessary to test the outcome measure in different diagnostic populations? Absolutely, yes. This is because gold standard measures used across populations may be deficient in measuring characteristics in the SCI population. Also, measures not made specifically for SCI may contain items that do not apply, and may affect an individual’s seriousness to answer, confound the data and prevent meaningful interpretation of data.
Example: Functional Independence Measure (FIM)
The FIM is the gold standard for the assessment of basic function. Despite its popularity and its universal recognition, attempts to use it across a broad range of disabling physical disorders, including SCI, has revealed deficiencies and inadequacies. In response, Catz and colleagues (1997) created the Spinal Cord Independence Measure (SCIM). The results demonstrate that the responsiveness, or the ability to detect change, is better in the SCIM than the FIM. Now in its third version, the SCIM III is gaining international acceptance as the measure to use to assess functioning after SCI.
Example: Short Form-36 and the Short-Form-12.
The SF-36 and SF-12 are extremely popular generic surveys of health related Quality of Life (QOL). These surveys include items oriented around activity limitation at the personal level, as well as participation/restriction at a societal level (e.g. can you lift and carry and object; can you climb stairs?). It it clear that a sizeable proportion of the SCI population would not be able to complete many of these activities.
This is why it is critical to assess that each survey item is first and foremost appropriate for the level of SCI being assessed, as unacceptable items can alter the individual’s response (seriousness to answer) or confound the data from each study cohort. This stance does not suggest that new measures should be created for every diagnosis, health condition or situation, but it does recommend that existing measures must be validated for each study population so they are both sufficiently accurate and sensitive to detect a meaningful difference in a functionally significant clinical endpoint between the experimental and control groups of the trial.
- Measures with Level 1 studies, defined by Kalpakjian et al. (2009) as studies with a primary aim to evaluate the psychometric properties of a measure.
- Measures that are familiar and of interest to clinicians.*
- Measures (N = 4) that are commonly known and used internationally. SCIRE Professional currently has 120+ Outcome Measures that have been validated for people with SCI.
- *For version 1, a table identifying all measures used in SCI was developed and clinicians (nurses, occupational therapists, physiatrists, physical therapists, psychologists, recreation therapists and social workers) from GF Strong Rehabilitation Centre (Vancouver, British Columbia) and Parkwood Hospital (London, Ontario) were surveyed. Measures were then selected for review based on receiving at least 5 tallies of interest and/or familiarity.
- A similar process was carried out for identifying pertinent measures for inclusion: another table was developed that identified all new measures along with those measures that were not included originally. Clinicians and scientists then reviewed the list and selected measures to include in updates.
Searching the Literature
- PubMed, MEDLINE, CINAHL, Embase, HaPI, PsycINFO, and Sportdiscus electronic databases were searched.
- Additional searching was conducted by archiving the references of papers obtained from the electronic search.
The key word “spinal cord injury” was used for each of the databases. The following terms varied in combination with spinal cord injury depending on the database used:
- validation studies, instrument validation, external validity, internal validity, criterion-related validity, concurrent validity, discriminant validity, content validity, face validity, predictive validity, reliability, interrater reliability, intrarater reliability, test-retest reliability, reproducibility, responsiveness, sensitivity to change
- evidence-based medicine
- outcome measures, clinical assessment tools, scales and measures.
A database file was established using RefWorks to organize potential articles of interest. After eliminating duplicate manuscripts, data extractors reviewed titles and abstracts in order to retain relevant papers. At this point all of the articles were read and the relevant information (reliability, validity and responsiveness coefficients and descriptions) was extracted.
Classifying the Measures
To cater to our different audiences, we used 2 frameworks: 1) the International Classification of Functioning, Disability, and Health (ICF), 2) Clinical areas
The ICF is a conceptual framework developed by the World Health Organization (WHO 2001). See Figure 1. The advantages of using this framework include:
- It is well recognized and used by the international community.
- It was created to provide standard language for use when discussing health and health-related domains.
- Other reviews of outcome measures have used the ICF for similar purposes (Salter et al. 2005).
Figure 1. Overview of the International Classification of Function, Disability and Health
- The measures were classified according to the body function/structure, activity and participation constructs.
- We included an additional dimension in order to help classify QOL measures.
- For version 1, three classifiers knowledgeable to both outcome measures and the ICF independently categorized all of the measures.
- The classifiers later met to reconcile any disagreement about classification of the measures.
- When a multidimensional measure covered more than one construct (e.g. activity and participation) they placed it in the category where the measure had the most items.
- Upon classification into the main domains, the measures were further categorized into appropriate subcategories based on the ICF definitions.
- During the completion of the version 3 update, new measures were found that did not easily fit into any one of the ICF domains or the QoL category. Currently, we have included these measures in the ‘Body Functions/Structures’ category. While we acknowledge this is not a perfect fit, we considered it to be the best option until new iterations of the ICF are available.
Due to clinical input received in Round 1 of the Delphi process, we also moved into categorizing measures into “clinical areas”. This is to facilitate use of this website for users unfamiliar with ICF terminology. Clinical areas used:
- Assistive Technology
- Community Reintegration
- Lower Limb & Walking
- Mental Health
- Neurological Impairment & Autonomic Dysfunction
- Other Affected Physiological Systems
- Quality of Life & Health Status
- Self Care & Daily Living
- Sexuality & Reproduction
- Skin Health
- Upper Limb
- Wheeled Mobility
Who was involved?
- Team of reviewers who assessed each measure consisted of clinicians and scientists who have long established expertise in a wide variety of relevant research areas. For details, please see the SCIRE Team page.
What did they do?
- Data was extracted from papers reporting findings about the psychometric properties and several “pragmatic” factors (acceptability, feasibility, etc.) for each of the various measures. Extraction was done with heavy reliance on work of Fitzpatrick et al. (1998) for methods and standards.
- Data was evaluated – our evaluation criteria as well as the standard for quantifying the rating where possible are presented in Table 1.
- Summaries were generated for each measure.
Many books and manuscripts have been written classifying and discussing psychometric principles and standards for the selection or validation of clinical measures. We will not replicate this process, but instead refer the reader to a couple of key dispositions such as Health Measurement Scales (Streiner and Norman 2003) and Foundations of Clinical Research (Portney and Watkins 2000), specifically chapters 4, 5, and 6 on reliability and validity. For an excellent overview providing insightful tips for selecting measures directly related to rehabilitation, refer to Physical Rehabilitation Outcome Measures (Finch et al. 1999).