Evidence Review Process

We systematically review multiple databases (MEDLINE/PubMed, CINAHL®, EMBASE, PsycINFO) to identify and synthesize all relevant literature published from 1980 to present day. MeSH headings and keywords for multiple clinical areas are paired with spinal cord injury (SCI), tetraplegia, quadriplegia or paraplegia. Studies are included if they:

  • are published in English
  • study at least 3 human subjects, at least 50% of which had a SCI
  • use a measurable outcome associated with the treatment

Over time, new areas of study and SCI rehabilitation topics and keywords (e.g., pressure ulcers) are identified by a multi-disciplinary team of expert scientists, clinicians, policy-makers, and people with SCI. After the reference sections of meta-analyses, systematic reviews and review articles are hand-searched (it is known that hand searching provides higher rates of return than electronic searching within a particular subject area (Hopewell et al. 2007). Keywords used for each specific topic  are outlined in Appendix 1 below.

External clinical evidence can inform, but can never replace, individual clinical expertise, and it is this expertise that decides whether the external evidence applies to the individual patient at all and, if so, how it should be integrated into a clinical decision… Evidence based medicine is the conscientious, explicit, and judicious use of current best evidence in making decisions about the care of individual patients. The practice of evidence based medicine means integrating individual clinical expertise with the best available external clinical evidence from systematic research. (Sackett et al. 1996)

Quality Assessment Tools and Data Extraction

Methodological quality of individual RCTs is assessed using the Physiotherapy Evidence Database (PEDro) scale. The PEDro scale was originally developed for the purpose of accessing bibliographic details and abstracts of randomized-controlled trials (RCT), quasi-randomized studies and systematic reviews in physiotherapy. The PEDro scale has been used to assess both pharmacological and non-pharmacological studies with good agreement between raters at an individual item level and in total PEDro scores (Foley et al. 2006). Maher et al. (2003) found the reliability of PEDro scale item ratings varied from “fair” to “substantial,” while the reliability of the total PEDro score was “fair” to “good”. The PEDro scale has 11-items, in which the first item relates to external validity and the other ten items assess the internal validity of a clinical trial. One point is given for each satisfied criterion (except for the first item, which is given a YES or NO), yielding a maximum score of ten. A higher score indicates better study quality. The following cut-points were used: 9-10 (excellent); 6-8 (good); 4-5 (fair); <4 (poor). A point for a particular criterion is awarded only if the article explicitly reported that the criterion was met. The scoring system is detailed in Appendix 2 below. Two independent raters review each article. Scoring discrepancies are resolved through discussion.

Methodological quality of systematic reviews of RCTs is assessed using AMSTAR (A MeaSurement Tool to Assess systematic Reviews). Two agreements using AMSTAR are required during quality assessment ensuring lower risk of bias. AMSTAR is attached in Appendix 3 below.

Data are extracted to form tables. Sample subject characteristics (Population), nature of the treatment (Intervention), measurements (Outcome Measures), and key results are presented in the tables. In cases, where a single study overlaps into multiple chapters (e.g., treadmill training has effects on the cardiorespiratory, lower extremity and bone health), the results focus on the outcomes relevant to that chapter.

Specific SCI rehabilitation topics were identified by a multi-disciplinary team of expert scientists, clinicians, consumers with SCI and policy-makers. These specific topics were searched with additional keywords generated from expert scientists and clinicians in SCI rehabilitation familiar with the topic and more titles and abstracts are reviewed. MeSH headings were used with the keywords. Key words were paired with spinal cord injury, tetraplegia, quadriplegia or paraplegia or spinal cord impaired or spinal cord lesion. The reference lists of previous review articles, systematic reviews and clinical practice guidelines were hand searched. It is known that hand searching may provide higher rates of return than electronic searching within a particular subject area (Hopewell et al. 2002).


The PEDro scale is used to assess the methodological quality of individual RCTs.

1. eligibility criteria were specified

Note on administration: This criterion is satisfied if the report describes the source of subjects and a list of criteria used to determine who was eligible to participate in the study.

Explanation: This criterion influences external validity, but not the internal or statistical validity of the trial. It has been included in the PEDro scale so that all items of the Delphi scale are represented on the PEDro scale. This item is not used to calculate the PEDro score.

2. subjects were randomly allocated to groups (in a crossover study, subjects were randomly allocated an order in which treatments were received)

Note on administration: A study is considered to have used random allocation if the report states that allocation was random. The precise method of randomisation need not be specified. Procedures such as coin-tossing and dice-rolling should be considered random. Quasi-randomised allocation procedures such as allocation by hospital record number or birth date, or alternation, do not satisfy this criterion.

Explanation: Random allocation ensures that (within the constraints provided by chance) treatment and control groups are comparable.

3. allocation was concealed

Note on administration: Concealed allocation means that the person who determined if a subject was eligible for inclusion in the trial was unaware, when this decision was made, of which group the subject would be allocated to. A point is awarded for this criteria, even if it is not stated that allocation was concealed, when the report states that allocation was by sealed opaque envelopes or that allocation involved contacting the holder of the allocation schedule who was “off-site”.

Explanation: “Concealment” refers to whether the person who determined if subjects were eligible for inclusion in the trial was aware, at the time he or she made this decision, which group the next subject would be allocated to. Potentially, if allocation is not concealed, the decision about whether or not to include a person in a trial could be influenced by knowledge of whether the subject was to receive treatment or not. This could produce systematic biases in otherwise random allocation. There is empirical evidence that concealment predicts effect size (concealment is associated with a finding of more modest treatment effects; see Schulz et al, JAMA 1995;273:408-12).

4. the groups were similar at baseline regarding the most important prognostic indicators

Note on administration: At a minimum, in studies of therapeutic interventions, the report must describe at least one measure of the severity of the condition being treated and at least one (different) key outcome measure at baseline. The rater must be satisfied that the groups’ outcomes would not be expected to differ, on the basis of baseline differences in prognostic variables alone, by a clinically significant amount. This criterion is satisfied even if only baseline data of study completers are presented.

Explanation: This criterion may provide an indication of potential bias arising by chance with random allocation. Gross discrepancies between groups may be indicative of inadequate randomisation procedures.

5. there was blinding of all subjects

Note on administration: Blinding means the person in question (subject, therapist or assessor) did not know which group the subject had been allocated to. In addition, subjects and therapists are only considered to be “blind” if it could be expected that they would have been unable to distinguish between the treatments applied to different groups. In trials in which key outcomes are self-reported (eg, visual analogue scale, pain diary), the assessor is considered to be blind if the subject was blind.

Explanation: Blinding of subjects involves ensuring that subjects were unable to discriminate whether they had or had not received the treatment. When subjects have been blinded, the reader can be satisfied that the apparent effect (or lack of effect) of treatment was not due to placebo effects or Hawthorne effects (an experimental artifact in which subjects responses are distorted by how they expect the experimenters want them to respond).

6. there was blinding of all therapists who administered the therapy

Note on administration: Blinding means the person in question (subject, therapist or assessor) did not know which group the subject had been allocated to. In addition, subjects and therapists are only considered to be “blind” if it could be expected that they would have been unable to distinguish between the treatments applied to different groups. In trials in which key outcomes are self-reported (eg, visual analogue scale, pain diary), the assessor is considered to be blind if the subject was blind.

Explanation: Blinding of therapists involves ensuring that therapists were unable to discriminate whether individual subjects had or had not received the treatment. When therapists have been blinded, the reader can be satisfied that the apparent effect (or lack of effect) of treatment was not due to the therapists’ enthusiasm or lack of enthusiasm for the treatment or control conditions.

7. there was blinding of all assessors who measured at least one key outcome

Note on administration: Blinding means the person in question (subject, therapist or assessor) did not know which group the subject had been allocated to. In addition, subjects and therapists are only considered to be “blind” if it could be expected that they would have been unable to distinguish between the treatments applied to different groups. In trials in which key outcomes are self-reported (eg, visual analogue scale, pain diary), the assessor is considered to be blind if the subject was blind.

Explanation: Blinding of assessors involves ensuring that assessors were unable to discriminate whether individual subjects had or had not received the treatment. When assessors have been blinded, the reader can be satisfied that the apparent effect (or lack of effect) of treatment was not due to the assessors’ biases impinging on their measures of outcomes.

8. measures of at least one key outcome were obtained from more than 85% of the subjects initially allocated to groups

Note on administration: This criterion is only satisfied if the report explicitly states both the number of subjects initially allocated to groups and the number of subjects from whom key outcome measures were obtained. In trials in which outcomes are measured at several points in time, a key outcome must have been measured in more than 85% of subjects at one of those points in time.

Explanation: It is important that measurement of outcome are made on all subjects who are randomised to groups. Subjects who are not followed up may differ systematically from those who are, and this potentially introduces bias. The magnitude of the potential bias increases with the proportion of subjects not followed up.

9. all subjects for whom outcome measures were available received the treatment or control condition as allocated or, where this was not the case, data for at least one key outcome was analysed by “intention to treat”

Note on administration: An intention to treat analysis means that, where subjects did not receive treatment (or the control condition) as allocated, and where measures of outcomes were available, the analysis was performed as if subjects received the treatment (or control condition) they were allocated to. This criterion is satisfied, even if there is no mention of analysis by intention to treat, if the report explicitly states that all subjects received treatment or control conditions as allocated.

Explanation: Almost inevitably there are protocol violations in clinical trials. Protocol violations may involve subjects not receiving treatment as planned, or receiving treatment when they should not have. Analysis of data according to how subjects were treated (instead of according to how subjects should have been treated) may produce biases. It is probably important that, when the data are analysed, analysis is done as if each subject received the treatment or control condition as planned. This is usually referred to as “analysis by intention to treat”. For a discussion of analysis by intention to treat see Elkins and Moseley, J Physiother 2015;61(3):165-7.

10. the results of between-group statistical comparisons are reported for at least one key outcome

Note on administration: A between-group statistical comparison involves statistical comparison of one group with another. Depending on the design of the study, this may involve comparison of two or more treatments, or comparison of treatment with a control condition. The analysis may be a simple comparison of outcomes measured after the treatment was administered, or a comparison of the change in one group with the change in another (when a factorial analysis of variance has been used to analyse the data, the latter is often reported as a group x time interaction). The comparison may be in the form of hypothesis testing (which provides a “p” value, describing the probability that the groups differed only by chance) or in the form of an estimate (for example, the mean or median difference, or a difference in proportions, or number needed to treat, or a relative risk or hazard ratio) and its confidence interval.

Explanation: In clinical trials, statistical tests are performed to determine if the difference between groups is greater than can plausibly be attributed to chance.

11. the study provides both point measures and measures of variability for at least one key outcome

Note on administration: A point measure is a measure of the size of the treatment effect. The treatment effect may be described as a difference in group outcomes, or as the outcome in (each of) all groups. Measures of variability include standard deviations, standard errors, confidence intervals, interquartile ranges (or other quantile ranges), and ranges. Point measures and/or measures of variability may be provided graphically (for example, SDs may be given as error bars in a Figure) as long as it is clear what is being graphed (for example, as long as it is clear whether error bars represent SDs or SEs). Where outcomes are categorical, this criterion is considered to have been met if the number of subjects in each category is given for each group.

Explanation: Clinical trials potentially provide relatively unbiased estimates of the size of treatment effects. The best estimate (point estimate) of the treatment effect is the difference between (or ratio of) the outcomes of treatment and control groups. A measure of the degree of uncertainty associated with this estimate can only be calculated if the study provides measures of variability.

For all criteria

Points are only awarded when a criterion is clearly satisfied. If on a literal reading of the trial report it is possible that a criterion was not satisfied, a point should not be awarded for that criterion.

For criteria 4 and 7-11

Key outcomes are those outcomes which provide the primary measure of the effectiveness (or lack of effectiveness) of the therapy. In most studies, more than one variable is used as an outcome measure.

AMSTAR (A MeaSurement Tool to Assess systematic Reviews) is used to asses the methodological quality of systematic reviews of RCTs.

1. Was an ‘a priori’ design provided?

___Yes ____No ____ Can’t Answer ____ N/A The research question and inclusion criteria should be established before the conduct of the review. Note: Need to refer to a protocol, ethics approval, or pre-determined/ a priori published research objective to score a ‘yes’.

2. Was there duplicate study selection and data extraction?

___Yes ____No ___ Can’t Answer ___ N/A There should be at least two independent data extractors and a consensus procedure for disagreements in place. Note: 2 people do study selection, 2 people do data extraction, consensus process or one person checks the other’s work.

3. Was a comprehensive literature search performed?

___Yes ____No ___ Can’t Answer ___ N/A At least two electronic sources should be searched. The report must include years and databases used (e.g., Central, EMBASE, MEDLINE). Key words and/or MESH terms must be stated and, where feasible, the search strategy should be provided. All searches should be supplemented by consulting current contents, reviews, textbooks, specialized registers, or experts in the particular field of study, and by revewing the references in the studies found. Note: If at least 2 sources + one supplementary strategy used, select ‘yes’ (Cochrane register/Central counts as 2 sources; a grey literature search counts as supplementary).

4. Was the status of publication (i.e., grey literature) used as an inclusion criterion?

___Yes ____No ___ Can’t Answer ___ N/A The authors should state that they searched for reports regardless of their publication type. The authors should state whether or not ehy excluded any reports (from the systematic review), based on their publication status, language, etc. Note: If review indicates that there was a search for ‘grey literature’ or ‘unpublished literature’ indicate ‘yes’. SIGLE database, dissertations, conference proceeedings, and trial registries are all considered grey for this purpose. If searching a source that contains both grey and non-grey, it must specify that they were searching for grey/unpublished literature.

5. Was a list of studies (included and excluded) provided?

___Yes ____No ___ Can’t Answer ___ N/A A list of included and excluded studies should be provided. Note: Acceptable if the excluded studies are referenced. If there is an electronic link to the list but the link is dead, select ‘no’.

6. Were the characteristics of the included studies provided?

___Yes ____No ___ Can’t Answer ___ N/A In an aggregated form, such as a table, data from the original studies should be provided on the participants, interventions and outcomes. The ranges of characteristics in all the studies analyzed e.g., age, race, sex, relevant socioeconomic data, disease status, duration, severity, or other diseases should be reported. Note: Acceptable if not in table format as long as they are described as above.

7. Was the scientific quality of the included studies assessed and documented?

___Yes ____No ___ Can’t Answer ___ N/A ‘A priori’ methods of assessment should be provided (e.g., for effectiveness studies if the authors chose to include only randomized, double-blind, placebo controlled studies, or allocation concealment as inclusion criteria); for other types of studies alternative items will be relevant. Note: Can include use of a quality scoring tool or checklist, e.g., Jada scale, risk of bias, sensitivity analysis, etc., or a description of quality items, with some kind of result for EACH study (‘low’ or ‘high’ is fine, as long as it is clear which studies score ‘low’ and ‘high’; a summary score/range for all studies is not acceptable).

8. Was the scientific quality of the included studies used appropriately in formulating conclusions?

___Yes ____No ___ Can’t Answer ___ N/A The results of the methodological rigor and scientific quality should be considered in the analysis and the conclusions of the review, and explicitly stated in formulating recommendations. Note: Might say something such as ‘the results should be interpreted with caution due to poor quality of included studies.’ Cannot score ‘yes’ for this question if scored ‘no’ for question 7.

9. Were the methods used to combine the findings of studies appropriate?

___Yes ____No ___ Can’t Answer ___ N/A For the pooled results, a test should be done to ensure the studies were combinable, to assess their homogeneity (i.e., Chi-squared test for homogeneity). If heterogeneity exists, a random effects model should be used and/or the clinical appropriateness of combining should be taken into consideration (i.e., is it sensible to combine?). Note: Indicate ‘yes’ if they mention or describe heterogeneity, i.e., if they explain that they cannot pool because of heterogeneity/variability between interventions.

10. Was the likelihood of publication bias assessed?

___Yes ____No ___ Can’t Answer ___ N/A An assessment of publication bias should include a combination of graphical aids (e.g., funnel plot, other available tests) and/or statistical tests (e.g., Egger regression test, Hedges-Olken). Note: If no test values or funnel plot included, sore ‘no’. Score ‘yes’ if it mentions that publication bias could not beassessed because there were fewer than 10 included studies.

11. Was the conflict of interest included?

___Yes ____No ___ Can’t Answer ___ N/A Potential sources of support should be clearly acknowledged in both the systematic review and the included studies. Note: To get a ‘yes,’ it must indicate the source of funding or support for the systematic review AND for each of the included studies.   Shea et al. BMC Medical Research Methodology 2007, 7:10 Additional notes (in italics) made by Michelle Weir, Julia Worswick, and Carolyn Wayne based on conversations with Bev Shea and/or Jeremy Grimshaw in June and October 2008 and July and September 2010.