This CPG adheres to the 2010 AACE Protocol for Standardized Production of Clinical Practice Guidelines published in Endocrine Practice (5). This updated protocol describes a more transparent methodology of rating the clinical evidence and synthesizing recommendation grades. The protocol also stipulates a rigorous multilevel review process.

The process was begun by developing an outline for reviewing the principal clinical aspects of hypothyroidism. Computerized and manual searches of the medical literature and various databases, primarily including Medline®, were based on specific section titles, thereby avoiding inclusion of unnecessary detail and exclusion of important studies. Compilation of the bibliography was a continual and dynamic process. Once the principal clinical aspects of hypothyroidism were defined, questions were formulated with the intent to then develop recommendations that addressed these questions. The grading of recommendations was based on consensus among the authors.

The final document was approved by the American Association of Clinical Endocrinologists (AACE) and American Thyroid Association (ATA), and was officially endorsed by the American Association of Diabetes Educators (AADE), American Association of Endocrine Surgeons (AAES), American Academy of Otolaryngology—Head and Neck Surgery (AAO-HNS), American College of Endocrinology (ACE), Italian Association of Clinical Endocrinologists (AME), American Society for Metabolic & Bariatric Surgery (ASMBS), The Endocrine Society of Australia (ESA), International Association of Endocrine Surgeons (IAES), Latin American Thyroid Society (LATS), and Ukranian Association of Endocrine Surgeons (UAES).


The purpose of these guidelines is to present an updated evidence-based framework for the diagnosis, treatment, and follow-up of patients with hypothyroidism.

Guidelines for CPGs

Current guidelines for CPGs in clinical medicine emphasize an evidence-based approach rather than simply expert opinion (6). Even though a purely evidence-based approach is not applicable to all actual clinical scenarios, we have incorporated this into these CPGs to provide objectivity.

Levels of scientific substantiation and recommendation grades (transparency)

All clinical data that are incorporated in these CPGs have been evaluated in terms of levels of scientific substantiation. The detailed methodology for assigning evidence levels (ELs) to the references used in these CPGs has been reported by Mechanick et al. (7), from which Table 2 is taken. The authors’ EL ratings of the references are included in the References section. The four-step approach that the authors used to grade recommendations is summarized in Tables 3, 4, 5, and 6 of the 2010 Standardized Production of Clinical Practice Guidelines (5), from which Table 3 is taken. By explicitly providing numerical and semantic descriptors of the clinical evidence as well as relevant subjective factors and study flaws, the updated protocol has greater transparency than the 2008 AACE protocol described by Mechanick et al. (7).

In these guidelines, the grading system used for the recommendations does not reflect the instruction of the recommendation, but the strength of the recommendation. For example in some grading systems “should not” implies that there is substantial evidence to support a recommendation. However the grading method employed in this guideline enables authors to use this language even when the best evidence level available is “expert opinion.” Although different grading systems were employed, an effort was made to make these recommendations consistent with related portions of “Hyperthyroidism and Other Causes of Thyrotoxicosis: Management Guidelines of the American Thyroid Association and American Association of Clinical Endocrinologists” (8,9), as well as the “Guidelines of the American Thyroid Association for the Diagnosis and Management of Thyroid Disease During Pregnancy and Postpartum” (10).


Level Description Comments
1 Prospective, randomized, controlled trials—large Data are derived from a substantial number of trials with adequate statistical power involving a substantial number of outcome data subjects.
    Large meta-analyses using raw or pooled data or incorporating quality ratings
    Well-controlled trial at one or more centers
    Consistent pattern of findings in the population for which the recommendation is made (generalizable data).
    Compelling nonexperimental, clinically obvious, evidence (e.g., thyroid hormone treatment for myxedema coma), “all-or-none” indication
2 Prospective controlled trials with or without randomization—limited body of outcome data Limited number of trials, small population sites in trials
    Well-conducted single-arm prospective cohort study
    Limited but well-conducted meta-analyses
    Inconsistent findings or results not representative for the target population
    Well-conducted case-controlled study
3 Other experimental outcome data and nonexperimental data Nonrandomized, controlled trials
    Uncontrolled or poorly controlled trials
    Any randomized clinical trial with one or more major or three or more minor methodological flaws
    Retrospective or observational data
    Case reports or case series
    Conflicting data with weight of evidence unable to support a final recommendation
4 Expert opinion Inadequate data for inclusion in level 1, 2, or 3; necessitates an expert panel’s synthesis of the literature and a consensus
    Experience based
    Theory driven

     Levels 1, 2, and 3 represent a given level of scientific substantiation or proof. Level 4 or Grade D represents unproven claims. It is the “best evidence” based on the individual ratings of clinical reports that contributes to a final grade recommendation.

     Source: Mechanick et al., 2008 (7).


2010 AACE Protocol for Production of Clinical Practice Guidelines—Step III: Grading of recommendations; How different evidence levels can be mapped to the same recommendation grade
Best evidence level Subjective factor impact Two-thirds consensus Mappinga Recommendation grade
1 None Yes Direct A
2 Positive Yes Adjust up A
2 None Yes Direct B
1 Negative Yes Adjust down B
3 Positive Yes Adjust up B
3 None Yes Direct C
2 Negative Yes Adjust down C
4 Positive Yes Adjust up C
4 None Yes Direct D
3 Negative Yes Adjust down D
1,2,3,4 N/A No Adjust down D

     Adopted by the AACE and the ATA for the Hypothyroidism CPG.

     aStarting with the left column, best evidence levels (BELs), subjective factors, and consensus map to recommendation grades in the right column. When subjective factors have little or no impact (“none”), then the BEL is directly mapped to recommendation grades. When subjective factors have a strong impact, then recommendation grades may be adjusted up (“positive” impact) or down (“negative” impact). If a two-thirds consensus cannot be reached, then the recommendation grade is D.

     Source: Mechanick et al., 2010 (5).

     N/A, not applicable (regardless of the presence or absence of strong subjective factors, the absence of a two-thirds consensus mandates a recommendation grade D).


Study Subclinical Overt TSH Comment
NHANES III 4.3% 0.3% 4.5  
Colorado Thyroid Disease Prevalence 8.5% 0.4% 5.0 Not on thyroid hormone
Framingham     10.0 Over age 60 years: 5.9% women; 2.3% men; 39% of whom had subnormal T4
British Whickham     10.0 9.3% women; 1.2% men

     Sources: Hollowell et al., 2002 (11); Canaris et al., 2000 (12); Sawin et al., 1985 (13); Vanderpump et al., 1995 (14); Vanderpump and Tunbridge, 2002 (15).

     NHANES, National Health and Nutrition Examination Survey.


Increased TBG Decreased TBG Binding inhibitors
Inherited Inherited Salicylates
Pregnancy Androgens Furosemide
Neonatal state Anabolic steroids Free fatty acids
Estrogens Glucocorticoids Phenytoin
Hepatitis Severe illness Carbamazepine
Porphyria Hepatic failure NSAIDs (variable, transient)
Heroin Nephrosis Heparin
Methadone Nicotinic acid  
Mitotane L-Asparaginase  
SERMS (e.g., tamoxifen, raloxifene)    

     TBG, T4-binding globulin; SERMS, selective estrogen receptor modulators; NSAIDs, nonsteroidal anti-inflammatory drugs.


Test Method Comments
Free T4 index or free T4 estimate Product of total T4 and thyroid hormone binding ratio or T3-resin uptake Normal values in pregnancy and with alterations in TBG binding;
Direct immunoassay of free T4 With physical separation using equilibrium dialysis or ultrafiltration Reduced values in pregnancy compared to nonpregnant reference ranges; normal values with alterations in TBG binding
Direct immunoassay of free T4 Without physical separation using anti-T4 antibody Reduced values in pregnancy compared to nonpregnant reference ranges; normal values with alterations in TBG binding


The shortcomings of this evidence-based methodology in these CPGs are that many recommendations are based on weak scientific data (Level 3) or consensus opinion (Level 4), rather than strong scientific data (Levels 1 and 2). There are also the problems of (i) subjectivity on the part of the authors when weighing positive and negative, or epidemiologic versus experimental, data in order to arrive at an evidence-based recommendation grade or consensus opinion, (ii) subjectivity on the part of the authors when weighing subjective attributes, such as cost effectiveness and risk-to-benefit ratios, in order to arrive at an evidence-based recommendation grade or consensus opinion, (iii) potentially incomplete review of the literature by the authors despite extensive diligence, and (iv) bias in the available publications, which originate predominantly from experienced clinicians and large academic medical centers and may, therefore, not reflect the experience at large. The authors, through an a priori methodology and multiple levels of review, have tried to address these shortcomings by discussions with three experts (see Acknowledgments).

Summary of recommendation grades

The recommendations are evidence-based (Grades A, B, and C) or based on expert opinion because of a lack of conclusive clinical evidence (Grade D). The “best evidence” rating level (BEL), which corresponds to the best conclusive evidence found, accompanies the recommendation grade. Details regarding the mapping of clinical evidence ratings to these recommendation grades have already been provided [see Levels of scientific substantiation and recommendation grades (transparency)]. In this CPG, a substantial number of recommendations are upgraded or downgraded because the conclusions may not apply in other situations (non-generalizability). For example, what applies to an elderly population with established cardiac disease may not apply to a younger population without cardiac risk factors. Whenever expert opinions resulted in upgrading or downgrading a recommendation, it is explicitly stated after the recommendation.