Total laryngectomy (TLE) has for many years been a valid curative treatment of laryngeal and hypopharyngeal carcinoma. However, a TLE can have a substantial impact on patients' lives, due to the crucial role the larynx has in speech and communication. 1 , 2 That is why besides oncological outcome, the preservation of laryngeal functions has progressively gained an important role in treatment decision‐making in patients with laryngeal and hypopharyngeal carcinoma. 3 , 4 Larynx‐preserving alternatives to TLE are radiotherapy and/or chemotherapy, and different types of organ‐preserving surgeries, like transoral laser microsurgery and external partial laryngectomy. 5 , 6
However, not all cases are suited for larynx‐preserving treatment. Mainly patients with a higher T classification are at risk for undertreatment, and thus recurrent disease, when not opting for primary TLE. Patients with recurrent disease have lower survival rates and salvage TLE after radiotherapy comes with more complications than primary TLE. 7 , 8 , 9 Inadequate patient selection for larynx‐preserving treatment may therefore have contributed to the fact that survival rates of patients with laryngeal carcinoma have barely improved in the last 20 years. 3 , 4 , 10 , 11
Identification of imaging variables, which are associated with recurrent disease, may help in treatment decision‐making. Various imaging modalities, such as magnetic resonance imaging (MRI), computed tomography (CT), and fluorinedeoxyglucose positron emission tomography (FDG‐PET), 18 may be used to identify patients who are unlikely to benefit from radiotherapy. Quantitative prognostic factors on pretreatment imaging could help improve patient selection and may increase the overall survival rate of patients with laryngeal and hypopharyngeal carcinoma.
Several possible prognostic imaging variables have been described in the literature. In this paper, we conducted a systematic review to identify pretreatment, quantitative imaging variables that are associated with recurrent disease in patients with laryngeal or hypopharyngeal carcinoma treated with chemoradiotherapy. A meta‐analysis of the imaging variables was performed.
This study is compliant with the Preferred Reporting Items for Systematic Reviews and Meta‐Analyses (PRISMA) criteria. 12
A systematic search was performed in PubMed/Medline and Embase (1990–July 2020) with synonyms of “larynx or hypopharynx carcinoma” AND (“imaging modalities” OR “imaging variables”) AND “recurrences” AND “radiotherapy” AND “prognosis” (see Table S1, Supporting Information for full search strategy). The title and abstract of all articles were first screened by one author (Hilde J. G. Smits) to identify relevant articles. The following inclusion criteria were applied: (1) study population of patients with laryngeal or hypopharyngeal carcinoma; (2) treatment with chemoradiotherapy; (3) tumor imaging before the start of the treatment; (4) recurrences or local control was the studied outcome; and (5) ≥10 recurrences were reported within the study population. Case reports, reviews, conference abstracts, non‐English articles, and animal studies were excluded. Full text analysis determined whether articles met the inclusion criteria. Finally, additional eligible articles were selected by cross‐reference check of the included articles.
The quality of the selected studies was assessed with a scoring system evaluating 10 predefined criteria (Table S2). The scoring system was based on the Newcastle Ottawa Quality Assessment Scale 13 and adapted to fit the topic of this review. 14 , 15 , 16 A study that scored <8 out of a maximum of 13 points for the 10 criteria was excluded from further analysis. The quality assessment was performed independently by two authors (Hilde J. G. Smits and Jan W. Dankbaar). Disagreements between authors on the scores were solved by consensus.
Study characteristics were extracted from all high quality articles. If two or more articles were found to have an overlapping patient population, we included the results of both articles if unique variables were studied. If the same variable was reported, only the result from the most recent article was included in this review.
For each imaging variable, effect estimates (risk ratio [RR], odds ratio, or hazard ratio) with 95% confidence interval (CI) were collected. If no relevant effect estimates were reported, crude data were used to calculate a univariate RR if possible. The consistency and clinical relevance of the effect estimates were assessed for each variable and they were classified as either a prognostic factor, a nonprognostic factor or a factor with inconsistent evidence (Figure 1). 15 , 16 The level of evidence of each prognostic and nonprognostic factor was then determined based on the number of studies that reported a significant or neutral effect estimate (Figure 1).
Variables were sorted by imaging modality and if possible, a formal meta‐analysis was performed. Inverse variance random effect models were used to pool RRs in Review Manager (RevMan, Version 5.3. Copenhagen: The Nordic Cochrane Centre, The Cochrane Collaboration, 2014). The weight for each study was calculated based on the variance, and pooled RRs and 95% confidence intervals were reported.
The literature search yielded 2115 results after the duplicates were removed (Figure 2). Two hundred and eighty‐five articles remained after the title and abstract screening, of which 37 articles met all inclusion criteria upon full text analysis. Of those, five articles reported neither the crude data nor ratios with confidence interval and could not be further analyzed, three articles had overlapping patient populations, and two articles scored <8 on the quality assessment. One additional publication was found through cross‐reference. In the 28 remaining articles, the following imaging modalities were used: CT (17 articles), MRI (5), FDG‐PET (3), dual‐energy CT (1), 11 C‐Tyrosine‐PET (1), and sonography (1). The articles about the latter three modalities were excluded from the review, as were the articles about FDG‐PET, since the variables were too heterogeneous to analyze. The remaining 22 articles about CT and MRI were included in this review, and their study characteristics are shown in Table 1. 17 , 18 , 19 , 20 , 21 , 22 , 23 , 24 , 25 , 26 , 27 , 28 , 29 , 30 , 31 , 32 , 33 , 34 , 35 , 36 , 37 , 38 The included articles cover 28 imaging variables on CT, of which 15 are studied in more than one article, and 18 articles on MR, of which 6 are studied in more than one article.
|Study||Design||Subsites||T classification||Sample size||Eligible sample size||#LF||LC calculation method||Follow‐up (months)||Quality score||Variables studied|
|Agarwal et al. 17||PC||L, H||1–4||60||60||18||2‐year LC||Median 24||11||ENT*, MPP*|
|Bockel, van et al. 18||RC||SPG, G||2–4||150||150||33||2‐year LC||Mean 52||9||TV|
|Chen et al. 19||RC||H||1–4||63||63||30||5‐year LC||Median 38||10||LAI*|
|Dagan et al. 20||RC||G||2||80||80||18||#LF||Median 85||8||PAGS, SGE, SPGE*|
|Dziegielewski et al. 21||PC||G||3||107||60||22||#LF||NA||9||TV|
|Hermans et al. (1) 22||RC||G||1–4||287||119||28||#LF||>36||9||TV, ACI, PCI, SGE, PEGS, PAGS, ArCI, ArCAI, CCI, TCI, CI, VCI, ELS, #CSI|
|Hermans et al. (2) 23||RC||SPG||1–4||147||103||34||#LF||>24||10||TV, ACI, PCI, SGE, PEGS, PAGS, ARI, ARAI, CCI, TCI, CI, VCI, ELS, #CSI|
|Janssens et al. 24||PC||SPG, G||2–4||270||270||64||5‐year LC||Median 44||11||TV|
|Kraas et al. 25||RC||SPG||1–4||28||28||11||#LF||Mean 34||9||TV|
|Mancuso et al. 26||RC||SPG||1–4||63||63||16||#LF||>24||10||TV, PEGS|
|Murakami et al. 27||PC||G||1, 2||68||68||16||2‐year LC||Mean 27||11||TV, ACI, SGE, ArCI, ArCAI, TCI, VEN, TDM*, TD*, TCAI*|
|Nur et al. 28||RC||G||is, 1, 2||114||112||15||5‐year LC||Median 65||9||ACI|
|Pameijer et al. 29||RC||G||3||42||42||12||#LF||>24||8||TV, ACI, PCI, SGE, PAGS, ArCAI, VEN, PAGS‐FV*, #SSI*, CTVC*|
|Rutkowski et al. 30||RC||G||2||115||115||33||#LF||>36||9||TV|
|Scherl et al. 31||RC||L, H||3, 4||463||236||42||5‐year LC||Mean 47.6||8||CI|
|Tsou et al. 32||RC||H||2–4||51||51||29||#LF||Mean 24.6||8||TV, CN*|
|Zouhair et al. 33||RC||G||1, 2||122||122||19||#LF||Median 85||8||ACI|
|Castelijns et al. (1) 34||RC||L||1–4||80||80||30||#LF||>24||12||TV, VCI*|
|Castelijns et al. (2) 35||RC||L||1–4||80||80||30||#LF||>24||11||CAI*, CI*, #CSI*|
|Ljumanovic et al. (1) 36||RC||SPG||1–4||84||84||28||5‐year LC||>24||10||TV, ACI, HE, SGE, PEGS, ELA, ELTC*, TCI*, CCI*, GE*|
|Ljumanovic et al. (2) 37||RC||G||1–4||118||118||39||2‐year LC||>24||12||TV, ACI, HE, SGE, PEGS, ELA, TC/CAI*|
|Ljumanovic et al. (3) 38||RC||SPG, G||1–4||64||64||28||2‐year LC||>24||11||TV, SGE, PAGS*, DCE*|
Most articles analyzed tumor volume (TV) as a dichotomized variable, using various cut‐off points ranging from 0.28 to 19 cm3. For the purpose of the meta‐analysis, the results of articles that used similar cut‐off points (±0.5 cm3) were grouped, for example, the variable TV >2 cm3 on CT contains the range of cut‐off points 1.6–2.5 cm3 ). The level of evidence of all variables studied in multiple articles is shown in Tables 2, 3, 4. Results of variables or TV cut‐off points that were only studied in one article can be found in Tables S3 and S4. No articles published univariate RRs, so all univariate RRs were calculated by us based on crude data. This data was either the number of recurrences within the study period, or the local control rates. Only a few studies provided other estimates or multivariable analyses. These results were included in Tables 2, 3, 4, but the pooled RRs are only based on the univariate RRs.
Seven imaging variables were found to be prognostic factors with a strong level of evidence, all of them CT‐variables (Table 2). TV >1 cm3 (pooled RR = 3.03 [95% CI = 1.98–4.66]), TV >2 cm3 (pooled RR = 2.48 [1.71–3.61]), TV >8 cm3 (pooled RR = 2.09 [1.28–3.41]), anterior commissure involvement (pooled RR = 2.19 [1.45–3.32]), posterior commissure involvement (pooled RR = 2.44 [1.56–3.81]), subglottic extension (pooled RR = 2.25 [1.59–3.19]), and arytenoid cartilage adjacent or invaded (pooled RR = 2.10 [1.39–3.19]) were all found to increase the risk of recurrence significantly in three or more studies. Arytenoid cartilage adjacent or invaded means the tumor has grown adjacent to the arytenoid or has invaded the arytenoid cartilage.
A moderate level of evidence was found for TV >6 cm3 (pooled RR = 2.46 [1.39–4.35]), TV >16 cm3 (pooled RR = 2.94 [1.98–4.37]), pre‐epiglottic space involvement (pooled RR = 2.12 [1.40–3.22]), paraglottic space involvement (pooled RR = 1.72 [1.18–2.50]), and cricoid cartilage involvement (pooled RR = 2.17 [1.27–3.71]) as prognostic factors on CT (Table 2). On MRI, TV >3 cm3 (pooled RR = 2.44 [1.54–3.88]), and pre‐epiglottic space involvement (pooled RR = 2.62 [1.33–5.14]) were found to have a moderate level of evidence for being a prognostic factor of recurrent disease (Table 3).
An additional three variables on CT had a limited level of evidence: laryngeal ventricle involvement, any cartilage involvement and vocal cord involvement (Table 2). For MRI, four variables were found to have a limited level of evidence: anterior commissure involvement, subglottic extension, hypopharyngeal extension and extralaryngeal spread beyond anterior commissure (Table 3).
Only two variables were found to be nonprognostic factors with limited level of evidence: extralaryngeal spread (pooled RR = 1.35 [0.85–2.15]) and the involvement of more than one cartilage site (pooled RR = 1.48 [0.78–2.82]) (Table S5). Both are CT variables.
On CT, TV >4 cm3 , arytenoid cartilage invasion and thyroid cartilage involvement were found to have inconsistent evidence (Table 4). TV >4 cm3 was studied in five articles, four of which reported this cut‐off point to be significantly prognostic of tumor recurrence, but one study reported a neutral RR of 0.87 [0.47–1.61]. Arytenoid cartilage invasion was found to be significant in two out of three studies and thyroid cartilage involvement in only one out of three studies. The other studies for these variables all showed neutral effect estimates.
In an effort to determine the optimal TV cut‐off point, we stratified the studies according to the tumor subsite of their populations. This could only be done for supraglottic and glottic tumors, with five studies analyzing TV in supraglottic tumors and seven studies in glottic tumors. The sensitivity and specificity of all TV cut‐off points found in these studies were plotted for both glottic and supraglottic and glottic tumors (Figure 3). This graph contains results of both CT and MRI studies.
In general, lower cut‐off points result in a higher sensitivity and lower specificity, while the reverse is true for higher cut‐off points. Glottic tumors seem to have a lower optimal cut‐off point (between 1 and 4 cm3) than supraglottic tumors (between 4 and 10 cm3). However, there is a high variability in sensitivity and specificity between studies, so no optimal cut‐off point could be determined.
We also compared the sensitivity and specificity of anatomical subsite parameters for all studies on both CT and MRI (Figure 4). Most parameters show a great variability across studies, with a few exceptions. All studies that analyzed cricoid cartilage involvement, extralaryngeal spread beyond anterior commissure, and posterior commissure involvement found a high specificity (90%–100%) and a low sensitivity (0%–30%). Findings for any cartilage involvement were also fairly consistent, with a specificity between 60% and 70% and a sensitivity between 30% and 50%.
In this systematic review and meta‐analysis, the prognostic value of quantitative imaging variables was assessed for recurrent laryngeal or hypopharyngeal carcinoma after radiotherapy.
TV is one of the most studied prognostic imaging factors and this review supports that patients with higher TV have an increased risk of recurrence. All cut‐off points were a prognostic factor, except for TV >4 cm3, which had inconsistent evidence. However, only one study that used 4 cm3 as a cut‐off point reported a neutral effect size, while the other four found an increased risk of recurrence for TV >4 cm3. Therefore, it seems likely that the first study is an anomaly.
We also attempted to find an optimal TV cut‐off point that can be used to predict recurrences. This cut‐off point seems to be lower for glottic tumors compared to supraglottic tumors. However, there is a lot of variability in sensitivity and specificity of studies that used the same cut‐off points. This can at least partially be explained by heterogeneity in study populations. Sensitivity and specificity are influenced by the prevalence of a prognostic factor within a study population. If a population is more likely to contain larger tumors (above the cut‐off point), the sensitivity will be higher and the specificity will be lower. Therefore heterogeneity in for example T classification distribution can cause a lot of variability. In order to gain more insight into the differences between patient groups, studies should use more homogenous patient cohorts or stratify their results by tumor location and T classification.
Besides heterogeneity in study populations, a lack of uniformity in TV delineation might also complicate the results. Differences in image quality, measurement methods, slice thickness, and delineation guidelines can lead to different conclusions on TV. Moreover, interobserver variability of TV delineation in the head and neck region on both CT and MRI is still a much debated topic. With some studies finding no significant variability between observers, 39 , 40 while others do. 41 , 42 , 43
Moreover, Ligtenberg et al. 44 found that delineations on CT, MRI and FDG‐PET were all larger than the reference TV delineation on pathology. Between modalities, significant differences were found between CT and MRI and between CT and FDG‐PET. This indicates that CT measurements cannot be compared to MRI measurements when it comes to TV.
In the current TNM classification of laryngeal carcinoma, TV is not taken into account. 45 For hypopharyngeal carcinoma, tumor dimension is considered in the form of the one‐dimensional greatest tumor diameter. 45 For head and neck sites in general, previous reviews have plead for the addition of TV to the TNM classification system, 46 , 47 arguing that TV assessment should at the very least supplement TNM classification in clinical decision‐making. Although our analysis does indicate a significant impact of TV on recurrences after radiotherapy, only few studies included a multivariable analysis with both the T classification and TV. Therefore, it is difficult to determine the added prognostic value of TV.
The present study found strong evidence that involvement of the anterior commissure (ACI) is a prognostic factor of local recurrence after radiotherapy. This is in line with two recent reviews studying ACI in early glottic carcinoma by Tulli et al. 48 and Eskiizmir et al. 49 A third review by Hendriksma et al. 50 found conflicting evidence, arguing that ACI should not be judged as a binary variable, but rather on a classification scale. 51 In the present review, all included studies treated ACI as a binary variable.
We found strong evidence of subglottic extension as a prognostic factor. It must be noted however, that the definition of subglottic extension is ambiguous. The boundaries of the subglottis itself are controversial and most publications use different definitions. 52 No study included in this review provided any definition for subglottic extension.
A third anatomical prognostic factor for which we found strong evidence was arytenoid cartilage adjacent or invaded (ArCAI), while just arytenoid cartilage invasion (ArCI) gave inconsistent evidence. The difference between the two parameters is that both ArCAI and ArCI include tumors that have invaded the cartilage, but ArCAI also includes the tumors that have grown adjacent to it. These results suggest that the risk of recurrence is already higher when the tumor has reached the arytenoid cartilage and that actual invasion does not necessarily increase that risk further. It is possible that patients with invaded cartilage received a more rigorous treatment regime, but this was not mentioned, but this was not mentioned in the articles that made the distinction. More research into the prognostic value of cartilage adjacent tumors is needed in order to draw any conclusions.
The current TNM classification system 45 for all laryngeal subsites as well as for the hypopharynx takes into account infiltrated subsites, like pre‐epiglottic and paraglottic space invasion, and thyroid and cricoid cartilage involvement. The inclusion of these variables is supported by this review, except for thyroid cartilage involvement, for which we found inconsistent evidence. However, the two studies by Hermans et al. 22 , 23 that reported a neutral effect estimate both had a very low prevalence of thyroid cartilage involvement within their study population (Table 4). It might therefore be an anomaly that these patients did relatively well.
As for the sensitivity and specificity of the anatomical variables as prognostic factors, most parameters showed great variability across studies. Three parameters had a high specificity and low sensitivity in all studies: posterior commissure involvement, cricoid cartilage involvement, and extralaryngeal spread beyond anterior commissure. In all study populations in which these variables were analyzed, the prevalence of these factors was relatively low, explaining the low sensitivity. However, because the specificity is so high, they might serve as a prognostic factor in those patients who do have these factors. Since posterior commissure was only analyzed in three studies and the other two factors in two studies, more research is necessary in order to draw definite conclusions.
Only 5 out of the 22 included articles studied MRI variables, all from the same research group. Due to overlap in patient pools, only the results of one or two articles could be compared for each variable. MR imaging in laryngeal carcinoma has its challenges due to relatively long acquisition times and significant organ movement. However, MRI techniques have improved a great deal over the last years and in the absence of motion artifacts, MRI is superior to CT in determining tumor extension. The imaging modality should not be overlooked when it comes to its prognostic value, especially more advanced MRI methods, like diffusion‐weighted imaging (DWI) and dynamic contrast‐enhanced (DCE) MRI. 53 , 54 On DWI‐MRI, the apparent diffusion coefficient (ADC) has often been discussed as a prognostic factor in different head and neck sites, but mean or median ADC does often not reach significance. 54 , 55 , 56 More promising are ADC parameters that take into account tumor heterogeneity like ADC kurtosis. 54 , 57 , 58 On DCE‐MRI, higher K trans , a measure of vascular permeability, is associated with better treatment response and long‐term outcome in various head and neck sites. 53 , 54 , 59 , 60
The reason so few studies with FDG‐PET variables were included was because most studies we found used other outcome measures than local control. FDG‐PET is also not routinely used for pretreatment staging and is more commonly used in diagnostic studies than prognostic ones. This does not mean that FDG‐PET is not a promising modality for the prediction of radiotherapy outcome. High pretreatment metabolic tumor volume and total lesion glycolysis are both associated with worse patient outcomes in head and neck cancer. 61 , 62
Furthermore, this review only looked at imaging variables, but clinical variables also play a role in patient prognosis. Factors like age, sex, tobacco use and WHO performance status have all been shown to affect patient outcome after radiotherapy in either laryngeal or hypopharyngeal carcinoma, or head and neck cancer in general. 49 , 63 , 64 These clinical factors should also be taken into account when making treatment‐decisions.
For all variables included in the meta‐analysis, the pooled RRs are based on few studies. Therefore, the uncertainty in the pooled RRs is large and they serve more as an indication than as conclusive RRs.
Another limitation is that only very few studies included a multivariable analysis; therefore, there is little control for possible confounders. For example, while TV was found to be a prognostic factor for recurrence, there is a correlation between TV and TNM classification. This review cannot control for this, due to the lack of multivariable analyses.
Publication bias might also play a role, as negative results are less likely to be published. Because of the limited number of studies per variable we could not test publication bias with a funnel plot. 65 There is also no formal way to assess selective reporting bias. If an article does not provide the crude data or effect estimates of non‐significant results, their findings could not be used in this review.
There is heterogeneity between the tumor and treatment characteristics in the study populations, as well as in follow‐up time and the outcome measures that were used (Table 1). The three outcome measures were the number of recurrences discovered during the follow‐up time, the 2‐year local control rate, and the 5‐year local control rate. No differentiation was made based on the outcome, since most laryngeal and hypopharyngeal recurrences present within 2 years after treatment. 8 , 66 As for tumor characteristics, the prognostic value of different variables is likely to vary between different locations or tumor classifications, which might have affected the results. However, this review aimed to find prognostic imaging factors for all laryngeal and hypopharyngeal tumors and an inverse variance random effects model was applied to adjust for variation between studies. More research is needed to allow for further stratification.
We also noticed that only a few articles reported the time between the pretreatment imaging and the start of treatment (6 of the 23 studies articles). And when they did, this time could be quite long (one study reported an average of 43 days between imaging and start of treatment 38 ). In this time period tumors will likely grow, causing discrepancies between the tumor on pretreatment imaging and the tumor at the start of treatment. Future studies assessing the prognostic value of imaging factors should report on the interval between imaging and treatment and aim to keep this time as short as possible.
Pretreatment tumor volume determined on CT has clear prognostic value in laryngeal and hypopharyngeal carcinoma treated with radiotherapy. Due to heterogeneity in study populations and methodology, no conclusions can be drawn on optimal cut‐off points.
On CT, there is strong evidence for the prognostic value of anterior and posterior commissure involvement, subglottic extension, and arytenoid cartilage extension. Moderate evidence was found for pre‐epiglottic space involvement, paraglottic space involvement and cricoid cartilage involvement as prognostic factors on CT. On MRI, a moderate level of evidence was only found for tumor volume and pre‐epiglottic space involvement. This is partly due to the low number of articles studying prognostic imaging variables on MRI. More research is needed in order to accurately assess the prognostic power of MR imaging variables.
Data sharing not applicable ‐ no new data generated.