TUTCRIS - Tampereen teknillinen yliopisto

TUTCRIS

Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer

Tutkimustuotosvertaisarvioitu

Standard

Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. / Bejnordi, Babak Ehteshami; Veta, Mitko; Van Diest, Paul Johannes; Van Ginneken, Bram; Karssemeijer, Nico; Litjens, Geert; Van Der Laak, Jeroen A.W.M.; Hermsen, Meyke; Manson, Quirine F.; Balkenhol, Maschenka; Geessink, Oscar; Stathonikos, Nikolaos; Van Dijk, Marcory C.R.F.; Bult, Peter; Beca, Francisco; Beck, Andrew H.; Wang, Dayong; Khosla, Aditya; Gargeya, Rishab; Irshad, Humayun; Zhong, Aoxiao; Dou, Qi; Li, Quanzheng; Chen, Hao; Lin, Huang Jing; Heng, Pheng Ann; Haß, Christian; Bruni, Elia; Wong, Quincy; Halici, Ugur; Öner, Mustafa Ümit; Cetin-Atalay, Rengul; Berseth, Matt; Khvatkov, Vitali; Vylegzhanin, Alexei; Kraus, Oren; Shaban, Muhammad; Rajpoot, Nasir; Awan, Ruqayya; Sirinukunwattana, Korsuk; Qaiser, Talha; Tsang, Yee Wah; Tellez, David; Annuscheit, Jonas; Hufnagl, Peter; Valkonen, Mira; Kartasalo, Kimmo; Latonen, Leena; Ruusuvuori, Pekka; Liimatainen, Kaisa; CAMELYON16 Consortium.

julkaisussa: JAMA - Journal of the American Medical Association, Vuosikerta 318, Nro 22, 12.12.2017, s. 2199-2210.

Tutkimustuotosvertaisarvioitu

Harvard

Bejnordi, BE, Veta, M, Van Diest, PJ, Van Ginneken, B, Karssemeijer, N, Litjens, G, Van Der Laak, JAWM, Hermsen, M, Manson, QF, Balkenhol, M, Geessink, O, Stathonikos, N, Van Dijk, MCRF, Bult, P, Beca, F, Beck, AH, Wang, D, Khosla, A, Gargeya, R, Irshad, H, Zhong, A, Dou, Q, Li, Q, Chen, H, Lin, HJ, Heng, PA, Haß, C, Bruni, E, Wong, Q, Halici, U, Öner, MÜ, Cetin-Atalay, R, Berseth, M, Khvatkov, V, Vylegzhanin, A, Kraus, O, Shaban, M, Rajpoot, N, Awan, R, Sirinukunwattana, K, Qaiser, T, Tsang, YW, Tellez, D, Annuscheit, J, Hufnagl, P, Valkonen, M, Kartasalo, K, Latonen, L, Ruusuvuori, P, Liimatainen, K & CAMELYON16 Consortium 2017, 'Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer', JAMA - Journal of the American Medical Association, Vuosikerta. 318, Nro 22, Sivut 2199-2210. https://doi.org/10.1001/jama.2017.14585

APA

Bejnordi, B. E., Veta, M., Van Diest, P. J., Van Ginneken, B., Karssemeijer, N., Litjens, G., ... CAMELYON16 Consortium (2017). Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA - Journal of the American Medical Association, 318(22), 2199-2210. https://doi.org/10.1001/jama.2017.14585

Vancouver

Bejnordi BE, Veta M, Van Diest PJ, Van Ginneken B, Karssemeijer N, Litjens G et al. Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. JAMA - Journal of the American Medical Association. 2017 joulu 12;318(22):2199-2210. https://doi.org/10.1001/jama.2017.14585

Author

Bejnordi, Babak Ehteshami ; Veta, Mitko ; Van Diest, Paul Johannes ; Van Ginneken, Bram ; Karssemeijer, Nico ; Litjens, Geert ; Van Der Laak, Jeroen A.W.M. ; Hermsen, Meyke ; Manson, Quirine F. ; Balkenhol, Maschenka ; Geessink, Oscar ; Stathonikos, Nikolaos ; Van Dijk, Marcory C.R.F. ; Bult, Peter ; Beca, Francisco ; Beck, Andrew H. ; Wang, Dayong ; Khosla, Aditya ; Gargeya, Rishab ; Irshad, Humayun ; Zhong, Aoxiao ; Dou, Qi ; Li, Quanzheng ; Chen, Hao ; Lin, Huang Jing ; Heng, Pheng Ann ; Haß, Christian ; Bruni, Elia ; Wong, Quincy ; Halici, Ugur ; Öner, Mustafa Ümit ; Cetin-Atalay, Rengul ; Berseth, Matt ; Khvatkov, Vitali ; Vylegzhanin, Alexei ; Kraus, Oren ; Shaban, Muhammad ; Rajpoot, Nasir ; Awan, Ruqayya ; Sirinukunwattana, Korsuk ; Qaiser, Talha ; Tsang, Yee Wah ; Tellez, David ; Annuscheit, Jonas ; Hufnagl, Peter ; Valkonen, Mira ; Kartasalo, Kimmo ; Latonen, Leena ; Ruusuvuori, Pekka ; Liimatainen, Kaisa ; CAMELYON16 Consortium. / Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer. Julkaisussa: JAMA - Journal of the American Medical Association. 2017 ; Vuosikerta 318, Nro 22. Sivut 2199-2210.

Bibtex - Lataa

@article{8d8a98908d624c3db7afe4a0149ab227,
title = "Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer",
abstract = "IMPORTANCE: Application of deep learning algorithms to whole-slide pathology imagescan potentially improve diagnostic accuracy and efficiency. OBJECTIVE: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting. DESIGN, SETTING, AND PARTICIPANTS: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC). EXPOSURES: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation. MAIN OUTCOMES AND MEASURES: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor. RESULTS: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4{\%} [95{\%} CI, 64.3{\%}-80.4{\%}]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95{\%} CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P <.001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95{\%} CI, 0.927-0.998] for the pathologist WOTC). CONCLUSIONS AND RELEVANCE: In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.",
author = "Bejnordi, {Babak Ehteshami} and Mitko Veta and {Van Diest}, {Paul Johannes} and {Van Ginneken}, Bram and Nico Karssemeijer and Geert Litjens and {Van Der Laak}, {Jeroen A.W.M.} and Meyke Hermsen and Manson, {Quirine F.} and Maschenka Balkenhol and Oscar Geessink and Nikolaos Stathonikos and {Van Dijk}, {Marcory C.R.F.} and Peter Bult and Francisco Beca and Beck, {Andrew H.} and Dayong Wang and Aditya Khosla and Rishab Gargeya and Humayun Irshad and Aoxiao Zhong and Qi Dou and Quanzheng Li and Hao Chen and Lin, {Huang Jing} and Heng, {Pheng Ann} and Christian Ha{\ss} and Elia Bruni and Quincy Wong and Ugur Halici and {\"O}ner, {Mustafa {\"U}mit} and Rengul Cetin-Atalay and Matt Berseth and Vitali Khvatkov and Alexei Vylegzhanin and Oren Kraus and Muhammad Shaban and Nasir Rajpoot and Ruqayya Awan and Korsuk Sirinukunwattana and Talha Qaiser and Tsang, {Yee Wah} and David Tellez and Jonas Annuscheit and Peter Hufnagl and Mira Valkonen and Kimmo Kartasalo and Leena Latonen and Pekka Ruusuvuori and Kaisa Liimatainen and {CAMELYON16 Consortium}",
note = "INT=tut-bmt,{"}Valkonen, Mira{"} EXT={"}Liimatainen, Kaisa{"}",
year = "2017",
month = "12",
day = "12",
doi = "10.1001/jama.2017.14585",
language = "English",
volume = "318",
pages = "2199--2210",
journal = "JAMA : JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION",
issn = "0098-7484",
publisher = "American Medical Association",
number = "22",

}

RIS (suitable for import to EndNote) - Lataa

TY - JOUR

T1 - Diagnostic assessment of deep learning algorithms for detection of lymph node metastases in women with breast cancer

AU - Bejnordi, Babak Ehteshami

AU - Veta, Mitko

AU - Van Diest, Paul Johannes

AU - Van Ginneken, Bram

AU - Karssemeijer, Nico

AU - Litjens, Geert

AU - Van Der Laak, Jeroen A.W.M.

AU - Hermsen, Meyke

AU - Manson, Quirine F.

AU - Balkenhol, Maschenka

AU - Geessink, Oscar

AU - Stathonikos, Nikolaos

AU - Van Dijk, Marcory C.R.F.

AU - Bult, Peter

AU - Beca, Francisco

AU - Beck, Andrew H.

AU - Wang, Dayong

AU - Khosla, Aditya

AU - Gargeya, Rishab

AU - Irshad, Humayun

AU - Zhong, Aoxiao

AU - Dou, Qi

AU - Li, Quanzheng

AU - Chen, Hao

AU - Lin, Huang Jing

AU - Heng, Pheng Ann

AU - Haß, Christian

AU - Bruni, Elia

AU - Wong, Quincy

AU - Halici, Ugur

AU - Öner, Mustafa Ümit

AU - Cetin-Atalay, Rengul

AU - Berseth, Matt

AU - Khvatkov, Vitali

AU - Vylegzhanin, Alexei

AU - Kraus, Oren

AU - Shaban, Muhammad

AU - Rajpoot, Nasir

AU - Awan, Ruqayya

AU - Sirinukunwattana, Korsuk

AU - Qaiser, Talha

AU - Tsang, Yee Wah

AU - Tellez, David

AU - Annuscheit, Jonas

AU - Hufnagl, Peter

AU - Valkonen, Mira

AU - Kartasalo, Kimmo

AU - Latonen, Leena

AU - Ruusuvuori, Pekka

AU - Liimatainen, Kaisa

AU - CAMELYON16 Consortium

N1 - INT=tut-bmt,"Valkonen, Mira" EXT="Liimatainen, Kaisa"

PY - 2017/12/12

Y1 - 2017/12/12

N2 - IMPORTANCE: Application of deep learning algorithms to whole-slide pathology imagescan potentially improve diagnostic accuracy and efficiency. OBJECTIVE: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting. DESIGN, SETTING, AND PARTICIPANTS: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC). EXPOSURES: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation. MAIN OUTCOMES AND MEASURES: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor. RESULTS: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P <.001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC). CONCLUSIONS AND RELEVANCE: In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.

AB - IMPORTANCE: Application of deep learning algorithms to whole-slide pathology imagescan potentially improve diagnostic accuracy and efficiency. OBJECTIVE: Assess the performance of automated deep learning algorithms at detecting metastases in hematoxylin and eosin-stained tissue sections of lymph nodes of women with breast cancer and compare it with pathologists' diagnoses in a diagnostic setting. DESIGN, SETTING, AND PARTICIPANTS: Researcher challenge competition (CAMELYON16) to develop automated solutions for detecting lymph node metastases (November 2015-November 2016). A training data set of whole-slide images from 2 centers in the Netherlands with (n = 110) and without (n = 160) nodal metastases verified by immunohistochemical staining were provided to challenge participants to build algorithms. Algorithm performance was evaluated in an independent test set of 129 whole-slide images (49 with and 80 without metastases). The same test set of corresponding glass slides was also evaluated by a panel of 11 pathologists with time constraint (WTC) from the Netherlands to ascertain likelihood of nodal metastases for each slide in a flexible 2-hour session, simulating routine pathology workflow, and by 1 pathologist without time constraint (WOTC). EXPOSURES: Deep learning algorithms submitted as part of a challenge competition or pathologist interpretation. MAIN OUTCOMES AND MEASURES: The presence of specific metastatic foci and the absence vs presence of lymph node metastasis in a slide or image using receiver operating characteristic curve analysis. The 11 pathologists participating in the simulation exercise rated their diagnostic confidence as definitely normal, probably normal, equivocal, probably tumor, or definitely tumor. RESULTS: The area under the receiver operating characteristic curve (AUC) for the algorithms ranged from 0.556 to 0.994. The top-performing algorithm achieved a lesion-level, true-positive fraction comparable with that of the pathologist WOTC (72.4% [95% CI, 64.3%-80.4%]) at a mean of 0.0125 false-positives per normal whole-slide image. For the whole-slide image classification task, the best algorithm (AUC, 0.994 [95% CI, 0.983-0.999]) performed significantly better than the pathologists WTC in a diagnostic simulation (mean AUC, 0.810 [range, 0.738-0.884]; P <.001). The top 5 algorithms had a mean AUC that was comparable with the pathologist interpreting the slides in the absence of time constraints (mean AUC, 0.960 [range, 0.923-0.994] for the top 5 algorithms vs 0.966 [95% CI, 0.927-0.998] for the pathologist WOTC). CONCLUSIONS AND RELEVANCE: In the setting of a challenge competition, some deep learning algorithms achieved better diagnostic performance than a panel of 11 pathologists participating in a simulation exercise designed to mimic routine pathology workflow; algorithm performance was comparable with an expert pathologist interpreting whole-slide images without time constraints. Whether this approach has clinical utility will require evaluation in a clinical setting.

U2 - 10.1001/jama.2017.14585

DO - 10.1001/jama.2017.14585

M3 - Article

VL - 318

SP - 2199

EP - 2210

JO - JAMA : JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION

JF - JAMA : JOURNAL OF THE AMERICAN MEDICAL ASSOCIATION

SN - 0098-7484

IS - 22

ER -