Deep Learning Based Evaluation of Skeletal Maturation: A Comparative Analysis of Five Hand-Wrist Methods


TENTAŞ S., ÖZDEN S.

Orthodontics and Craniofacial Research, cilt.28, sa.6, ss.943-954, 2025 (SCI-Expanded, Scopus) identifier identifier

  • Yayın Türü: Makale / Tam Makale
  • Cilt numarası: 28 Sayı: 6
  • Basım Tarihi: 2025
  • Doi Numarası: 10.1111/ocr.70008
  • Dergi Adı: Orthodontics and Craniofacial Research
  • Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, CAB Abstracts, CINAHL, MEDLINE
  • Sayfa Sayıları: ss.943-954
  • Anahtar Kelimeler: artificial intelligence, bone age, deep learning, skeletal maturation
  • İnönü Üniversitesi Adresli: Evet

Özet

Objective: The study aims to evaluate the effectiveness of deep learning algorithms in skeletal age estimation by comparing the diagnostic reliability of five different hand-wrist maturation (HWM) assessment methods. Materials and Methods: A total of 6572 hand-wrist radiographs from orthodontic patients aged 8–16 years were retrospectively analysed. Radiographs were categorised into five groups based on HWM classification methods: (I) Björk's nine-stage, (II) Hägg and Taranger's five-stage, (III) Chapman's four-stage, (IV) three-stage hook of hamate ossification based and (V) simplified three-stage Björk's classification based. YOLOv8x-based deep learning models were trained separately for each group. The dataset was split into training, validation and test subsets. Performance was evaluated using accuracy, precision, recall, F1 score and AUC metrics. Results: The YOLOv8x-cls model demonstrated high classification performance across all five groups. Group IV and Group II achieved the highest accuracy and F1 scores, with average F1 values of 0.99 and 0.96, respectively. Group III and Group V also showed strong performance (F1 = 0.93 and 0.92). In Group I, slightly lower classification performance was observed in the S-H2 and MP3-Cap stages (F1 = 0.72–0.74), which correspond to the pubertal growth peak, while early and late skeletal maturation stages were classified with high accuracy and F1 scores. ROC curve analysis further supported these findings, with AUC values for MP3-Cap and S-H2 recorded as 0.70 and 0.75, respectively, whereas higher AUC values were achieved in most other stages across all groups. Conclusion: Deep learning models proved effective in evaluating skeletal maturation across five different HWM methods. Particularly high performance was observed in anatomically distinct regions such as the MP3, adductor sesamoid and hamate bone, which can be reliably identified by general dentists, enabling earlier referrals and timely orthodontic interventions.