Swin-MFINet: Swin transformer based multi-feature integration network for detection of pixel-level surface defects

Uzen, Huseyin; Turkoglu, Muammer; Yanikoglu, Berrin; HANBAY, DAVUT

doi:10.1016/j.eswa.2022.118269

Swin-MFINet: Swin transformer based multi-feature integration network for detection of pixel-level surface defects

Uzen H., Turkoglu M., Yanikoglu B., HANBAY D.

EXPERT SYSTEMS WITH APPLICATIONS, cilt.209, 2022 (SCI-Expanded)

Yayın Türü: Makale / Tam Makale
Cilt numarası: 209
Basım Tarihi: 2022
Doi Numarası: 10.1016/j.eswa.2022.118269
Dergi Adı: EXPERT SYSTEMS WITH APPLICATIONS
Derginin Tarandığı İndeksler: Science Citation Index Expanded (SCI-EXPANDED), Scopus, Academic Search Premier, PASCAL, Aerospace Database, Applied Science & Technology Source, Communication Abstracts, Computer & Applied Sciences, INSPEC, Metadex, Public Affairs Index, Civil Engineering Abstracts
Anahtar Kelimeler: Pixel-Level Surface Defects Detection, Swin Transformers, Encoder-Decoder Network, Convolutional Neural Network, CONVOLUTIONAL NEURAL-NETWORK, CLASSIFICATION
İnönü Üniversitesi Adresli: Evet

Özet

Automatic surface defect detection is critical for manufacturing industries, such as steel, fabric, and marble industries. This study proposes a Swin transformer-based model called Multi-Feature Integration Network (Swin-MFINet) for pixel-level surface defect detection. The proposed model consists of an encoder, a Swin transformer-based decoder, and Multi-Feature Integration (MFI) modules. In the encoder module of the proposed model, a pre-trained Inception network is used to extract key features from small-size datasets. In the decoder section, global semantic features are obtained from the initial features by using the Swin-transformer block, which is the newest transformer technology of today. In addition, the convolution layer is used in the last step of the decoder, since transformers are limited in acquiring small spatial details such as edges, colors, and textures, which are important in detecting some small defects. In the last module called MFI, feature maps from different decoder stages are combined, and the channel squeeze-spatial excitation block is applied to reveal important features. Finally, a prediction map is obtained by applying a convolution layer and sigmoid activation function to the MFI module output, respectively. The performance of proposed model is analyzed over MT and MVTec datasets containing surface defect images. The proposed model obtained mIoU scores of 81.37%, and 77.07% respectively, for these two datasets These results outperform the state-of-the-art for the surface defect detection problem.