Segmentation of unstructured text on the book cover images using the convolutional network based on the U-Net architecture

Pavel Nikolaev

Abstract


This paper discusses the convolutional neural network for image segmentation with book covers. The structure of the network is given, indicating all its constituent blocks and layers, as well as their parameters, and the operating principle of each part is described in detail. The U-Net model is used as the basis of the network. The architecture of this model stands out among others with its encoder-decoder structure, which allows generating new images. In this case, the encoder part of the network is responsible for image recognition, and the decoder part is responsible for generating a new image. The proposed neural network is capable of creating binary (black and white) masks, on which the text is highlighted in one color, and all other elements in another. Thus, the text is separated from other elements in the image. To train and test the convolutional neural network, the self-assembled and labeled dataset of 200 examples is used. Despite the small amount of data, the U-Net-based network trains well and shows acceptable performance results, which is confirmed by the test results. The trained network can be used in practice. In particular, it is supposed to be used to improve the accuracy of text recognition on book covers.


Full Text:

PDF (Russian)

References


L. Shapiro, G. Stockman, Kompyuternoe zrenie in Moscow, Russia: BINOM. Laboratoriya znanij (In Russian), 2013.

A. Guatam, "Segmentation of Text from Image Document", International Journal of Computer Science and Information Technologies, vol. 4, no. 3, pp. 538-540, 2013.

M. Grzegorzek, C. Li, J. Raskatow, D. Paulus, N. Vassilieva, “Texture-Based text detection in digital images with wavelet features and Support Vector Machines,” in Proceedings of the 8th International Conference on Computer Recognition Systems CORES 2013, pp. 857-866, 2013. DOI: 10.1007/978-3-319-00969-8_84.

V.P. Le, N. Nayef, M. Visani, J. M. Ogier, C.D. Tran, “Textand non-text segmentation based on connected component features,” in Proceedings of the 2015 13th International Conference on Document Analysis and Recognition(ICDAR), pp. 1096–1100, 2015. DOI:10.1109/ICDAR.2015.7333930.

R.R. Nair, B. U. Kota, I. Nwogu, and V. Govindaraju, “Segmentation of highly unstructured handwritten documents using a neural network technique,” in Proceedings of the 2016 23rd International Conference on Pattern Recognition (ICPR), pp. 1291–1296, 2016. DOI:10.1109/ICPR.2016.7899815.

P.V. Bezmaternykh, D.A. Ilin, D.P. Nikolaev, “U-Net-bin: hacking the document image binarization contest”, Computer Optics, vol. 43(5), pp. 825–832, 2019. DOI: 10.18287/2412-6179-2019-43-5-825-832.

O. Ronneberger, P. Fischer,T. Brox, “U-net: Convolutional networks for biomedicalimage segmentation,” in Proceedings. of MICCAI 2015: Medical Image Computing and Computer-Assisted Intervention – MICCAI 2015, pp. 234–241, 2015. DOI: 10.1007/978-3-319-24574-4_28.

P.L. Nikolaev, “Analysis of human activity by deep learning,” in Sistemnyj administrator, vol. 12 (193), pp. 80-83, 2018. (In Russian)

Google Colab, Available at: https://colab.research.google.com


Refbacks

  • There are currently no refbacks.


Abava  Кибербезопасность MoNeTec 2024

ISSN: 2307-8162