Audio-Visual CNN using Transfer Learning for TV Commercial Break Detection
Muhammad Zha'farudin Pudya Wardana(1*), Moh. Edi Wibowo(2)
(1) Master Program in Computer Science, FMIPA UGM, Yogyakarta
(2) Department of Computer Science and Electronics, FMIPA UGM, Yogyakarta
(*) Corresponding Author
Abstract
The TV commercial detection problem is a hard challenge due to the variety of programs and TV channels. The usage of deep learning methods to solve this problem has shown good results. However, it takes a long time with many training epochs to get high accuracy.
This research uses transfer learning techniques to reduce training time and limits the number of training epochs to 20. From video data, the audio feature is extracted with Mel-spectrogram representation, and the visual features are picked from a video frame. The datasets were gathered by recording programs from various TV channels in Indonesia. Pre-trained CNN models such as MobileNetV2, InceptionV3, and DenseNet169 are re-trained and are used to detect commercials at the shot level. We do post-processing to cluster the shots into segments of commercials and non-commercials.
The best result is shown by Audio-Visual CNN using transfer learning with an accuracy of 93.26% with only 20 training epochs. It is faster and better than the CNN model without using transfer learning with an accuracy of 88.17% and 77 training epochs. The result by adding post-processing increases the accuracy of Audio-Visual CNN using transfer learning to 96.42%.
Keywords
Full Text:
PDFReferences
[1] S. Li Yujuns and Luo, “A TV Commercial Detection System,” in Web Information Systems and Mining, 2011, pp. 35–43.
[2] X. Wu and S. Satoh, “Ultrahigh-Speed TV Commercial Detection, Extraction, and Matching,” IEEE Transactions on Circuits and Systems for Video Technology, vol. 23, no. 6, pp. 1054–1069, 2013, doi: 10.1109/TCSVT.2013.2248991.
[3] Z. Feng and C. Lab, “Real Time Commercial Detection in Videos,” 2013.
[4] A. Vyas, R. Kannao, V. Bhargava, and P. Guha, “Commercial Block Detection in Broadcast News Videos,” 2014. doi: 10.1145/2683483.2683546.
[5] A. Gomes, M. P. Queluz, and F. Pereira, “Automatic detection of TV commercial blocks: A new approach based on digital on-screen graphics classification,” in 2017 11th International Conference on Signal Processing and Communication Systems (ICSPCS), 2017, pp. 1–6.
[6] M. Li, Y. Guo, and Y. Chen, “CNN-Based Commercial Detection in TV Broadcasting,” in Proceedings of the 2017 VI International Conference on Network, Communication and Computing, 2017, pp. 48–53. doi: 10.1145/3171592.3171619.
[7] S. Minaee, I. Bouazizi, P. Kolan, and H. Najafzadeh, “Ad-Net: Audio-Visual Convolutional Neural Network for Advertisement Detection In Videos,” ArXiv, vol. abs/1806.08612, 2018.
[8] C. Szegedy, V. Vanhoucke, S. Ioffe, J. Shlens, and Z. Wojna, “Rethinking the Inception Architecture for Computer Vision,” 2016 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2818–2826, 2016.
[9] G. Huang, Z. Liu, and K. Q. Weinberger, “Densely Connected Convolutional Networks,” 2017 IEEE Conference on Computer Vision and Pattern Recognition (CVPR), pp. 2261–2269, 2017.
[10] M. Sandler, A. G. Howard, M. Zhu, A. Zhmoginov, and L.-C. Chen, “MobileNetV2: Inverted Residuals and Linear Bottlenecks,” 2018 IEEE/CVF Conference on Computer Vision and Pattern Recognition, pp. 4510–4520, 2018.
[11] D. P. Kingma and J. Ba, “Adam: A Method for Stochastic Optimization,” CoRR, vol. abs/1412.6980, 2015.
DOI: https://doi.org/10.22146/ijccs.76058
Article Metrics
Abstract views : 1279 | views : 735Refbacks
- There are currently no refbacks.
Copyright (c) 2023 IJCCS (Indonesian Journal of Computing and Cybernetics Systems)
This work is licensed under a Creative Commons Attribution-ShareAlike 4.0 International License.
View My Stats1