Image
10.33826/ijmras/vo6i06.5

An analysis of Video Categorization, including its Approaches, Results, Performance, Problems, Solutions, and Future Directions

Abstract

Internet accessibility and bandwidth have improved dramatically in recent years. Since connecting to the Internet is now so cheap, it has facilitated the widespread and rapid dissemination of information in the forms of text, audio, and video. Predicting the appropriate category for this video footage is necessary for a variety of uses. For the sake of human efficiency, several machine-learning approaches have been created for video categorization. Existing review articles on video classification have a number of drawbacks, including limited analysis, poor organization, failure to disclose research gaps or conclusions, and inadequate description of benefits, drawbacks, and future directions for investigation. However, we believe that our review article comes close to surpassing these constraints. This research aims to provide a comprehensive overview of the current state of video categorization by analyzing and comparing the many approaches now in use and recommending the way that has shown to be the most successful and efficient. First, we look at how films are categorized using taxonomy, current applications, processes, and datasets. Second, the current connection in science, deep learning, and the model of machine learning, as well as the associated inconveniences, challenges, flaws, and possible work, data, and performance assessments. The study of video classification systems, including their characteristics, tools, advantages, and disadvantages, for the purpose of comparing the methods they have used, is a significant part of this review. Finally, we provide a tabular overview of key aspects. The RNN, CNN, and combination technique outperforms the CNN-dependent approach in terms of accuracy and independence extraction functions.

Keywords
  • Video Classification,
  • Machine learning,
  • Deep learning,
  • Video,
  • Video classification
References
  • Brezeale, D. and D.J. Cook, Automatic video classification: A survey of the literature. IEEE Transactions on Systems, Man, and Cybernetics, Part C (Applications and Reviews), vol. 38, no. 3, p. 416-430, 2008. DOI: https://doi.org/10.1109/TSMCC.2008.919173
  • Wu, Z., et al., Deep learning for video classification and captioning, in Frontiers of multimedia research, 3122867 p. 3-29, 2017. DOI: https://doi.org/10.1145/3122865.3122867
  • Ren, Q., et al., A Survey on Video Classification Methods Based on Deep Learning. DEStech Transactions on Computer Science and Engineering, cisnrc, 33301 .p. 1-7, 2019. DOI: https://doi.org/10.12783/dtcse/cisnrc2019/33301
  • Anushya, A., VIDEO TAGGING USING DEEP LEARNING: A SURVEY, International Journal of Computer Science and Mobile Computing,Vol.9 Issue.2,pg. 49-55,2020.
  • Rani, P., J. Kaur, and S. Kaswan, Automatic Video Classification: A Review. EAI Endorsed Transactions on Creative Technologies, ,7(24), p. 163996,2020). DOI: https://doi.org/10.4108/eai.13-7-2018.163996
  • Li, Y., C. Wang, and J. Liu, A Systematic Review of Literature on User Behavior in Video Game Live Streaming. International Journal of Environmental Research and Public Health, vol. 17, no. 9, p. 3328,2020. DOI: https://doi.org/10.3390/ijerph17093328
  • Zhen, M., et al. Learning Discriminative Feature with CRF for Unsupervised Video Object Segmentation. in European Conference on Computer Vision. Springer, LNCS, volume 12372,pp 445-46,2020. DOI: https://doi.org/10.1007/978-3-030-58583-9_27
  • Li, Z., R. Li, and G. Jin, Sentiment Analysis of Danmaku Videos Based on Naïve Bayes and Sentiment Dictionary.
  • IEEE Access, vol. 8, p. 75073-75084,2020. DOI: https://doi.org/10.1109/ACCESS.2020.2986582
  • Ruz, G.A., P.A. Henríquez, and A. Mascareño, Sentiment analysis of Twitter data during critical events through Bayesian networks classifiers. Future Generation Computer Systems, 106: p. 92-104,2020. DOI: https://doi.org/10.1016/j.future.2020.01.005
  • Xu, Q., et al., Aspect-based sentiment classification with multi-attention network. Neurocomputing, vol. 388, p. 135- 143, 2020. DOI: https://doi.org/10.1016/j.future.2020.01.005
  • Bibi, M., et al., A Cooperative Binary-Clustering Framework Based on Majority Voting for Twitter Sentiment Analysis. IEEE Access, Vol. 8, p. 68580 - 68592,2020. DOI: https://doi.org/10.1109/ACCESS.2020.2983859
  • Sailunaz, K. and R. Alhajj, Emotion and sentiment analysis from Twitter text. Journal of Computational Science, vol. 36, p. 101003, 2020. DOI: https://doi.org/10.1016/j.jocs.2019.05.009
  • Peng, T., et al., Video Classification Based On the Improved K-Means Clustering Algorithm. E&ES, vol. 440, no. 3, p. 032060,2020. DOI: https://doi.org/10.1088/1755-1315/440/3/032060
  • Li, X. and S. Geng, Research on sports retrieval recognition of action based on feature extraction and SVM classification algorithm. Journal of Intelligent & Fuzzy Systems, vol. 39, no. 4, pp. 5797-5808, 2020. DOI: https://doi.org/10.3233/JIFS-189056
  • Alomari, E., R. Mehmood, and I. Katib, Sentiment Analysis of Arabic Tweets for Road Traffic Congestion and Event Detection, in Smart Infrastructure and Applications, Springer. p. 37-54, 2020. DOI: https://doi.org/10.1007/978-3- 030-13705-2_2
  • Ren, R., D.D. Wu, and T. Liu, Forecasting stock market movement direction using sentiment analysis and support vector machine. IEEE Systems Journal, vol. 13, no. 1, p. 760-770, 2020.DOI: https://doi.org/10.1109/JSYST.2018.2794462
  • Yadav, A. and D.K. Vishwakarma, A unified framework of deep networks for genre classification using movie trailer.
  • Applied Soft Computing, vol. 96: p. 106624, 2020. DOI: https://doi.org/10.1016/j.asoc.2020.106624
  • Parameswaran, S., et al., Exploring Various Aspects of Gabor Filter in Classifying Facial Expression, in Advances in Communication Systems and Networks, Springer. p. 487-500, 2020. DOI: https://doi.org/10.1007/978-981-15- 3992-3_41
  • Hauptmann, A., et al., with the Informedia Digital Video Library System, MULTIMEDIA '94,Pages 480–481, 1994.
  • Warner, W. and J. Hirschberg. Detecting hate speech on the world wide web. in Proceedings of the second workshop on language in social media. 2012. Association for Computational Linguistics. (LSM 2012), pages 19–26, 2012.
  • Li, C., et al., Infant Facial Expression Analysis: Towards A Real-time Video Monitoring System Using R-CNN and HMM. IEEE Journal of Biomedical and Health Informatics, 9254091, pp 1-12, 2020. DOI: https://doi.org/10.1109/JBHI.2020.3037031
  • Shen, J., et al., Towards an efficient deep pipelined template-based architecture for accelerating the entire 2D and 3D CNNs on FPGA. IEEE Transactions on Computer-Aided Design of Integrated Circuits and Systems, 2019. 1442
  • - 1455,Vol. 39, no. 7, July 2020. DOI: https://doi.org/10.1109/TCAD.2019.2912894
  • Meng, B., X. Liu, and X. Wang, Human action recognition based on quaternion spatial-temporal convolutional neural network and LSTM in RGB videos. Multimedia Tools and Applications, vol. 77, no. 20, p. 26901-26918,2018.
  • DOI: https://doi.org/10.1007/s11042-018-5893-9
  • Yang, H., et al., Asymmetric 3d convolutional neural networks for action recognition. Pattern recognition, vol. 85, p. 1-12, 2019. DOI: https://doi.org/10.1016/j.patcog.2018.07.028
  • Kar, A., et al. Adascan: Adaptive scan pooling in deep convolutional neural networks for human action recognition in videos. in Proceedings of the IEEE conference on computer vision and pattern recognition. 2017. (CVPR), pp. 3376-3385,2017. DOI: https://doi.org/10.1109/CVPR.2017.604
  • Cho, K., et al., Learning phrase representations using RNN encoder-decoder for statistical machine translation.
  • arXiv preprint arXiv:1406.1078, p. 1-45, 2014. DOI: https://doi.org/10.3115/v1/D14-1179
  • Shofiqul, M.S.I., N. Ab Ghani, and M.M. Ahmed, A review on recent advances in Deep learning for Sentiment Analysis: Performances, Challenges and Limitations. COMPUSOFT: An International Journal of Advanced Computer Technology, vol. 9, no. 7, p. 3768-3776, 2020.
  • Kalra, G.S., R.S. Kathuria, and A. Kumar. YouTube Video Classification based on Title and Description Text. in
  • International Conference on Computing, Communication, and Intelligent Systems (ICCCIS). 2019. IEEE.
  • ICCCIS48478,p. 8974514,2019. DOI: https://doi.org/10.1109/ICCCIS48478.2019.8974514
  • Yuan, F., et al., End-to-end video classification with knowledge graphs. arXiv preprint arXiv:1711.01714, 2017. 1711.01714, pp 1-9, 2017.
  • Voulodimos, A., et al., Deep learning for computer vision: A brief review. Computational intelligence and neuroscience, 7068349, pp 1-13, 2019. DOI: https://doi.org/10.1155/2018/7068349
  • Sargano, A.B., P. Angelov, and Z. Habib, A comprehensive review on handcrafted and learning-based action representation approaches for human activity recognition. applied sciences, vol. 7, no. 1, p. 110,2017. DOI: https://doi.org/10.3390/app7010110
  • Elboushaki, A., et al., MultiD-CNN: A multi-dimensional feature learning approach based on deep convolutional networks for gesture recognition in RGB-D image sequences. Expert Systems with Applications, vol. 139: p. 112829, 2020. DOI: https://doi.org/10.1016/j.eswa.2019.112829
  • Huiqun, Z., W. Hui, and W. Xiaoling. Application research of video annotation in sports video analysis. in 2011 International Conference on Future Computer Science and Education.IEEE, 6041660, p. 1-5, 2011. DOI: https://doi.org/10.1109/ICFCSE.2011.24
  • Herath, S., M. Harandi, and F. Porikli, Going deeper into action recognition: A survey. Image and vision computing, vol. 60, p. 4-21, 2017. DOI: https://doi.org/10.1016/j.imavis.2017.01.010
  • Chen, H., et al., Action recognition with temporal scale-invariant deep learning framework. China Communications, vol. 14, no. 2, p. 163-172, 2017. DOI: https://doi.org/10.1109/CC.2017.7868164
  • Peng, X., et al. Action recognition with stacked fisher vectors. in European Conference on Computer Vision, Springer.
  • ECCV,2014,pp 581-595, 2014. DOI: https://doi.org/10.1007/978-3-319-10602-1_38
  • Lan, Z., et al. Beyond gaussian pyramid: Multi-skip feature stacking for action recognition. in Proceedings of the IEEE conference on computer vision and pattern recognition, (CVPR), pp. 204-212, 2015.
  • Dalal, N., B. Triggs, and C. Schmid. Human detection using oriented histograms of flow and appearance. in European conference on computer vision, Springer. ECCV, p. 428-441, 2006. DOI: https://doi.org/10.1007/11744047_33
  • Asadi-Aghbolaghi, M., et al. A survey on deep learning based approaches for action and gesture recognition in image sequences. in 2017 12th IEEE international conference on automatic face & gesture Recognition (FG 2017),
  • IEEE. 7961779, p. 1-8, 2017. DOI: https://doi.org/10.1109/FG.2017.150
  • Yang, X., P. Molchanov, and J. Kautz. Multilayer and multimodal fusion of deep neural networks for video classification. in Proceedings of the 24th ACM international conference on Multimedia, 2964297, p. 978–987. 2016.
  • DOI: https://doi.org/10.1145/2964284.2964297
  • Yue-Hei Ng, J., et al. Beyond short snippets: Deep networks for video classification. in Proceedings of the IEEE Conference on computer vision and pattern recognition,(CVPR), p. 4694-4702, 2015.
  • Dvir, A., et al., Encrypted Video Traffic Clustering Demystified. Computers & Security, Volume 96, p. 101917, 2020.
  • DOI: https://doi.org/10.1016/j.cose.2020.101917
  • Yin, D., et al., Detection of harassment on web 2.0. Proceedings of the Content Analysis in the WEB, 2: p. 1-7, 2009.
Creative Commons License

This work is licensed under a Creative Commons Attribution 4.0 International License.

How to Cite

LIN ZHENQUAN. (2023). An analysis of Video Categorization, including its Approaches, Results, Performance, Problems, Solutions, and Future Directions. International Journal of Multidisciplinary Research and Studies, 6(06), 01–19. https://doi.org/10.33826/ijmras/v06i06.5

Download Citation

Downloads

Download data is not yet available.

Most read articles by the same author(s)

Similar Articles

You may also start an advanced similarity search for this article.