Image-to-Video Retrieval by ResNet50
DOI:
https://doi.org/10.15379/ijmst.v10i3.3411Keywords:
Confusion Matrix, Convolutional Neural Networks, Image-to-Video Retrieval, Mean Average Precision (mAP), ResNet50Abstract
The objective of this research is to create a computer system which can retrieve a video clip by using only a single image. The developed system is called “Image-to-Video Retrieval System (I2VRS)”. The system employs the convolutional neural networks called “ResNet50”, which is a toolbox in MATLAB software to retrieve the video clip dataset. The ResNet50 is one of the powerful CNN to recognize an image in the image processing technique. The I2VRS creates its own dataset called I2VRS dataset, which consists of 101 video clips and each video clip contains 1,000 video frames. All video clips are filmed around 60 s. each in the .MP4 file-format. The system also tests an un-training dataset with 100 images, which are directly taken with a mobile phone in the .JPEG file-format. The mean average precision (mAP) of the system is 0.9825, with the training dataset time being 5,668.7 s. The average access time to retrieve a video clip is 1.5726 s. per image.