Decomposition of tensor tensor
Artificial intelligence, deep learning, convolutional neural network, reinforcement learning. These are revolutionary advances in the field of machine learning, which make many impossible tasks become possible. Despite these advantages, there are also some disadvantages and limitations. For example, because neural networks need a large number of training sets, it is easy to lead to over fitting. These algorithms are usually designed for specific tasks, and their capabilities can not be well transplanted to other tasks
One of the main challenges of computer vision is the amount of data involved: an image is usually represented as a matrix with millions of elements, while video contains thousands of such images. In addition, noise often appears in this kind of data. Therefore, unsupervised learning method which can reduce the dimension of data is a necessary magic weapon to improve many algorithms.
In view of this, tensor decomposition is very useful in the application of high-dimensional data. Using Python to implement tensor decomposition to analyze video can get important information of data, which can be used as preprocessing of other methods
High dimensional data
High dimensional data analysis involves a set of problems, one of which is that the number of features is larger than the number of data. In many applications (such as regression), this can lead to speed and model learning problems, such as over fitting or even unable to generate models. This is common in computer vision, materials science and even business, because too much data is captured on the Internet
One of the solutions is to find the low dimensional representation of the data and use it as the training observation in the model, because dimension reduction can alleviate the above problems. The lower dimensional space usually contains most of the information of the original data, so the reduced dimensional data is enough to replace the original data. Spline, regularization and tensor decomposition are examples of this method. Let’s study the latter method and practice one of its applications.
Mathematical concepts
The core concept of this paper is tensor
The number is a 0-dimensional tensor
A vector is a one-dimensional tensor
A matrix is a two-dimensional tensor
in addition, it directly refers to the dimension of tensor
This data structure is particularly useful for storing images or videos. In the traditional
RGB model
In [1] , a single image can be represented by a three-dimensional tensor
each color channel (red, green, blue) has its own matrix, and the value of a given pixel in the matrix encodes the intensity of the color channel
Each pixel has (x, y) coordinates in the matrix, and the size of the matrix depends on the resolution of the image
Further, a video is just a series of frames, each of which is an image. It makes it difficult to visualize, but it can be stored in 4D tensor: three dimensions are used to store a single frame, and the fourth dimension is used to encode the passage of time
To be more specific, let’s take a 60 second video with 60 frames per second (frames per second) and a resolution of 800×600 as an example. The video can be stored in 800x600x3600 tensor. So it’s going to have five billion elements! That’s too much for building a reliable model. That’s why we need tensor decomposition
There are many literatures about tensor decomposition, and I recommend Kolda and balder’s
overview
[2]。 In particular, Tucker decomposition has many applications, such as tensor regression
Objectives
[3] or
forecast
[4] variable. The key point is that it allows the extraction of a kernel tensor, a compressed version of the original data. If this reminds you of PCA, that’s right: one of the steps in Tucker decomposition is actually an extension of SVD, that is, SVD
high order singular value decomposition
.
Existing algorithms allow the extraction of kernel tensor and decomposition matrix (not used in our application). The super parameter is rank n. The main idea is that the higher the n value is, the more accurate the decomposition is. Rank n also determines the size of the kernel tensor. If n is small, the reconstructed tensor may not exactly match the original tensor, but the lower the data dimension is, the trade-off depends on the current application.
It is very useful to extract this kernel tensor, which will be seen in the following practical application examples.
Application
As an example of a toy, I captured three 10 second videos on my mobile phone:
My favorite cafe terrace
Parking lot
Cars on the freeway during the afternoon commute
I uploaded them and notbook code to GitHub. The main purpose is to determine whether we can strictly rank potential video pairs according to the similarity under the premise that parking lots and commuting videos are the most similar.
Before analyzing, use opencv Python library to load and process this data. The steps are as follows,
Create videocapture objects and extract the number of frames for each object
I use a shorter video to truncate the other two for better comparison
#Importlibraries
importcv2
importnumpyasnp
importrandom
importtensorlyastl
fromtensorly.decompositionimporttucker
#CreateVideoCaptureobjects
parking_lot=cv2.VideoCapture('parking_lot.MOV')
patio=cv2.VideoCapture('patio.MOV')
commute=cv2.VideoCapture('commute.MOV')
#Getnumberofframesineachvideo
parking_lot_frames=int(parking_lot.get(cv2.CAP_PROP_FRAME_COUNT))
patio_frames=int(patio.get(cv2.CAP_PROP_FRAME_COUNT))
commute_frames=int(commute.get(cv2.CAP_PROP_FRAME_COUNT))
Randomly sample 50 frames from these tensors to speed up subsequent operations
#Settheseedforreproducibility
random.seed(42)
random_frames=random.sample(range(0,commute_frames),50)
#Usetheserandomframestosubsetthetensors
subset_parking_lot=parking_lot_tensor[random_frames,:,:,:]
subset_patio=patio_tensor[random_frames,:,:,:]
subset_commute=commute_tensor[random_frames,:,:,:]
#Convertthreetensorstodouble
subset_parking_lot=subset_parking_lot.astype('d')
subset_patio=subset_patio.astype('d')
subset_commute=subset_commute.astype('d')
After completing these steps, we get three 50x1080x1920x3 tensors.
Results
To determine how similar these videos are, we can rank them. The L2 norm of the difference between two tensors is a common measure of similarity. The smaller the value is, the higher the similarity is. In mathematics, the norm of tensor can be
Therefore, the norm of difference is similar to Euclidean distance
The result of this operation using complete tensor is not satisfactory.
#Parkingandpatio
parking_patio_naive_diff=tl.norm(subset_parking_lot-subset_patio)
#Parkingandcommute
parking_commute_naive_diff=tl.norm(subset_parking_lot-subset_commute)
#Patioandcommute
patio_commute_naive_diff=tl.norm(subset_patio-subset_commute)
Look at the similarities
Not only is there no clear ranking between the two videos, but the parking lot and terrace videos seem to be the most similar, in sharp contrast to the original assumption
Well, let’s see if Tucker decomposition can improve the results.
Tensorly library makes it relatively easy to decompose tensors, although it’s a bit slow: all we need is tensors and their rank n. Although AIC criterion is a common method to find the optimal value of this parameter, it is not necessary to achieve the optimal value in this particular case, because the purpose is to compare. We need all three variables to have the same rank n. Therefore, we choose n-rank = [2, 2, 2, 2], which is a good tradeoff between accuracy and speed. By the way, n-rank = [5,5,5,5] exceeds the function of LAPACK (low level linear algebra package), which also shows that these methods are computationally expensive.
After extracting the kernel tensor, the same comparison can be made.
#Getcoretensorfortheparkinglotvideo
core_parking_lot,factors_parking_lot=tucker(subset_parking_lot,ranks=[2,2,2,2])
#Getcoretensorforthepatiovideo
core_patio,factors_patio=tucker(subset_patio,ranks=[2,2,2,2])
#Getcoretensorforthecommutevideo
core_commute,factors_commute=tucker(subset_commute,ranks=[2,2,2,2])
#Comparecoreparkinglotandpatio
parking_patio_diff=tl.norm(core_parking_lot-core_patio)
int(parking_patio_diff)
#Comparecoreparkinglotandcommute
parking_commute_diff=tl.norm(core_parking_lot-core_commute)
int(parking_commute_diff)
#Comparecorepatioandcommute
patio_commute_diff=tl.norm(core_patio-core_commute)
int(patio_commute_diff)
Look at the similarity
These results make sense: Although terrace videos are different from parking and commuting videos, the latter two videos are closer to an order of magnitude.
Conclusion
In this article, I show how unsupervised learning methods provide insight into data. Only when the dimension is reduced by Tucker decomposition to extract the kernel tensor from the video, can the comparison be meaningful. We confirm that the parking lot is the most similar to the commuter video
As video becomes more and more popular data source, this technology has many potential applications. The first thought (due to my passion for TV and how the video streaming service uses data) is to improve the existing recommendation system by checking the similarities between some key scenes of the trailer or movie/TV program. The second is materials science, in which the heated metal can be classified according to the similarity between the infrared video and the benchmark. In order to make these methods fully scalable, we should solve the computational cost: on my computer, Tucker decomposition speed is very slow, although only three 10s small videos. Parallelization is a potential way to speed up processing.
In addition to these direct applications, the technique can also be combined with some of the methods introduced in the introduction. Using the core tensor instead of the whole image or video as the training points in the neural network can help to solve the over fitting problem and speed up the training speed, so as to enhance the method by solving these two main problems.
In this paper, through an example, let us initially understand the tensor decomposition tool, and master the actual usage through the code. Please click here [5] for children’s shoes that need to download this code to consolidate understanding. If you want to have a deep understanding of tensor decomposition, please read the references. The detailed interpretation of the theoretical knowledge of tensor decomposition will be introduced later
References
[1]
RGB model https://en.wikipedia.org/wiki/RGB_ color_ model
[2]
summary: http://www.kolda.net/publication/TensorReview.pdf
[3]
Objective: https://arxiv.org/abs/1807.10278
[4]
prediction: https://arxiv.org/pdf/1706.03423.pdf
[5]
code: https://github.com/celestinhermez/video-analysis-tensor-decomposition
[6]
English link: https://towardsdatascience.com/video-analysis-with-tensor-decomposition-in-python-3a1fe088831c
– END –
this article is shared by WeChat official account – machine learning and Mathematics (Mathinside2016).