OpenCV's SVD is slow

Tweet


SVD (singular value decomposition) of OpenCV 2.3 and later

LAPACK was used for OpenCV 2.2 and former. OpenCV 2.3 and later use original implementation.

Computation speed of OpenCV 2.3 and later is slower than that of OpenCV and former.
Significant slow down of SVD in OpenCV 2.3 #4313
solve CV_SVD very slow in the latest opencv #7563
SVD performance issue #7917

OpenCV 3.1's SVD is implemented in JacobiSVDImple_ in lapack.cpp. I don't know the reason why this happens. I first thought that the process when the singular value is zero makes the computation speed slow. So, I checked the computation speed of the matrix filled with 1 and that of the matrix filled with random values.


Source code of random matrix

const int ROW = 2000;
const int COL = 2000;
RNG gen(getTickCount());
Mat mat(ROW, COL, CV_64F);
gen.fill(mat, RNG::UNIFORM, 0.0, 1.0);
int64 start = getTickCount();
cout << "start = " << start << endl;
SVD svd(mat);
int64 end = getTickCount();
cout << "end = " << end << endl;
int calctime = (int)((end - start) * 1000 / getTickFrequency());
cout << "duration = " << calctime << " [msec]" << endl;
cout << "W = " << endl << svd.w.rowRange(Range(0, 9)) << endl;


Source code of 1 matrix

const int ROW = 2000;
const int COL = 2000;
Mat mat = Mat::ones(ROW, COL, CV_64F);
int64 start = getTickCount();
cout << "start = " << start << endl;
SVD svd(mat);
int64 end = getTickCount();
cout << "end = " << end << endl;
int calctime = (int)((end - start) * 1000 / getTickFrequency());
cout << "duration = " << calctime << " [msec]" << endl << endl;
cout << "W = " << endl << svd.w.rowRange(Range(0, 9)) << endl;


Benchmark

Matrix size 2000-times-2000

Random matrix

Language Library Computation time [millisecond]
MATLAB   3737
Python numpy 3878
C/C++ OpenCV 2.2 27523
Python OpenCV 147870
C/C++ OpenCV 3.1 187424
C/C++ NRC 437539
C/C++ Eigen BDC 921375
C/C++ Eigen Jacobi 940722

Ones matrix

Language Library Computation time [millisecond]
C/C++ Eigen Jacobi 170
C/C++ OpenCV 2.2 2501
MATLAB   18784
Python numpy 21017
C/C++ NRC 542057
Python OpenCV 9003935
C/C++ OpenCV 3.1 9004074
C/C++ Eigen BDC error termination

Summary

Language Library Matrix Computation time [millisecond]
MATLAB random 3737
ones 18784
Python numpy random 3878
ones 21017
OpenCV random 147870
ones 9003935
C/C++ OpenCV 3.1 random 187424
ones 9004074
OpenCV 2.2 random 27523
ones 2501
Eigen Jacobi random 940722
ones 170
Eigen BDC random 921375
ones error termination
NRC random 437539
ones 542057

Sample source code of other library

MATLAB and numpy are optimized for CPU to get high performance.

Conclusion: OpenCV 3.1 is 3600 times slower than OpenCV 2.2
OpenCV 3.1's SVD is 3600 times slower than OpenCV 2.2's SVD #10084


Denormal number

If there is denormal number (subnormal number) in floating point variables, the computation speed will be extremely bad. You know, the computation speed becomes bad when there is NaN (Not a Number) such as infinity (INF) caused by division-by-zero, or NaN caused by sqrt or log with minus values.
https://en.wikipedia.org/wiki/Denormal_number
https://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x

Below is the result when the denormal number is forced to be zero. Two sentences are added as follows.
#include <xmmintrin.h>
_mm_setcsr(_mm_getcsr() | 0x8040);

Matrix size 2000-times-2000

Random matrix

Language Library Computation time [millisecond]
C/C++ Eigen BDC 12652
C/C++ OpenCV 2.2 48290
C/C++ OpenCV 3.1 166341
C/C++ NRC 375849
C/C++ Eigen Jacobi 920501

Ones matrix

Language Library Computation time [millisecond]
C/C++ Eigen Jacobi 195
C/C++ OpenCV 2.2 2230
C/C++ NRC 6026
C/C++ OpenCV 3.1 138215
C/C++ Eigen BDC error termination

Summary

Language Library Matrix Denormal Computation time [millisecond]
C/C++ OpenCV 3.1 random default 187424
FTZ DAZ 166341
ones default 9004074
FTZ DAZ 138215
OpenCV 2.2 random default 27523
FTZ DAZ 48290
ones default 2501
FTZ DAZ 2230
Eigen Jacobi random default 940722
FTZ DAZ 920501
ones default 170
FTZ DAZ 195
Eigen BDC random default 921375
FTZ DAZ 12652
ones default error termination
FTZ DAZ error termination
NRC random default 437539
FTZ DAZ 375849
ones default 542057
FTZ DAZ 6026

Conclusion: OpenCV 3.1 is 70 times slower than OpenCV 2.2


Reliability

First, second, and third singular values of 2000-by-2000 matrix. Note that random matrix is different from each other, so the slight difference of values does not matter.

Matrix Language Library Denormal 1st SV 2nd SV 3rd SV
random MATLAB 1000.3 25.7 25.7
Python numpy 1000.4 25.8 25.6
OpenCV 1000.2 25.8 25.7
C/C++ OpenCV 3.1 default 999.9 25.7 25.5
FTZ DAZ 1000.1 25.8 25.7
OpenCV 2.2 default 999.7 25.7 25.5
FTZ DAZ 1000.1 25.7 25.6
Eigen Jacobi default 51.3 51.0 51.0
FTZ DAZ 51.3 51.2 51.0
Eigen BDC default 51.5 51.3 51.2
FTZ DAZ 51.6 51.3 51.0
NRC default 1000.4 25.7 25.6
FTZ DAZ 1000.2 25.8 25.7
ones MATLAB 2000.0 0.0 0.0
Python numpy 2000.0 0.0 0.0
OpenCV 2000.0 0.0 0.0
C/C++ OpenCV 3.1 default 2000.0 0.0 0.0
FTZ DAZ 2000.0 0.0 0.0
OpenCV 2.2 default 2000.0 0.0 0.0
FTZ DAZ 2000.0 0.0 0.0
Eigen Jacobi default 2000.0 0.0 0.0
FTZ DAZ 2000.0 0.0 0.0
Eigen BDC default error termination
FTZ DAZ error termination
NRC default 2000.0 0.0 0.0
FTZ DAZ 2000.0 0.0 0.0

Conclusion: Are you okay, Eigen?


OpenMP

Some of the libraries do parallel computing. Visual Studio of above results are of default settings, and the belows enabled OpenMP.

[C/C++]-[Language]-[Open MP Support] set [Yes]
[C/C++]-[Code Generation]-[Enable Enhanced Instruction Set] set [Advanced Vector Extensions 2]

Library Matrix Denormal OpenMP [millisecond]
OpenCV 3.1 random default default 187424
OpenMP 156288
FTZ DAZ default 166341
OpenMP 167507
ones FTZ DAZ default 138215
OpenMP 127292
OpenCV 2.2 random default default 27523
OpenMP 27171
FTZ DAZ default 48290
OpenMP 28803
ones default default 2501
OpenMP 2133
FTZ DAZ default 2230
OpenMP 2078
Eigen Jacobi random default default 940722
OpenMP 19606
FTZ DAZ default 920501
OpenMP 993473
ones default default 170
OpenMP 186
FTZ DAZ default 195
OpenMP 174
Eigen BDC random default default 921375
OpenMP 999517
FTZ DAZ default 12652
OpenMP 8018
NRC random default default 437539
OpenMP 408642
FTZ DAZ default 375849
OpenMP 408991
ones default default 542057
OpenMP 508781
FTZ DAZ default 6026
OpenMP 5687

Conclusion: SVD is not suitable for parallel computing, thus the computation speed does not differ so much when either using parallel computing or not, even if the software library partially uses parallel computing.


Qiita

This webpage is the entry of 2017/Dec/6th, Japanese Qiita OpenCV advent calender 2017.
https://qiita.com/advent-calendar/2017/opencv


Aftermath

Upcoming version of OpenCV may include the LAPACK implementation of SVD.
https://github.com/opencv/opencv/wiki/OE-12.-Lapack


Back