LAPACK was used for OpenCV 2.2 and former. OpenCV 2.3 and later use original implementation.
Computation speed of
OpenCV 2.3 and later is slower than that of OpenCV and former.
Significant slow down of
SVD in OpenCV 2.3 #4313
solve CV_SVD very slow in
the latest opencv #7563
SVD performance issue
#7917
OpenCV 3.1's SVD is implemented in JacobiSVDImple_ in lapack.cpp. I don't know the reason why this happens. I first thought that the process when the singular value is zero makes the computation speed slow. So, I checked the computation speed of the matrix filled with 1 and that of the matrix filled with random values.
const int ROW = 2000;
const int COL = 2000;
RNG gen(getTickCount());
Mat mat(ROW, COL, CV_64F);
gen.fill(mat, RNG::UNIFORM, 0.0, 1.0);
int64 start = getTickCount();
cout << "start = " << start << endl;
SVD svd(mat);
int64 end = getTickCount();
cout << "end = " << end << endl;
int calctime = (int)((end - start) * 1000 / getTickFrequency());
cout << "duration = " << calctime << " [msec]" << endl;
cout << "W = " << endl << svd.w.rowRange(Range(0, 9)) << endl;
const int ROW = 2000;
const int COL = 2000;
Mat mat = Mat::ones(ROW, COL, CV_64F);
int64 start = getTickCount();
cout << "start = " << start << endl;
SVD svd(mat);
int64 end = getTickCount();
cout << "end = " << end << endl;
int calctime = (int)((end - start) * 1000 / getTickFrequency());
cout << "duration = " << calctime << " [msec]" << endl << endl;
cout << "W = " << endl << svd.w.rowRange(Range(0, 9)) << endl;
Matrix size 2000-times-2000
Random matrix
Language | Library | Computation time [millisecond] |
MATLAB | 3737 | |
Python | numpy | 3878 |
C/C++ | OpenCV 2.2 | 27523 |
Python | OpenCV | 147870 |
C/C++ | OpenCV 3.1 | 187424 |
C/C++ | NRC | 437539 |
C/C++ | Eigen BDC | 921375 |
C/C++ | Eigen Jacobi | 940722 |
Ones matrix
Language | Library | Computation time [millisecond] |
C/C++ | Eigen Jacobi | 170 |
C/C++ | OpenCV 2.2 | 2501 |
MATLAB | 18784 | |
Python | numpy | 21017 |
C/C++ | NRC | 542057 |
Python | OpenCV | 9003935 |
C/C++ | OpenCV 3.1 | 9004074 |
C/C++ | Eigen BDC | error termination |
Summary
Language | Library | Matrix | Computation time [millisecond] |
MATLAB | random | 3737 | |
ones | 18784 | ||
Python | numpy | random | 3878 |
ones | 21017 | ||
OpenCV | random | 147870 | |
ones | 9003935 | ||
C/C++ | OpenCV 3.1 | random | 187424 |
ones | 9004074 | ||
OpenCV 2.2 | random | 27523 | |
ones | 2501 | ||
Eigen Jacobi | random | 940722 | |
ones | 170 | ||
Eigen BDC | random | 921375 | |
ones | error termination | ||
NRC | random | 437539 | |
ones | 542057 |
Sample source code of other library
MATLAB and numpy are optimized for CPU to get high performance.
Conclusion: OpenCV 3.1 is 3600 times slower than OpenCV 2.2
OpenCV 3.1's SVD is 3600
times slower than OpenCV 2.2's SVD #10084
If there is denormal number (subnormal number) in floating point variables,
the computation speed will be extremely bad. You know, the computation speed
becomes bad when there is NaN (Not a Number) such as infinity (INF) caused by
division-by-zero, or NaN caused by sqrt or log with minus values.
https://en.wikipedia.org/wiki/Denormal_number
https://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x
Below is the result when the denormal number is forced to be zero. Two
sentences are added as follows.
#include <xmmintrin.h>
_mm_setcsr(_mm_getcsr() | 0x8040);
Matrix size 2000-times-2000
Random matrix
Language | Library | Computation time [millisecond] |
C/C++ | Eigen BDC | 12652 |
C/C++ | OpenCV 2.2 | 48290 |
C/C++ | OpenCV 3.1 | 166341 |
C/C++ | NRC | 375849 |
C/C++ | Eigen Jacobi | 920501 |
Ones matrix
Language | Library | Computation time [millisecond] |
C/C++ | Eigen Jacobi | 195 |
C/C++ | OpenCV 2.2 | 2230 |
C/C++ | NRC | 6026 |
C/C++ | OpenCV 3.1 | 138215 |
C/C++ | Eigen BDC | error termination |
Summary
Language | Library | Matrix | Denormal | Computation time [millisecond] |
C/C++ | OpenCV 3.1 | random | default | 187424 |
FTZ DAZ | 166341 | |||
ones | default | 9004074 | ||
FTZ DAZ | 138215 | |||
OpenCV 2.2 | random | default | 27523 | |
FTZ DAZ | 48290 | |||
ones | default | 2501 | ||
FTZ DAZ | 2230 | |||
Eigen Jacobi | random | default | 940722 | |
FTZ DAZ | 920501 | |||
ones | default | 170 | ||
FTZ DAZ | 195 | |||
Eigen BDC | random | default | 921375 | |
FTZ DAZ | 12652 | |||
ones | default | error termination | ||
FTZ DAZ | error termination | |||
NRC | random | default | 437539 | |
FTZ DAZ | 375849 | |||
ones | default | 542057 | ||
FTZ DAZ | 6026 |
Conclusion: OpenCV 3.1 is 70 times slower than OpenCV 2.2
First, second, and third singular values of 2000-by-2000 matrix. Note that random matrix is different from each other, so the slight difference of values does not matter.
Matrix | Language | Library | Denormal | 1st SV | 2nd SV | 3rd SV |
random | MATLAB | 1000.3 | 25.7 | 25.7 | ||
Python | numpy | 1000.4 | 25.8 | 25.6 | ||
OpenCV | 1000.2 | 25.8 | 25.7 | |||
C/C++ | OpenCV 3.1 | default | 999.9 | 25.7 | 25.5 | |
FTZ DAZ | 1000.1 | 25.8 | 25.7 | |||
OpenCV 2.2 | default | 999.7 | 25.7 | 25.5 | ||
FTZ DAZ | 1000.1 | 25.7 | 25.6 | |||
Eigen Jacobi | default | 51.3 | 51.0 | 51.0 | ||
FTZ DAZ | 51.3 | 51.2 | 51.0 | |||
Eigen BDC | default | 51.5 | 51.3 | 51.2 | ||
FTZ DAZ | 51.6 | 51.3 | 51.0 | |||
NRC | default | 1000.4 | 25.7 | 25.6 | ||
FTZ DAZ | 1000.2 | 25.8 | 25.7 | |||
ones | MATLAB | 2000.0 | 0.0 | 0.0 | ||
Python | numpy | 2000.0 | 0.0 | 0.0 | ||
OpenCV | 2000.0 | 0.0 | 0.0 | |||
C/C++ | OpenCV 3.1 | default | 2000.0 | 0.0 | 0.0 | |
FTZ DAZ | 2000.0 | 0.0 | 0.0 | |||
OpenCV 2.2 | default | 2000.0 | 0.0 | 0.0 | ||
FTZ DAZ | 2000.0 | 0.0 | 0.0 | |||
Eigen Jacobi | default | 2000.0 | 0.0 | 0.0 | ||
FTZ DAZ | 2000.0 | 0.0 | 0.0 | |||
Eigen BDC | default | error termination | ||||
FTZ DAZ | error termination | |||||
NRC | default | 2000.0 | 0.0 | 0.0 | ||
FTZ DAZ | 2000.0 | 0.0 | 0.0 |
Conclusion: Are you okay, Eigen?
Some of the libraries do parallel computing. Visual Studio of above results are of default settings, and the belows enabled OpenMP.
[C/C++]-[Language]-[Open MP Support] set [Yes]
[C/C++]-[Code Generation]-[Enable Enhanced Instruction Set] set [Advanced Vector
Extensions 2]
Library | Matrix | Denormal | OpenMP | [millisecond] |
OpenCV 3.1 | random | default | default | 187424 |
OpenMP | 156288 | |||
FTZ DAZ | default | 166341 | ||
OpenMP | 167507 | |||
ones | FTZ DAZ | default | 138215 | |
OpenMP | 127292 | |||
OpenCV 2.2 | random | default | default | 27523 |
OpenMP | 27171 | |||
FTZ DAZ | default | 48290 | ||
OpenMP | 28803 | |||
ones | default | default | 2501 | |
OpenMP | 2133 | |||
FTZ DAZ | default | 2230 | ||
OpenMP | 2078 | |||
Eigen Jacobi | random | default | default | 940722 |
OpenMP | 19606 | |||
FTZ DAZ | default | 920501 | ||
OpenMP | 993473 | |||
ones | default | default | 170 | |
OpenMP | 186 | |||
FTZ DAZ | default | 195 | ||
OpenMP | 174 | |||
Eigen BDC | random | default | default | 921375 |
OpenMP | 999517 | |||
FTZ DAZ | default | 12652 | ||
OpenMP | 8018 | |||
NRC | random | default | default | 437539 |
OpenMP | 408642 | |||
FTZ DAZ | default | 375849 | ||
OpenMP | 408991 | |||
ones | default | default | 542057 | |
OpenMP | 508781 | |||
FTZ DAZ | default | 6026 | ||
OpenMP | 5687 |
Conclusion: SVD is not suitable for parallel computing, thus the computation speed does not differ so much when either using parallel computing or not, even if the software library partially uses parallel computing.
This webpage is the entry of 2017/Dec/6th, Japanese Qiita OpenCV advent
calender 2017.
https://qiita.com/advent-calendar/2017/opencv
Upcoming version of OpenCV may include the LAPACK implementation of SVD.
https://github.com/opencv/opencv/wiki/OE-12.-Lapack