OpenCV's SVD is slow

SVD (singular value decomposition) of OpenCV 2.3 and later

LAPACK was used for OpenCV 2.2 and former. OpenCV 2.3 and later use original implementation.

Computation speed of OpenCV 2.3 and later is slower than that of OpenCV and former.
Significant slow down of SVD in OpenCV 2.3 #4313
solve CV_SVD very slow in the latest opencv #7563
SVD performance issue #7917

OpenCV 3.1's SVD is implemented in JacobiSVDImple_ in lapack.cpp. I don't know the reason why this happens. I first thought that the process when the singular value is zero makes the computation speed slow. So, I checked the computation speed of the matrix filled with 1 and that of the matrix filled with random values.

Source code of random matrix

const int ROW = 2000;
const int COL = 2000;
RNG gen(getTickCount());
Mat mat(ROW, COL, CV_64F);
gen.fill(mat, RNG::UNIFORM, 0.0, 1.0);
int64 start = getTickCount();
cout << "start = " << start << endl;
SVD svd(mat);
int64 end = getTickCount();
cout << "end = " << end << endl;
int calctime = (int)((end - start) * 1000 / getTickFrequency());
cout << "duration = " << calctime << " [msec]" << endl;
cout << "W = " << endl << svd.w.rowRange(Range(0, 9)) << endl;

Source code of 1 matrix

const int ROW = 2000;
const int COL = 2000;
Mat mat = Mat::ones(ROW, COL, CV_64F);
int64 start = getTickCount();
cout << "start = " << start << endl;
SVD svd(mat);
int64 end = getTickCount();
cout << "end = " << end << endl;
int calctime = (int)((end - start) * 1000 / getTickFrequency());
cout << "duration = " << calctime << " [msec]" << endl << endl;
cout << "W = " << endl << svd.w.rowRange(Range(0, 9)) << endl;

Benchmark

Windows 10 Home
64 bit
Intel Core i7-4710HQ CPU @ 2.50GHz 2.50GHz
16.0 GB
Visual Studio Community 2015
Python 3.6.3
MATLAB R2017b
numpy 1.13.3
Numerical Recipes in C 3rd edition
Eigen 3.3.4
OpenCV 3.3.0 [Python]
OpenCV 3.1 [C/C++] 64bit
OpenCV 2.2 [C/C++] 32bit

Matrix size 2000-times-2000

Random matrix

Language	Library	Computation time [millisecond]
MATLAB		3737
Python	numpy	3878
C/C++	OpenCV 2.2	27523
Python	OpenCV	147870
C/C++	OpenCV 3.1	187424
C/C++	NRC	437539
C/C++	Eigen BDC	921375
C/C++	Eigen Jacobi	940722

Ones matrix

Language	Library	Computation time [millisecond]
C/C++	Eigen Jacobi	170
C/C++	OpenCV 2.2	2501
MATLAB		18784
Python	numpy	21017
C/C++	NRC	542057
Python	OpenCV	9003935
C/C++	OpenCV 3.1	9004074
C/C++	Eigen BDC	error termination

Summary

Language	Library	Matrix	Computation time [millisecond]
MATLAB		random	3737
MATLAB		ones	18784
Python	numpy	random	3878
	numpy	ones	21017
	OpenCV	random	147870
	OpenCV	ones	9003935
C/C++	OpenCV 3.1	random	187424
	OpenCV 3.1	ones	9004074
	OpenCV 2.2	random	27523
	OpenCV 2.2	ones	2501
	Eigen Jacobi	random	940722
	Eigen Jacobi	ones	170
	Eigen BDC	random	921375
	Eigen BDC	ones	error termination
	NRC	random	437539
	NRC	ones	542057

Sample source code of other library

MATLAB and numpy are optimized for CPU to get high performance.

Conclusion: OpenCV 3.1 is 3600 times slower than OpenCV 2.2
OpenCV 3.1's SVD is 3600 times slower than OpenCV 2.2's SVD #10084

Denormal number

If there is denormal number (subnormal number) in floating point variables, the computation speed will be extremely bad. You know, the computation speed becomes bad when there is NaN (Not a Number) such as infinity (INF) caused by division-by-zero, or NaN caused by sqrt or log with minus values.
https://en.wikipedia.org/wiki/Denormal_number
https://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x

Below is the result when the denormal number is forced to be zero. Two sentences are added as follows.
#include <xmmintrin.h>
_mm_setcsr(_mm_getcsr() | 0x8040);

Windows 10 Home
64 bit
Intel Core i7-4710HQ CPU @ 2.50GHz 2.50GHz
16.0 GB
Visual Studio Community 2015
Numerical Recipes in C 3rd edition
Eigen 3.3.4
OpenCV 3.1 [C/C++] 64bit
OpenCV 2.2 [C/C++] 32bit

Matrix size 2000-times-2000

Random matrix

Language	Library	Computation time [millisecond]
C/C++	Eigen BDC	12652
C/C++	OpenCV 2.2	48290
C/C++	OpenCV 3.1	166341
C/C++	NRC	375849
C/C++	Eigen Jacobi	920501

Ones matrix

Language	Library	Computation time [millisecond]
C/C++	Eigen Jacobi	195
C/C++	OpenCV 2.2	2230
C/C++	NRC	6026
C/C++	OpenCV 3.1	138215
C/C++	Eigen BDC	error termination

Summary

Language	Library	Matrix	Denormal	Computation time [millisecond]
C/C++	OpenCV 3.1	random	default	187424
		random	FTZ DAZ	166341
		ones	default	9004074
		ones	FTZ DAZ	138215
	OpenCV 2.2	random	default	27523
		random	FTZ DAZ	48290
		ones	default	2501
		ones	FTZ DAZ	2230
	Eigen Jacobi	random	default	940722
		random	FTZ DAZ	920501
		ones	default	170
		ones	FTZ DAZ	195
	Eigen BDC	random	default	921375
		random	FTZ DAZ	12652
		ones	default	error termination
		ones	FTZ DAZ	error termination
	NRC	random	default	437539
		random	FTZ DAZ	375849
		ones	default	542057
		ones	FTZ DAZ	6026

Conclusion: OpenCV 3.1 is 70 times slower than OpenCV 2.2

Reliability

First, second, and third singular values of 2000-by-2000 matrix. Note that random matrix is different from each other, so the slight difference of values does not matter.

Matrix	Language	Library	Denormal	1st SV	2nd SV	3rd SV
random	MATLAB			1000.3	25.7	25.7
	Python	numpy		1000.4	25.8	25.6
	Python	OpenCV		1000.2	25.8	25.7
	C/C++	OpenCV 3.1	default	999.9	25.7	25.5
		OpenCV 3.1	FTZ DAZ	1000.1	25.8	25.7
		OpenCV 2.2	default	999.7	25.7	25.5
		OpenCV 2.2	FTZ DAZ	1000.1	25.7	25.6
		Eigen Jacobi	default	51.3	51.0	51.0
		Eigen Jacobi	FTZ DAZ	51.3	51.2	51.0
		Eigen BDC	default	51.5	51.3	51.2
		Eigen BDC	FTZ DAZ	51.6	51.3	51.0
		NRC	default	1000.4	25.7	25.6
		NRC	FTZ DAZ	1000.2	25.8	25.7
ones	MATLAB			2000.0	0.0	0.0
	Python	numpy		2000.0	0.0	0.0
	Python	OpenCV		2000.0	0.0	0.0
	C/C++	OpenCV 3.1	default	2000.0	0.0	0.0
		OpenCV 3.1	FTZ DAZ	2000.0	0.0	0.0
		OpenCV 2.2	default	2000.0	0.0	0.0
		OpenCV 2.2	FTZ DAZ	2000.0	0.0	0.0
		Eigen Jacobi	default	2000.0	0.0	0.0
		Eigen Jacobi	FTZ DAZ	2000.0	0.0	0.0
		Eigen BDC	default	error termination
		Eigen BDC	FTZ DAZ	error termination
		NRC	default	2000.0	0.0	0.0
		NRC	FTZ DAZ	2000.0	0.0	0.0

Conclusion: Are you okay, Eigen?

OpenMP

Some of the libraries do parallel computing. Visual Studio of above results are of default settings, and the belows enabled OpenMP.

[C/C++]-[Language]-[Open MP Support] set [Yes]
[C/C++]-[Code Generation]-[Enable Enhanced Instruction Set] set [Advanced Vector Extensions 2]

Library	Matrix	Denormal	OpenMP	[millisecond]
OpenCV 3.1	random	default	default	187424
		default	OpenMP	156288
		FTZ DAZ	default	166341
		FTZ DAZ	OpenMP	167507
	ones	FTZ DAZ	default	138215
	ones	FTZ DAZ	OpenMP	127292
OpenCV 2.2	random	default	default	27523
		default	OpenMP	27171
		FTZ DAZ	default	48290
		FTZ DAZ	OpenMP	28803
	ones	default	default	2501
		default	OpenMP	2133
		FTZ DAZ	default	2230
		FTZ DAZ	OpenMP	2078
Eigen Jacobi	random	default	default	940722
		default	OpenMP	19606
		FTZ DAZ	default	920501
		FTZ DAZ	OpenMP	993473
	ones	default	default	170
		default	OpenMP	186
		FTZ DAZ	default	195
		FTZ DAZ	OpenMP	174
Eigen BDC	random	default	default	921375
		default	OpenMP	999517
		FTZ DAZ	default	12652
		FTZ DAZ	OpenMP	8018
NRC	random	default	default	437539
		default	OpenMP	408642
		FTZ DAZ	default	375849
		FTZ DAZ	OpenMP	408991
	ones	default	default	542057
		default	OpenMP	508781
		FTZ DAZ	default	6026
		FTZ DAZ	OpenMP	5687

Conclusion: SVD is not suitable for parallel computing, thus the computation speed does not differ so much when either using parallel computing or not, even if the software library partially uses parallel computing.

Qiita

This webpage is the entry of 2017/Dec/6th, Japanese Qiita OpenCV advent calender 2017.
https://qiita.com/advent-calendar/2017/opencv

Aftermath

Upcoming version of OpenCV may include the LAPACK implementation of SVD.
https://github.com/opencv/opencv/wiki/OE-12.-Lapack

Back