This page has been moved to tech0105.html
OpenCVの特異値分解だが,バージョン2.2まではLAPACKを使っていた.しかし,OpenCVのバージョン2.3以降は独自の実装になっている.
OpenCV 2.3以降はOpenCV 2.2以前と比べて,計算速度が遅い.
Significant slow down of
SVD in OpenCV 2.3 #4313
solve CV_SVD very slow in
the latest opencv #7563
SVD performance issue
#7917
OpenCV 3.1におけるSVDの実装は,lapack.cppのJacobiSVDImpl_関数である.数値計算の専門家ではない私が見ても原因は分からない.ソースコードを見た当初は,特異値がゼロのときの処理が遅いのではないか,と思った.そこで,ランダムな値の入った行列と,1という数値だけが入った行列で計算時間を確認した.
const int ROW = 2000;
const int COL = 2000;
RNG gen(getTickCount());
Mat mat(ROW, COL, CV_64F);
gen.fill(mat, RNG::UNIFORM, 0.0, 1.0);
int64 start = getTickCount();
cout << "start = " << start << endl;
SVD svd(mat);
int64 end = getTickCount();
cout << "end = " << end << endl;
int calctime = (int)((end - start) * 1000 / getTickFrequency());
cout << "duration = " << calctime << " [msec]" << endl;
cout << "W = " << endl << svd.w.rowRange(Range(0, 9)) << endl;
const int ROW = 2000;
const int COL = 2000;
Mat mat = Mat::ones(ROW, COL, CV_64F);
int64 start = getTickCount();
cout << "start = " << start << endl;
SVD svd(mat);
int64 end = getTickCount();
cout << "end = " << end << endl;
int calctime = (int)((end - start) * 1000 / getTickFrequency());
cout << "duration = " << calctime << " [msec]" << endl << endl;
cout << "W = " << endl << svd.w.rowRange(Range(0, 9)) << endl;
Matrix size 2000-times-2000
Random matrix
Language | Library | Computation time [millisecond] |
MATLAB | 3737 | |
Python | numpy | 3878 |
C/C++ | OpenCV 2.2 | 27523 |
Python | OpenCV | 147870 |
C/C++ | OpenCV 3.1 | 187424 |
C/C++ | NRC | 437539 |
C/C++ | Eigen BDC | 921375 |
C/C++ | Eigen Jacobi | 940722 |
Ones matrix
Language | Library | Computation time [millisecond] |
C/C++ | Eigen Jacobi | 170 |
C/C++ | OpenCV 2.2 | 2501 |
MATLAB | 18784 | |
Python | numpy | 21017 |
C/C++ | NRC | 542057 |
Python | OpenCV | 9003935 |
C/C++ | OpenCV 3.1 | 9004074 |
C/C++ | Eigen BDC | error termination |
まとめ
Language | Library | Matrix | Computation time [millisecond] |
MATLAB | random | 3737 | |
ones | 18784 | ||
Python | numpy | random | 3878 |
ones | 21017 | ||
OpenCV | random | 147870 | |
ones | 9003935 | ||
C/C++ | OpenCV 3.1 | random | 187424 |
ones | 9004074 | ||
OpenCV 2.2 | random | 27523 | |
ones | 2501 | ||
Eigen Jacobi | random | 940722 | |
ones | 170 | ||
Eigen BDC | random | 921375 | |
ones | error termination | ||
NRC | random | 437539 | |
ones | 542057 |
MATLABやnumpyはCPUのパフォーマンスを最大限に発揮できるように内部で最適化されているため,計算速度が速い.
結論:OpenCV 3.1はOpenCV 2.2と比べて3600倍遅い.
OpenCV 3.1's SVD is 3600
times slower than OpenCV 2.2's SVD #10084
浮動小数点の変数に非正規化数(denormal number / subnormal
number)が含まれると,その値との演算が異常に遅くなる.ゼロでの割り算で発生する無限(INF)や,負の値をsqrtやlogに与えた場合などに生じるNaN(Not
a Number)が計算に使用されると異常に遅くなるのと同じ理由である.
https://en.wikipedia.org/wiki/Denormal_number
https://stackoverflow.com/questions/9314534/why-does-changing-0-1f-to-0-slow-down-performance-by-10x
非正規化数をゼロにして実行してみた.ソースコードには以下の2行を追加した.
#include <xmmintrin.h>
_mm_setcsr(_mm_getcsr() | 0x8040);
Matrix size 2000-times-2000
Random matrix
Language | Library | Computation time [millisecond] |
C/C++ | Eigen BDC | 12652 |
C/C++ | OpenCV 2.2 | 48290 |
C/C++ | OpenCV 3.1 | 166341 |
C/C++ | NRC | 375849 |
C/C++ | Eigen Jacobi | 920501 |
Ones matrix
Language | Library | Computation time [millisecond] |
C/C++ | Eigen Jacobi | 195 |
C/C++ | OpenCV 2.2 | 2230 |
C/C++ | NRC | 6026 |
C/C++ | OpenCV 3.1 | 138215 |
C/C++ | Eigen BDC | error termination |
まとめ
Language | Library | Matrix | Denormal | Computation time [millisecond] |
C/C++ | OpenCV 3.1 | random | default | 187424 |
FTZ DAZ | 166341 | |||
ones | default | 9004074 | ||
FTZ DAZ | 138215 | |||
OpenCV 2.2 | random | default | 27523 | |
FTZ DAZ | 48290 | |||
ones | default | 2501 | ||
FTZ DAZ | 2230 | |||
Eigen Jacobi | random | default | 940722 | |
FTZ DAZ | 920501 | |||
ones | default | 170 | ||
FTZ DAZ | 195 | |||
Eigen BDC | random | default | 921375 | |
FTZ DAZ | 12652 | |||
ones | default | error termination | ||
FTZ DAZ | error termination | |||
NRC | random | default | 437539 | |
FTZ DAZ | 375849 | |||
ones | default | 542057 | ||
FTZ DAZ | 6026 |
結論:OpenCV 3.1はOpenCV 2.2と比べて70倍遅い.
2000×2000行列に対する第1~3特異値の一覧.randomは同じ数値の行列ではないので,参考程度にご覧下さい.
Matrix | Language | Library | Denormal | 1st SV | 2nd SV | 3rd SV |
random | MATLAB | 1000.3 | 25.7 | 25.7 | ||
Python | numpy | 1000.4 | 25.8 | 25.6 | ||
OpenCV | 1000.2 | 25.8 | 25.7 | |||
C/C++ | OpenCV 3.1 | default | 999.9 | 25.7 | 25.5 | |
FTZ DAZ | 1000.1 | 25.8 | 25.7 | |||
OpenCV 2.2 | default | 999.7 | 25.7 | 25.5 | ||
FTZ DAZ | 1000.1 | 25.7 | 25.6 | |||
Eigen Jacobi | default | 51.3 | 51.0 | 51.0 | ||
FTZ DAZ | 51.3 | 51.2 | 51.0 | |||
Eigen BDC | default | 51.5 | 51.3 | 51.2 | ||
FTZ DAZ | 51.6 | 51.3 | 51.0 | |||
NRC | default | 1000.4 | 25.7 | 25.6 | ||
FTZ DAZ | 1000.2 | 25.8 | 25.7 | |||
ones | MATLAB | 2000.0 | 0.0 | 0.0 | ||
Python | numpy | 2000.0 | 0.0 | 0.0 | ||
OpenCV | 2000.0 | 0.0 | 0.0 | |||
C/C++ | OpenCV 3.1 | default | 2000.0 | 0.0 | 0.0 | |
FTZ DAZ | 2000.0 | 0.0 | 0.0 | |||
OpenCV 2.2 | default | 2000.0 | 0.0 | 0.0 | ||
FTZ DAZ | 2000.0 | 0.0 | 0.0 | |||
Eigen Jacobi | default | 2000.0 | 0.0 | 0.0 | ||
FTZ DAZ | 2000.0 | 0.0 | 0.0 | |||
Eigen BDC | default | error termination | ||||
FTZ DAZ | error termination | |||||
NRC | default | 2000.0 | 0.0 | 0.0 | ||
FTZ DAZ | 2000.0 | 0.0 | 0.0 |
結論:Eigenライブラリの挙動がおかしい.
ライブラリによっては,並列計算で計算しているものもある.上記のVisual Studioはデフォルトの設定だが,OpenMPを有効にして実行してみた結果は以下の通り.
[C/C++]-[言語]-[OpenMPのサポート]を[はい]に
[C/C++]-[コード生成]-[拡張命令セットを有効にする]を[Advanced Vector Extensions 2]に
Library | Matrix | Denormal | OpenMP | [millisecond] |
OpenCV 3.1 | random | default | default | 187424 |
OpenMP | 156288 | |||
FTZ DAZ | default | 166341 | ||
OpenMP | 167507 | |||
ones | FTZ DAZ | default | 138215 | |
OpenMP | 127292 | |||
OpenCV 2.2 | random | default | default | 27523 |
OpenMP | 27171 | |||
FTZ DAZ | default | 48290 | ||
OpenMP | 28803 | |||
ones | default | default | 2501 | |
OpenMP | 2133 | |||
FTZ DAZ | default | 2230 | ||
OpenMP | 2078 | |||
Eigen Jacobi | random | default | default | 940722 |
OpenMP | 19606 | |||
FTZ DAZ | default | 920501 | ||
OpenMP | 993473 | |||
ones | default | default | 170 | |
OpenMP | 186 | |||
FTZ DAZ | default | 195 | ||
OpenMP | 174 | |||
Eigen BDC | random | default | default | 921375 |
OpenMP | 999517 | |||
FTZ DAZ | default | 12652 | ||
OpenMP | 8018 | |||
NRC | random | default | default | 437539 |
OpenMP | 408642 | |||
FTZ DAZ | default | 375849 | ||
OpenMP | 408991 | |||
ones | default | default | 542057 | |
OpenMP | 508781 | |||
FTZ DAZ | default | 6026 | ||
OpenMP | 5687 |
結論:特異値分解は並列計算に向いていないアルゴリズムであるため,例えライブラリの一部の計算に並列計算ライブラリが使用されていたとしても,特異値分解の速度はあまり変わらない.
このページは,日本のQiitaのOpenCVのアドベントカレンダー2017の12月6日に記事として書きました.
https://qiita.com/advent-calendar/2017/opencv