All these problems happened when trying to compile caffe
to verify 3Dpose_ssl#661b5d1 project.
Include and Library Paths
In Makefile.config
, include and libirary paths need to be adjusted as per environment. In Conda virtual environment, package headers are install to $CONDA_PREFIX/include
and packages libiraries to $CONDA_PREFIX/lib
. Therefore, theses paths need to be added.
Additionally, the author have specify ANACONDA_HOME
in Makefile.config
. Since Conda is used here and corresponsing paths are configured, this line needs to be commented out to make sure the compile process works well.
Protobuf Version Misty
Protobuf is used to serializing structured data in caffe. It is hightly version-sensitive, and if incorrect version is installed, errors would be thrown out during the compilation.
caffe/include/caffe/proto/caffe.pb.h:17:2: error: #error This file was generated by an older version of protoc which is
#error This file was generated by an older version of protoc which is
caffe/include/caffe/proto/caffe.pb.h:18:2: error: #error incompatible with your Protocol Buffer headers. Please
#error incompatible with your Protocol Buffer headers. Please
caffe/include/caffe/proto/caffe.pb.h:19:2: error: #error regenerate this file with a newer version of protoc.
#error regenerate this file with a newer version of protoc.
Dig into generated caffe/include/caffe/proto/caffe.pb.h
and search for the error messages, more details could be found
The version number 3006000
gives a hint that protobuf
3.6.0 is used to generate the headers. So installing protobuf 3.6.0 in Conda will solve the problem.
(conda)$ conda install protobuf=3.6.0
Missing cblas.h
BLAS is an essential dependency of caffe.
Basic Linear Algebra Subprograms (BLAS) is a specification that prescribes a set of low-level routines for performing common linear algebra operations such as vector addition, scalar multiplication, dot products, linear combinations, and matrix multiplication. They are the de facto standard low-level routines for linear algebra libraries; the routines have bindings for both C and Fortran.
BLAS could be implemented by any project, including OpenBLAS as a popular open-source one. Conda provides OpenBLAS prebuilt package.
(conda)$ conda install -c anaconda openblas
Incorrect Number of cuDNN Parameters
Newer versions of cuDNN feature a change that cudnnSetConvolution2dDescriptor
function needs 2 more parameters. This will cause compilation errors.
CXX src/caffe/data_transformer.cpp
In file included from ./include/caffe/util/device_alternate.hpp:40:0,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from ./include/caffe/data_transformer.hpp:6,
from src/caffe/data_transformer.cpp:8:
./include/caffe/util/cudnn.hpp: In function ‘const char* cudnnGetErrorString(cudnnStatus_t)’:
./include/caffe/util/cudnn.hpp:21:10: warning: enumeration value ‘CUDNN_STATUS_RUNTIME_PREREQUISITE_MISSING’ not handled in switch [-Wswitch]
switch (status) {
^
./include/caffe/util/cudnn.hpp: In function ‘void caffe::cudnn::setConvolutionDesc(cudnnConvolutionStruct**, cudnnTensorDescriptor_t, cudnnFilterDescriptor_t, int, int, int, int)’:
./include/caffe/util/cudnn.hpp:113:70: error: too few arguments to function ‘cudnnStatus_t cudnnSetConvolution2dDescriptor(cudnnConvolutionDescriptor_t, int, int, int, int, int, int, cudnnConvolutionMode_t, cudnnDataType_t)’
pad_h, pad_w, stride_h, stride_w, 1, 1, CUDNN_CROSS_CORRELATION));
^
./include/caffe/util/cudnn.hpp:15:28: note: in definition of macro ‘CUDNN_CHECK’
cudnnStatus_t status = condition; \
^
In file included from ./include/caffe/util/cudnn.hpp:5:0,
from ./include/caffe/util/device_alternate.hpp:40,
from ./include/caffe/common.hpp:19,
from ./include/caffe/blob.hpp:8,
from ./include/caffe/data_transformer.hpp:6,
from src/caffe/data_transformer.cpp:8:
/usr/local/cuda-8.0/include/cudnn.h:500:27: note: declared here
cudnnStatus_t CUDNNWINAPI cudnnSetConvolution2dDescriptor( cudnnConvolutionDescriptor_t convDesc,
^
Makefile:585: recipe for target '.build_release/src/caffe/data_transformer.o' failed
make: *** [.build_release/src/caffe/data_transformer.o] Error 1
There are some discussions and a solution in caffe issues.
Faced the same problem. It's happening due to cudnn.hpp (Location: include/caffe/util/cudnn.hpp) . Update cudnn.hpp file. It is not considering the current cuDNN versions.
Since 3Dpose_ssl
project uses an old version of caffe, the solution is quite simple. Replacing the cudnn.hpp
from latest caffe will address the problem.
(conda)$ wget https://github.com/BVLC/caffe/blob/master/include/caffe/util/cudnn.hpp -O include/caffe/util/cudnn.hpp
Cannot Link libjpeg
or libpng
(conda)$ make all
CXX/LD -o .build_release/tools/upgrade_solver_proto_text.bin
/usr/bin/ld: warning: libjpeg.so.8, needed by /public/wl4/anaconda3/envs/pose27/lib/libopencv_highgui.so, not found (try using -rpath or -rpath-link)
/usr/bin/ld: warning: libpng16.so.16, needed by /public/wl4/anaconda3/envs/pose27/lib/libopencv_highgui.so, not found (try using -rpath or -rpath-link)
According to facebookarchive/caffe2 Issue#1693: Cannot link OpenCV because of libjpeg (Anaconda), there are two possible solutions here:
1. Export the LD_LIBRARY_PATH environment variable to equal your Anaconda lib directory
2. Install the OpenCV Anaconda package and make sure that Caffe2 uses it (preferred)
So the solution is to make sure there are corresponding packages installed in Conda environment and specify LD_LIBRARY_PATH
.
(conda)$ conda install -c anaconda libpng jpeg
(conda)$ LD_LIBRARY_PATH=$CONDA_PREFIX/lib make all -j56
Again... Protobuf Linking Failure
(conda)$ protoc --version
libprotoc 3.6.0
(conda)$ LD_LIBRARY_PATH=$CONDA_PREFIX/lib make all
CXX/LD -o .build_release/tools/upgrade_solver_proto_text.bin
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::internal::WireFormatLite::WriteStringMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::io::CodedOutputStream::WriteStringWithSizeToArray(std::string const&, unsigned char*)'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::Message::GetTypeName() const'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::MessageFactory::InternalRegisterGeneratedFile(char const*, void (*)(std::string const&))'
.build_release/lib/libcaffe.so: undefined reference to `leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::Message::DebugString() const'
.build_release/lib/libcaffe.so: undefined reference to `google::base::CheckOpMessageBuilder::NewString()'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::internal::OnShutdownDestroyString(std::string const*)'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::internal::WireFormatLite::WriteBytesMaybeAliased(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::MessageLite::ParseFromString(std::string const&)'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::internal::NameOfEnum(google::protobuf::EnumDescriptor const*, int)'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::internal::fixed_address_empty_string'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::internal::WireFormatLite::WriteString(int, std::string const&, google::protobuf::io::CodedOutputStream*)'
.build_release/lib/libcaffe.so: undefined reference to `leveldb::Status::ToString() const'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::internal::AssignDescriptors(std::string const&, google::protobuf::internal::MigrationSchema const*, google::protobuf::Message const* const*, unsigned int const*, google::protobuf::Metadata*, google::protobuf::EnumDescriptor const**, google::protobuf::ServiceDescriptor const**)'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::internal::WireFormatLite::ReadBytes(google::protobuf::io::CodedInputStream*, std::string*)'
.build_release/lib/libcaffe.so: undefined reference to `google::protobuf::Message::InitializationErrorString() const'
collect2: error: ld returned 1 exit status
This is caused by incompatible compiler version when building protobuf
. This happens because the environment, CentOS 7, delivers GCC 4.8.5 while pre-built binary packages in Conda are using GCC 7. There is no compatibility between big versions of GCC.
To solve the problem, a manual rebuild is effective.
Notice that the same problem also happens when linking to leveldb::DB::Open(leveldb::Options const&, std::string const&, leveldb::DB**)'
. Rebuild leveldb
with current GCC and add the compilation output into LIBRARY_PATH
will solve the problem.
CUDA Driver Version is Insufficient for CUDA Runtime Version
The compilation is over. When running caffe
to train a model, it still gives error.
(conda)$ LD_LIBRARY_PATH=$CONDA_PREFIX/lib:/usr/local/cuda/lib64:$LEVELDB_SRC/out-shared ./build/tools/caffe train -gpu=all -solver=$MODEL/solver.prototxt -weights=$WEIGHTS/pose_iter_320000.caffemodel 2>&1 | tee -a train.log
F1228 17:38:43.572676 103290 caffe.cpp:93] Check failed: error == cudaSuccess (35 vs. 0) CUDA driver version is insufficient for CUDA runtime version
*** Check failure stack trace: ***
@ 0x2b83da5b7a3d google::LogMessage::Fail()
@ 0x2b83da5bce7a google::LogMessage::SendToLog()
@ 0x2b83da5b9b20 google::LogMessage::Flush()
@ 0x2b83da5b9e0d google::LogMessageFatal::~LogMessageFatal()
@ 0x4082fe get_gpus()
@ 0x409215 train()
@ 0x406adc main
@ 0x2b83fa54ec05 __libc_start_main
@ 0x407523 (unknown)
Check the CUDA driver version and CUDA runtime version first.
# Check CUDA Toolkit version
(conda)$ /usr/local/cuda/bin/nvcc --veresion
nvcc: NVIDIA (R) Cuda compiler driver
Copyright (c) 2005-2019 NVIDIA Corporation
Built on Sun_Jul_28_19:07:16_PDT_2019
Cuda compilation tools, release 10.1, V10.1.243
#Check NVIDIA driver version
(conda)$ cat /proc/driver/nvidia/version
NVRM version: NVIDIA UNIX x86_64 Kernel Module 418.87.00 Thu Aug 8 15:35:46 CDT 2019
GCC version: gcc version 4.8.5 20150623 (Red Hat 4.8.5-16) (GCC)
According to CUDA Tookit Documentation, running a CUDA application requires the system with a driver that is compatible with the CUDA Toolkit. However, CUDA 10.1 is compatible with NVIDIA driver version 418.87, which is clearly greater than version 418.39. It must be a dynamic library problem.
(conda)$ LD_LIBRARY_PATH=$CONDA_PREFIX/lib:/usr/local/cuda/lib64:$LEVELDB_SRC/out-shared ldd ./build/tools/caffe | grep cuda
libcudart.so.10.2 => /public/wl4/anaconda3/envs/pose27/lib/libcudart.so.10.2 (0x00002b365d546000)
Check Conda environment about CUDA packages and it appeared that CUDA 10.2 libraries came from cudnn
and cudatoolkit
package.
(conda)$ conda list | grep cuda
cudatoolkit 10.2.89 hfd86e86_0 anaconda
cudnn 7.6.5 cuda10.2_0 anaconda
Therefore, installing a cudnn
depedent on CUDA 10.1 will solve the problem.
(conda)$ conda search --info cudnn
....
cudnn 7.6.5 cuda10.1_0
----------------------
file name : cudnn-7.6.5-cuda10.1_0.conda
name : cudnn
version : 7.6.5
build : cuda10.1_0
build number: 0
size : 179.9 MB
license : Proprietary
subdir : linux-64
url : https://repo.anaconda.com/pkgs/main/linux-64/cudnn-7.6.5-cuda10.1_0.conda
md5 : 48850e851b910b694192f417e860fba3
timestamp : 2019-12-19 21:21:03 UTC
dependencies:
- cudatoolkit >=10.1,<10.2
....
(conda)$ conda install cudnn=7.6.5=cuda10.1_0
Collecting package metadata (current_repodata.json): done
Solving environment: done
## Package Plan ##
environment location: /public/wl4/anaconda3/envs/pose27
added / updated specs:
- cudnn==7.6.5=cuda10.1_0
The following packages will be downloaded:
package | build
---------------------------|-----------------
cudatoolkit-10.1.243 | h6bb024c_0 347.4 MB defaults
cudnn-7.6.5 | cuda10.1_0 179.9 MB defaults
------------------------------------------------------------
Total: 527.4 MB
The following packages will be SUPERSEDED by a higher-priority channel:
cudatoolkit anaconda::cudatoolkit-10.2.89-hfd86e8~ --> pkgs/main::cudatoolkit-10.1.243-h6bb024c_0
cudnn anaconda::cudnn-7.6.5-cuda10.2_0 --> pkgs/main::cudnn-7.6.5-cuda10.1_0
Proceed ([y]/n)? y
Downloading and Extracting Packages
cudnn-7.6.5 | 179.9 MB | ################################################################################################################################################################################################## | 100%
cudatoolkit-10.1.243 | 347.4 MB | ################################################################################################################################################################################################## | 100%
Preparing transaction: done
Verifying transaction: done
Executing transaction: done
Check failed: Caffe::root_solver() || root_net_ root_net_ needs to be set for all non-root solvers
According to Move root_net_ check in net constructor #4806, there is the solution here:
Tips: this problem is caused because the lstm network does not support muti-gpu, you should change the cpp file and remake the caffe.