Basic environment: CentOS 7, NVIDIA drivers (CUDA 10.1.243), Conda 4.10.
I hate Conda!
Python 2 Environment
First attempt is with Python 2.7 since the readme says
The code has been tested in python 2.7, Tensorflow 1.13
Create Conda environment:
$ conda create -n mgn-py27 python=2.7 cudatoolkit=10.1 cudnn=7.6.5
$ conda activate mgn-py27
MultiGarment-Network
has 2 dependencies that we need to install manually.
Install dirt
Install dependencies:
(mgn-py27)$ conda install cmake gcc_linux-64
(mgn-py27)$ conda install tensorflow-gpu=1.13
But pip list
is not showing tensorflow-gpu
package. So we need to install it again from pypi before building and installation:
(mgn-py27)$ pip install --ignore-installed tensorflow==1.13.1
(mgn-py27)$ pip install .
Install Mesh
(mgn-py27)$ conda install gxx_linux-64 opencv
(mgn-py27)$ git checkout 1761d544686b3735991954947a8befa759891eb4
(mgn-py27)$ make
(mgn-py27)$ cd dist && pip install psbody_mesh-0.1-cp27-cp27mu-linux_x86_64.whl
Run MultiGarment-Network
(mgn-py27)$ conda install matlabplot
(mgn-py27)$ pip install "scikit-learn<0.18" chumpy
(mgn-py27)$ python test_network.py
Using dirt renderer.
....
Done
freeglut (mesh_viewer): ERROR: Internal error <FBConfig with necessary capabilities not found> in function fgOpenWindow
Python 3 Environment (WIP)
Create a Conda environment:
$ conda create -n mgn-py36 python=3.6 cudatoolkit=10.0 cudnn=7.6.5
$ conda activate mgn-py36
Install dirt
Note the troubleshooting section in the description:
If you are using TensorFlow 1.14, there are some binary compatibility issues when using older versions of python (e.g. 2.7 and 3.5), due to compiler version mismatches. These result in a segfault at tensorflow::shape_inference::InferenceContext::GetAttr
or similar. To resolve, either upgrade python to 3.7, or downgrade TensorFlow to 1.13, or build DIRT with gcc 4.8
Thus we will use an older version 1.13 of Tensorflow to avoid this known issue. And since the package from conda-forge
messes up the dependencies, we must specify channel to anaconda
:
(mgn-py36)$ conda install -c anaconda tensorflow-gpu=1.13
(mgn-py36)$ pip install tensorflow-gpu==1.13.1
Then install rest dependencies for building:
(mgn-py36)$ conda install cmake gcc_linux-64
Finally build and install dirt
, but with some environment variables:
(mgn-py36)$ export CUDA_HOME=/usr/local/cuda-10.1
(mgn-py36)$ export PATH=$CUDA_HOME/bin:$PATH
(mgn-py36)$ pip install .
(mgn-py36)$ python tests/square_test.py
....
successful: all pixels agree
Install Mesh
(mgn-py36)$ conda install boost
(mgn-py36)$ conda install pyopengl pillow pyzmq pyyaml
(mgn-py36)$ conda install gxx_linux-64
(mgn-py36)$ make all
(mgn-py36)$ make tests
....
Ran 28 tests in 5.882s
Convert MultiGarment-Network to Python 3
(mgn-py36)$ conda install matplotlib scikit-learn==0.17 chumpy
Since we are running Python 3, we will need to replace cPickle
with _pickle
as the former does not exist.
(mgn-py36)$ find ./ -type f -name \*.py -exec sed -i -e 's/cPickle/_pickle/g' {} \;
(mgn-py36)$ 2to3 -w -n MultiGarmentNetwork/
Due to some incompatibility, Pickle un-serialization will fail:
Traceback (most recent call last):
File "test_network.py", line 171, in <module>
_, faces = pkl.load(f)
UnicodeDecodeError: 'ascii' codec can't decode byte 0x8c in position 16: ordinal not in range(128)
You will need to change the Pickle loading to
_, faces = pkl.load(open(file_path, 'rb'), encoding='latin1')
A too-old scikit-learn
is also a problem:
Traceback (most recent call last):
File "test_network.py", line 176, in <module>
pca_verts[garment] = pkl.load(f)
File "$CONDA_HOME/lib/python3.6/site-packages/sklearn/decomposition/__init__.py", line 10, in <module>
from .kernel_pca import KernelPCA
File "$CONDA_HOME/lib/python3.6/site-packages/sklearn/decomposition/kernel_pca.py", line 13, in <module>
from ..metrics.pairwise import pairwise_kernels
File "$CONDA_HOME/lib/python3.6/site-packages/sklearn/metrics/__init__.py", line 33, in <module>
from . import cluster
File "$CONDA_HOME/lib/python3.6/site-packages/sklearn/metrics/cluster/__init__.py", line 8, in <module>
from .supervised import adjusted_mutual_info_score
File "$CONDA_HOME/lib/python3.6/site-packages/sklearn/metrics/cluster/supervised.py", line 14, in <module>
from scipy.misc import comb
ImportError: cannot import name 'comb'
Changing to from scipy.special import comb
will solve the problem.
Until now the network should run without syntax error. Some report that the network could run after the above changes.
But here we will still get this error:
Traceback (most recent call last):
File "test_network.py", line 185, in <module>
pred = get_results(m, dat)
File "test_network.py", line 55, in get_results
out = m([images, vertex_label, J_2d])
File "$CONDA_PREFIX/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 592, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/public/wl4/clothing/MultiGarmentNetwork-py3/network/base_network.py", line 336, in call
garm_model_outputs = [fe(latent_code_offset_ShapeMerged) for fe in self.garmentModels]
File "/public/wl4/clothing/MultiGarmentNetwork-py3/network/base_network.py", line 336, in <listcomp>
garm_model_outputs = [fe(latent_code_offset_ShapeMerged) for fe in self.garmentModels]
File "$CONDA_PREFIX/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 592, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/public/wl4/clothing/MultiGarmentNetwork-py3/network/base_network.py", line 65, in call
x = self.PCA_(pca_comp)
File "$CONDA_PREFIX/lib/python3.6/site-packages/tensorflow/python/keras/engine/base_layer.py", line 592, in __call__
outputs = self.call(inputs, *args, **kwargs)
File "/public/wl4/clothing/MultiGarmentNetwork-py3/network/custom_layers.py", line 33, in call
return tf.reshape(tf.matmul(x, self.components) + self.mean, (-1, K.int_shape(self.mean)[0] / 3, 3))
File "$CONDA_PREFIX/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 7161, in reshape
tensor, shape, name=name, ctx=_ctx)
File "$CONDA_PREFIX/lib/python3.6/site-packages/tensorflow/python/ops/gen_array_ops.py", line 7206, in reshape_eager_fallback
ctx=_ctx, name=name)
File "$CONDA_PREFIX/lib/python3.6/site-packages/tensorflow/python/eager/execute.py", line 66, in quick_execute
six.raise_from(core._status_to_exception(e.code, message), None)
File "<string>", line 3, in raise_from
tensorflow.python.framework.errors_impl.InvalidArgumentError: Value for attr 'Tshape' of float is not in the list of allowed values: int32, int64
; NodeDef: {{node Reshape}}; Op<name=Reshape; signature=tensor:T, shape:Tshape -> output:T; attr=T:type; attr=Tshape:type,default=DT_INT32,allowed=[DT_INT32, DT_INT64]> [Op:Reshape]
There is also a repository containing Python 3 version of MultiGarment-Network: