Background
One can often use pre-built python distributions to setup python working environments, for example Anaconda, Enthought Canopy. But these distributions has the following common problems, which reduces their usefulness:
-
It’s hard to upgrade packages without internet connections. It’s also hard to install packages not contained in the distribution.
-
It does not play well with existing packages on the target systems. For example, Anaconda bundles mpich, which conflict with system MPIs.
-
These distributions has poor performances. Since they are built on relatively old platforms and possibly with no optimizations, they are often slower than native python. For example, I found anaconda python on a RHEL 6.3 system very slow to start.
For me, it’s essential to deploy my python working environments with the following considerations:
-
Easy to install/upgrade packages on demand, under restricted connections. Some systems I work with does not have internet connections.
-
It shall integrate well with existing python packages. Some packages such as PyQt5 requires deep system integration so I would better install them with system package manager.
-
Good performance. The distributed packages shall run fast and be responsive on the target platform.
The key ideas
The first is pip user install. With pip
, we have the most complete
python package collection. Almost all packages are available in PyPI, and we
can always make home-brew python packages pip-compatible. pip
is a standard
python module since python 2.7.6 and is easy to install for prior python
versions. So pip
serves as a built-in package manager. pip
supports
installing packages in the users directory and plays well with existing
packages. For example, pip install
will skip a package if already available
and capable. pip
user install adds paths to standard python module paths,
instead of overwriting them.
The second is python requirement file. pip
supports install packages and
its dependencies from a requirement file. That is, you list packages you
want, pip
automatically solves the dependency, grabs proper packages and
install them. It makes both install and upgrade a piece of cake.
The last is python wheels. Python wheels is simply pre-built python packages. The real power with python wheels is that you can build your own wheel for packages in a requirement list, together with all dependencies. After that, you can install directly from wheels without the need of a network connection.
In practice
With the above three key ideas, I develop the following method to solve the python working environment distribution problem:
-
Set up a base CPython version. One can choose to rely on system python, or compile their own CPython.
-
Maintain a requirement file. Only list packages you need in the requirements. You can also specify version constraints.
-
Build wheels for distribution.
pip install --user --upgrade wheel pip wheel -r requirements.txt
-
Distributing wheels and install packages from wheels.
pip install --user --upgrade -r requirements.txt --no-index -f ${WHEELSDIR}
-
Enjoy.
The above method is used to distribute my python working environments in 4 different platforms: two Fedora 22 workstations, one RHEL 6.3 system and one RHEL 5.4 system. It works.