How can we update a large Python 2 codebase to Python 3?


Introduction

Python initially started off as Python version2, which is also known as the Legacy Edition. The last edition of Python2 was Python2.7, which went out of service in 2020. Python 3.x was introduced as a replacement, with a host of improvements and bug fixes over the Python 2.x versions. The older Legacy Python was a LTS software, which means it had Long-Time Support. However, Python 3.x versions are backward incompatible releases, which makes it very essential to upgrade Python 2 codebases to Python 3, to fully enjoy the ease and support of Python 3. The biggest reasons for upgrading to Python 3 for developers can be cited as a) Developer productivity( as it is dynamically typed and very easy to learn and write code in) and b) Performance improvements, which includes faster performance in most tasks.

Methods Used for Updating to Python3

  • Rewriting the codebase in Python 3

  • Using the Porting process

Method 1: Rewriting the Entire Codebase

This way of upgrading the codebase is only helpful if the software is built on a small scale. This is because anyone who is upgrading the codebase needs to have an overall idea regarding how the entire codebase works. Rewriting the code in Python 3 can help in implementing the features and ease of use of Python 3, which can shorten code and make it more efficient. Also, if the codebase is migrated using some other method, there might be a problem in implementing Python 3.x features until and unless the entire codebase has been migrated. Rewriting the codebase solves this problem, and also gives us an opportunity to upgrade any codeblock which we have wanted to do for a long time.

However, this method works only when the codebase is on a smaller scalable size.

Method 2: Using the Porting Process

On the other hand, we can use the Python porting process as officially described in the documentation. At a high level, this porting is a three-step process −

  • Auto conversion

  • Manual changes

  • Runtime validation and fix

However, a prerequisite to all this is installing Python 3 first, along with its relevant packages and libraries. Let us see the process for Windows.

Download and install −

https://www.python.org/ftp/python/3.7.4/python-3.7.4.exe

This installs the Python software. After this, the Porting process can be started using the official Python 2 to Python 3 porting modules or software, such as 2to3 and others. This will port the code in Python 3, although the following concerns must be taken care of −

Update the Setup.py File to Denote Python3 Compatibility

The classifiers in the setup file must be updated to contain Programming language :: Python :: 3 . This will allow only Python 3 or certain versions of it to exist in the coding environment (specific version classifiers are also available), which prevents unnecessary backtracking to Python 2 code. This will help greatly in maintaining code integrity, and allow the entire codebase to exist in Python 3 only.

Use the Modernize or Futurize Scripts

As Python 3 is not backwards compatible, all scripts in the codebase must be upgraded to Python 3 standards. Here we can use scripts like Modernize or Futurize at the beginning of each such module which needs to be upgraded. Not all Python features will be used in a module, however a few basic functions must be Modernized to ensure the smooth running of any module. Hence the official Python documentation recommends adding the following code, just to be safe anyways −

from __future__ import absolute_import
from __future__ import division
from __future__ import print_function 

This ensures our current code will not regress and stop working under Python 3 due to some basic dependencies. However, an even better approach will be to use the Pylink project, where its --py3k flag helps to point out the situations where our code has deviated from Python 3 compatibility. This prevents separately running the Modernize or Futurize scripts at the starting of each block, thereby shortening the code and reducing errors, although it is to be noted that Pylint supports only Python 3.4 or higher.

Use Feature Detection While Importing to Ensure Version Compatibility

There might be instances where the Python 2 codebase has modules which does not run in Python 3. By using feature detection, we can learn if our previous project will support Python 3 versions at all. It is a more secure approach to use feature detection instead of version detection to check for the correct version we need to use, which will prevent further problems in future.

try:
   from importlib import abc
except ImportError:
   from importlib2 import abc 

Check While Comparing Between Binary and Text Data

In Python 3, text and binary data types cannot be mixed blatantly, as this will lead to errors. But this check cannot be automated using any other Python library, hence it is better to run custom code blocks to compare string and binary based data, the reason being the Python 3 bytes do not behave in the same way as the old str in Legacy Python.

Having Good Test Coverage

It is very important to have good test coverage to prevent bugs, and shorten the time required to upgrade code by changing only where it is needed. Tools such as Coverage.py help out greatly in such cases, by finding out the difference between what was executed and what was the error to find out the fault at the exact location.

Conclusion

Here we have gone through some methods of migrating a large software codebase from Python 2 to Python 3, along with some of the constraints we should look out for and some useful methodologies which can be used. However, the process may vary to some extent depending on the codebase in consideration and the libraries and modules used in the project. These general steps do work in an overall good manner to convert most codebases to Python 3 and modernize Python code.

Updated on: 02-May-2023

113 Views

Kickstart Your Career

Get certified by completing the course

Get Started
Advertisements