py3compat layer

This is a Python 2/3 compatibility layer for Python extensions. It allows extensions to work on both Python versions with minimal #ifdefs or additional boilerplate.

This README serves as a reference, but also as a porting guide, and documentation of the porting strategy.

Should I Care?

Not necessarily. Samba does not officially support Python 3.

As a Samba developer, you should ensure that your changes work under Python 2. The porting effort can be left to volunteers interested in that.

Python versions

The targeted Python versions are 2.6, 2.7, and 3.3+. Many features were backported from the 3.x series to 2.6, so not supporting 2.5 and below makes life significantly easier.

Design of this library is guided by these principles, in rough order:

Strings, Bytes and Unicode

The most disruptive change between 2.x and 3.x series of Python is the unicode/bytes split.

In Python 2, PyString_* family of functions was used for bytestrings, with implicit conversions if Unicode strings were involved.

In Python 3, PyUnicode_* and PyBytes_* are separate types; aside from certain specific operations they cannot be mixed.

This library defines PyStr_* and PyBytes_* macros to ease porting.

Use PyStr_* for human-readable Unicode strings, i.e. the "str" type in each respective Python version. Unicode strings SHOULD not have embedded null characters. The PyStr_* macros are defined by this library, and correspond to PyUnicode_* on Python 3, and PyString_* on Python 2.

Use PyBytes_* for binary data, that is the "bytes" type in Python 3 and "str" in Python 2. The macros are provided for Python 2; on py3 they are

Use PyUnicode_* for explicit Unicode strings: "str" on Python 3 and "unicode" on Python 2. Only do this if you would have used PyUnicode_* in code for Python 2. (Incidentally, use of PyUnicode_* in Samba is quite limited.)

Use PyString_* for code that is Python 2 only. If you do, your code will fail to compile for Python 3. Until Samba starts supporting Python 3, PyString_* is your safest choice. Python 3 support can be added later by a contributor interested in porting.

To summarize:

Name py3 type py2 type Usage
PyStr_* str str Unicode data
PyBytes_* bytes str Binary data
PyUnicode_* str unicode Explicit Unicode
PyString_* <error> str Unported code

String size

When dealing with Unicode strings, the concept of “size” is tricky, since the number of characters doesn't necessarily correspond to the number of bytes in the UTF-8 representation.

To prevent subtle errors, this library does not provide the PyStr_Size function.

Instead, use PyStr_AsUTF8AndSize.Under Python 3, this as an alias for PyUnicode_AsUTF8AndSize. Under Python 2, it functions like 3's PyUnicode_AsUTF8AndSize, except the string is not encoded (as it is not an Unicode string), the size pointer must not be NULL, and the size may be stored even if an error occurs.

Integers

Use PyInt_* and PyLong_* as you would have used them in Python 2. For Python 3, this library aliases the removed PyInt_* functions/macros to PyLong_*.

Module Initialization

The module creation process was overhauled in Python 3. This library provides a compatibility wrapper so the Python 3 syntax can be used.

PyModuleDef and PyModule_Create

Defining a module with this library is similar to the Python 3 way.

First, create a PyModuleDef structure:

static struct PyModuleDef moduledef = {
    PyModuleDef_HEAD_INIT,
    .m_name = "spam",
    .m_doc = PyDoc_STR("Python wrapper for the spam submodule."),
    .m_size = -1,
    .m_methods = spam_methods,
};

Then, where a Python 2 module would have

m = Py_InitModule3("spam", spam_methods, "Python wrapper ...");

use instead

m = PyModule_Create(&moduledef);

For m_size, use -1. (For the case of the module supporting multiple subinterpreters, 0 is also accepted, but this is tricky to achieve portably.) Additional members of the PyModuleDef structure are accepted, but unused under Python 2. Do not rely on them. See Python documentation for details.

Module creation entrypoint

Instead of the void init<name> function in Python 2, or a Python3-style PyObject *PyInit_<name> function, use the MODULE_INIT_FUNC macro to define an initialization function, and return the created module from it:

MODULE_INIT_FUNC(name)
{
    ...
    m = PyModule_Create(&moduledef);
    ...
    if (error) {
        return NULL;
    }
    ...
    return m;
}

Under Python 3, the macro expands to the PyInit_<name> function header (including a prototype, to squelch -Wmissing-prototypes warnings). For Python 2, it additionally defines an init<name> function that calls PyInit_<name> and discards the result.

Adding module-level constants

Samba includes code like this

PyModule_AddObject(m, "RDWR", PyInt_FromLong(O_RDWR));
PyModule_AddObject(m, "__version__", PyString_FromString(MOD_VERSION));

Python 2.6 introduced convenience functions for this, which are shorter to write

PyModule_AddIntConstant(m, "RDWR", O_RDWR)
PyModule_AddStringConstant(m, "__version__", MOD_VERSION)

PyModule_AddStringConstant will use the string type native to the current Python version.

Comparisons

The __cmp__-based object comparison has been removed in favor of rich comparison. This means that instead of

static int cmp(PyObject *_obj1, PyObject *_obj2)

function in the tp_compare slot, there is now

static PyObject* richcmp(PyObject *_obj1, PyObject *_obj2, int op)

in the tp_richcompare slot. The op argument specifies the comparison operation: Py_EQ (==), Py_GT (>), Py_LE (<=), etc.

This mechanism is available in Python 2.6.

Additionally, Python 3 brings a semantic change. Previously, objects of disparate types were ordered according to type, where the ordering of types was undefined (but consistent across an invocation of Python). In Python 3, objects of different types are unorderable. To explicitly fall back to default behavior, the richcmp function can return NotImplemented.

To help porting from __cmp__ operations, this library defines a convenience macro, PY_RICHCMP, which evaluates to the right PyObject * result based on two values orderable by C's comparison operators. A typical rich comparison function will look something like this:

static PyObject* mytype_richcmp(PyObject *obj1, PyObject *obj2, int op)
{
    if (mytype_Check(obj1) && mytype_Check(obj2)) {
        return PY_RICHCMP(get_data(obj1), get_data(obj2), op);
    }
    Py_RETURN_NOTIMPLEMENTED;
}

where get_data returns e.g. a pointer or int, and mytype_Check checks if get_data can be called on an object (usually via PyObject_TypeCheck). If a "cmp"-style function is provided by the wrapped C library, use PY_RICHCMP(cmp(obj1, obj2), 0, op).

(The API is meant to discourage implementing cmp as (obj1 - obj2), which is undefined with pointers that aren't part of an array.)

This library defines the Py_RETURN_NOTIMPLEMENTED macro if it's not provided by your Python version.

Python Objects

The "ob_type" member of PyObject structs is gone in Python 3, to conform to C's strict aliasing rules (see PEP 3123). The Py_TYPE macro is to be used instead. Similarly, Py_REFCNT and Py_SIZE macros should be used to access ob_refcnt and ob_size members.

For initialization of type objects, the sequence

PyObject_HEAD_INIT(NULL)
0, /* ob_size */

must be replaced with

PyVarObject_HEAD_INIT(NULL, 0)

Python 3 removed the type flags Py_TPFLAGS_HAVE_WEAKREFS and Py_TPFLAGS_HAVE_ITER. This library defines them as 0.

Unhandled Changes

The CObject API is not in Python 3. Samba's use of CObject is very limited, and it is expected to be replaced by other mechanisms.

The Build System

This section is quite Samba-specific.

ABI flags in library filename

In Python 3, extension libraries have a flag attached to the filename. Instead of spam.so, the library is named spam.cpython-34m.so. Files with the shorter "generic" name are loaded if the file specific to a given version does not exist. In addition to making it the target Python version clear, this allows for keeping modules for several Python versions in one directory.

Samba's build system uses this feature. Libraries with only the .so extension are built for Python 2 only. (This may change if and when Samba will officially support Python 3, or even its stable ABI.)

Building for Python 3

To build for Python 3, define the environment variable PYTHON=/usr/bin/python3 at configure time. This only works when building components that have been ported to Python 3.

Configuring with "--extra-python=/usr/bin/python3" will build a copy of the Python 3 bindings in addition to the ones for Python 2. These will get tested with make test.

To support this, changes to the wscript need to be made. In the configure step, use SAMBA_CHECK_PYTHON instead of calling find_program, check_tool and check_python_version indifidually.

Then, for each

bld.SAMBA_PYTHON('<modulename>',
                 ...,
                 realname='<modulename>.so')

create a corresponding call for the "extrapython" module:

if bld.env['EXTRA_PYTHON']:
    bld.SAMBA_PYTHON('extra-<modulename>',
                     ...,
                     realname='<modulename>.so',
                     extra_python=True)

For non-module libraries that uses Python runtime, i.e. "pyfeature='pyembed'", use the pyembed_libname helper to derive the filename. This name must then be used by all dependencies. For example:

name = bld.pyembed_libname('<modulename>')
bld.SAMBA_LIBRARY(name,
                  pyfeature='pyembed',
                  pc_files='<modulename>.pc',
                  ...)

Then, also build a version for "extrapython":

if bld.env['EXTRA_PYTHON']:
    name = bld.pyembed_libname('<modulename>', extrapython=True)
    bld.SAMBA_LIBRARY(name,
                      pyfeature='extrapyembed',
                      pc_files=None,
                      ...)

The @PYTHON_SO_ABI_FLAG@ conf variable is available for use in templates, such as .pc.in files for pkgconfig. It should be affixed to "pyembed" library names.

Testing Python libraries

The function samba_utils.RUN_PYTHON_TESTS will run given test scripts. When building for two Python versions, it'll test on both.

References

The "Migrating C extensions" chapter from Lennart Regebro's book "Porting to Python 3" is a good general guide.

The Python documentation contains a C porting guide as well.