[Translation] Pointers in Python: what’s the point?

[Translation] Pointers in Python: what’s the point?



If you have ever worked with such low-level languages ​​like C or C ++, you probably heard about pointers. They allow you to greatly increase the effectiveness of different pieces of code. But they can also confuse newbies - and even experienced developers - and lead to memory management bugs. Are there any pointers in Python, can I emulate them somehow?

Pointers are widely used in C and C ++. In fact, these are variables that contain the memory addresses for which other variables are located. To refresh your pointers, read this review .

Thanks to this article, you will better understand the object model in Python and find out why pointers do not really exist in this language. In case you need to imitate the behavior of pointers, you will learn how to emulate them without a concurrent memory management nightmare.

With this article you:

  • Find out why there are no pointers in Python.
  • Learn the difference between C variables and names in Python.
  • Learn how to emulate pointers in Python.
  • Use ctypes to experiment with real pointers.

Note : Here the term “Python” is applied to the Python implementation of C, which is known as CPython. All discussions of the device language are valid for CPython 3.7, but may not correspond to subsequent iterations.

Why are there no pointers in Python?


I do not know. Can pointers exist natively in Python? Probably, but apparently, pointers contradict the concept of Zen of Python , because they provoke implicit changes instead of explicit. Often, pointers are quite complex, especially for beginners. Moreover, they push you to bad decisions or to do something really dangerous, like reading from a memory area you shouldn’t read from.

Python tries to abstract from the user implementation details, such as memory addresses. Often in this language the emphasis is on ease of use, not speed. Therefore, pointers in Python do not make much sense. But don't worry, by default the language gives you some advantages of using pointers.

To deal with pointers in Python, let's take a quick look at the features of the language implementation. In particular, you need to understand:

  1. What are mutable and immutable objects.
  2. How are variables/names arranged in Python.

Hold on to your memory addresses, let's go!

Objects in Python


Everything in Python is an object. For example, open a REPL and see how isinstance () is used:

  & gt; & gt; & gt;  isinstance (1, object)
 True
 & gt; & gt; & gt;  isinstance (list (), object)
 True
 & gt; & gt; & gt;  isinstance (True, object)
 True
 & gt; & gt; & gt;  def foo ():
 ... pass
 ...
 & gt; & gt; & gt;  isinstance (foo, object)
 True  

This code demonstrates that everything in Python is actually objects. Each object contains at least three types of data:

  • Link count.
  • Type.
  • Value.

The reference counter is used to manage memory. Details about this management are written in Memory Management in Python . The type is used at the CPython level to ensure type safety during execution (runtime). And the value is the actual value associated with the object.

But not all objects are the same.There is one important difference: objects are changeable and unchangeable. Understanding this difference between object types will help you become more aware of the first layer of the onion, which is called “Python pointers.”

Variable and unchangeable objects


There are two types of objects in Python:

  1. Unchangeable objects (cannot be changed);
  2. Editable objects (subject to change).

Awareness of this difference is the first key to traveling the world of pointers in Python. Here is a description of the immutability of some popular types:

Type
Unchangeable?
int
Yes
float
Yes
bool
Yes
complex
Yes
tuple
Yes
frozenset
Yes
str
Yes
list
No
set
No
dict
No

As you can see, many of the commonly used primitive types are immutable. You can check this by writing some Python code. You will need two tools from the standard library:

  1. id () returns the memory address of the object;
  2. is returns True if and only if two objects have the same memory address.

You can run this code in the REPL environment:

  & gt; & gt; & gt;  x = 5
 & gt; & gt; & gt;  id (x)
 94529957049376  

Here we set the variable x to 5 . If you try to change the value using addition, you will get a new object:

  & gt; & gt; & gt;  x + = 1
 & gt; & gt; & gt;  x
 6
 & gt; & gt; & gt;  id (x)
 94529957049408  

Although it may seem that this code simply changes the value of x , but in fact you get a new object as a response.

The str type is also unchangeable:

  & gt; & gt; & gt;  s = "real_python"
 & gt; & gt; & gt;  id (s)
 140637819584048
 & gt; & gt; & gt;  s + = "_rocks"
 & gt; & gt; & gt;  s
 'real_python_rocks'
 & gt; & gt; & gt;  id (s)
 140637819609424  

And in this case, s after the operation + = gets another memory address.

Bonus : The + = operator is converted to various method calls.

For some objects, such as a list, + = converts to __ iadd __ () (local add). It will change itself and return the same ID. However, str and int do not have these methods, and as a result __ add __ () will be called instead of __ iadd __ () .

Learn more about this in the data model documentation Python .

If we try to directly change the string value of s , we get an error:

  & gt; & gt; & gt;  s [0] = "R"  

Reverse tracing (the most recent calls are displayed last):

  File "& lt; stdin & gt;", line 1, in & lt; mydule & gt;
 TypeError: 'str' object support item assignment  

The above code fails and Python reports that str does not support this change, which corresponds to the definition of immutability of the type str .

Compare with a variable object, for example, with a list:

  & gt; & gt; & gt;  my_list = [1, 2, 3]
 & gt; & gt; & gt;  id (my_list)
 140637819575368
 & gt; & gt; & gt;  my_listappend (4)
 & gt; & gt; & gt;  my_list
 [1, 2, 3, 4]
 & gt; & gt; & gt;  id (my_list)
 140637819575368  

This code demonstrates the main difference between the two types of objects. Initially, my_list has an ID. Even after adding 4 to the list, my_list still has the same ID. The reason is that the list type is changeable.

Here is another demonstration of list variability with assignment:

  & gt; & gt; & gt;  my_list [0] = 0
 & gt; & gt; & gt;  my_list
 [0, 2, 3, 4]
 & gt; & gt; & gt;  id (my_list)
 140637819575368  

In this code, we changed my_list and set it as the first element of 0 . However, the list retained the same ID after this operation. The next step on our path to knowing Python will be exploring its ecosystem.

Understanding Variables


Variables in Python are fundamentally different from variables in C and C ++. In fact, they are simply not in Python. Names instead of variables .

This may sound pedantic, and for the most part the way it is. Most often, you can take names in Python as variables, but you need to understand the difference. This is especially important when studying such a difficult topic as pointers.

To make it easier for you to understand, let's see how variables work in C, what they represent, and then compare it with the work of names in Python.

Variables in C


Take the code that defines the x variable:

  int x = 2337;  

Execution of this short line goes through several different stages:

  1. Allocating enough memory for a number.
  2. Assigning 2337 to this place in memory.
  3. Display that x indicates this value.

Simplified memory may look like this:



Here, the x variable has the fake 0x7f1 address and 2337 . If you later want to change the value of x , you can do so:

  x = 2338;  

This code assigns the variable x a new value to 2338 , thus overwriting the previous value. This means that the variable x is changeable . Updated memory map for new value:



Please note that the location of x has not changed, only the value itself. It is important. It tells us that x - is a place in memory , not just a name.

You can also consider this issue within the concept of ownership. On the one hand, x owns a place in memory. First, x is an empty box that can contain only one number (integer) in which integer values ​​can be stored.

When you assign a value to x , you put the value in the box that belongs to x . If you want to submit a new variable y , you can add this line:

  int y = x;  

This code creates a new box called y and copies the value from x into it. Now the memory circuit looks like this:



Notice the new location of y - 0x7f5 .Although the value of x was copied to y , the variable y holds the new address in memory. Therefore, you can overwrite the value of y without affecting x :

  y = 2339;  

Now the memory circuit looks like this:



I repeat: you changed the value of y , but not the location. In addition, you did not affect the source variable x .

Named in Python is a completely different situation.

Names in Python


There are no variables in Python, instead of names. You can use the term “variables” at your discretion, but it is important to know the difference between variables and names.

Let's take the equivalent code from the above example in C and write it in Python:

  & gt; & gt; & gt;  x = 2337  

As in C, during the execution of this code goes through several separate stages:

  1. PyObject is created.
  2. A number for PyObject’s is assigned a typecode.
  3. 2337 is assigned a value for the PyObject’a.
  4. Create the name x .
  5. x points to a new PyObject.
  6. PyObject's link count is incremented by 1.

Note : PyObject is not the same the very thing that an object in Python, this entity is characteristic of CPython and represents the basic structure of all Python objects.

PyObject is defined as a C-structure, so if you are wondering why you cannot directly call a typecode or a reference counter, the reason is that you do not have direct access to the structures. Calling methods like sys.getrefcount () can help you get some internal things. < br/>
If we talk about memory, it might look like this:



Here, the memory circuit is very different from the circuit in C shown above. Instead of x owning a block of memory in which 2337 is stored, the newly created Python object owns the memory in which 2337 lives. The python name x does not directly own any address in memory, as the C-variable owns a static cell.

If you want to assign x a new value, try this code:

  & gt; & gt; & gt;  x = 2338  

The behavior of the system will be different from what happens in C, but will not be too different from the original binding (bind) in Python.

In this code:

  • New PyObject is being created.
  • A number for PyObject’s is assigned a typecode.
  • 2 is assigned a value for the PyObject’a.
  • x points to a new PyObject.
  • The link count of the new PyObject is incremented by 1.
  • The reference count for the old PyObject is decremented by 1.

Now the memory circuit looks like this:



This illustration shows that x points to an object reference and does not own the memory area as it used to. You also see that the command x = 2338 is not an assignment, but rather a binding of the name x to the link.

In addition, the previous object (containing 2337 ) is now in memory with a reference count of 0, and will be removed garbage collector .

You can enter a new name y , as in the C example:

  & gt; & gt; & gt;  y = x  

A new name will appear in memory, but not necessarily a new object:



Now you see that a new Python object is not created, only a new name is created that points to the same object. In addition, the object reference count increased by 1. You can check the equivalence of the identity of objects to confirm their sameness:

  & gt; & gt; & gt;  y is x
 True  

This code indicates that x and y are one object. But make no mistake: y is still immutable. For example, you can perform an addition operation with y :

  & gt; & gt; & gt;  y + = 1
 & gt; & gt; & gt;  y is x
 False  

After the addition call, you will be returned a new Python object. Now the memory looks like this:



A new object has been created, and y now points to it. It is curious that we would get exactly the same final state if we directly attached y to 2339 :

  & gt; & gt; & gt;  y = 2339  

After this expression, we obtain the final state of memory, as in the addition operation. Let me remind you that in Python you do not assign variables, but bind names to links.

About interned objects in Python


Now you understand how new objects are created in Python and how names are attached to them. It's time to talk about interned objects.

We have the following Python code:

  & gt; & gt; & gt;  x = 1000
 & gt; & gt; & gt;  y = 1000
 & gt; & gt; & gt;  x is y
 True  

As before, x and y are names pointing to the same Python object. But this object that contains the value 1000 cannot always have the same memory address. For example, if you add two numbers and get 1000, you will get another address:

  & gt; & gt; & gt;  x = 1000
 & gt; & gt; & gt;  y = 499 + 501
 & gt; & gt; & gt;  x is y
 False  

This time, the x is y line returns False . If you are embarrassed, do not worry. Here is what happens when this code is executed:

  1. A Python object is being created ( 1000 ).
  2. He is given the name x .
  3. A Python object is being created ( 499 ).
  4. A Python object is being created ( 501 ).
  5. These two objects add up.
  6. Create a new Python object ( 1000 ).
  7. He is given the name y .

Technical Explanations : the steps described take place only when this code is executed inside the REPL. If you take the given example, paste it into a file and run it, the line x is y returns True .

The reason is the CPython compiler's quick-wittedness, which peephole-optimization tries to accomplish, as far as possible, to save code execution steps. Details can be found in the CPython peephole optimizer source code .

But isn't it wasteful? Well, yes, but this price you pay for all the great benefits of Python.You do not need to think about removing such intermediate objects, and do not even need to know about their existence! The joke is that these operations are performed relatively quickly, and you would not know about them until this moment.

The creators of Python wisely noticed these overheads and decided to make a few optimizations. Their result is behavior that may surprise newbies:

  & gt; & gt; & gt;  x = 20
 & gt; & gt; & gt;  y = 19 + 1
 & gt; & gt; & gt;  x is y
 True  

In this example, almost the same code as above, except that we get True . It's all about interned objects. Python pre-creates a subset of objects in memory and stores them in the global namespace for everyday use.

What objects depend on the Python implementation? In CPython 3.7, interned are:

  1. Integers from -5 to 256 .
  2. Lines that contain only ASCII letters, numbers, or underscores.

This is done because these variables are very often used in many programs. When interning, Python prevents memory allocation for constantly used objects.

Lines smaller than 20 characters and containing ASCII letters, numbers, or underscores will be interned because they are assumed to be used as identifiers:

  & gt; & gt; & gt;  s1 = "realpython"
 & gt; & gt; & gt;  id (s1)
 140696485006960
 & gt; & gt; & gt;  s2 = "realpython"
 & gt; & gt; & gt;  id (s2)
 140696485006960
 & gt; & gt; & gt;  s1 is s2
 True  

Here, s1 and s2 point to the same address in memory. If we didn’t insert an ASCII letter, a number or an underscore, we would get a different result:

  & gt; & gt; & gt;  s1 = "Real Python!"
 & gt; & gt; & gt;  s2 = "Real Python!"
 & gt; & gt; & gt;  s1 is s2
 False  

In this example, an exclamation point is used, so the strings are not interned and are different objects in memory.

Bonus : If you want these objects to refer to the same interned object, you can use sys.intern () . One way to use this feature is described in the documentation:

Interning strings is useful for a slight increase in performance when searching through a dictionary: if the keys in the dictionary and the desired key are interned, then the comparison of the keys (after hashing) can be performed by comparing pointers rather than strings. ( Source )

Interned objects are often confused by programmers. Just remember that if you start to doubt, you can always use id () and is to determine the equivalence of objects.

Emulation of pointers in Python


The fact that pointers are missing natively in Python does not mean that you cannot take advantage of pointers. There are actually several ways to emulate pointers in Python. Here we look at two of them:

  1. Use changeable types as pointers.
  2. Use specially prepared Python objects.

Use of variable types as pointers


You already know what changeable types are. It is because of their variability that we can emulate the behavior of pointers. Suppose you need to replicate this code:

  void add_one (int * x) {
  * x + = 1;
 }  

This code takes a pointer to a number ( * x ) and increments the value by 1. Here’s the main function to execute the code:

  # include & lt; stdio.h & gt;

 int main (void) {
  int y = 2337;
  printf ("y =% d \ n", y);
  add_one (& amp; y);
  printf ("y =% d \ n", y);
  return 0;
 }  

In the following snippet, we set y to 2337 , displayed the current value, incremented it by 1, and then displayed the new value. Appears on the screen:

  y = 2337
 y = 2338  

One way to replicate this behavior in Python is to use a mutable type. For example, apply a list and change the first element:

  & gt; & gt; & gt;  def add_one (x):
 ... x [0] + = 1
 ...
 & gt; & gt; & gt;  y = [2337]
 & gt; & gt; & gt;  add_one (y)
 & gt; & gt; & gt;  y [0]
 2338  

Here, add_one (x) refers to the first element and increments its value by 1. Applying the list means that we get a modified value as a result. So there are pointers in Python? Not. The described behavior became possible because the list is a changeable type. If you try to use a tuple, you get an error:

  & gt; & gt; & gt;  z = (2337,)
 & gt; & gt; & gt;  add_one (z)  

Reverse tracing (the most recent calls come last):

  File "& lt; stdin & gt;", line 1, in & lt; module & gt;
  File "& lt; stdin & gt;", line 2, in add_one
 TypeError: 'tuple' object does not support item assignment  

This code demonstrates the immutability of the tuple, so it does not support the assignment of elements.

list is not the only changeable type, part pointers are emulated with dict .

Suppose you have an application that should track the occurrence of interesting events. This can be done by creating a dictionary and using one of its elements as a counter:

  & gt; & gt; & gt;  counters = {"func_calls": 0}
 & gt; & gt; & gt;  def bar ():
 ... counters ["func_calls"] + = 1
 ...
 & gt; & gt; & gt;  def foo ():
 ... counters ["func_calls"] + = 1
 ... bar ()
 ...
 & gt; & gt; & gt;  foo ()
 & gt; & gt; & gt;  counters ["func_calls"]
 2  

In this example, the dictionary uses counters to track the number of function calls. After calling foo () , the counter increased by 2, as expected. And this is due to the dict variability.

Remember, this is just emulation of pointer behavior; it has nothing to do with real pointers in C and C ++. You could say these operations are more expensive than if they were performed in C or C ++.

Using Python Objects


dict is a great way to emulate pointers in Python, but sometimes it’s tiring to remember which key name you used. Especially if you use the dictionary in different parts of the application. This is where a custom Python class can help.

Suppose you need to track metrics in an application. A great way to abstract the annoying details is to create a class:

  class Metrics (object):
  def __init __ (self):
  self._metrics = {
  "func_calls": 0,
  "cat_pictures_served": 0,
  }  

This code defines the Metrics class. He still uses the dictionary to store the actual data that lies in the _metrics member variable. This will give you the desired variability. Now you just need to access these values. You can do this with the following properties:

  class Metrics (object):
  # ...

  @property
  def func_calls (self):
  return self._metrics ["func_calls"]

  @property
  def cat_pictures_served (self):
  return self._metrics ["cat_pictures_served"]  

Here we use @property . If you are not familiar with decorators, then read the article Primer on Python Decorators .In this case, the @property decorator allows you to refer to func_calls and cat_pictures_served as if they were attributes:

  & gt; & gt; & gt;  metrics = Metrics ()
 & gt; & gt; & gt;  metrics.func_calls
 0
 & gt; & gt; & gt;  metrics.cat_pictures_served
 0  

The fact that you can refer to these names as attributes means that you are abstracted from the fact that these values ​​are stored in a dictionary. In addition, you make attribute names more explicit. Of course, you should be able to increase the values:

  class Metrics (object):
  # ...

  def inc_func_calls (self):
  self._metrics ["func_calls"] + = 1

  def inc_cat_pics (self):
  self._metrics ["cat_pictures_served"] + = 1  

We introduced two new methods:

  1. inc_func_calls ()
  2. inc_cat_pics ()

They change the values ​​in the metrics dictionary. You now have a class that can be modified in the same way as the pointer:

  & gt; & gt; & gt;  metrics = Metrics ()
 & gt; & gt; & gt;  metrics.inc_func_calls ()
 & gt; & gt; & gt;  metrics.inc_func_calls ()
 & gt; & gt; & gt;  metrics.func_calls
 2  

You can call func_calls and call inc_func_calls () in different parts of the application and emulate pointers in Python. This is useful in situations where you have something like metrics that you need to frequently use and update in different parts of applications.

Note : In this case, explicitly creating inc_func_calls () and inc_cat_pics () instead of using @ property.setter not allows users to set these values ​​to an arbitrary int , or an incorrect value like a dictionary.

Here is the complete source code for the Metrics class:

  class Metrics (object):
  def __init __ (self):
  self._metrics = {
  "func_calls": 0,
  "cat_pictures_served": 0,
  }

  @property
  def func_calls (self):
  return self._metrics ["func_calls"]

  @property
  def cat_pictures_served (self):
  return self._metrics ["cat_pictures_served"]

  def inc_func_calls (self):
  self._metrics ["func_calls"] + = 1

  def inc_cat_pics (self):
  self._metrics ["cat_pictures_served"] + = 1  

Real pointers using ctypes


Maybe there are pointers in Python, especially in CPython? With the built-in ctypes module, you can create real pointers, as in C. If you are not familiar with ctypes, you can read the article Extending Python With C Libraries and the "ctypes" Module .

You may need it when you need to call the library C, which needs pointers. Returning to the add_one () C function mentioned above:

  void add_one (int * x) {
  * x + = 1;
 }  

Recall that this code increases the value of x by 1. To use it, we first compile the code into a common (shared) object. We assume that our file is stored in add.c , you can do this with gcc:

  $ gcc -c -Wall -Werror -fpic add.c
 $ gcc -shared -o libadd1.so add.o  

The first command compiles the original C file into the add.o object. The second command takes this unrelated object and creates the shared object libadd1.so .

libadd1.so should be in your current directory. You can use ctypes to load it into Python:

  & gt; & gt; & gt;  import ctypes
 & gt; & gt; & gt;  add_lib = ctypes.CDLL ("./libadd1.so")
 & gt; & gt; & gt;  add_lib.add_one
 & lt; _FuncPtr object at 0x7f9f3b8852a0 & gt;  

The ctypes.CDLL code returns an object that represents the common libadd1 .Since you defined add_one () in it, you can access this function as if it were any other Python object. But before calling a function, you need to determine its signature. So Python will know that you pass functions to the correct type.

In our case, the signature of the function is a pointer to a number, ctypes will allow you to specify it with the following code:

  & gt; & gt; & gt;  add_one = add_lib.add_one
 & gt; & gt; & gt;  add_one.argtypes = [ctypes.POINTER (ctypes.c_int)]  

Here we set the function signature to meet the C expectations. Now, if we try to call this code with the wrong type, instead of unpredictable behavior, we get a nice warning:

  & gt; & gt; & gt;  add_one (1)
 Traceback (most recent call last):
  File "& lt; stdin & gt;", line 1, in & lt; module & gt;
 ctypes.ArgumentError: argument 1: & lt; class 'TypeError' & gt ;: \
 expected LP_c_int instance instead of int  

Python throws an error and explains that add_one () wants to get a pointer, not just an integer. Fortunately, ctypes has a way to pass pointers to such functions. First we declare an integer in the style of C:

  & gt; & gt; & gt;  x = ctypes.c_int ()
 & gt; & gt; & gt;  x
 c_int (0)  

Here we created an integer x with the value 0 . ctypes provides a convenient byref () function that allows you to pass a variable by reference.

Note : The phrase by reference is the antonym of the transfer of the variable by value .

When passing by reference, you pass the reference to the original variable, so the changes will be reflected on it. When passed by value, you get a copy of the source variable, and changes to this source variable are no longer affected.

To call add_one () you can use this code:

  & gt; & gt; & gt;  add_one (ctypes.byref (x))
 998793640
 & gt; & gt; & gt;  x
 c_int (1)  

Fine! Your number has increased by 1. Congratulations, you have successfully used real pointers in Python.

Conclusion


Now you better understand the relationship between Python objects and pointers. While some of the clarifications regarding names and variables appear to be pedantic, understanding these core terms improves your understanding of the mechanism for handling variables in Python.

We also learned some ways to emulate pointers in Python:

  • Use mutable objects as pointers with low overhead.
  • Create custom Python objects for ease of use.
  • Unlock real pointers with the ctypes module.

These methods allow you to emulate pointers in Python without sacrificing the memory security provided by the language.

Source text: [Translation] Pointers in Python: what’s the point?