Insecure Deserialization Attacks with Python Pickle Module

The two words Insecure Deserialization – put together might sound harmless, but they could be a potential nightmare for Python application developers.

An insecure deserialization attack can wreak havoc on your Python application, compromising its security and leaving your sensitive data at risk.

Both the data serialization and deserialization serve the purpose of ensuring that the object remains a replica of the original item prior to serialization. This blog is here to shed light on the dangers of insecure deserialization attacks and to arm you with the knowledge to defend your Python application from this threat.

Without further ado, let’s get right into it!

What is Serialization/Deserialization?

Serialization refers to the process of converting a data object into a byte stream. Object serialization is the process of converting the state of an object into a byte stream. You can store this byte stream in any file-like object, such as a disk or memory stream.

Deserialization is the process of reconstructing a data structure or object from a series of bytes or a string in order to instantiate the object for consumption. This is the reverse serialization process, i.e., converting a data structure or object into a series of bytes for storage or transmission across devices.

Serialization and Deserialization in Python

Python provides three primary modules for serialization and deserialization:

Marshal Module: This is the oldest module, used primarily to read and write Python’s compiled bytecode. Although it can serialize objects, it’s generally not recommended due to potential format changes that may affect compatibility.
Pickle Module: This module serializes and deserializes Python objects into a binary format. Although this format isn’t human-readable, it is efficient and supports custom objects. The Pickle module is used exclusively for Python objects and does not facilitate data exchange between different programming languages. It is known for its security and interoperability issues. Key functions include:

pickle.dumps() is used to pickle (serialize) the data and takes a variable, function, or class to be pickled as its argument.
pickle.loads() is used to unpickle (deserialize) the data and takes a variable containing byte stream as a valid argument.

JSON Module: This newer module works with JSON, a lightweight and human-readable data format that supports interoperability with other languages. It handles various standard Python types, such as bool, dict, int, float, list, string, tuple, and None. JSON is ideal for scenarios requiring cross-language data exchange.

Each module serves different needs, so your choice will depend on your specific requirements for data handling and compatibility. Here, we will discuss the vulnerabilities associated with the Pickle module, as well as exploitation and mitigation strategies.

Vulnerability Overview

Insecure deserialization happens when the data being deserialized abuses the application’s logic to perform unintended tasks. These tasks could range from performing Denial of Service (DoS) attacks, spawning a reverse shell, and executing arbitrary code on the target.

The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.

The goal is to achieve Remote Code Execution through this weakness.

Exploitation

For the exploitation part, we are going to use OWASP SKF Lab. We’ll need to install Docker on our machine to set it up.

For Installation, use the following commands:

$ sudo docker pull blabla1337/owasp-skf-lab:des-pickle
$ sudo docker run –rm -ti -p 5000:5000 blabla1337/owasp-skf-lab:des-pickle

Now that the app is running, let’s go for exploitation!

Step 1: Navigate to the URL http://0.0.0.0:5000/ in any browser. Make sure your browser connects with the burp proxy.

Step 2: Now go to the exploit.py file available on GitHub. And open that file in any text editor.

Observation – This code creates a Pickle serialized object of os.system call passing sleep 10 as an argument. The deserialization of this class should lead the system to sleep for 10 seconds.

Step 3: Run the Exploit.py file in the terminal and observe the output.

Step 4: Copy the output of the previous step, paste it on the text field, and click on Submit Button. Capture the request in Burp Suite.

deserialization object attack demonstration

Step 5: Send the captured request to a repeater and click on ‘Send’. Observe the 10-second delay in the response.

Observation: This confirms that the sleep 10 command was executed successfully, and Remote Code Execution was achieved.

Mitigation

In order to mitigate insecure deserialization in Python, user input and digital signature are the two main aspects of the deserialization of the data. However, to better understand these two aspects to prevent deserialization attacks, refer to the methodologies below.

1. User-supplied data, such as URL parameters, POST data payloads, and cookies, must always be regarded as untrusted. Never unpickle data received from an unauthenticated or untrusted source.

In order to validate the data, we first need to add a signature to the serialized files, and then, at the moment of deserializing the files, we validate the file by checking the signature. We should generate a digital signature when we write out a byte stream for later use, then verify that signature when reading the objects back in as shown in the below code:

2. The easiest way to avoid deserialization vulnerabilities is to avoid using serialization altogether. If you need to accept structured data from an HTTP request, XML or JSON are more common formats and less prone to malicious use. We can securely deserialize YAML files by turning off the support for custom classes using the yaml.SafeLoader class as shown below:

Conclusion

The article is more specific to Python (and in the PyYAML example, specific to a Python library), it’s important to note that this is certainly not a problem that is limited to Python. However, applications written in Java, PHP, ASP.NET, and other languages can also be susceptible to insecure deserialization vulnerabilities.

Serialization and deserialization vary greatly depending on the programming language, serialization formats, and software libraries used. To such an extent, fortunately, there’s no ‘one-size-fits-all’ approach to attacking an insecure deserialization vulnerability. It could make the vulnerability harder to find and exploit, but it does not make it any less dangerous.

SecureLayer7 helps organizations identify vulnerabilities, reduce risk, and defend against evolving cyber threats. Contact our experts to get started.