The two words Insecure Deserialization – put together might sound harmless, but they could be a potential nightmare for Python application developers.
An Insecure Deserialization attack can wreak havoc on your Python application, compromising its security and leaving your sensitive data at risk.
Both the data serialization and deserialization serve the purpose of ensuring that the object remains a replica of the original item prior to serialization. This blog is here to shed light on the dangers of Insecure Deserialization attacks and to arm you with the knowledge to defend your Python application from this threat.
Without further ado, let’s get right into it!
Serialization refers to the process of converting a data object into a byte stream. Object serialization is the process of converting the state of an object into a byte stream. You can store this byte stream in any file-like object, such as a disk or memory stream.
Deserialization is the process of reconstructing a data structure or object from a series of bytes or a string in order to instantiate the object for consumption. This is the reverse serialization process, i.e., converting a data structure or object into a series of bytes for storage or transmission across devices.
Python’s pickle module is used for serialization and deserialization in Python. This module serializes or deserializes Python objects only. It does not allow an exchange of data between different programming languages. A key thing to note is that it’s well-known for its security and interoperability issues.
The following functions are used for serialization and deserialization in Python:
pickle.dumps() is used to pickle (serialize) the data and takes a variable, function, or class to be pickled as its argument.
pickle.loads() is used to unpickle (deserialize) the data and takes a variable containing byte stream as a valid argument.
Insecure Deserialization happens when the data being deserialized abuses the application’s logic to perform unintended tasks. These tasks could range from performing Denial of Service (DoS) attacks, spawning a reverse shell, and executing arbitrary code on the target.
The pickle module implements binary protocols for serializing and de-serializing a Python object structure. “Pickling” is the process whereby a Python object hierarchy is converted into a byte stream, and “unpickling” is the inverse operation whereby a byte stream (from a binary file or bytes-like object) is converted back into an object hierarchy.
The goal is to achieve Remote Code Execution through this weakness.
For the exploitation part, we are going to use OWASP SKF Lab. We’ll need to install Docker on our machine to set it up.
For Installation, use the following commands:
Now that the app is running, let’s go for exploitation!
Step 1: Navigate to the URL http://0.0.0.0:5000/ in any browser. Make sure your browser connects with the burp proxy.
Step 2: Now go to the exploit.py file available on GitHub. And open that file in any text editor.
Observation – This code creates a Pickle serialized object of os.system call passing sleep 10 as an argument. The deserialization of this class should lead the system to sleep for 10 seconds.
Step 3: Run Exploit.py file in the terminal and observe the output.
Step 4: Copy the output of the previous step, paste it on the text field, and click on Submit Button. Capture the request in Burp Suite.
Step 5: Send the captured request to a repeater and click on send. Observe the time-delayed of 10 seconds in the response.
Observation: It proves that the sleep 10 commands were executed successfully, and the Remote Code Execution was achieved.
In order to mitigate insecure deserialization in python, user input and digital signature are the two main aspects of the deserialization of the data. However, to better understand these two aspects to prevent deserialization attacks, refer to the methodologies below.
1. User-supplied data, such as URL parameters, POST data payloads, and cookies, must always be regarded as untrusted. Never unpickle data received from an unauthenticated or untrusted source.
In order to validate the data we first need to add a signature to the serialized files, then at the moment of deserializing the files, we validate the file by checking the signature. We should generate a digital signature when we write out a byte stream for later use, then verify that signature when reading the objects back in as shown in the below code:
2. The easiest way to avoid deserialization vulnerabilities is to avoid using serialization altogether. If you need to accept structured data from an HTTP request, XML or JSON are more common formats and less prone to malicious use. We can securely deserialize YAML files by turning off the support for custom classes using the yaml.SafeLoader class as shown below:
The article is more specific to Python (and in the PyYAML example, specific to a Python library), it’s important to note that this is certainly not a problem that is limited to Python. However, applications written in Java, PHP, ASP.NET, and other languages can also be susceptible to insecure deserialization vulnerabilities.
Serialization and deserialization vary greatly depending on the programming language, serialization formats, and software libraries used. To such an extent, fortunately, there’s no ‘one-size-fits-all’ approach to attacking an insecure deserialization vulnerability. It could make the vulnerability harder to find and exploit, but it does not make it any less dangerous.