When an object is translated to a byte sequence, and the byte sequence can be kept on a disk or sent through streams, the process is called serialization. The reverse, i.e. building an object from byte sequence, is called deserialization
Serialization converts the object into a specific format of data bytes for storage and distribution. As the technology is diverse in OS and languages used by it, so it is crucial to have a neutral format of data bytes, which can be interpreted the same way by each one of them.
Data can be serialized into some formats, such as XML, JSON, YAML, BSON, MessagePack and Protobuf and among them, XML and JSON are wild used by most web applications.
Deserialization converts the stream of data bytes into its object format. However, the reconstructed object merely replicates the original object.
This makes it easier to interpret the data, and the data becomes compatible among all languages also vis-à-vis data storage and transmission.
During deserialization of data, vulnerability occurs when an attacker manipulates the data during serialization, which is then passed for deserialization without any validation or sanitization of data.
This causes unintended behaviour in program flow, which may lead to Cross-Site Script, Denial of Service, Authentication Bypass, Local File Inclusion and Arbitrary Remote Code execution
Serialization and Deserialization in different languages.
PHP: For Object Serialization, PHP uses serialize() method, which returns a string containing a stream of bytes of any value that can be stored in PHP. For Deserialization, the unserialize() method is used to recreate the object from a stream of bytes.
Python: There are few libraries in python, which are used for the Serialization and Deserialization of Objects:
i. Pickle: The pickle.dumps() is used to serialize and pickle.loads() is used to deserialize the data.
ii. JsonPickle: In this module, the Objects are serialized to and deserialized from JSON. Here, serialization is done using jsonpickle.encode() and deserialization using jsonpickle.decode() .
iii.PyYAML/ruamel.yaml: This module processes the YAML(Yet Another Markup Language) data. The serialization is done using yaml.dump() and deserialization using yaml.load() .
Java: Java implements serialization using Java.io.Serializable, the writeObject method from ObjectOutputStream serializes the object and readObject from
ObjectInputStream deserializes the data. Other JAVA API used for deserialization are :
i. readObjectNodData() , readResolve() , readExternal() , readUnshared() methods from java.io.ObjectInputStream
ii. readObject() method from java.beans.XMLDecoder
iii. fromXML() method from com.thoughtworks.xstream.XStream
.Net: The serialization in dotNet applications is done using serialize() method and deserialization using the deserialize() method. However, certain conditions need to be met for Objects to be serialized and deserialized in the dotNet application, which will be explained in a later series of this blog.
Ruby: In Ruby on Rails application, Marshal Class using marshal.dump() and marshal.load() method handles the serialization and deserialization. Due to the use of separate class, this process is also referred to as Marshalling and Unmarshalling for ruby applications.
NodeJS: For NodeJS applications, the node-serialize and funcster library is used, which have serialize() and unserialize() method for serialization and deserialization, respectively.
Serialization and deserialization are cornerstones upon which data are converted to storable, transmittable or reconstructable form. They play a crucial role in the computing world.