

Parsers can be confused by putting two different names into each structure that refer to the same data. The Central Directory is at the end of the file (right before the End of Central Directory structure mentioned later), and a Local File Header is prefixed to each of the stored files, earlier in the file. Two data structures in the ZIP format can be responsible for holding the file name, specifically the Central Directory Entry and the Local File Header. But odd things can happen if you read in the other direction. If you glance at corkami’s ZIP 101 poster, you can see that ZIP is meant to be read from the bottom up, unlike many other formats. Outside of symlinks and hard links, filesystems generally enforce uniqueness of file names.

The ZIP format has been explained visually by corkami and others, but I will attempt to reduce the format for the sake of this example to two chunks of data that can provide a name for a stored file. These differences in interpretation can be exploited by adversaries, and we’ll hint at what we’re doing to see these maliciously crafted ZIPs from all angles. In this post, we highlight some of the redundancies and parts of the format that leave the interpretation of a given file in the hands of the application reading it. Along with the variety, choices made by ZIP parsers can lead to a single ZIP producing different output based on the ZIP parser used. The redundancies inherent in the ZIP format, combined with Postel’s Law (i.e., “be conservative in what you send, be liberal in what you accept”), have created a wide variety of “acceptable” ZIPs. ZIP structures are also found inside of self-extracting EXEs, hiding in PDFs and other obscure formats. It forms the basis of Microsoft OfficeOpenXML files (docx, xlsx, pptx file extensions), Java Archives (JAR), Android Packages (APK) and Electronic Publication (EPUB) files.

This blog post looks at how the format can be exploited and shares the solution we came up with.Ĭompressed file formats come in many flavors such as tarballs (.tar.gz), RAR Archives (.rar) and 7Zip (.7z), but ZIP has become the foundation for widely used file formats in addition to becoming the generic term used for compressing and bundling files. Because the format isn’t generally executable (minus self-extracting ZIPs), it hasn’t gotten as much attention as executable formats. ZIP files are a known vector for phishing campaigns, ransomware and other malicious action.
