$ zip -usd ziphelp.zip *.txt
sd: Zipfile name 'ziphelp.zip'
sd: Command line read
sd: Reading archive
zip warning: expected 3 entries but found 0
zip error: Zip file structure invalid (ziphelp.zip)
Add debug = 1 to the top of win32\makefile.w32 then run these commands to generate PDBs along with the other binaries.
set LOCAL_ZIP=-DDEBUG
nmake -f win32\makefile.w32
Investigating the Failure
A quick way to set up Visual Studio to debug this issue is to create a new C++ console application then change the command, command arguments, and working directory of the project as shown below.
Pressing F10 should now start debugging the zip binary. The highlights are:
The question now is why scanzipf_regnew does not find any entries since there are 3 PK12 and 3 PK34 signatures in the file. Let us inspect the file offsets right after the code seeks to in_cd_start_offset (from step 5 above) and immediately before looking for the next signature to process.
This shows a value of 0x0000000000002168 which is not the proper offset in the zip file to seek to. Recall that in_cd_start_offset was 0x218e, which is the second-last byte of line 537 in the zipfile’s hexdump output. Could this be an error in the standard library fseek and ftell functions? TODO: why does the scan fail from an earlier starting point?
Pressing F12 to see the definitions of zftello and zfseeko went to the wrong place! These are not the standard library functions being used. Visual Studio was opening their definitions in tailor.h instead of the actual implementations being called. Turns out zftello and zfseeko are functions implemented in win32/win32i64.c. These comments above zftello raise some huge red flags.
/* 64-bit buffered ftello
*
* Win32 does not provide a 64-bit buffered
* ftell (in the published api anyway) so below provides
* hopefully close version.
* We have not gotten _telli64 to work with buffered
* streams. Below cheats by using fgetpos improperly and
* may not work on other ports.
*/
/* 64-bit buffered fseeko
*
* Win32 does not provide a 64-bit buffered
* fseeko so use _lseeki64 and fflush. Note
* that SEEK_CUR can lose track of location
* if fflush is done between the last buffered
* io and this call.
*/
Looks like this custom seek/tell code is responsible for the incorrect offsets into the zip file! We can work around this by simply removing these custom implementations.
Windows Specific Bug?
So why didn’t Cygwin’s zip.exe have this issue? Running zip.exe -v shows this compiler/OS description:
Compiled with gcc 4.8.3 for Unix (Cygwin) on Jun 23 2014.
The Cygwin OS name in parenthesis is defined in unix/unix.c only if __CYGWIN__ is defined. However, under this condition, the custom zftello and zfseeko implementations will not be included in the zip sources being compiled! Therefore, the issue does not occur in Cygwin’s distributed zip binary.
After figuring out how to build the Info-ZIP sources, I had a few commands to test the zip file by creating a few text files to zip.
zip -h > help.txt
zip -h2 > help2.txt
zip -L > license.txt
Unfortunately, the zip -qru ./files.zip -i *.txtcommand from the OpenJDK is not what we need. To actually create a zip file, use only the -u flag
zip -u files.zip *.txt
To test that the files were zipped successfully, unzip the files and compare them to the original files. Here’s the whole script for this:
echo "---Creating temp directory---"
mkdir temp; cd temp
echo "---Creating text files---"
zip -h > help.txt
zip -h2 > help2.txt
zip -L > license.txt
echo "---Adding text files to a new repo---"
git init
git add *.txt
git commit -m "Add original text files"
echo "---Zipping text files---"
zip -u files.zip *.txt
echo "---Removing text files---"
rm *.txt
echo "---Unzipping text files---"
unzip files.zip
echo "---Checking unzipped files---"
git diff
When using the zip binary for Windows, something strange happens when running the zip command a 2nd time:
$ zip -u files.zip *.txt
zip warning: files.zip not found or empty
adding: help.txt (176 bytes security) (deflated 49%)
adding: help2.txt (176 bytes security) (deflated 62%)
adding: license.txt (176 bytes security) (deflated 54%)
$ zip -u files.zip *.txt
zip warning: expected 3 entries but found 0
zip error: Zip file structure invalid (files.zip)
Info-ZIP supports a -sd flag that shows diagnostic information while it runs. It reveals that something is going wrong when reading the archive.
$ zip -usd files.zip *.txt
sd: Zipfile name 'files.zip'
sd: Command line read
sd: Reading archive
zip warning: expected 3 entries but found 0
zip error: Zip file structure invalid (files.zip)
In the last post, I described how to build the Info-ZIP sources. When using the resulting zip binaries in Cygwin, some important path handling issues come up. The paths passed to the zip binary when building the OpenJDK in Cygwin use forward slashes. The Cygwin User’s Guide has a section on File Access that outlines the support for POSIX and Win32-style file paths.
The Windows file system APIs support forward slashes in file paths. The zip source code uses the fopen CRT function, which eventually ends up calling CreateFileW. The CreateFileW docs state that you may use either forward slashes (/) or backslashes (\) in the lpFileName parameter. The translation of paths from Win32 to NT happens in a function called RtlDosPathNameToRelativeNtPathName_U as discussed in the Definitive Guide on Win32 to NT Path Conversion. Since this is a built-in Windows function, it does not support the /cygdrive/ style prefixes. Running the simple test program argtofile in Cygwin easily demonstrates this.
The /cygdrive/ prefixes will therefore not work for programs compiled for Windows (such as the zip binary directly compiled using Visual C++). Therefore, the cygpath command is necessary to translate these paths to Win32-style file paths. To peek into how cygpath works, we can take advantage of the fact that the source code for the cygpath utility is available online. I found it easier to browse the sources after cloning the repo:
The scenario of interest is what happens when cygpath -u ~ is invoked. In this case, we want to see how the “/cygdrive/” string is prefixed to the computed path.
normalizes the path by calling normalize_win32_path
before finally iterating through the mount items to find the path’s prefix in the mount table.
Also searching for the cygpath \s*( regex leads to the vcygpath function in winsup/utils/path.cc. That appears to be more directly related to the cygpath command (how?). Searching for the \"cygdrive\" regex also reveals that this is a magic string used in many places in the codebase.
All this shows that there is indeed some complexity behind maintaining the POSIX/Win32-style file path mapping in Cygwin but it should be possible to add some basic logic to the Windows Info-ZIP build to handle /cygdrive/ prefixes in its file arguments. The question I have at this point is how does compiling the zip binaries for the Cygwin environment (the shipping configuration) result in proper handling of POSIX-style filenames?
curl -Lo zip30.tar.gz https://sourceforge.net/projects/infozip/files/Zip%203.x%20%28latest%29/3.0/zip30.tar.gz/download
tar xvf zip30.tar.gz
cd ./zip30
git init; git add *; git commit -m "Commit original Info-ZIP sources"
Now that we have the sources, let’s see how to build them. The scenario I’m working on is Windows specific so we need Visual Studio 2019 with the Desktop Development with C++ workload installed. I’ll be building a 32-bit zip executable. Launch the x86 Native Tools Command Prompt for VS 2019 and change to the zip30 source directory to start building. Some digging around reveals a makefile with build instructions (that seem one directory off). Here’s the command to build a 32-bit executable from the sources (note that building fails due to various errors that need to be addressed):
nmake -f win32\makefile.w32
Carriage Return (CR) Name Collisions
The first error is this rather cryptic mess of syntax errors:
Microsoft (R) Program Maintenance Utility Version 14.29.30133.0
Copyright (C) Microsoft Corporation. All rights reserved.
cl -nologo -c -W3 -O2 -DWIN32 -DASM_CRC -ML zip.c
cl : Command line warning D9002 : ignoring unknown option '-ML'
zip.c
C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\winnt.h(18822): error C2143: syntax error: missing ':' before 'constant'
C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\winnt.h(18822): error C2143: syntax error: missing ';' before ':'
C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\winnt.h(18822): error C2059: syntax error: ':'
C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\winnt.h(18823): error C2143: syntax error: missing '{' before ':'
C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\winnt.h(18823): error C2059: syntax error: ':'
C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\winnt.h(18824): error C2059: syntax error: '}'
C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\winnt.h(18825): error C2059: syntax error: '}'
C:\Program Files (x86)\Windows Kits\10\include\10.0.19041.0\um\winnt.h(18826): error C2059: syntax error: '}'
zip.c(5746): warning C4267: '=': conversion from 'size_t' to 'ush', possible loss of data
zip.c(5838): warning C4267: '=': conversion from 'size_t' to 'ush', possible loss of data
NMAKE : fatal error U1077: '"C:\Program Files (x86)\Microsoft Visual Studio\2019\Enterprise\VC\Tools\MSVC\14.29.30133\bin\HostX86\x86\cl.EXE"' : return code '0x2'
Stop.
Turns out this option was removed in Visual Studio 2010 as per the Microsoft C/C++ change history since the linker no longer supports optimizing for Windows 98. This is clearly a safe flag to remove from the linker flags in win32\makefile.w32.
Update the Branding
Change the VERSION string from “3.0” to “3.0-ioHardenedZIP”
Update the REVDATE from “July 5th 2008” to the current date (“December 18th 2021” in my case)
Update the about text to indicate that it is a custom build.
Testing the Zip Build
The sources should now build successfully in the x86 Native Tools Command Prompt for VS 2019. The OpenJDK build uses the -qru flags for creating zip files so we can easily test the zip executable by creating a zip of the Info-ZIP help and license text.
zip -h > help.txt
zip -h2 > help2.txt
zip -L > license.txt
zip -qru ./files.zip -i *.txt
We need to verify whether the zip was correctly created. Saving this for another day.