Question:
We process thousands of different zip files every week. I recently came across some ZIP files created in a non-DOS environment with some filenames that have characters that are illegal in DOS.
Currently in the CkZip classes, I can choose to either Extract() a zipfile entry into a directory, or I can Inflate() a zip entry into memory and write it out to disk. Unfortunately, neither of these work when the entry in the zip file is extremely large and contains illegal DOS characters in the filename.
The crux of my problem is that after writing the entry to disk, I need to then "process" it and move it elsewhere … if CkZip changes the name to comply with DOS naming conventions, my program has no way of figuring out what you’ve renamed it to. Second, since the file uncompressed is so large, if I decompress it into a CkByteArray, I get an out of memory error.
So I’d like to suggest perhaps adding another overload of Expand() that can take a full path and filename to decompress into — or some property on CkZipEntry to tell my program what the DOS filename might be should it be written to disk.
FWIW, the entry in the zip file has a filename of "*.*" (without quotes). I’ve also seen filenames with question marks and other illegal DOS characters in them.
Answer:
This C++ example will explore the issue of dealing with filenames within .zip archives that use characters that are illegal / invalid in DOS and Windows:
void BadFilenameCharsInZip(void)
{
// First, create a .zip with some invalid DOS filename characters:
CkZip zip;
zip.UnlockComponent("anything");
zip.NewZip("badCharsInFilename.zip");
// The question-mark character cannot be used as a filename in Windows.
CkZipEntry *entryAdded = zip.AppendString("?abc?.txt","this is a test this is a test this is a test");
delete entryAdded;
// The asterisk character cannot be used as a filename in Windows.
entryAdded = zip.AppendString("*xyz.txt","this is a test this is a test this is a test");
delete entryAdded;
zip.WriteZipAndClose();
// OK, the badCharsInFilename.zip is written. If you try to unzip it with a typical zip utility,
// you'll definitely see the problem.
// Open the .zip with CkZip:
CkZip zip2;
zip2.OpenZip("badCharsInFilename.zip");
int i;
int n = zip2.get_NumEntries();
// Iterate over the entries and examine the filenames (to confirm that the
// filenames have not changed in any way).
for (i=0; i<n; i++)
{
CkZipEntry *e = zip2.GetEntryByIndex(i);
if (e)
{
printf("%s\n",e->fileName());
delete e;
}
}
// Now try to unzip.
int numUnzipped = zip2.Unzip("badCharsDir");
printf("numUnzipped = %d\n",numUnzipped);
printf("%s\n",zip2.lastErrorText());
// The Unzip method returns a value of -1, and this is what we find in the log:
/*
ChilkatLog:
Unzip:
DllDate: Dec 4 2007
Username: Chilkat
Component: Visual C++ 6.0
UnzipDir: badCharsDir
UnzipFailedFilename: badCharsDir\?abc?.txt
UnzipFailedFilename: badCharsDir\*xyz.txt
NumUnzipped: 0
Not all files extracted successfully.
*/
// Loop over the zip entries, updating each filename in the way our app chooses:
for (i=0; i<n; i++)
{
CkZipEntry *e = zip2.GetEntryByIndex(i);
if (e)
{
CkString strFilename;
e->get_FileName(strFilename);
strFilename.replaceChar('?','Q');
strFilename.replaceChar('*','A');
e->put_FileName(strFilename.getString());
printf("%s\n",e->fileName());
delete e;
}
}
// Now try unzipping again... everthing works!
numUnzipped = zip2.Unzip("badCharsDir");
printf("numUnzipped = %d\n",numUnzipped);
printf("%s\n",zip2.lastErrorText());
}