Bypassing misidentified MS Office files
Alex via amavis-users
amavis-users at amavis.org
Tue Oct 1 01:52:29 CEST 2013
Hi,
Some time ago I reported that MS Office docx files are misidentified
by 'file' as Zip Archives. I've tried to upgrade to the latest 'file'
on my fc17 system (compiled 5.15), and it is still detected
improperly. I'm unsure how to modify the magic file to properly
identify them, so I'd like to bypass scanning files that contain the
"[trash]/0001.dat" files that are causing the problem.
Does someone have a working 'file' magic file that they could send me
to evaluate?
I've tried the following, and it doesn't appear to work:
$banned_filename_re = new_RE(
...
[ qr'^\[trash\]/[0-9a-f]{4}\.dat$' => 0 ], # allow any in
Unix-type archives
...
);
I'm hoping my description of the problem is clear.
$ file report.docx
report.docx: Zip archive data, at least v2.0 to extract
$ unzip -v report.docx
Archive: report.docx
Length Method Size Cmpr Date Time CRC-32 Name
-------- ------ ------- ---- ---------- ----- -------- ----
412276 Defl:N 22707 95% 01-01-1980 00:00 88586817 word/document.xml
456 Stored 456 0% 01-01-1980 00:00 ffffffff [trash]/0001.dat
...
If I remove the [trash]/ files and re-zip the archive, it's properly detected:
$ file report1.docx
report1.docx: Microsoft OOXML
My thinking is to avoid any [trash]/NNNN.dat files, where NNNN.dat is
[0-9a-f]{4} but it doesn't appear to work.
Thanks so much,
Alex
More information about the amavis-users
mailing list