skip to navigation
skip to content

Not Logged In

hachoir-parser 1.1

Package of Hachoir parsers used to open binary files

hachoir-parser is a package of most common file format parsers written for Hachoir framework. Not all parsers are complete, some are very good and other are poor: only parser first level of the tree for example.

A perfect parser have no "raw" field: with a perfect parser you are able to know each bit meaning. Some good (but not perfect ;-)) parsers:

  • Matroska video
  • Microsoft RIFF (AVI video, WAV audio, CDA file)
  • PNG picture
  • TAR and ZIP archive

Website: http://hachoir.org/wiki/hachoir-parser

What's new in hachoir-parser 1.1?

Main changes: add "EFI Platform Initialization Firmware Volume" (PIFV) and "Microsoft Windows Help" (HLP) parsers. Details:

  • MPEG audio:
    • add createContentSize() to support hachoir-subfile
    • support file starting with ID3v1
    • if file doesn't contain any frame, use ID3v1 or ID3v2 to create the description
  • EXIF:
    • use "count" field value
    • create RationalInt32 and RationalUInt32
    • fix for empty value
    • add GPS tags
  • JPEG:
    • support Ducky (APP12) chunk
    • support Comment chunk
    • improve validate(): make sure that first 3 chunk types are known
  • RPM: use bzip2 or gzip handler to decompress content
  • S3M: fix some parser bugs
  • OLE2: reject negative block index (or special block index)
  • ip2name(): catch KeybordInterrupt and don't resolve next addresses
  • ELF: support big endian
  • PE: createContentSize() works on PE program, improve resource section detection
  • AMF: stop mixed array parser on empty key

What's new in hachoir-parser 1.0?

Changes:

  • OLE2: Support file bigger than 6 MB (support many DIFAT blocks)
  • OLE2: Add createContentSize() to guess content size
  • LNK: Improve parser (now able to parse the whole file)
  • EXE PE: Add more subsystem names
  • PYC: Support Python 2.5c2
  • Fix many spelling mistakes

Minor changes:

  • PYC: Fix long integer parser (negative number), add (disabled) code to disassemble bytecode, use self.code_info to avoid replacing self.info
  • OLE2: Add ".msi" file extension
  • OLE2: Fix to support documents generated on Mac
  • EXIF: set max IFD entry count to 1000 (instead of 200)
  • EXIF: don't limit BYTE/UNDEFINED IFD entry count
  • EXIF: add "User comment" tag
  • GIF: fix image and screen description
  • bzip2: catch decompressor error to be able to read trailing data
  • Fix file extensions of AIFF
  • Windows GUID use new TimestampUUID60 field type
  • RIFF: convert class constant names to upper case
  • Fix RIFF: don't replace self.info method
  • ISO9660: Write parser for terminator content

What's new in hachoir-parser 0.10?

New parsers:

  • Microsoft Archive parser (.mar)
  • Microsoft Windows animated icon (.ani): based on RIFF file format
  • Microsoft's HTML Help (.chm)
  • Windows Shortcut (.lnk)
  • X11 Portable Compiled Font (pcf)
  • Adobe Portable Document Format (PDF)

Major changes:

  • Convert many constants to Unicode
  • Set charset to ISO-8859-1 for many strings with no charset. Examples: filename in gzip, strings in ID3v1
  • MIME type is now in Unicode
  • Timestamp are stored as datetime.datetime() object
  • Add MAC48_Address and NIC24 parser
  • Add IEEE 24-bit Organizationally unique identifiers list

Changes:

  • Disable QueryParser fallback feature
  • QueryParser accepts "class" tag
  • Split Parser in HachoirParser and Parser classes
  • OLE2: * Rewrite most of the code using SeekableFieldSet * Support FAT block chain * Able to parse fragmented streams * Add parser for component object and document summary
  • MKV: add method to convert date value to datetime.datetime() object
  • OGG: validate() checks magic string
  • Write PascalStringWin32 class
  • Add Win32 LANGUAGE_ID dictionary
  • Rewrite GUID class using RFC 4122:
    • Supports differents GUID format versions
    • Able to read timestamp
    • Able to read network address
  • iTunesDB: support sort index type and playlist
  • BMP: move code to parse image data in a separated function, so code can be reused; fix magic regex (reserved may be not nul)
  • EXIF/TIFF: reject IFD entry with more than 300 values
  • MPEG audio:
    • Frame.isValid() also checks sync field
    • Add getNbChannel() method
    • findSyncrhonizeBits() uses stronger validation to avoid false positive
    • validate() checks first field name and not just if stream starts with bytes "ID3"
  • RIFF: text: truncate to nul byte and use ISO-8859-1 charset
  • JPEG: reject invalid component id or quantization index (instead of using a warning message)
  • JPEG: support all sort of start of scan (especially progressive jpeg)
  • JPEG: add magic string of JPEG starting with Adobe chunk
  • Photoshop metadata: add parser for version information
  • PNG: add method to get number of bits per pixel and use do not format timestamp value
  • PNG: support transparency color
  • TTF: Reject chunk with more than 300 names
  • EXE: Reject PE program with more than 50 sections
  • EXE resource: * PE_Resource now uses SeekableFieldSet * Parse file flags * Read file subtype (for driver or font) * Reject header with more than 300 entries * Stop parser at depth 5 * Write version information parser for NE program

Minor changes:

  • GIF: replace image marker warning with a parser error
  • IPTC: use charset UTF-8 and not ISO-8859-15
  • CAB: validate() rejects file with more than 30 folders and fix misuse of seekBit()
  • AU: fix end padding size

What's new in hachoir-parser 0.9?

New parsers:

  • ACE, CAB, RAR, MOD, S3M, XM, PSD, Torrent, TTF, PDF, NE, MPEG TS

Changes:

  • Add unique identifier and category to each parser
  • Use tags to choose the right parser
  • Create ParserList and QueryParser classes
  • Support magic string as regex ('magic_regex')

Improved parsers:

  • 7-zip: parse a lot of headers, just not start and signature headers
  • ZIP: support file without file size, support 64-bit structures
  • Ogg: support "video" chunk and add function to get last page

What's new in hachoir-parser 0.8.1?

New features:
  • Rewrite setup.py: uses distutils by default (instead of setuptools), doesn't depend on hachoir-core
  • ICO parser: fixes to support cursors
  • Parser use new HACHOIR_ERRORS constant
Bugfixes:
  • gzip: fix magic string
  • XCF: remove useless exceptions
  • RIFF: fix fourcc handler (when fourcc is a string and not Unicode)
  • FAT: catch ValueError when using string index() method
  • ASF: don't create empty fields and validate() checks header minimum size
  • EXE: validate() checks size_mod_512 in MSDOS header, add method to compute content size of MSDOS executable (not PE)

What's new in hachoir-parser 0.8?

New parsers:
  • 7-zip archive
  • Aldus Placeable Metafile (APM), variant of WMF
  • Audio Interchange File Format (AIFF)
  • Audio Interchange File Format Compressed (AIFC)
  • Linux swap file
  • LucasArts Font
  • New Technology File System (NTFS)
  • Microsoft Enhanced Metafile (EMF)
  • Microsoft Windows Metafile (WMF)
  • Musical Instrument Digital Interface (MIDI) audio file parser
  • Real Audio (.ra)
  • Real Media (.rm)
  • Truevision Targa Graphic (TGA) picture
New features:
  • Add method to compute real content size
  • Add magic string to find file start
  • Add method to get file extension (file name suffix)
  • Add method to choose the best MIME type
  • Really better file validation, sometimes use arbitrary limits to detect invalid file. Examples: 50 MB for maximum SWF file size, 6000 pixels for maximum GIF picture width, etc.
Changes:
  • Lazy decompression for bzip2 and gzip parsers
  • ZIP: add more MIME types and file extensions
  • EXE: better PE detection
  • Set constant name to upper case
  • Always use a tuple for common file extensions
  • Bitmap: add padding to pixels if needed, fix size of pixels field
  • Tcpdump: display ARP layer info (if any) and reject file if link type is unknown

What's new in hachoir-parser 0.7?

New parsers:
  • AMF metadata, used in Flash video
  • Flash animation (SWF)
  • Flash video (FLV)
  • Java class
  • Ogg/Vorbis (audio)
  • Ogg/Theora (video)
  • Reiser file system version 3
Important parser improvments:
  • bzip2 and gzip parser are able to decompress file
  • JPEG picture:
    • Parse quantization table and restart interval
    • Write stronger validate method
  • GIF picture: support image comment, graphic control and netscape 2.0 extension
  • ID3v1: support ID3 version 1.1 and 1.1b (track number and genre)
  • MPEG audio:
    • Better file validation (less false positive), don't allow padding between frames anymore
    • Fix computation of frame size: now works with MPEG version 2 and 2.5
  • RIFF: parse AVI and ODML headers
  • Tcpdump: add parser for Unicast (layer 2)
Other parser improvments:
  • Photoshop metadata: fix header, "reserved" is a string not four nul bytes
  • Bitmap: support version 4
  • PNG: add background color parser
  • Sun/NeXT audio: add more codec description
  • Matroska video container: add ISO 639-2 language names
  • EXT2 file system: use bits for file mode (instead of 16-bit integer)
Developer changes:
  • Split run_testcase.py in three: download_testcase.py, run_testcase.py for hachoir-parser and run_testcase.py for hachoir-metadata
  • Update for hachoir-core 0.7:
    • Use NullBits/NullBytes for nul padding
    • Rename _createDescription() to createDescription()
    • Rename _createValue() to createValue()
  • Create function parseStream() to parse a stream
  • Palette is now PaletteRGB and is based on UserVector class
  • New Parser class based on the simple Parser class from hachoir-core

What's new in hachoir-parser 0.6?

News of version 0.6.2:

  • Fix Microsoft Office parser: misuse of new array() function
  • Fix SECT.display attribute (convert integer to string)

News of version 0.6.1:

  • Fix EXIF parser: SubFile import was missing

News of version 0.6:

  • hachoir-parser is now a separated component so it's easier to release new versions and write small bugfix
  • New parsers: * 3DO model (by Cyril Zorin) * Abstract Syntax Notation One (ASN.1) * MPEG video * Spider-Man video (by Mike Melanson) * Tcpdump: Ethernet, IPv4, ARP, ICMP, TCP, UDP * TIFF image * ZSNES save (by Jason Gorski)
  • Better parsers: * MPEG audio: support padding between frames, better file validation, and guess if bit rate is constant (CBR) or variable (VBR) * Python PYC: rewritten from scratch, now support python 1.5 to 2.5 * ID3v2: support picture in v2.3.0, safer charset code
  • Many small bugfixes in ID3, MPEG audio and other parsers

Since hachoir core 0.6 is able to "autofix" more bugs, hachoir-parser 0.6 is even stronger.

Parser list

Archive

  • 7zip: Compressed archive in 7z format
  • ace: ACE archive
  • bzip2: bzip2 archive
  • cab: Microsoft Cabinet archive
  • gzip: gzip archive
  • mar: Microsoft Archive
  • rar: Roshal archive (RAR)
  • rpm: RPM package
  • tar: TAR archive
  • unix_archive: Unix archive
  • zip: ZIP archive

Audio

  • aiff: Audio Interchange File Format (AIFF)
  • fasttracker2: FastTracker2 module
  • itunesdb: iPod iTunesDB file
  • midi: MIDI audio
  • mod: Uncompressed amiga module
  • mpeg_audio: MPEG audio version 1, 2, 2.5
  • ptm: PolyTracker module (v1.17)
  • real_audio: Real audio (.ra)
  • s3m: ScreamTracker3 module
  • sun_next_snd: Sun/NeXT audio

Container

  • asn1: Abstract Syntax Notation One (ASN.1)
  • matroska: Matroska multimedia container
  • ogg: Ogg multimedia container
  • ogg_stream: Ogg logical stream
  • real_media: RealMedia (rm) Container File
  • riff: Microsoft RIFF container
  • swf: Macromedia Flash data

File System

  • ext2: EXT2/EXT3 file system
  • fat12: FAT12 filesystem
  • fat16: FAT16 filesystem
  • fat32: FAT32 filesystem
  • iso9660: ISO 9660 file system
  • linux_swap: Linux swap file
  • msdos_harddrive: MS-DOS hard drive with Master Boot Record (MBR)
  • ntfs: NTFS file system
  • reiserfs: ReiserFS file system

Game

  • lucasarts_font: LucasArts Font
  • spiderman_video: The Amazing Spider-Man vs. The Kingpin (Sega CD) FMV video
  • zsnes: ZSNES Save State File (only version 143)

Image

  • bmp: Microsoft bitmap (BMP) picture
  • gif: GIF picture
  • ico: Microsoft Windows icon or cursor
  • jpeg: JPEG picture
  • pcx: PC Paintbrush (PCX) picture
  • png: Portable Network Graphics (PNG) picture
  • psd: Photoshop (PSD) picture
  • targa: Truevision Targa Graphic (TGA)
  • tiff: TIFF picture
  • wmf: Microsoft Windows Metafile (WMF)
  • xcf: Gimp (XCF) picture

Misc

  • 3do: renderdroid 3d model.
  • 3ds: 3D Studio Max model
  • chm: Microsoft's HTML Help (.chm)
  • hlp: Microsoft Windows Help (HLP)
  • lnk: Windows Shortcut (.lnk)
  • ole2: Microsoft Office document
  • pcf: X11 Portable Compiled Font (pcf)
  • pdf: Portable Document Format (PDF) document
  • tcpdump: Tcpdump file (network)
  • torrent: Torrent metainfo file
  • ttf: TrueType font

Program

  • elf: ELF Unix/BSD program/library
  • exe: Microsoft Windows Portable Executable
  • java_class: Compiled Java class
  • pifv: EFI Platform Initialization Firmware Volume
  • python: Compiled Python script (.pyc/.pyo files)

Video

  • asf: Advanced Streaming Format (ASF), used for WMV (video) and WMA (audio)
  • flv: Macromedia Flash video
  • mov: Apple QuickTime movie
  • mpeg_ts: MPEG-2 Transport Stream
  • mpeg_video: MPEG video, version 1 or 2

Total: 72 parsers

File Type Py Version Size # downloads
hachoir-parser-1.1.tar.gz (md5) Source 325KB 659
hachoir_parser-1.1-py2.5.egg (md5) Python Egg 2.5 888KB 51
hachoir_parser-1.1-py2.4.egg (md5) Python Egg 2.4 897KB 93