hachoir-parser 1.1
Package of Hachoir parsers used to open binary files
hachoir-parser is a package of most common file format parsers written for Hachoir framework. Not all parsers are complete, some are very good and other are poor: only parser first level of the tree for example.
A perfect parser have no "raw" field: with a perfect parser you are able to know each bit meaning. Some good (but not perfect ;-)) parsers:
- Matroska video
- Microsoft RIFF (AVI video, WAV audio, CDA file)
- PNG picture
- TAR and ZIP archive
Website: http://hachoir.org/wiki/hachoir-parser
What's new in hachoir-parser 1.1?
Main changes: add "EFI Platform Initialization Firmware Volume" (PIFV) and "Microsoft Windows Help" (HLP) parsers. Details:
- MPEG audio:
- add createContentSize() to support hachoir-subfile
- support file starting with ID3v1
- if file doesn't contain any frame, use ID3v1 or ID3v2 to create the description
- EXIF:
- use "count" field value
- create RationalInt32 and RationalUInt32
- fix for empty value
- add GPS tags
- JPEG:
- support Ducky (APP12) chunk
- support Comment chunk
- improve validate(): make sure that first 3 chunk types are known
- RPM: use bzip2 or gzip handler to decompress content
- S3M: fix some parser bugs
- OLE2: reject negative block index (or special block index)
- ip2name(): catch KeybordInterrupt and don't resolve next addresses
- ELF: support big endian
- PE: createContentSize() works on PE program, improve resource section detection
- AMF: stop mixed array parser on empty key
What's new in hachoir-parser 1.0?
Changes:
- OLE2: Support file bigger than 6 MB (support many DIFAT blocks)
- OLE2: Add createContentSize() to guess content size
- LNK: Improve parser (now able to parse the whole file)
- EXE PE: Add more subsystem names
- PYC: Support Python 2.5c2
- Fix many spelling mistakes
Minor changes:
- PYC: Fix long integer parser (negative number), add (disabled) code to disassemble bytecode, use self.code_info to avoid replacing self.info
- OLE2: Add ".msi" file extension
- OLE2: Fix to support documents generated on Mac
- EXIF: set max IFD entry count to 1000 (instead of 200)
- EXIF: don't limit BYTE/UNDEFINED IFD entry count
- EXIF: add "User comment" tag
- GIF: fix image and screen description
- bzip2: catch decompressor error to be able to read trailing data
- Fix file extensions of AIFF
- Windows GUID use new TimestampUUID60 field type
- RIFF: convert class constant names to upper case
- Fix RIFF: don't replace self.info method
- ISO9660: Write parser for terminator content
What's new in hachoir-parser 0.10?
New parsers:
- Microsoft Archive parser (.mar)
- Microsoft Windows animated icon (.ani): based on RIFF file format
- Microsoft's HTML Help (.chm)
- Windows Shortcut (.lnk)
- X11 Portable Compiled Font (pcf)
- Adobe Portable Document Format (PDF)
Major changes:
- Convert many constants to Unicode
- Set charset to ISO-8859-1 for many strings with no charset. Examples: filename in gzip, strings in ID3v1
- MIME type is now in Unicode
- Timestamp are stored as datetime.datetime() object
- Add MAC48_Address and NIC24 parser
- Add IEEE 24-bit Organizationally unique identifiers list
Changes:
- Disable QueryParser fallback feature
- QueryParser accepts "class" tag
- Split Parser in HachoirParser and Parser classes
- OLE2: * Rewrite most of the code using SeekableFieldSet * Support FAT block chain * Able to parse fragmented streams * Add parser for component object and document summary
- MKV: add method to convert date value to datetime.datetime() object
- OGG: validate() checks magic string
- Write PascalStringWin32 class
- Add Win32 LANGUAGE_ID dictionary
- Rewrite GUID class using RFC 4122:
- Supports differents GUID format versions
- Able to read timestamp
- Able to read network address
- iTunesDB: support sort index type and playlist
- BMP: move code to parse image data in a separated function, so code can be reused; fix magic regex (reserved may be not nul)
- EXIF/TIFF: reject IFD entry with more than 300 values
- MPEG audio:
- Frame.isValid() also checks sync field
- Add getNbChannel() method
- findSyncrhonizeBits() uses stronger validation to avoid false positive
- validate() checks first field name and not just if stream starts with bytes "ID3"
- RIFF: text: truncate to nul byte and use ISO-8859-1 charset
- JPEG: reject invalid component id or quantization index (instead of using a warning message)
- JPEG: support all sort of start of scan (especially progressive jpeg)
- JPEG: add magic string of JPEG starting with Adobe chunk
- Photoshop metadata: add parser for version information
- PNG: add method to get number of bits per pixel and use do not format timestamp value
- PNG: support transparency color
- TTF: Reject chunk with more than 300 names
- EXE: Reject PE program with more than 50 sections
- EXE resource: * PE_Resource now uses SeekableFieldSet * Parse file flags * Read file subtype (for driver or font) * Reject header with more than 300 entries * Stop parser at depth 5 * Write version information parser for NE program
Minor changes:
- GIF: replace image marker warning with a parser error
- IPTC: use charset UTF-8 and not ISO-8859-15
- CAB: validate() rejects file with more than 30 folders and fix misuse of seekBit()
- AU: fix end padding size
What's new in hachoir-parser 0.9?
New parsers:
- ACE, CAB, RAR, MOD, S3M, XM, PSD, Torrent, TTF, PDF, NE, MPEG TS
Changes:
- Add unique identifier and category to each parser
- Use tags to choose the right parser
- Create ParserList and QueryParser classes
- Support magic string as regex ('magic_regex')
Improved parsers:
- 7-zip: parse a lot of headers, just not start and signature headers
- ZIP: support file without file size, support 64-bit structures
- Ogg: support "video" chunk and add function to get last page
What's new in hachoir-parser 0.8.1?
- New features:
- Rewrite setup.py: uses distutils by default (instead of setuptools), doesn't depend on hachoir-core
- ICO parser: fixes to support cursors
- Parser use new HACHOIR_ERRORS constant
- Bugfixes:
- gzip: fix magic string
- XCF: remove useless exceptions
- RIFF: fix fourcc handler (when fourcc is a string and not Unicode)
- FAT: catch ValueError when using string index() method
- ASF: don't create empty fields and validate() checks header minimum size
- EXE: validate() checks size_mod_512 in MSDOS header, add method to compute content size of MSDOS executable (not PE)
What's new in hachoir-parser 0.8?
- New parsers:
- 7-zip archive
- Aldus Placeable Metafile (APM), variant of WMF
- Audio Interchange File Format (AIFF)
- Audio Interchange File Format Compressed (AIFC)
- Linux swap file
- LucasArts Font
- New Technology File System (NTFS)
- Microsoft Enhanced Metafile (EMF)
- Microsoft Windows Metafile (WMF)
- Musical Instrument Digital Interface (MIDI) audio file parser
- Real Audio (.ra)
- Real Media (.rm)
- Truevision Targa Graphic (TGA) picture
- New features:
- Add method to compute real content size
- Add magic string to find file start
- Add method to get file extension (file name suffix)
- Add method to choose the best MIME type
- Really better file validation, sometimes use arbitrary limits to detect invalid file. Examples: 50 MB for maximum SWF file size, 6000 pixels for maximum GIF picture width, etc.
- Changes:
- Lazy decompression for bzip2 and gzip parsers
- ZIP: add more MIME types and file extensions
- EXE: better PE detection
- Set constant name to upper case
- Always use a tuple for common file extensions
- Bitmap: add padding to pixels if needed, fix size of pixels field
- Tcpdump: display ARP layer info (if any) and reject file if link type is unknown
What's new in hachoir-parser 0.7?
- New parsers:
- AMF metadata, used in Flash video
- Flash animation (SWF)
- Flash video (FLV)
- Java class
- Ogg/Vorbis (audio)
- Ogg/Theora (video)
- Reiser file system version 3
- Important parser improvments:
- bzip2 and gzip parser are able to decompress file
- JPEG picture:
- Parse quantization table and restart interval
- Write stronger validate method
- GIF picture: support image comment, graphic control and netscape 2.0 extension
- ID3v1: support ID3 version 1.1 and 1.1b (track number and genre)
- MPEG audio:
- Better file validation (less false positive), don't allow padding between frames anymore
- Fix computation of frame size: now works with MPEG version 2 and 2.5
- RIFF: parse AVI and ODML headers
- Tcpdump: add parser for Unicast (layer 2)
- Other parser improvments:
- Photoshop metadata: fix header, "reserved" is a string not four nul bytes
- Bitmap: support version 4
- PNG: add background color parser
- Sun/NeXT audio: add more codec description
- Matroska video container: add ISO 639-2 language names
- EXT2 file system: use bits for file mode (instead of 16-bit integer)
- Developer changes:
- Split run_testcase.py in three: download_testcase.py, run_testcase.py for hachoir-parser and run_testcase.py for hachoir-metadata
- Update for hachoir-core 0.7:
- Use NullBits/NullBytes for nul padding
- Rename _createDescription() to createDescription()
- Rename _createValue() to createValue()
- Create function parseStream() to parse a stream
- Palette is now PaletteRGB and is based on UserVector class
- New Parser class based on the simple Parser class from hachoir-core
What's new in hachoir-parser 0.6?
News of version 0.6.2:
- Fix Microsoft Office parser: misuse of new array() function
- Fix SECT.display attribute (convert integer to string)
News of version 0.6.1:
- Fix EXIF parser: SubFile import was missing
News of version 0.6:
- hachoir-parser is now a separated component so it's easier to release new versions and write small bugfix
- New parsers: * 3DO model (by Cyril Zorin) * Abstract Syntax Notation One (ASN.1) * MPEG video * Spider-Man video (by Mike Melanson) * Tcpdump: Ethernet, IPv4, ARP, ICMP, TCP, UDP * TIFF image * ZSNES save (by Jason Gorski)
- Better parsers: * MPEG audio: support padding between frames, better file validation, and guess if bit rate is constant (CBR) or variable (VBR) * Python PYC: rewritten from scratch, now support python 1.5 to 2.5 * ID3v2: support picture in v2.3.0, safer charset code
- Many small bugfixes in ID3, MPEG audio and other parsers
Since hachoir core 0.6 is able to "autofix" more bugs, hachoir-parser 0.6 is even stronger.
Parser list
Archive
- 7zip: Compressed archive in 7z format
- ace: ACE archive
- bzip2: bzip2 archive
- cab: Microsoft Cabinet archive
- gzip: gzip archive
- mar: Microsoft Archive
- rar: Roshal archive (RAR)
- rpm: RPM package
- tar: TAR archive
- unix_archive: Unix archive
- zip: ZIP archive
Audio
- aiff: Audio Interchange File Format (AIFF)
- fasttracker2: FastTracker2 module
- itunesdb: iPod iTunesDB file
- midi: MIDI audio
- mod: Uncompressed amiga module
- mpeg_audio: MPEG audio version 1, 2, 2.5
- ptm: PolyTracker module (v1.17)
- real_audio: Real audio (.ra)
- s3m: ScreamTracker3 module
- sun_next_snd: Sun/NeXT audio
Container
- asn1: Abstract Syntax Notation One (ASN.1)
- matroska: Matroska multimedia container
- ogg: Ogg multimedia container
- ogg_stream: Ogg logical stream
- real_media: RealMedia (rm) Container File
- riff: Microsoft RIFF container
- swf: Macromedia Flash data
File System
- ext2: EXT2/EXT3 file system
- fat12: FAT12 filesystem
- fat16: FAT16 filesystem
- fat32: FAT32 filesystem
- iso9660: ISO 9660 file system
- linux_swap: Linux swap file
- msdos_harddrive: MS-DOS hard drive with Master Boot Record (MBR)
- ntfs: NTFS file system
- reiserfs: ReiserFS file system
Game
- lucasarts_font: LucasArts Font
- spiderman_video: The Amazing Spider-Man vs. The Kingpin (Sega CD) FMV video
- zsnes: ZSNES Save State File (only version 143)
Image
- bmp: Microsoft bitmap (BMP) picture
- gif: GIF picture
- ico: Microsoft Windows icon or cursor
- jpeg: JPEG picture
- pcx: PC Paintbrush (PCX) picture
- png: Portable Network Graphics (PNG) picture
- psd: Photoshop (PSD) picture
- targa: Truevision Targa Graphic (TGA)
- tiff: TIFF picture
- wmf: Microsoft Windows Metafile (WMF)
- xcf: Gimp (XCF) picture
Misc
- 3do: renderdroid 3d model.
- 3ds: 3D Studio Max model
- chm: Microsoft's HTML Help (.chm)
- hlp: Microsoft Windows Help (HLP)
- lnk: Windows Shortcut (.lnk)
- ole2: Microsoft Office document
- pcf: X11 Portable Compiled Font (pcf)
- pdf: Portable Document Format (PDF) document
- tcpdump: Tcpdump file (network)
- torrent: Torrent metainfo file
- ttf: TrueType font
Program
- elf: ELF Unix/BSD program/library
- exe: Microsoft Windows Portable Executable
- java_class: Compiled Java class
- pifv: EFI Platform Initialization Firmware Volume
- python: Compiled Python script (.pyc/.pyo files)
Video
- asf: Advanced Streaming Format (ASF), used for WMV (video) and WMA (audio)
- flv: Macromedia Flash video
- mov: Apple QuickTime movie
- mpeg_ts: MPEG-2 Transport Stream
- mpeg_video: MPEG video, version 1 or 2
Total: 72 parsers
| File | Type | Py Version | Size | # downloads |
|---|---|---|---|---|
| hachoir-parser-1.1.tar.gz (md5) | Source | 325KB | 659 | |
| hachoir_parser-1.1-py2.5.egg (md5) | Python Egg | 2.5 | 888KB | 51 |
| hachoir_parser-1.1-py2.4.egg (md5) | Python Egg | 2.4 | 897KB | 93 |
- Author: Hachoir team (see AUTHORS file)
- Home Page: http://hachoir.org/wiki/hachoir-parser
- Download URL: http://hachoir.org/wiki/hachoir-parser
- License: GNU GPL v2
- Categories
- Package Index Owner: haypo
- DOAP record: hachoir-parser-1.1.xml
