Several utilities (first of all the Disk utility and hdiutil) can be used on the
Macintosh to produce and manage disk images. Those disk images exist in several
formats, some of them being declared as obsolete (for instance the images of floppy
disks).
Disk images can be real images of real disks, that is a plain copy, sector after
sector, of the real contents of the disk. They can also be containers used as
virtual disks, handled and managed like a real disk, but holding for instance
the contents of a specific folder.
For this reason, disk images are very often used to distribute software
packages.
The Disk Utility software allows to produce several kinds of empty images. The
mots interesting are the so called MBR. The sector 0 contains a classical MBR,
with the 0xAA55 signature at the end. The system code is 0xAF. The whole image
is formatted according the so-called floppy mode (that is, without a partition
table, volume header in sector 2).
It is also possible to produce an EFI/GPT image. This image uses the same MBR
scheme, but the system code is this time 0xEE. This sector 0 is followed by
a classical GUID partition table (see the
Wiki page for more
information.
On the other hand, following list is given in the hdiutil manual page:
UDRW - UDIF read/write image
UDRO - UDIF read-only image
UDCO - UDIF ADC-compressed image
UDZO - UDIF zlib-compressed image
UDBZ - UDIF bzip2-compressed image (OS X 10.4+ only)
UFBI - UDIF entire image with MD5 checksum
UDRo - UDIF read-only (obsolete format)
UDCo - UDIF compressed (obsolete format)
UDTO - DVD/CD-R master for export
UDxx - UDIF stub image
UDSP - SPARSE (grows with content)
RdWr - NDIF read/write image (deprecated)
Rdxx - NDIF read-only image (Disk Copy 6.3.3 format)
ROCo - NDIF compressed image (deprecated)
Rken - NDIF compressed (obsolete format)
DC42 - Disk Copy 4.2 image
Beside the last five formats, which are clearly of historical nature, the UDRW
image (UDIF read/write image) and the UDTO image (DVD/CD-R master for export)
are uncompressed formats, that is plain images of a real or virtual disk. They can
be opened by utilities like our MacImage and can
be mounted under Linux, etc.
The UDSP format is an image where all the space occupied by the disk is not yet
allocated. Those so-called sparse images and files are a quite different matter
and we won't go further here.
The compressed disk images use several compression schemes, from the simpliest,
where sectors containing nothing but zeros are not copied to the compressed
file, to the highly sophisticated, using some of the best compression schemes,
but always with a cost: it takes time.
A good compromise must be found between the time used for the compression, the
time used for the decompression and the space taken by the image on a storage
disk.
In the compressed images, one can find three different parts:
The data part and the table can have a variable length. On the other hand, the binary block, at the very end of the file, has a fixed length of 512 bytes.
The 'koly' binary block is called by this name because it contains the 'koly'
signature in the first 32-bit integer. This block often contains many parasitic
data remnants because the buffer was not cleared before writing the pertinent
data in it. However, one can easily observe that there are two int64 integers
holding the offset of the Block Table in the compressed file, after the second
one an other int64 integer holding the length of this table. At the end of the block,
one can easily note the CRC32 of the image and, a bit further, the number of the
sectors in the plain image.
I don't know the meaning of the few first int32 integers, but I think it could
be the number of the partitions, the size of the sectors, etc.
The Block Table is stored in an XML file which is often called the plist file,
because it uses the format often used on the Macintosh for the preference lists.
It is also called 'resource-fork' file because this string is present in the
file.
One should note that a XML data segment in a data file should not be called a
resource fork, but this is the way it is. Let's go further.
The Block Table contains several data runs, generally four, but I already observed
a disk image with a single meaningful data run and a second empty one. Those data
runs are stored between <data> and </data> tags. In the case of a disk image from
a Macintosh disk or folder, we have generally four data runs, also called partitions
in the XML file.
The first so-called partition contains the sector 0 of the disk. The Partition
Table contains the sector 1 and so on, up to the volume header block. The Apple_HFS
partition contains the useful contents of the image. The Apple_Free Partition
contains the few last free sectors of the disk image.
Those data runs are stored in the XML file as base64 coded sequences. For more
information on the base64 coding scheme, see for instance the
Wikipedia page.
After decoding, those data runs appear as a header block of 204 bytes, beginning
with the 'mish' signature, followed by blocks of 40 bytes, one for each
compressed data segment in the Data part of the image. The structure of this
bloc is following:
This block allows to manage the data segments. The block types identified so far are following:
To gather, analyze and pinpoint the informations given above, we spent mainly
time looking at real images produced by Disk utility and hdiutil. This allowed
us to identify rather rapidly the Zlib and Bzlib compression schemes and to
find libraries which implement those schemes, namely Zlib.net,
library written by Jean-Loup Gailly and Mark Adler, and
Bzlib.org, library written by Julian Seward.
I want to thank them warmly for their work and all informations given in their
code.
Several welcome confirmations and some precisions were found in the libraries
dmg2iso and dmg2img, diffused on the site vultur.eu.org
and written by Vu1tur and Jean-Pierre Demailly.
The hardest and longest part was the decoding of the ADC compression scheme.
It seems that there is no official documentation available on the Web. Nevertheless,
I could get working code written in a couple of days.
The ADC (Apple Data Compression) scheme relies on both run length coding and pointing to data in a sliding dictionary. The best way to explain the scheme is to express it as pseudo-code:
Read a byte.
If bit 7 is set, this is a data run, whose length is the rest of the bits, plus one.Copy to the target buffer.If bit 6 is set, this is a three-byte code.
Strip the bit 6 and add 4 to get the length. The following two bytes code the offset of the data to be used. This offset is computed backwards from the target pointer. Put a offset pointer to this address. If the difference between the offset pointer and the target pointer is large enough to hold the data length, just do a plain copy from the offset pointer to the target pointer. If not, use memset or the like to copy n times a single byte at the target pointer.If none is set, this is a two-byte code.
The length is coded in bits 2345 and the offset is coded in bits 01 of the first byte and in the other byte. Add 3 to the length (that is, 0000 codes 3, 0001 4, and so on). Like in the three-byte codes, depending on the difference between the offset pointer and the target pointer, use memcpy or memset.As long as there is still data to decode.
Our MacImage utility can manage such image files (decompress them, display their content to copy some files, etc.).