As research for a project (-cough- Delicious Library 2 -cough-) on which I'm slaving away, night and day, I did some timing tests to see how Apple's implementation of JPEG2000 performed, especially when creating a thumbnail image from a larger image (as compared to JPEG, which I've used a lot).
As background, I should mention that Delicious Library (1.x) currently stores three separate JPEGs each cover image it downloads from Amazon, pre-rendered (with gloss) and downsampled into three different sizes (small, medium, large). (I store the downsampled images because anti-aliased downsampling turned out to be very expensive.) Every time we draw a cover, we actually draw directly from its appropriately-sized JPEG data, instead of keeping a cached image around, because caching the unpacked images was blowing up memory. (A library of 2,000 items with uncompressed covers would easily take up more than a 1GB of RAM, and we'd start thrashing like mad.) The real beauty to this scheme was that since the images on-disk were JPEGs and we were drawing directly from the data on disk (without changing it in any way), if we ran out of physical memory and some of our textures got paged out, they didn't have to be written to the swap space on the disk; Mach is smart enough to realize that if it has a page that it's mapped directly from a file on disk, there's no point in writing that page out to the swapfile; it can always read the page from the original file again when it needs to. Since writing to disk is the single-slowest thing a program can do, this is a HUGE speed win (if you've got more textures than you have memory).
However, keeping all those fiddly images (small, medium, and large) also increase our memory footprint, not to mention taking up a fair extra chunk of disk space. Wouldn't it be nice if we could just store ONE, full-size image, and then just decompress the first part of it for small images, and more for medium, and the whole thing for large images? (Progressive refinement, as it were.)
Well, as it happens, this is part of what JPEG2000 was born to do (much like Kris and "warming it up"). Not only is JPEG2000 more efficient than JPEG (eg, prettier files are smaller), but you can also create a beautiful low-res version of a JPEG2000 file by reading only its first few bytes, and as you read more you can construct larger and larger images, until, when you've read the last byte, you can reconstruct the full-size image. This would be really convenient with Mac OS X's mapped files ([NSData dataWithContentsOfMappedFile:]), because you can map in the entire JPEG2000 file safe with the knowledge that only the first couple of pages are going to be physically read off the disk if you just want a thumbnail (because mapped files lazily demand-page themselves into memory as they are accessed). (And, remember, disk access is the slowest thing your program can do, so less disk access is better.)
And, conveniently, Mac OS X 10.4 ("Tiger") has a new framework for reading and writing images, and it understands JPEG2000 natively (based on the commercial, C++ Kakadu library). And, in this new Apple framework, there's a function CGImageSourceCreateThumbnailAtIndex() where you can create a thumbnail of an image you haven't read in yet, by specifying a maximum side length (kCGImageSourceThumbnailMaxPixelSize). This would be EXACTLY the kind of call in which one would implement the partial-reading of JPEG2000s in order to quickly read in a low-rez versions of high-rez files.
At least, that's the theory. Because I'm a skeptic, I ran timing tests, and it turns out JPEG2000 is, well... you've already read the title of this post. It's slow. 10x slower than JPEG for a standard decode.
But, that's not all. It turns out that exactly the OPPOSITE of the optimization I thought would happen is happening; with JPEG files smaller thumbnails are actually MUCH faster to create than large ones, whereas with JPEG2000 files creating a thumbnail takes about the same amount of time no matter what size it is. That is, Apple has optimized their JPEG decoder to create thumbnails (and done a GREAT job), but they haven't done so with their JPEG2000 decoder (or they did, but did a REALLY poor job; I prefer to think they just haven't tried yet).
The timing tests below were done with a 1292x1319 RGB image. JPEG2000 and JPEG files were created at full resolution with varying levels of compression to try to be comparable in file size (final file sizes given below). Images were unpacked using CGImageSourceCreateThumbnailAtIndex() with the specified maximum side length (kCGImageSourceThumbnailMaxPixelSize) and then CFReleas()ed 100 times for each test.
|graphs courtesy Mike Lee|
So, good news, bad news. The good news is, with the new 10.4 image reading framework (CGImageSource...), I can read JPEGs and make beautiful, anti-aliased smaller versions of them EXTREMELY quickly compared to 10.3. Under 10.3, I had to read in the whole JPEG, decompress the whole thing into a full-size buffer, create a small thumbnail buffer, and downsample the whole thing. Obviously, there are a lot of slow parts in that. 10.4's CGImageSourceCreateThumbnailAtIndex() is a godsend in this case -- I can store one large JPEG per book/DVD/whatever and use only the bit of the JPEG that I need for a given size.
The bad news is, right now, JPEG2000 is just not fast enough to use in real time. I've heard that, in general, it's just a slow format, so I'm not bashing Apple here; their implementation may be 40x faster than anyone else's, for all I know. But it's still 10-20x as slow as their JPEG decompression for thumbnails, and that's what matters to me in this case. (But, for example, if I were distributing an online game and were going to decompress my graphic assets only once, I'd use JPEG2000, since it's higher quality and smaller and has full alpha.)
I ran some various tests on JPEG2000s to see if the data I got was anomalous, but apparently it is not. One interesting note for me was that more compressed JPEG2000s don't decompress faster than less compressed JPEG2000s at the same image size. Who knew? Now you do.
Followup (9/19): The news gets even better, as it turns out 10.4's CGImageSourceCreateThumbnailAtIndex() is 3-7x faster at creating JPEG thumbnails than the method I had to use under 10.3: creating a CGImageRef using CGImageCreateWithJPEGDataProvider() and then drawing it (with antialiasing turned on) into a smaller window.
How often do you get to speed up your most time-critical portion of code by 3-7x? Go Apple!