September 18, 2005

JPEG2000: Cool but SLOW. [OBSOLETE on 10.5]

[October 29, 2007 - Note this post was written for 10.4. Under 10.5, our tests indicate JPEG2000 is now actually faster than JPEG to decode at any size; sometimes a TON faster. This is awesome news. Most of the following post is now just plain wrong, but I'm leaving it for posterity.]

--

As research for a project (-cough- Delicious Library 2 -cough-) on which I'm slaving away, night and day, I did some timing tests to see how Apple's implementation of JPEG2000 performed, especially when creating a thumbnail image from a larger image (as compared to JPEG, which I've used a lot).

As background, I should mention that Delicious Library (1.x) currently stores three separate JPEGs each cover image it downloads from Amazon, pre-rendered (with gloss) and downsampled into three different sizes (small, medium, large). (I store the downsampled images because anti-aliased downsampling turned out to be very expensive.) Every time we draw a cover, we actually draw directly from its appropriately-sized JPEG data, instead of keeping a cached image around, because caching the unpacked images was blowing up memory. (A library of 2,000 items with uncompressed covers would easily take up more than a 1GB of RAM, and we'd start thrashing like mad.) The real beauty to this scheme was that since the images on-disk were JPEGs and we were drawing directly from the data on disk (without changing it in any way), if we ran out of physical memory and some of our textures got paged out, they didn't have to be written to the swap space on the disk; Mach is smart enough to realize that if it has a page that it's mapped directly from a file on disk, there's no point in writing that page out to the swapfile; it can always read the page from the original file again when it needs to. Since writing to disk is the single-slowest thing a program can do, this is a HUGE speed win (if you've got more textures than you have memory).

However, keeping all those fiddly images (small, medium, and large) also increase our memory footprint, not to mention taking up a fair extra chunk of disk space. Wouldn't it be nice if we could just store ONE, full-size image, and then just decompress the first part of it for small images, and more for medium, and the whole thing for large images? (Progressive refinement, as it were.)

Well, as it happens, this is part of what JPEG2000 was born to do (much like Kris and "warming it up"). Not only is JPEG2000 more efficient than JPEG (eg, prettier files are smaller), but you can also create a beautiful low-res version of a JPEG2000 file by reading only its first few bytes, and as you read more you can construct larger and larger images, until, when you've read the last byte, you can reconstruct the full-size image. This would be really convenient with Mac OS X's mapped files ([NSData dataWithContentsOfMappedFile:]), because you can map in the entire JPEG2000 file safe with the knowledge that only the first couple of pages are going to be physically read off the disk if you just want a thumbnail (because mapped files lazily demand-page themselves into memory as they are accessed). (And, remember, disk access is the slowest thing your program can do, so less disk access is better.)

And, conveniently, Mac OS X 10.4 ("Tiger") has a new framework for reading and writing images, and it understands JPEG2000 natively (based on the commercial, C++ Kakadu library). And, in this new Apple framework, there's a function CGImageSourceCreateThumbnailAtIndex() where you can create a thumbnail of an image you haven't read in yet, by specifying a maximum side length (kCGImageSourceThumbnailMaxPixelSize). This would be EXACTLY the kind of call in which one would implement the partial-reading of JPEG2000s in order to quickly read in a low-rez versions of high-rez files.

At least, that's the theory. Because I'm a skeptic, I ran timing tests, and it turns out JPEG2000 is, well... you've already read the title of this post. It's slow. 10x slower than JPEG for a standard decode.

But, that's not all. It turns out that exactly the OPPOSITE of the optimization I thought would happen is happening; with JPEG files smaller thumbnails are actually MUCH faster to create than large ones, whereas with JPEG2000 files creating a thumbnail takes about the same amount of time no matter what size it is. That is, Apple has optimized their JPEG decoder to create thumbnails (and done a GREAT job), but they haven't done so with their JPEG2000 decoder (or they did, but did a REALLY poor job; I prefer to think they just haven't tried yet).

The timing tests below were done with a 1292x1319 RGB image. JPEG2000 and JPEG files were created at full resolution with varying levels of compression to try to be comparable in file size (final file sizes given below). Images were unpacked using CGImageSourceCreateThumbnailAtIndex() with the specified maximum side length (kCGImageSourceThumbnailMaxPixelSize) and then CFReleas()ed 100 times for each test.

graphs courtesy Mike Lee

So, good news, bad news. The good news is, with the new 10.4 image reading framework (CGImageSource...), I can read JPEGs and make beautiful, anti-aliased smaller versions of them EXTREMELY quickly compared to 10.3. Under 10.3, I had to read in the whole JPEG, decompress the whole thing into a full-size buffer, create a small thumbnail buffer, and downsample the whole thing. Obviously, there are a lot of slow parts in that. 10.4's CGImageSourceCreateThumbnailAtIndex() is a godsend in this case -- I can store one large JPEG per book/DVD/whatever and use only the bit of the JPEG that I need for a given size.

The bad news is, right now, JPEG2000 is just not fast enough to use in real time. I've heard that, in general, it's just a slow format, so I'm not bashing Apple here; their implementation may be 40x faster than anyone else's, for all I know. But it's still 10-20x as slow as their JPEG decompression for thumbnails, and that's what matters to me in this case. (But, for example, if I were distributing an online game and were going to decompress my graphic assets only once, I'd use JPEG2000, since it's higher quality and smaller and has full alpha.)


I ran some various tests on JPEG2000s to see if the data I got was anomalous, but apparently it is not. One interesting note for me was that more compressed JPEG2000s don't decompress faster than less compressed JPEG2000s at the same image size. Who knew? Now you do.

--

Followup (9/19): The news gets even better, as it turns out 10.4's CGImageSourceCreateThumbnailAtIndex() is 3-7x faster at creating JPEG thumbnails than the method I had to use under 10.3: creating a CGImageRef using CGImageCreateWithJPEGDataProvider() and then drawing it (with antialiasing turned on) into a smaller window.

How often do you get to speed up your most time-critical portion of code by 3-7x? Go Apple!

Labels:

15 Comments:

Anonymous pat said...

Yeah, who cares about JPEG2000... I just want more Kris Kross references.

September 18, 2005 9:24 PM

 
Anonymous pat said...

Sorry: Kris Kross

September 18, 2005 9:27 PM

 
Anonymous Anonymous said...

Wil, you're aware that this is not a feature specific to JPEG2000, right? That's just an aspect of DCT-based image storage. The Enlightenment guys created a libepeg library that allows progressive refinement for regular JPEGs. Dig it up, it looks like it's what you want. Though, of course, that's not built into OS X...

September 19, 2005 1:55 AM

 
Anonymous Anonymous said...

Wil, go down the street to LizardTech and convince them to release the DjVu with a sane license scheme and wrap that into your next test suite.

September 19, 2005 8:49 AM

 
Anonymous Anonymous said...

You have filed a bug on this, I hope?

-jcr

September 19, 2005 2:04 PM

 
Anonymous Anonymous said...

The key with JPEG-2000 and efficient multi-resolution access is that the image must be compressed with multiple iterations of the wavelet transform. For n iterations, you get n+1 resolutions, each scaled by 1/4 from the previous. You can extract these resolutions simply be truncating the file at the right point, if that is how you choose to compress it.

If you really want to see the power of JPEG-2000 check out the free demo apps on the Kakadu web-site especially the interactive and progressive delivery using JPIP. It is neat.

September 19, 2005 3:34 PM

 
Anonymous Anonymous said...

I don't believe that it is optimized. They appear to be using a relatively old version of Kakadu. I don't that there was any AltiVec optimization in that version of the library. In Kakadu 4.5, I compressed a rather large (10,726 x 9147 pixel) file. Using default settings on the kdu_compress tool and the cpu timing flag, the total time was almost 24 seconds. Remember that image is signifcantly larger than yours. The BMP version of my test file is almost 281 MB.

September 19, 2005 5:37 PM

 
Anonymous Anonymous said...

The other thing to note is that Apple has optimized the shit out of their standard jpeg decompressor. Last time I checked it was the fastest by far. With video photo jpeg just screams.

September 20, 2005 1:43 AM

 
Anonymous Anonymous said...

There are some major cheats you can do to speed up jpeg thumbnail generation. The most obvious one is just to read a few coefficients of each DCT block of the image to decode a lower resolution image. Just reading the DC coefficient gives you a poor 8 times downsampling but then further downsamplings can be done more carefully to give a nice looking small thumbnail.

Have you checked out the jasper implementation of j2k? It is used in Imagemagick and I think there is a fink package for it.

September 21, 2005 3:58 PM

 
Anonymous Anonymous said...

There are some major cheats you can do to speed up jpeg thumbnail generation. The most obvious one is just to read a few coefficients of each DCT block of the image to decode a lower resolution image. Just reading the DC coefficient gives you a poor 8 times downsampling but then further downsamplings can be done more carefully to give a nice looking small thumbnail.

Have you checked out the jasper implementation of j2k? It is used in Imagemagick and I think there is a fink package for it.

September 21, 2005 4:19 PM

 
Blogger Wil Shipley said...

I did check out JasPer. It has a bit of a reputation for being slow, but I wasn't able to test it myself, as it choked trying to read JPEG2Ks with alpha written by Mac OS X, _and_ I couldn't find any obvious API for easily getting quick thumbnails.

At that point my figuring was, if the library Apple licensed, which is supposed to be _faster_ than JasPer, and yet Apple's JPEG thumbnail decompressing is 20x faster than their JPEG2Ks, there's no point in messing further with JasPer.

-W

September 21, 2005 4:37 PM

 
Blogger The Peter Files Blog of Comedy said...

Thanks, as someone who does not have the time to test this kind of stuff and on some levels barely understands it (and on others is with you) I really appreciated this post.

I have been using the older JPEG encoder/decoder for a long time and had not even bothered to check for an upgrade since iPhoto came out.

Now I will look for it, but not the 2000. I think. Perhaps both are worth having, since I do not design games, perhaps the time difference is negligible for me.

Thanks again!

Found you through the Blogger Buzz Page!

Peter,
The Peter Files Blog of Comedy, Satire, Jokes and Commentary

January 21, 2007 11:14 PM

 
Anonymous Anonymous said...

Is this create tumbnail thing faster then using core image when you have the image data in memory?

I'm using CILanczosScaleTransform to scale full screen captures to 640*480 pixles. It is quite fast on G5, but much much slower on Mac mini and Macbook (using on board gpu) or older macs.

BTW, there is a huge quality/performance difference between CILanczosScaleTransform (looks great, slow) and simple scaling with CIAffineTransform (looks like crap, fast)

February 03, 2007 5:56 AM

 
Blogger Wil Shipley said...

Is this create tumbnail thing faster then using core image when you have the image data in memory?

Last I tested, yes, quite a bit faster than downsampling. Things change all the time, though, and the quality will be different.

February 04, 2007 7:20 PM

 
Blogger edward said...

Interesting post.

What I'm wondering though, is why [NSData dataWithContentsOfMappedFile] (which is very cool for Jpeg processing I agree) leaks buckets of memory on 10.5 GM when using 2 threads and GC? Turn off GC, or use one thread, no problem.

Saw you at WWDC07 btw, well done with the award :)

October 29, 2007 10:28 AM

 

Post a Comment

<< Home