top | item 28348262

(no title)

pgroves | 4 years ago

How to make png encoding much faster? I'm working with large medical images and after a bit of work we can do all the needed processing in under a second (numpy/scipy methods). But then the encoding to png is taking 9-15secs. As a result we have to pre-render all possible configurations and put them on S3 b/c we can't do the processing on demand in a web request.

Is there a way to use multiple threads or GPU to encode pngs? I haven't been able to find anything. The images are 3500x3500px and compress from roughly 50mb to 15mb with maximum compression (so don't say to use lower compression).

discuss

gred|4 years ago

I've spent some time on this problem -- classic space vs. time tradeoff. Usually if you're spending a lot of time on PNG encoding, you're spending it compressing the image content. PNG compression uses the DEFLATE format, and many software stacks leverage zlib here. It sounds like you're not simply looking to adjust the compression level (space vs. time balance), so we'll skip that.

Now zlib specifically is focused on correctness and stability, to the point of ignoring some fairly obvious opportunities to improve performance. This has led to frustration, and this frustration has led performance-focused zlib forks. The guys at AWS published a performance-focused survey [1] of the zlib fork landscape fairly recently. If your stack uses zlib, you may be able to find a way to swap in a different (faster) fork. If your stack does not use zlib, you may at least be able to find a few ideas for next steps.

[1] https://aws.amazon.com/blogs/opensource/improving-zlib-cloud...

lightcatcher|4 years ago

I have no experience in PNG encoding, but found https://github.com/brion/mtpng The author mentions "It takes about 1.25s to save a 7680×2160 desktop screenshot PNG on this machine; 0.75s on my faster laptop." which makes me think your slower performance on smaller images either comes using the max compression setting or using hardware with worse single threaded performance.

Although these don't directly solve the PNG encoding performance problem, maybe some of these ideas could help?

* if users will be using the app in an environment with plenty of bandwidth and you don't mind paying for server bandwidth, could you serve up PNGs with less compression? Max compression takes 15s and saves 35MB's. If the users have 50mbit internet, then it only takes 5.6s to transmit the extra 35MB, so you could come out 10s ahead by not compressing. (yes, I see your comment about "don't say to use lower compression", but no reason to be killed by compression CPU cost if the bandwidth is available).

* initially show the user a lossy image (could be a downsized png) that can be quickly generated. You could then upgrade to a full quality once you finish encoding the PNG, or if server bandwidth/CPU usage is an issue then you could only upgrade if the user clicks a "high-quality" button or something. If server CPU usage is an issue, the low then high quality approach could let you turn down the compression setting and save some CPU at the cost of bandwidth and user latency.

minhmeoke|4 years ago

Are you required to use PNG or could you save the files in an alternative lossless format like TIFF [1]? If you're stuck with PNG, mtpng [2] mentioned earlier seems to be significantly faster with multithreading (>40% reduction in encoding times). If you're publishing for web, TIFF or cwebp might also be possibilities with -mt (multithreading) and -q 25 (lower compression and larger filesize but faster) flags, or an experimental GPU implementation [3].

[1] https://blender.stackexchange.com/questions/148231/what-imag...

[2] https://github.com/brion/mtpng

[3] https://emmaliu.info/15418-Final-Project/

Const-me|4 years ago

GPGPU is the way to go.

Not terribly hard if you only need 1-2 formats supported, e.g. RGBA8 only. You don't need to port the complete codec, only some initial portion of the pipeline and stream data back from GPUs, the last steps with lossless compression of the stream ain't a good fit for GPUs.

If you want the code to run on a web server, after you'll debug the encoder your next problem is where to deploy. NVidia teslas are frickin expensive. If you wanna run on public clouds, I'd consider their VMs with AMD GPUs.

pgroves|4 years ago

Thanks, I hadn't heard of that and I will look into it. This is a research setting with plenty of hardware we can request and not a huge number of users so that part doesn't worry me.

physicles|4 years ago

I solved a similar problem last year. As others have said, your bottleneck is the compression scheme that PNG uses. Turning down the level of compression will help. If you can build a custom intermediate format, you'll see huge gains.

Here's what that custom format might look like.

(I'm guessing these images are gray scale, so the "raw" format is uint16 or uint32)

First, take the raw data and delta encode it. This is similar to PNG's concept of "filters" -- little processors that massage the data a bit to make it more compressible. Then, since most of the compression algorithms operate on unsigned ints, you'll need to apply zigzag encoding (this is superior to allowing integer underflow, as benchmarks will show).

Then, take a look at some of the dedicated integer compression algorithms. Examples: FastPFor (or TurboPFor), BP32, snappy, simple8b, and good ol' run length encoding. These are blazing fast compared to gzip.

In my use case, I didn't care how slow compression was, so I wrote an adaptive compressor that would try all compression profiles and select the smallest one.

Of course, benchmark everything.

bjornlouser|4 years ago

> Is there a way to use multiple threads or GPU

Maybe you could write the png without compression, compress chunks of the image in parallel using 7z, then reconstitute and decompress on the client side.

pgroves|4 years ago

This is on our list of possibilities. It would take a little more time than I'd like to spend on this problem but it would work.

primitivesuave|4 years ago

I would also be interested in knowing the answer to this. Currently we use OpenSeadragon to generate a map tiling of whole slide images (~4 GB per image), then stitch together and crop tiles of a particular zoom layer to produce PNGs of the desired resolution.

yboris|4 years ago

I'm unsure if this will help, but the new image format JPEG XL (.jxl) is coming soon to replace JPEG. It will have a lossless and a lossy abilities. It claims to be faster than JPEG.

Another neat feature is that it's designed to be progressive, so you could host a single 10mb original file, and the client can download just the first 1mb (up to the quality they are comfortable with).

Take a look: https://jpegxl.info/

pgroves|4 years ago

This is a research university that moves very slow, so waiting two years for something better is actually a possibility (and prerendering to S3 works ok for now). I'll keep this bookmarked.

dehrmann|4 years ago

Since this is Python, which encoder are you using? I'd make sure it's in C, not Python. You might also be spending a lot of time converting numpy arrays to Python arrays.

pella|4 years ago

also check the FPGA cards (ask the Xilinx; Altera/Intel, ...)