convert -page A4 image.jpg out.pdfThe resulting PDF would simply embed the jpeg image making it only a few Kb larger. However now with v6.5.4 of ImageMagick the default behaviour with this command is to uncompress the jpeg and store it in a lossless format. In my case, a scanned A4 page jumped from 150Kb to 7Mb.
Given the lack of documentation on ImageMagicks output format settings, it took a bit of experimenting to find the new command to embed a jpeg is:
convert -page A4 -compress jpeg image.jpg out.pdfThe additional compress option reproduces the original results.
This is part of a larger custom bash shell script that automates "one click" scanning of sequential paper pages into a PDF on ubuntu. Simple, fixed settings, no messing around.
 
Thank you VERY MUCH! It was exactly what I needed!
ReplyDelete5 jpeg images with about 800 KB each were becoming a 123 MB pdf file! LOL
Now how to convert few jpeg files into single pdf?
ReplyDeleteI use pdftk to join multiple pdf's into one file. In its most basic usage:
ReplyDeletepdftk *.pdf cat output outfile.pdf
To convert multiple Jpeg files:
ReplyDeletetry
convert *.jpeg test.pdf
or
convert *.jpg test.pdf
@Matthew, that works okay for a couple of pages. Last time i tried that with 20 jpeg pages imagemagick jammed up trying to request a few GB of ram.
ReplyDeletepdftk adds some great additional functionality thats worth investigating: http://www.accesspdf.com/pdftk
Hi Rob, I would like to see your script bash.. I am interested..
ReplyDelete@Manuel, I have posted the complete script i have been using at http://www.rrfx.net/2009/11/batch-scanning-paper-documents-to-pdf.html ...let me know if it helps you out!
ReplyDeleteHello,
ReplyDeleteI've observed that:
convert -compress jpeg in.jpg out.pdf
won't simply put the JPEG image into the output document, but it will instead *recompress* it, thereby losing data.
Is there a way around this?
Now this is odd:
ReplyDeletetlon:~/pdf-jpeg-test$ convert -compress jpeg original.jpg original.pdf
tlon:~/pdf-jpeg-test$ v
total 304
-rw-r--r-- 1 orbis tertius 186761 2009-11-25 17:59 original.jpg
-rw-r--r-- 1 orbis tertius 113360 2009-11-25 18:01 original.pdf
See the PDF file is smaller than the JPEG. Extracting the JPEG with pdfimages -j and then comparing it with the original one shows visible differences.
On the other hand, (re)compressing the JPEG picture before "converting" it into PDF results in the PDF containing the unmodified JPEG data:
tlon:~/pdf-jpeg-test$ convert -quality 99 original.jpg 99original.jpg
tlon:~/pdf-jpeg-test$ convert -compress jpeg 99original.jpg 99original.pdf
tlon:~/pdf-jpeg-test$ v 99*
-rw-r--r-- 1 orbis tertius 201099 2009-11-25 18:01 99original.jpg
-rw-r--r-- 1 orbis tertius 207282 2009-11-25 18:02 99original.pdf
tlon:~/pdf-jpeg-test$ convert -quality 50 original.jpg 50original.jpg
tlon:~/pdf-jpeg-test$ convert -compress jpeg 50original.jpg 50original.pdf
tlon:~/pdf-jpeg-test$ v 50*
-rw-r--r-- 1 orbis tertius 76878 2009-11-25 18:02 50original.jpg
-rw-r--r-- 1 orbis tertius 79395 2009-11-25 18:02 50original.pdf
Hi Orbis, i was about to (re)post a long reply to that effect. Unfortunately Firefox 3.5.5 is a buggy piece of crap and it crashed while i was waiting for Kdiff3.
ReplyDeleteA binary diff between an original test jpeg (8MB), and the one extracted from a pdf with "pdfimages -j" was identical for the first 40% of the file, and completely different for the other 60%. Odd, but it make sense that a single bit difference would then make the rest of the jpeg's different.
I remember doing tests like this way back when i first set myself up for scanning paper documents. Enough tests to be convinced that the jpeg was as good as being stored. Progressive scan jpegs were converted to baseline first.
It seems like imagemagick stores the quality level in the jpeg. I've noticed the same behaviour in Gimp when you hit "save as" on a jpeg, close it, reopen it and hit "save as" again. However Gimp doesnt pick up the quality level that Imagemagick seems to have written to the file.
Given i'm using imagemagick for all my postprocessing i've not found it to be a problem. Cheers.
For anyone wanting to test this:
ReplyDelete#convert -quality 66 dsc07857.jpg test.jpg
#convert -compress jpeg test.jpg test.pdf
#pdfimages -j test.pdf out
#ls -l (reordered source->jpg->pdf->extracted jpg)
-rw-r--r-- 1 rob rob 52352 2008-04-05 14:02 dsc07857e800.jpg
-rw-r--r-- 1 rob rob 29499 2009-11-26 01:46 test.jpg
-rw-r--r-- 1 rob rob 32418 2009-11-26 01:46 test.pdf
-rw-r--r-- 1 rob rob 29481 2009-11-26 01:47 out-000.jpg
The extracted jpeg is almost the same file size. Binary diff:
#kdiff3 test.jpg out-000.jpg
In this case shows the first 20% of binary jpeg data to be the same
To verify the jpeg data is the same, convert to a bitmap and binary diff **:
#convert test.jpg test.bmp
#convert out-000.jpg out-000.bmp
#kdiff3 test.bmp out-000.bmp
Here the bitmap header is different, however the image data is identical.
**dont try this on large jpeg files
Thanks a lot, I use a gnome nautilus script with ubuntu :
ReplyDelete#!/bin/bash
IFS='
'
convert -page a4 -quality 50 -compress jpeg $NAUTILUS_SCRIPT_SELECTED_FILE_PATHS photos.pdf
Hi thanks for your observation.I have a few images downloaded from internet.I want to retain their quality.I used command as follows
ReplyDeleteconvert image.jpg image.pdf
I observed doing
convert -page A4 image.jpg out.pdf and
convert -page A4 -compress jpeg image.jpg out.pdf
had no difference in the two resulting pdf's.The size of image is 209.7Kb and resulting pdf in both cases are 204.3 Kb.I see a bit of loss in quality of converted pdf.Is it possible to retain the image quality some how.
Your converted pdf is smaller than the original jpeg because it's probably used "progressive" encoding in the stored jpeg. This should be lossless, see wikipedia: http://en.wikipedia.org/wiki/JPEG
ReplyDelete"It has been found that Baseline Progressive JPEG encoding usually gives better compression as compared to Baseline Sequential JPEG due to the ability to use different Huffman tables"
and
"It is also possible to transform between baseline and progressive formats without any loss of quality, since the only difference is the order in which the coefficients are placed in the file"