[Dcmlib] [Fwd: pdf parser for generating XML like document]

Mathieu Malaterre mathieu.malaterre at kitware.com
Tue Oct 25 21:10:53 CEST 2005


one more thing. kword has a pretty decent pdf importer. You can even 
select a range of pages to import. opensource is cool :)

Mathieu

Mathieu Malaterre wrote:
> 
> No, PDF has no concept of tables, as such.  It's just commands to select
> fonts and draw text, and some other commands to draw horizontal lines,
> etc.
> 
> I don't know of any easy way to convert PDF to XML for the sort of
> application you're working on, sorry.
> 
> - Derek
> 
> -------- Original Message --------
> Subject: pdf parser for generating XML like document
> Date: Sun, 23 Oct 2005 17:31:56 -0400
> 
> Hello,
> 
>     I did search for a mailing list on the following web site:
> http://www.foolabs.com/xpdf/
> 
>     and since I could not find it, I am writting to you directly.
> 
>     I have the following problem. DICOM is a file format that is specified
> by NEMA at:
> 
> http://medical.nema.org/dicom/2004.html
> 
>     In particular if you look at the document: (1)
> http://medical.nema.org/dicom/2004/04_06PU.PDF
> 
>  The spec is huge. Therefore I am using pdftotext + python script to
> generate a custom output. You can find everything here:
> 
> The python script
> (bascially takes as input the output of `pdftotext -raw -nopgbrk`
> http://cvs.creatis.insa-lyon.fr/viewcvs/viewcvs.cgi/gdcm/Dicts/ParseDict.py
> 
> And here is the cleanup output (python script+hand writting):
> http://cvs.creatis.insa-lyon.fr/viewcvs/viewcvs.cgi/gdcm/Dicts/dicomV3.dic
> 
> This is very difficult to maintain as every year a new spec is release.
> 
>     Therefore I was wondering if you could give me some advice on how to
> parse the PDF document(1). Is there some table start/end marker in the
> pdf file that I can use. Is there any API, of the pdf lib that would
> allow me to generate an 'XML' like description of the PDF in a neutral 
> way ?
> 
> Thanks so much for your time,
> Mathieu
> Ps: If such ML exist, forgive me and please give the reference so that I
> can ask this question.
> 
> _______________________________________________
> Dcmlib mailing list
> Dcmlib at creatis.insa-lyon.fr
> http://www.creatis.insa-lyon.fr/mailman/listinfo/dcmlib
> 




More information about the Dcmlib mailing list