[Rtk-users] Stream divisions in rtkfdk

Tue May 27 00:12:50 CEST 2014

Hi Chao,
Thanks for the detailed report.

On Thu, May 22, 2014 at 10:06 AM, Chao Wu <wuchao04 at gmail.com> wrote:

> Hi Simon,
>
> Thanks for the suggestions.
>
> The problem could be reproduced here (8G RAM, 1.5G GRAM, RTK1.0.0) by:
>
> rtksimulatedgeometry -n 30 -o geometry.xml --sdd=1536 --sid=384
> rtkprojectgeometricphantom -g geometry.xml -o projections.nii --spacing
> 0.6 --dimension 1944,1536 --phantomfile SheppLogan.txt
> rtkfdk -p . -r projections.nii -o fdk.nii -g geometry.xml --spacing 0.4
> --dimension 640,250,640 --hardware=cuda -v -l
>
> With #define VERBOSE (btw I got it in itkCudaDataManager.cxx instead of
> itkCudaImageDataManager.hxx) now I can have a better view of the GRAM
> usage.
> I found that the size of the volume data in the GRAM could be reduced by
> --divisions but the amount of projection data sent to the GRAM are not
> influenced by --lowmem switch.
>
After looking at the code again, lowmem acts on the reading so it's not
related to the GPU memory but on the CPU memory, sorry about that. The
reconstruction algorithm does stream the projections but it processes by
default 16 projections at a time. You can change this in
rtkFDKConeBeamReconstructionFilter.txx line 28 to, e.g., 2. This will
reduce your GPU memory consumption (I checked and it works for me). Let me
know if it works for you and if you think that this should be made an
option of rtkfdk.

> So --divisions does not help much if it is mainly the projection data
> which takes up GRAM, while --lowmem does not help at all. I did not look
> into the more front part of the code so I am not sure if this is the
> designed behaviour.
>
> On the other hand, I am also looking for possibilities to reduce GRAM used
> in the CUDA ramp filter. At least one thing should be changed, and one
> thing may be considered:
> - in rtkCudaFFTRampImageFilter.cu the forward FFT plan (fftFwd) should be
> destroyed earlier, right after the plan being executed. A plan takes up at
> least the same amount of memory as the data.
>
Good point, I changed it:
https://github.com/SimonRit/RTK/commit/bbba5ccd86d34ab8b4d9bc47b3ce6e2e176afc35

> - cufftExecR2C and cufftExecC2R can be in-place. However I do not have a
> clear idea about how to pad deviceProjection to the required size of
> its cufftComplex counterpart.
>
I'm not sure it should be done in-place since rtk::FFTRampImageFilter is
not an itk::InPlaceImageFilter. It might be possible but I would have to
check. Let me know if you investigate this further.
Thanks again,
Simon

>
> Any comments?
>
> Best regards,
> Chao
>
>
>
> 2014-05-21 14:30 GMT+02:00 Simon Rit <simon.rit at creatis.insa-lyon.fr>:
>
> Since it fails in cufft, it's the memory of the projections that is a
>> problem. Therefore, it is not surprising that --divisions has no
>> influence. But --lowmem should have an influence. I would suggest:
>> - to uncomment
>> //#define VERBOSE
>> in itkCudaImageDataManager.hxx and try to see what amount of memory
>> are requested.
>> - to try to reproduce the problem with simulated data so that we can
>> help you in finding a solution.
>> Simon
>>
>> On Wed, May 21, 2014 at 2:21 PM, Chao Wu <wuchao04 at gmail.com> wrote:
>> > Hi Simon,
>> >
>> > Yes I switched on an off the --lowmem option and it has no influence on
>> the
>> > behaviour I mentioned.
>> > In my case the system memory is sufficient to handle the projections
>> plus
>> > the volume.
>> > The major bottleneck is the amount of graphics memory.
>> > If I reconstruct a little bit more slices than the limit that I found
>> with
>> > one stream, the allocation of GPU resource for CUFFT in the
>> > CudaFFTRampImageFilter will fail (which was more or less expected).
>> > However with --divisions > 1 it is indeed able to reconstruct more
>> slices,
>> > but only a very few more; otherwise the CUFFT would fail again.
>> > I would expect the limitations of the amount of slices to be
>> approximately
>> > proportional to the number of streams, or do I miss anything about
>> stream
>> > division?
>> >
>> > Thanks,
>> > Chao
>> >
>> >
>> >
>> > 2014-05-21 13:43 GMT+02:00 Simon Rit <simon.rit at creatis.insa-lyon.fr>:
>> >
>> >> Hi Chao,
>> >> There are two things that use memory, the volume and the projections.
>> >> The --divisions option divides the volume only. The --lowmem option
>> >> works on a subset of projections at a time. Did you try this?
>> >> Simon
>> >>
>> >> On Wed, May 21, 2014 at 12:18 PM, Chao Wu <wuchao04 at gmail.com> wrote:
>> >> > Hoi,
>> >> >
>> >> > I may need some hint about how the stream division works in rtkfdk.
>> >> > I noticed that the StreamingImageFilter from ITK is used but I cannot
>> >> > figure
>> >> > out quickly how the division has been performed.
>> >> > I did some test with reconstructing 400 1500x1200 projections into a
>> >> > 640xNx640 volume (the pixel and voxel size are comparable).
>> >> > The reconstructions were executed by rtkfdk with CUDA.
>> >> > When I leave the origin of the volume at the center by default, I can
>> >> > reconstruct up to N=200 slices with --divisions=1 due to the
>> limitation
>> >> > of
>> >> > the graphic memory. Then when I increase the number of divisions to
>> 2, I
>> >> > can
>> >> > only reconstruct up to 215 slices; and with divisions to 3 only up to
>> >> > 219
>> >> > slices. Does anyone have an idea why it scales like this?
>> >> > Thanks in advance.
>> >> >
>> >> > Best regards,
>> >> > Chao
>> >> >
>> >> > _______________________________________________
>> >> > Rtk-users mailing list
>> >> > Rtk-users at openrtk.org
>> >> > http://public.kitware.com/cgi-bin/mailman/listinfo/rtk-users
>> >> >
>> >
>> >
>>
>
>
-------------- next part --------------
An HTML attachment was scrubbed...
URL: <http://www.creatis.insa-lyon.fr/pipermail/rtk-users/attachments/20140527/7a246bb0/attachment.htm>