Raytracing revisited

August 27th, 2009

Name? bla.
Sex? Male.
Hobby? Raytracing!
Duh. ;-/

Another program realized for my university; this time the main idea was to write something multitasking in assembler of x86. The program raytraces only quaternion julia fractals and is much more fun from benchmarking point of view. Program was written in C first, then using OProfile I've checked which functions take the most of the CPU time and those got rewritten in asm.

Algorithm has C version (x86/x64), optimized SSE2 assembler (x86), plain CUDA version (tested on x64 only, no crosscompilation) and mixed version - CUDA + C. They are selectable using compile-time macros. What makes it more fun is that the C version can be compiled by GCC using either x87 arithmetics or (naive) SSE2. Pictures, links and benchmark summary follow.

Example of generated image:
Julia

When generated with both CPU and GPU it can be easily seen that GPU uses simplified IEEE-754 (float) implementation (upper is CUDA).
Benchmark

I tested the program on 3Ghz * 8 core computer with a "rather slow" graphic card (it had to run on 1x pci-e slot and those are expensive and not game-oriented.) - Quadro NVS 290.

When only one core was at work the GPU clearly wins, second is my asm - yay. GCC generated 32bit and 64bit versions when both are using SSE doesn't seem to diverge. 64bit was a tiny bit faster. x87 lose.
Benchmark

When ran with one thread per core, the CPUs worked much faster than my GPU, and my asm worked mostly faster than plain GCC-SSE version, yet 8cpus + gpu outperformed them all. Pity CUDA didn't support crosscompilation. I'd have tried my asm + CUDA.
Benchmark

Rendering a 200 frames of a movie. Not sure why CPU+GPU takes such a long time. I've probably messed something up.
Benchmark

Profiler output. First six functions (one bar = one function) were implemented in asm. On vertical axis - percent of total program time.
Benchmark

Presentation (In Polish! But has some math and other pictures; made with LaTeX-Beamer): presentation.pdf
Project summary (In Polish!): sprawozdanie.pdf

Pack of code (without docs etc., GPL3+): JuliaTracer.tar.bz2 and signature.

Disclaimer: This was my first CUDA program ever, it was fun to write but I'm sure any CUDA-magician could have written this part much better, therefore don't treat those benchmarks seriously.

Add a comment [+] Hide the comment form [-]

I'm a bot, please ignore this comment.

I'm not a bot. Tick this box.

This is a mail of my good friend who loves rubbish email. Send him some if you want to land on my not-so-welcome lists: John Sparrow john@thera.be john(at)thera.be