Name? bla.
Sex? Male.
Hobby? Raytracing!
Duh. ;-/
Another program realized for my university; this time the main idea was to write something multitasking in assembler of x86. The program raytraces only quaternion julia fractals and is much more fun from benchmarking point of view. Program was written in C first, then using OProfile I've checked which functions take the most of the CPU time and those got rewritten in asm.
Algorithm has C version (x86/x64), optimized SSE2 assembler (x86), plain CUDA version (tested on x64 only, no crosscompilation) and mixed version - CUDA + C. They are selectable using compile-time macros. What makes it more fun is that the C version can be compiled by GCC using either x87 arithmetics or (naive) SSE2. Pictures, links and benchmark summary follow.