Nice job!
I would guess that if you used the 3d hardware rather than the framebuffer you should be able to just update a 32x128 texture for the actual DMD and do the drawing using a shader and have the performance be much less dependent on output resolution.
Also I haven't really looked at the RPi documentation much, but it doesn't have any sort of hardware serial that can automatically latch the input on that changing edge?