AIR,
Here is the results of your C submission.
Yep, it's not the most optimized way of implementing this, but you didn't want predefined buffers. This implementation creates a new buffer each time 'join' is called and concats the data, which is expensive.
BTW, your swift code compiles and runs fine on my Mac Mini. There is a difference when compiled in debug mode (the default) and release mode.
[riveraa@mini ~/Projects/Swift/blah] $ time .build/x86_64-apple-macosx10.10/release/blah
r LEN: 999986
Front: ZYXWVUTSRQPONMLKJIHGFEDCBA
Back: ZYXWVUTSRQPONMLKJIHGFEDCBA
real 0m1.979s
user 0m1.846s
sys 0m0.130s
[riveraa@mini ~/Projects/Swift/blah] $ time .build/x86_64-apple-macosx10.10/debug/blah
r LEN: 999986
Front: ZYXWVUTSRQPONMLKJIHGFEDCBA
Back: ZYXWVUTSRQPONMLKJIHGFEDCBA
real 0m2.678s
user 0m2.541s
sys 0m0.132sFor comparison, the C code executed on my Mac Mini:
r LEN: 999986
Front: ZYXWVUTSRQPONMLKJIHGFEDCBA
Back: ZYXWVUTSRQPONMLKJIHGFEDCBA
UBVal: 1000000
real 0m2.554s
user 0m2.405s
sys 0m0.145s