core time Run (gnu compiler, O3) 4096 2.754E-04 N1024_16x16x16 8192 1.194E-04 S256k4_16_1em18a 16384 6.046E-05 N1024_16x32x32 32768 3.953E-05 N1024_32x32x32