I briefly looked over the code for your CPU detection stuff.. It's actually, kind of bloated in a way. You're also passing -O3 in a few spots with mtune - That just increases the size of the binary in most cases with GCC - CLANG is a lot better with -O3..
If you're compiling code on 'customer' machines, you should just drop the mtune=generic stuff and stick with march overall which would save a lot of overhead on guessing CPU types? It also would eliminate a bunch of backend things.. For example, you could set a global define like this, ie:
$ CFLAGS=`gcc -march=native -E -v - </dev/null 2>&1 | grep cc1 | sed 's/^.*\(-march.*$\).*$/\1/'` | echo $CFLAGS
-march=ivybridge -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=15360 -mtune=ivybridge
Also - Why aren't you using mtune=native for AMD processors?
I also spotted this, ie:
set_intelflags() {
if [[ "$INTELOPT" = [yY] ]]; then
if [[ "$(uname -m)" == 'x86_64' && $(grep Intel /proc/cpuinfo) ]]; then
CFLAGS='-O2 -m64 -march=native -pipe -g -mmmx -msse3'
CXXFLAGS='-O2 -m64 -march=native -pipe -g -mmmx -msse3'
export CFLAGS
export CXXFLAGS
elif [[ "$(uname -m)" != 'x86_64' && $(grep Intel /proc/cpuinfo) ]]; then
CFLAGS='-O2 -m32 -march=native -pipe -g -mmmx -msse3'
CXXFLAGS='-O2 -m32 -march=native -pipe -g -mmmx -msse3'
export CFLAGS
export CXXFLAGS
fi
fi
}
This is actually, confusing. You're passing -m64/-m32 in CFLAGS, but autoconf automatically does this check anyways, unless someone wants to run IA32 on AMD64/x86_64 CPUs; You're also forcing mtune=native, and then forcing mmx and msse3 on top of it already, which mtune would already enable.. If you had a 32bit CPU from say, 2000, and you tried to use the second block of code, you would get SIGILL's - Most modern CPUs that are used on servers have MMX/SSE3.
GEN_MTUNEOPT="-m${CCM} -march=native"
# if only 1 cpu thread use -O2 to keep compile times sane
if [[ "$CPUS" = '1' ]]; then
export CFLAGS="-O2 $GEN_MTUNEOPT -pipe"
else
export CFLAGS="-O3 $GEN_MTUNEOPT -pipe"
fi
According to the comment, you're actually reducing the CFLAG based on the number of CPUs to lower compile times; that might seem OK in practice, since gcc will do a little more loop optimizations, and increase instruction counts, etc for functions that -O2 skipped over (branch optimization is extreme in this case)
But in testing, this doesn't do anything at all as far as I can see:
With -O2 on PHP 5.6.30 with standard configure flags:
real 3m7.799s
user 2m05.429s
sys 0m20.391s
With -O3 and a mount -o remount / to blow up the VFS cache (so -pipe et al isn't VFS cached)
real 3m8.007s
user 2m07.017s
sys 0m18.870s
I'm not trying to say this product is a bad idea, I just was curious on what compiler stuff you were doing.
if [[ "$CPUVENDOR" != 'GenuineIntel' ]]; then
CPUCCOPT="--with-cc-opt="-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -m${CCM} -mtune=generic""
else
CPUCCOPT="--with-cc-opt="-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -m${CCM} -mtune=native""
fi
This is another confusing block of code - You're applying mtune=generic to a Intel CPU, but not non-Intel CPUs, which you would think would be the other way around.
Honestly, you should really consider redoing all the backend code for 'GCC/CLANG' so it either uses march=native if you are going to simply compile it on customers' servers. FYI, some instructions on containers are not available to be used and you might even get SIGILL's as well (I've never seen this in practice, only busted glibc versions where AVX was announced by the CPU, but disabled in glibc due to an xsave() bug)