amuck-landowner

PHP 7.2.0 Alpha 1 Out Now

eva2000

Active Member
The next PHP version 7.20 Alpha1 has been released http://php.net/archive/2017.php#id2017-06-08-2.

Long way to go before stable http://wiki.php.net/todo/php72#timetable but looking good from quick benchmarks I did while adding multi PHP-FPM version support via Remi SCL PHP-FPM Yum repo in my Centmin Mod LEMP installer at https://community.centminmod.com/threads/php-7-2-0-alpha-1.11940/

Benchmarks

phpbenchmarks-110617.png

Wordpress 4.8 Benchmarks

Add some Wordpress 4.8.0 Blitz.io 1,000 user Virginia to OVH MC-32 BHS load testing benchmarks comparisons using Centmin Mod 123.09beta01's centmin.sh menu option 22 Wordpress auto installer but disabled all the default WP plugins that get installed with it and disabled all WP caching i.e. WP Super Cache, KeyCDN Cache Enabler and Redis Nginx level caching all disabled.

blitzio-table-01.png
 

Monk

New Member
Those numbers look odd - I wonder if they would improve with -march=native on the binary for the older PHP versions.
 

Jonathan

Woohoo
Verified Provider
Interesting results. I sure love that they've been focusing on performance.

I hope developers don't get lazier however, and simply make crappier code now instead of optimizing it...not that many devs actually optimize code anyway.
 

HBAndrei

Active Member
Verified Provider
Is it just me, or do they seem to be spitting out these new versions so much faster than before?
 

Monk

New Member
'not many devs actually optimize code'

Actually, a developer's first goal is writing something that works, and doesn't error out; Write a function now, let GCC/clang optimize it away. Sometimes you'll need to either do inline asm so gcc/clang doesn't do silly things, or profile for performance and make adjustments. Writing good code is fine. Writing fast code is another thing altogether. For example, there's a bunch of different ways to do a memcpy() on x86-64, let gcc do it, or write your own inline assembly function to override compiler specific stuff, which again, MIGHT cause performance problems on x cpus due to things like pipeline queue depth, etc.

For example, here's some test code that does some math functions, similar to the PHP script in the first post. Look at the differences in execution time between processors, and GCC versions/flags:

Code:
Processor (System-on-Chip)             Compiler   Time (-O2)  Time (-Os)  Fastest
AMD Opteron 8350                       gcc-4.8.1    0.704s      0.896s      -O2
AMD FX-6300                            gcc-4.8.1    0.392s      0.340s      -Os
AMD E2-1800                            gcc-4.7.2    0.740s      0.832s      -O2
Intel Xeon E5405                       gcc-4.8.1    0.603s      0.804s      -O2
Intel Xeon E5-2603                     gcc-4.4.7    1.121s      1.122s       -
Intel Core i3-3217U                    gcc-4.6.4    0.709s      0.709s       -
Intel Core i3-3217U                    gcc-4.7.3    0.708s      0.822s      -O2
Intel Core i3-3217U                    gcc-4.8.1    0.708s      0.944s      -O2
Intel Core i7-4770K                    gcc-4.8.1    0.296s      0.288s      -Os
Intel Atom 330                         gcc-4.8.1    2.003s      2.007s      -O2
ARM 1176JZF-S (Broadcom BCM2835)       gcc-4.6.3    3.470s      3.480s      -O2
ARM Cortex-A8 (TI OMAP DM3730)         gcc-4.6.3    2.727s      2.727s       -
ARM Cortex-A9 (TI OMAP 4460)           gcc-4.6.3    1.648s      1.648s       -
ARM Cortex-A9 (Samsung Exynos 4412)    gcc-4.6.3    1.250s      1.250s       -
ARM Cortex-A15 (Samsung Exynos 5250)   gcc-4.7.2    0.700s      0.700s       -
Qualcomm Snapdragon APQ8060A           gcc-4.8       1.53s       1.52s      -Os
 

eva2000

Active Member
Those numbers look odd - I wonder if they would improve with -march=native on the binary for the older PHP versions.
definitely would but harder to do on general available RPM provided versions. Centmin Mod's php-fpm is source compiled and auto detects if intel processor is present for march=native :)
Is it just me, or do they seem to be spitting out these new versions so much faster than before?
believe it's a monthly affair these days for minor branch updates at least :)
For example, here's some test code that does some math functions, similar to the PHP script in the first post. Look at the differences in execution time between processors, and GCC versions/flags:
definite there is some correlation between them and GCC compiler flags and options. That is why Centmin Mod's php-fpm is generally faster as would it's version of Nginx :D
 

Monk

New Member
-march=native tells the compiler to call cpuid() to get a list of the current CPU's features/flags/l1/2l/l3 size and optimize for that specific processor. Indeed, the code isn't portable, and is a huge drawback of RPM based languages like PHP, perl, etc is they are generally passed with -O2/-O0.

Adding on-the-fly patching to work around performance problems will add a large amount of code to the existing PHP base. (Linux did boot time patching for optimized memcpy(?))

Are you saying 'centmin' autodetects the CPU, via walking cpuinfo and applies march=native automatically, or PHP-FPM does?
 

eva2000

Active Member
Are you saying 'centmin' autodetects the CPU, via walking cpuinfo and applies march=native automatically, or PHP-FPM does?
Centmin Mod's nginx and php-fpm source compile routines detect if server is using intel cpu and uses GCC to dynamically apply march=native for supported intel cpus only :) This also allows Centmin Mod's PHP 7 routines support Intel Profile Guided optimisations optionally too. Will be adding the same for AMD Zen compiler routines whenever AMD Zen/Ryzen is a offering in hosting space :)

Centmin Mod also supports GCC native to CentOS + GCC 5.3.1 and GCC 6.2.1 and in future GCC 7 for nginx and php-fpm routines. So to keep up with latest cpu offerings as they come + Nginx with either LibreSSL/OpenSSL user choice https://community.centminmod.com/th...bressl-openssl-support-in-123-09beta01.11122/ :)
 
Last edited:

Monk

New Member
I briefly looked over the code for your CPU detection stuff.. It's actually, kind of bloated in a way. You're also passing -O3 in a few spots with mtune - That just increases the size of the binary in most cases with GCC - CLANG is a lot better with -O3..

If you're compiling code on 'customer' machines, you should just drop the mtune=generic stuff and stick with march overall which would save a lot of overhead on guessing CPU types? It also would eliminate a bunch of backend things.. For example, you could set a global define like this, ie:

$ CFLAGS=`gcc -march=native -E -v - </dev/null 2>&1 | grep cc1 | sed 's/^.*\(-march.*$\).*$/\1/'` | echo $CFLAGS
-march=ivybridge -mmmx -mno-3dnow -msse -msse2 -msse3 -mssse3 -mno-sse4a -mcx16 -msahf -mno-movbe -maes -mno-sha -mpclmul -mpopcnt -mno-abm -mno-lwp -mno-fma -mno-fma4 -mno-xop -mno-bmi -mno-bmi2 -mno-tbm -mavx -mno-avx2 -msse4.2 -msse4.1 -mno-lzcnt -mno-rtm -mno-hle -mrdrnd -mf16c -mfsgsbase -mno-rdseed -mno-prfchw -mno-adx -mfxsr -mxsave -mxsaveopt -mno-avx512f -mno-avx512er -mno-avx512cd -mno-avx512pf -mno-prefetchwt1 --param l1-cache-size=32 --param l1-cache-line-size=64 --param l2-cache-size=15360 -mtune=ivybridge

Also - Why aren't you using mtune=native for AMD processors?

I also spotted this, ie:

set_intelflags() {
if [[ "$INTELOPT" = [yY] ]]; then
if [[ "$(uname -m)" == 'x86_64' && $(grep Intel /proc/cpuinfo) ]]; then
CFLAGS='-O2 -m64 -march=native -pipe -g -mmmx -msse3'
CXXFLAGS='-O2 -m64 -march=native -pipe -g -mmmx -msse3'
export CFLAGS
export CXXFLAGS
elif [[ "$(uname -m)" != 'x86_64' && $(grep Intel /proc/cpuinfo) ]]; then
CFLAGS='-O2 -m32 -march=native -pipe -g -mmmx -msse3'
CXXFLAGS='-O2 -m32 -march=native -pipe -g -mmmx -msse3'
export CFLAGS
export CXXFLAGS
fi
fi
}

This is actually, confusing. You're passing -m64/-m32 in CFLAGS, but autoconf automatically does this check anyways, unless someone wants to run IA32 on AMD64/x86_64 CPUs; You're also forcing mtune=native, and then forcing mmx and msse3 on top of it already, which mtune would already enable.. If you had a 32bit CPU from say, 2000, and you tried to use the second block of code, you would get SIGILL's - Most modern CPUs that are used on servers have MMX/SSE3.

GEN_MTUNEOPT="-m${CCM} -march=native"
# if only 1 cpu thread use -O2 to keep compile times sane
if [[ "$CPUS" = '1' ]]; then
export CFLAGS="-O2 $GEN_MTUNEOPT -pipe"
else
export CFLAGS="-O3 $GEN_MTUNEOPT -pipe"
fi

According to the comment, you're actually reducing the CFLAG based on the number of CPUs to lower compile times; that might seem OK in practice, since gcc will do a little more loop optimizations, and increase instruction counts, etc for functions that -O2 skipped over (branch optimization is extreme in this case)

But in testing, this doesn't do anything at all as far as I can see:

With -O2 on PHP 5.6.30 with standard configure flags:

real 3m7.799s
user 2m05.429s
sys 0m20.391s

With -O3 and a mount -o remount / to blow up the VFS cache (so -pipe et al isn't VFS cached)

real 3m8.007s
user 2m07.017s
sys 0m18.870s

I'm not trying to say this product is a bad idea, I just was curious on what compiler stuff you were doing.

if [[ "$CPUVENDOR" != 'GenuineIntel' ]]; then
CPUCCOPT="--with-cc-opt="-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -m${CCM} -mtune=generic""
else
CPUCCOPT="--with-cc-opt="-O2 -g -pipe -Wall -Wp,-D_FORTIFY_SOURCE=2 -fexceptions -fstack-protector-strong --param=ssp-buffer-size=4 -m${CCM} -mtune=native""
fi

This is another confusing block of code - You're applying mtune=generic to a Intel CPU, but not non-Intel CPUs, which you would think would be the other way around.

Honestly, you should really consider redoing all the backend code for 'GCC/CLANG' so it either uses march=native if you are going to simply compile it on customers' servers. FYI, some instructions on containers are not available to be used and you might even get SIGILL's as well (I've never seen this in practice, only busted glibc versions where AVX was announced by the CPU, but disabled in glibc due to an xsave() bug)
 

eva2000

Active Member
thanks @Monk for that feedback and yes it's a bit messier than I'd like as I have to deal with both CentOS 6 GCC 4.4 and CentOS 7 GCC 4.8. Some of it is legacy or trying to work with legacy code/servers i.e. some intel older xeon cpus didn't like march=native (Illegal instruction errors during compile). Yes the extra detection is to narrow down the specific cpu family.

Also Nginx actually defaults to Clang compile with option to switch to GCC if centmin mod users want to.

As to why Intel only and not AMD, because I had no access to test AMD servers to test what I use unlike Intel servers the easier to access as I try to test every Intel family out there or from feedback from centmin mod users as to what works etc and mainly these days it's all Intel :)

As to reducing optimisations based on cpu count, just trying to follow the logic that low cpu core counts i.e. 1 cpu would usually also mean low memory capacity too and low system specs, so rather than overwhelm a low end vps with full on optimisations which would dramatically increase compile time and memory usage, I reduce them i.e 512MB ram 1 cpu VPS on a 5+ yr old Intel Xeon cpu.

This is another confusing block of code - You're applying mtune=generic to a Intel CPU, but not non-Intel CPUs, which you would think would be the other way around.
actually you misread that routine != GenuineIntel = generic, while = GenuineIntel = mtune=native

While I'd like to just use march=native, but in practice not all Intel cpus like it so have to cater for them all as I have no idea what centmin mod users will use. At one time I did use march=native only but had those Illegal instruction errors during compile on older specific Intel cpus that centmin mod users reported. This means I had to add more bloated logic to figure out specific intel cpu family being used by centmin mod users. Centmin Mod users reported the issue resolved after that :)

You're passing -m64/-m32 in CFLAGS, but autoconf automatically does this check anyways
yes i know but figured having it there (it was legacy code) doesn't hurt anything anyway or does it ? i.e. passing a flag that already defaults to the same value ?

If you had a 32bit CPU from say, 2000, and you tried to use the second block of code, you would get SIGILL's
noted and removed https://github.com/centminmod/centminmod/commit/17245e8d93910bfdd50430389e8876ea456a6b15 :) Luckily, not many 32bit CentOS users out there now - though I have a few centmin mod 128MB VPS with centos 6 32bit heh

Centmin Mod users can also opt out of PHP-FPM intel cpu optimisations when they set GCCINTEL_PHP='n' as opposed to default GCCINTEL_PHP='y' in persistent config file at /etc/centminmod/custom_config.inc which can override centmin mod default settings and persist through centmin mod git backed update routines.
 
Last edited:

eva2000

Active Member
FYI, difference between PHP-FPM 5.6.30 compile with GCCINTEL_PHP='n' vs GCCINTEL_PHP='y' is approximately ~5% faster on 2 cpu Intel E5-1650v3 based OpenVZ with 2GB ram on CentOS 7.3 64bit.

centmin mod also logs sar stats during nginx and php-fpm compiles

for GCCINSTALL_PHP='y'
Code:
00:03:14    kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
00:03:18            0   2097152    100.00         0   1463076         0      0.00    558400    968484      1812
00:03:21            0   2097152    100.00         0   1469252         0      0.00    562832    970252      7456
00:05:01            0   2097152    100.00         0   1462868         0      0.00    678540    907900     22640
00:06:00            0   2097152    100.00         0   1577576         0      0.00    727448    932444     40468
00:07:42            0   2097152    100.00         0   1579632         0      0.00    769516    861356     28548
00:07:46            0   2097152    100.00         0   1578916         0      0.00    762652    866988     19956
00:09:30            0   2097152    100.00         0   1415040         0      0.00    639356    846420      1172
00:10:01            0   2097152    100.00         0   1427020         0      0.00    840764    843548      7392
00:15:01        24396   2072756     98.84         0   1357916         0      0.00    794872    650072      9860
00:17:24        34432   2062720     98.36         0   1402684         0      0.00    774556    666240     67872
00:17:35            0   2097152    100.00         0   1449288         0      0.00    775896    711592    136476
00:17:36            0   2097152    100.00         0   1424600         0      0.00    783476    701044     94180
00:17:48            0   2097152    100.00         0   1447504         0      0.00    785628    721804    113156
00:17:50            0   2097152    100.00         0   1447876         0      0.00    786264    721624     73864
00:17:53            0   2097152    100.00         0   1450548         0      0.00    794768    715792     76124
00:17:54            0   2097152    100.00         0   1450916         0      0.00    795856    715076     76456
00:17:57            0   2097152    100.00         0   1451044         0      0.00    814044    697164     77664
00:17:59            0   2097152    100.00         0   1448304         0      0.00    811860    696628     34868
00:18:00            0   2097152    100.00         0   1453316         0      0.00    816424    697072     39816
00:18:09            0   2097152    100.00         0   1435248         0      0.00    818136    677464     20976
Average:         2941   2094211     99.86         0   1459631         0      0.00    754564    778448     47538
started around 00:08
Code:
00:03:14      runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
00:03:18            0        62      0.24      0.20      0.11         0
00:03:21            0        62      0.24      0.20      0.11         0
00:05:01            3        83      2.54      0.94      0.38         0
00:06:00            0        62      2.48      1.25      0.52         0
00:07:42            0        62      2.87      1.74      0.77         0
00:07:46            0        59      2.87      1.74      0.77         0
00:09:30            0        62      1.32      1.48      0.77         0
00:10:01            2        74      1.59      1.53      0.81         0
00:15:01            2        68      2.42      2.03      1.21         0
00:17:24            0        62      2.04      2.07      1.34         0
00:17:35            0        62      1.88      2.03      1.34         0
00:17:36            0        62      1.88      2.03      1.34         0
00:17:48            0        62      1.68      1.98      1.33         0
00:17:50            0        62      1.68      1.98      1.33         0
00:17:53            0        62      1.79      2.00      1.34         0
00:17:54            0        62      1.79      2.00      1.34         0
00:17:57            0        62      1.79      2.00      1.34         0
00:17:59            0        62      1.73      1.98      1.34         0
00:18:00            0        62      1.73      1.98      1.34         0
00:18:09            0        62      1.61      1.95      1.33         0
Average:            0        64      1.81      1.66      1.01         0
for GCCINSTALL_PHP='n'
Code:
00:20:01    kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
00:20:37       104304   1992848     95.03         0   1257232         0      0.00    653904    676176       820
00:25:01            0   2097152    100.00         0   1371136         0      0.00    827100    686936     22648
00:28:11            0   2097152    100.00         0   1486704         0      0.00    874724    623628    133324
00:28:24            0   2097152    100.00         0   1590404         0      0.00    880456    721968    258112
00:28:25            0   2097152    100.00         0   1531396         0      0.00    886648    678524    178324
00:28:39            0   2097152    100.00         0   1575388         0      0.00    889744    719428    129348
00:28:40            0   2097152    100.00         0   1593380         0      0.00    887436    739808    131684
00:28:43            0   2097152    100.00         0   1595860         0      0.00    892656    737064    133772
00:28:52            0   2097152    100.00         0   1591568         0      0.00    901692    720304     54520
00:28:55            0   2097152    100.00         0   1588824         0      0.00    899108    720156     54600
00:28:56            0   2097152    100.00         0   1593864         0      0.00    903228    721072     59520
00:29:01            0   2097152    100.00         0   1558300         0      0.00    908668    680232     20424
Average:         8692   2088460     99.59         0   1527838         0      0.00    867114    702108     98091
Code:
00:20:01      runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
00:20:37            0        63      0.96      1.44      1.22         0
00:25:01            2        73      2.11      1.83      1.43         0
00:28:11            0        63      1.93      1.91      1.53         0
00:28:24            0        63      1.73      1.86      1.52         0
00:28:25            0        63      1.73      1.86      1.52         0
00:28:39            0        63      1.57      1.82      1.51         0
00:28:40            0        63      1.57      1.82      1.51         0
00:28:43            0        63      1.60      1.82      1.52         0
00:28:52            0        63      1.47      1.79      1.51         0
00:28:55            0        63      1.43      1.78      1.50         0
00:28:56            0        63      1.43      1.78      1.50         0
00:29:01            0        63      1.40      1.77      1.50         0
00:30:01            0        61      0.51      1.44      1.41         0
Average:            0        64      1.50      1.76      1.48         0
 

Monk

New Member
Did you invalidate the VFS cache before you ran tests? A 'mount -o remount /' will usually do that (or whatever filesystem you are compiling on).

yes i know but figured having it there (it was legacy code) doesn't hurt anything anyway or does it ? i.e. passing a flag that already defaults to the same value ?

No, it doesn't hurt it. It's confusing, though. A lot of Linux distros are apparently starting to/suggesting overall unsupport for 32bit (about time). So this means that PAE and other hacks can finally go away. If I were you, I'd completely unsupport 32bit - It's slow, and there's tons of disadvantages for it.
 

eva2000

Active Member
For php compile tests they are complete CentOS OS reloaded reinstalls fresh so not needed.

Yeah 32bit is going the way of the dodo eventually :)
 

Monk

New Member
I don't see anything in the changelog or the code changes that would affect performance, looking at your benchmark output I see a variation of a few percentage points overall between the latest beta builds, which would probably match up with context switching, CPU scheduler latency, or others. Not really an anything to justify saying 'improved performance' without eliminating the aforementioned issues you have to contend with.

Are you running these tests on a VPS container, or a dedicated server?
 

eva2000

Active Member
Last edited:
Top
amuck-landowner