PHP 7.2.0 Alpha 1 Out Now

Discussion in 'Coding, Scripting & Programming' started by eva2000, Jun 11, 2017.

Tags:
  1. eva2000

    eva2000 Active Member

    327
    90
    May 22, 2013
    The next PHP version 7.20 Alpha1 has been released http://php.net/archive/2017.php#id2017-06-08-2.

    Long way to go before stable http://wiki.php.net/todo/php72#timetable but looking good from quick benchmarks I did while adding multi PHP-FPM version support via Remi SCL PHP-FPM Yum repo in my Centmin Mod LEMP installer at https://community.centminmod.com/threads/php-7-2-0-alpha-1.11940/

    Benchmarks

    phpbenchmarks-110617.png

    Wordpress 4.8 Benchmarks

    Add some Wordpress 4.8.0 Blitz.io 1,000 user Virginia to OVH MC-32 BHS load testing benchmarks comparisons using Centmin Mod 123.09beta01's centmin.sh menu option 22 Wordpress auto installer but disabled all the default WP plugins that get installed with it and disabled all WP caching i.e. WP Super Cache, KeyCDN Cache Enabler and Redis Nginx level caching all disabled.

    blitzio-table-01.png
     
  2. Monk

    Monk New Member

    11
    2
    Jun 13, 2017
    Those numbers look odd - I wonder if they would improve with -march=native on the binary for the older PHP versions.
     
  3. Jonathan

    Jonathan Woohoo Administrator Verified Provider

    377
    196
    May 27, 2013
    jonspw
    Interesting results. I sure love that they've been focusing on performance.

    I hope developers don't get lazier however, and simply make crappier code now instead of optimizing it...not that many devs actually optimize code anyway.
     
  4. HBAndrei

    HBAndrei Active Member Verified Provider

    159
    59
    May 1, 2014
    Is it just me, or do they seem to be spitting out these new versions so much faster than before?
     
  5. Monk

    Monk New Member

    11
    2
    Jun 13, 2017
    'not many devs actually optimize code'

    Actually, a developer's first goal is writing something that works, and doesn't error out; Write a function now, let GCC/clang optimize it away. Sometimes you'll need to either do inline asm so gcc/clang doesn't do silly things, or profile for performance and make adjustments. Writing good code is fine. Writing fast code is another thing altogether. For example, there's a bunch of different ways to do a memcpy() on x86-64, let gcc do it, or write your own inline assembly function to override compiler specific stuff, which again, MIGHT cause performance problems on x cpus due to things like pipeline queue depth, etc.

    For example, here's some test code that does some math functions, similar to the PHP script in the first post. Look at the differences in execution time between processors, and GCC versions/flags:

    Code:
    Processor (System-on-Chip)             Compiler   Time (-O2)  Time (-Os)  Fastest
    AMD Opteron 8350                       gcc-4.8.1    0.704s      0.896s      -O2
    AMD FX-6300                            gcc-4.8.1    0.392s      0.340s      -Os
    AMD E2-1800                            gcc-4.7.2    0.740s      0.832s      -O2
    Intel Xeon E5405                       gcc-4.8.1    0.603s      0.804s      -O2
    Intel Xeon E5-2603                     gcc-4.4.7    1.121s      1.122s       -
    Intel Core i3-3217U                    gcc-4.6.4    0.709s      0.709s       -
    Intel Core i3-3217U                    gcc-4.7.3    0.708s      0.822s      -O2
    Intel Core i3-3217U                    gcc-4.8.1    0.708s      0.944s      -O2
    Intel Core i7-4770K                    gcc-4.8.1    0.296s      0.288s      -Os
    Intel Atom 330                         gcc-4.8.1    2.003s      2.007s      -O2
    ARM 1176JZF-S (Broadcom BCM2835)       gcc-4.6.3    3.470s      3.480s      -O2
    ARM Cortex-A8 (TI OMAP DM3730)         gcc-4.6.3    2.727s      2.727s       -
    ARM Cortex-A9 (TI OMAP 4460)           gcc-4.6.3    1.648s      1.648s       -
    ARM Cortex-A9 (Samsung Exynos 4412)    gcc-4.6.3    1.250s      1.250s       -
    ARM Cortex-A15 (Samsung Exynos 5250)   gcc-4.7.2    0.700s      0.700s       -
    Qualcomm Snapdragon APQ8060A           gcc-4.8       1.53s       1.52s      -Os
     
  6. eva2000

    eva2000 Active Member

    327
    90
    May 22, 2013
    definitely would but harder to do on general available RPM provided versions. Centmin Mod's php-fpm is source compiled and auto detects if intel processor is present for march=native :)
    believe it's a monthly affair these days for minor branch updates at least :)
    definite there is some correlation between them and GCC compiler flags and options. That is why Centmin Mod's php-fpm is generally faster as would it's version of Nginx :D
     
  7. Monk

    Monk New Member

    11
    2
    Jun 13, 2017
    -march=native tells the compiler to call cpuid() to get a list of the current CPU's features/flags/l1/2l/l3 size and optimize for that specific processor. Indeed, the code isn't portable, and is a huge drawback of RPM based languages like PHP, perl, etc is they are generally passed with -O2/-O0.

    Adding on-the-fly patching to work around performance problems will add a large amount of code to the existing PHP base. (Linux did boot time patching for optimized memcpy(?))

    Are you saying 'centmin' autodetects the CPU, via walking cpuinfo and applies march=native automatically, or PHP-FPM does?
     
  8. eva2000

    eva2000 Active Member

    327
    90
    May 22, 2013
    Centmin Mod's nginx and php-fpm source compile routines detect if server is using intel cpu and uses GCC to dynamically apply march=native for supported intel cpus only :) This also allows Centmin Mod's PHP 7 routines support Intel Profile Guided optimisations optionally too. Will be adding the same for AMD Zen compiler routines whenever AMD Zen/Ryzen is a offering in hosting space :)

    Centmin Mod also supports GCC native to CentOS + GCC 5.3.1 and GCC 6.2.1 and in future GCC 7 for nginx and php-fpm routines. So to keep up with latest cpu offerings as they come + Nginx with either LibreSSL/OpenSSL user choice https://community.centminmod.com/th...bressl-openssl-support-in-123-09beta01.11122/ :)
     
    Last edited: Jun 14, 2017
  9. Monk

    Monk New Member

    11
    2
    Jun 13, 2017
    I briefly looked over the code for your CPU detection stuff.. It's actually, kind of bloated in a way. You're also passing -O3 in a few spots with mtune - That just increases the size of the binary in most cases with GCC - CLANG is a lot better with -O3..

    If you're compiling code on 'customer' machines, you should just drop the mtune=generic stuff and stick with march overall which would save a lot of overhead on guessing CPU types? It also would eliminate a bunch of backend things.. For example, you could set a global define like this, ie:

    Also - Why aren't you using mtune=native for AMD processors?

    I also spotted this, ie:

    This is actually, confusing. You're passing -m64/-m32 in CFLAGS, but autoconf automatically does this check anyways, unless someone wants to run IA32 on AMD64/x86_64 CPUs; You're also forcing mtune=native, and then forcing mmx and msse3 on top of it already, which mtune would already enable.. If you had a 32bit CPU from say, 2000, and you tried to use the second block of code, you would get SIGILL's - Most modern CPUs that are used on servers have MMX/SSE3.

    According to the comment, you're actually reducing the CFLAG based on the number of CPUs to lower compile times; that might seem OK in practice, since gcc will do a little more loop optimizations, and increase instruction counts, etc for functions that -O2 skipped over (branch optimization is extreme in this case)

    But in testing, this doesn't do anything at all as far as I can see:

    With -O2 on PHP 5.6.30 with standard configure flags:

    With -O3 and a mount -o remount / to blow up the VFS cache (so -pipe et al isn't VFS cached)

    I'm not trying to say this product is a bad idea, I just was curious on what compiler stuff you were doing.

    This is another confusing block of code - You're applying mtune=generic to a Intel CPU, but not non-Intel CPUs, which you would think would be the other way around.

    Honestly, you should really consider redoing all the backend code for 'GCC/CLANG' so it either uses march=native if you are going to simply compile it on customers' servers. FYI, some instructions on containers are not available to be used and you might even get SIGILL's as well (I've never seen this in practice, only busted glibc versions where AVX was announced by the CPU, but disabled in glibc due to an xsave() bug)
     
    eva2000 likes this.
  10. eva2000

    eva2000 Active Member

    327
    90
    May 22, 2013
    thanks @Monk for that feedback and yes it's a bit messier than I'd like as I have to deal with both CentOS 6 GCC 4.4 and CentOS 7 GCC 4.8. Some of it is legacy or trying to work with legacy code/servers i.e. some intel older xeon cpus didn't like march=native (Illegal instruction errors during compile). Yes the extra detection is to narrow down the specific cpu family.

    Also Nginx actually defaults to Clang compile with option to switch to GCC if centmin mod users want to.

    As to why Intel only and not AMD, because I had no access to test AMD servers to test what I use unlike Intel servers the easier to access as I try to test every Intel family out there or from feedback from centmin mod users as to what works etc and mainly these days it's all Intel :)

    As to reducing optimisations based on cpu count, just trying to follow the logic that low cpu core counts i.e. 1 cpu would usually also mean low memory capacity too and low system specs, so rather than overwhelm a low end vps with full on optimisations which would dramatically increase compile time and memory usage, I reduce them i.e 512MB ram 1 cpu VPS on a 5+ yr old Intel Xeon cpu.

    actually you misread that routine != GenuineIntel = generic, while = GenuineIntel = mtune=native

    While I'd like to just use march=native, but in practice not all Intel cpus like it so have to cater for them all as I have no idea what centmin mod users will use. At one time I did use march=native only but had those Illegal instruction errors during compile on older specific Intel cpus that centmin mod users reported. This means I had to add more bloated logic to figure out specific intel cpu family being used by centmin mod users. Centmin Mod users reported the issue resolved after that :)

    yes i know but figured having it there (it was legacy code) doesn't hurt anything anyway or does it ? i.e. passing a flag that already defaults to the same value ?

    noted and removed https://github.com/centminmod/centminmod/commit/17245e8d93910bfdd50430389e8876ea456a6b15 :) Luckily, not many 32bit CentOS users out there now - though I have a few centmin mod 128MB VPS with centos 6 32bit heh

    Centmin Mod users can also opt out of PHP-FPM intel cpu optimisations when they set GCCINTEL_PHP='n' as opposed to default GCCINTEL_PHP='y' in persistent config file at /etc/centminmod/custom_config.inc which can override centmin mod default settings and persist through centmin mod git backed update routines.
     
    Last edited: Jun 15, 2017
  11. eva2000

    eva2000 Active Member

    327
    90
    May 22, 2013
    FYI, difference between PHP-FPM 5.6.30 compile with GCCINTEL_PHP='n' vs GCCINTEL_PHP='y' is approximately ~5% faster on 2 cpu Intel E5-1650v3 based OpenVZ with 2GB ram on CentOS 7.3 64bit.

    centmin mod also logs sar stats during nginx and php-fpm compiles

    for GCCINSTALL_PHP='y'
    Code:
    00:03:14    kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
    00:03:18            0   2097152    100.00         0   1463076         0      0.00    558400    968484      1812
    00:03:21            0   2097152    100.00         0   1469252         0      0.00    562832    970252      7456
    00:05:01            0   2097152    100.00         0   1462868         0      0.00    678540    907900     22640
    00:06:00            0   2097152    100.00         0   1577576         0      0.00    727448    932444     40468
    00:07:42            0   2097152    100.00         0   1579632         0      0.00    769516    861356     28548
    00:07:46            0   2097152    100.00         0   1578916         0      0.00    762652    866988     19956
    00:09:30            0   2097152    100.00         0   1415040         0      0.00    639356    846420      1172
    00:10:01            0   2097152    100.00         0   1427020         0      0.00    840764    843548      7392
    00:15:01        24396   2072756     98.84         0   1357916         0      0.00    794872    650072      9860
    00:17:24        34432   2062720     98.36         0   1402684         0      0.00    774556    666240     67872
    00:17:35            0   2097152    100.00         0   1449288         0      0.00    775896    711592    136476
    00:17:36            0   2097152    100.00         0   1424600         0      0.00    783476    701044     94180
    00:17:48            0   2097152    100.00         0   1447504         0      0.00    785628    721804    113156
    00:17:50            0   2097152    100.00         0   1447876         0      0.00    786264    721624     73864
    00:17:53            0   2097152    100.00         0   1450548         0      0.00    794768    715792     76124
    00:17:54            0   2097152    100.00         0   1450916         0      0.00    795856    715076     76456
    00:17:57            0   2097152    100.00         0   1451044         0      0.00    814044    697164     77664
    00:17:59            0   2097152    100.00         0   1448304         0      0.00    811860    696628     34868
    00:18:00            0   2097152    100.00         0   1453316         0      0.00    816424    697072     39816
    00:18:09            0   2097152    100.00         0   1435248         0      0.00    818136    677464     20976
    Average:         2941   2094211     99.86         0   1459631         0      0.00    754564    778448     47538
    
    started around 00:08
    Code:
    00:03:14      runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
    00:03:18            0        62      0.24      0.20      0.11         0
    00:03:21            0        62      0.24      0.20      0.11         0
    00:05:01            3        83      2.54      0.94      0.38         0
    00:06:00            0        62      2.48      1.25      0.52         0
    00:07:42            0        62      2.87      1.74      0.77         0
    00:07:46            0        59      2.87      1.74      0.77         0
    00:09:30            0        62      1.32      1.48      0.77         0
    00:10:01            2        74      1.59      1.53      0.81         0
    00:15:01            2        68      2.42      2.03      1.21         0
    00:17:24            0        62      2.04      2.07      1.34         0
    00:17:35            0        62      1.88      2.03      1.34         0
    00:17:36            0        62      1.88      2.03      1.34         0
    00:17:48            0        62      1.68      1.98      1.33         0
    00:17:50            0        62      1.68      1.98      1.33         0
    00:17:53            0        62      1.79      2.00      1.34         0
    00:17:54            0        62      1.79      2.00      1.34         0
    00:17:57            0        62      1.79      2.00      1.34         0
    00:17:59            0        62      1.73      1.98      1.34         0
    00:18:00            0        62      1.73      1.98      1.34         0
    00:18:09            0        62      1.61      1.95      1.33         0
    Average:            0        64      1.81      1.66      1.01         0
    
    for GCCINSTALL_PHP='n'
    Code:
    00:20:01    kbmemfree kbmemused  %memused kbbuffers  kbcached  kbcommit   %commit  kbactive   kbinact   kbdirty
    00:20:37       104304   1992848     95.03         0   1257232         0      0.00    653904    676176       820
    00:25:01            0   2097152    100.00         0   1371136         0      0.00    827100    686936     22648
    00:28:11            0   2097152    100.00         0   1486704         0      0.00    874724    623628    133324
    00:28:24            0   2097152    100.00         0   1590404         0      0.00    880456    721968    258112
    00:28:25            0   2097152    100.00         0   1531396         0      0.00    886648    678524    178324
    00:28:39            0   2097152    100.00         0   1575388         0      0.00    889744    719428    129348
    00:28:40            0   2097152    100.00         0   1593380         0      0.00    887436    739808    131684
    00:28:43            0   2097152    100.00         0   1595860         0      0.00    892656    737064    133772
    00:28:52            0   2097152    100.00         0   1591568         0      0.00    901692    720304     54520
    00:28:55            0   2097152    100.00         0   1588824         0      0.00    899108    720156     54600
    00:28:56            0   2097152    100.00         0   1593864         0      0.00    903228    721072     59520
    00:29:01            0   2097152    100.00         0   1558300         0      0.00    908668    680232     20424
    Average:         8692   2088460     99.59         0   1527838         0      0.00    867114    702108     98091
    
    Code:
    00:20:01      runq-sz  plist-sz   ldavg-1   ldavg-5  ldavg-15   blocked
    00:20:37            0        63      0.96      1.44      1.22         0
    00:25:01            2        73      2.11      1.83      1.43         0
    00:28:11            0        63      1.93      1.91      1.53         0
    00:28:24            0        63      1.73      1.86      1.52         0
    00:28:25            0        63      1.73      1.86      1.52         0
    00:28:39            0        63      1.57      1.82      1.51         0
    00:28:40            0        63      1.57      1.82      1.51         0
    00:28:43            0        63      1.60      1.82      1.52         0
    00:28:52            0        63      1.47      1.79      1.51         0
    00:28:55            0        63      1.43      1.78      1.50         0
    00:28:56            0        63      1.43      1.78      1.50         0
    00:29:01            0        63      1.40      1.77      1.50         0
    00:30:01            0        61      0.51      1.44      1.41         0
    Average:            0        64      1.50      1.76      1.48         0
    
     
  12. Monk

    Monk New Member

    11
    2
    Jun 13, 2017
    Did you invalidate the VFS cache before you ran tests? A 'mount -o remount /' will usually do that (or whatever filesystem you are compiling on).

    No, it doesn't hurt it. It's confusing, though. A lot of Linux distros are apparently starting to/suggesting overall unsupport for 32bit (about time). So this means that PAE and other hacks can finally go away. If I were you, I'd completely unsupport 32bit - It's slow, and there's tons of disadvantages for it.
     
  13. eva2000

    eva2000 Active Member

    327
    90
    May 22, 2013
    For php compile tests they are complete CentOS OS reloaded reinstalls fresh so not needed.

    Yeah 32bit is going the way of the dodo eventually :)
     
  14. Monk

    Monk New Member

    11
    2
    Jun 13, 2017
    For a minor speed increase, you could disable exception handling which would usually resort to calling abort() on Linux. You could also use -ffast-math, but that does some really funky thing to code generation (and it actually can cause problems: https://gcc.gnu.org/bugzilla/show_bug.cgi?id=55522)
     
    eva2000 likes this.
  15. eva2000

    eva2000 Active Member

    327
    90
    May 22, 2013
  16. Monk

    Monk New Member

    11
    2
    Jun 13, 2017
    I don't see anything in the changelog or the code changes that would affect performance, looking at your benchmark output I see a variation of a few percentage points overall between the latest beta builds, which would probably match up with context switching, CPU scheduler latency, or others. Not really an anything to justify saying 'improved performance' without eliminating the aforementioned issues you have to contend with.

    Are you running these tests on a VPS container, or a dedicated server?
     
  17. eva2000

    eva2000 Active Member

    327
    90
    May 22, 2013
    Last edited: Jun 26, 2017