GitLab: Time to Leave the Cloud (bare-metal vs shared-environment virtualization)

Discussion in 'The Pub (Off topic discussion)' started by fm7, Nov 13, 2016.

  1. fm7

    fm7 Active Member

    163
    61
    Jul 26, 2014
    What we found is that the cloud was not meant to provide the level of IOPS performance we needed to run an agressive system like CephFS.


    ...


    The problem with CephFS is that in order to work, it needs to have a really performant underlaying infrastructure because it needs to read and write a lot of things really fast. If one of the hosts delays writing to the journal, then the rest of the fleet is waiting for that operation alone, and the whole file system is blocked. When this happens, all of the hosts halt, and you have a locked file system; no one can read or write anything and that basically takes everything down.


    ...


    Recap: What We Learned
     

    1. CephFS gives us more scalability and ostensibly performance but did not work well in the cloud on shared resources, despite tweaking and tuning it to try to make it work.
    2. There is a threshold of performance on the cloud and if you need more, you will have to pay a lot more, be punished with latencies, or leave the cloud.
    3. Moving to dedicated hardware is more economical and reliable for the scale and performance of our application.
    4. Building an observable system by pulling and aggregating performance data into understandable dashboards helps us spot non-obvious trends and correlations, leading to addressing issues faster.
    5. Monitoring some things can be really application specific which is why we are building our own gitlab-monitor Prometheus exporter. We plan to ship this with GitLab CE soon.



    https://about.gitlab.com/2016/11/10/why-choose-bare-metal/
     
    Last edited by a moderator: Nov 13, 2016
    HalfEatenPie likes this.
  2. HalfEatenPie

    HalfEatenPie The Irrational One Retired Staff

    2,890
    1,385
    Mar 25, 2013
    HalfEatenPie
    That's actually interesting.  Especially since right now many people are going TO Cloud over Bare Metal.  Most usually state that the reliability of cloud hardware over bare metal is what usually drives it (and usually the concept that cloud is much easier to scale over bare metal).  
     
    fm7 likes this.
  3. graeme

    graeme Active Member

    146
    40
    Nov 20, 2013
    It is easier to scale at a trivial level (adding resources is easy). It makes the easy problems easier.


    I doubt it makes the hard problem of developing a scalable architecture easier.



    I have personally found that an easier a service is to get started with, (particularly things like Heroku that have very quick set up) the more likely you are to run into things you cannot easily do.
     
    fm7 likes this.