Using varnish as a http cache

wlanboy

Content Contributer
This is not a planned tutorial but a start of a discussion how low end vps can serve as web frontends.

Many people like to use event-driven webservers like lighttpd or ngix other do prefer process-based webservers like apache. Both do have their advantages but only the first do have the image of low resource consuming webservers.

This post is based on this discussion. Thank you vanarp for pointing me to this topic.

First of all: What is varnish?

Answer: There is a good video about it.

If you look to e.g. a blog most of the time the content delivered to the visitor does not change. But everytime ruby or php is calling the database and glues the pieces together. Varnish is caching the html response to decrease the load of the server.

So the basic scenario would be that we buy a second vps to run varnish on it. I know if you have enough RAM varnish can run on the same server but I want to point to something with this scenario.

Because in these days we do have two advantages - if you are using the right provider:

  • Offloaded MySQL servers
  • local unmetered network
So a second vps can run varnish which is calling the first vps through local network:


visitor -> varnish vps -> application vps -> offloaded MySQL server

Second and third communication through local LAN.

Looks like you do not need a 512 MB box for a well known blog.

Back to varnish:

Installation is easy:


curl http://repo.varnish-cache.org/debian/GPG-key.txt | apt-key add -
echo "deb http://repo.varnish-cache.org/ubuntu/ lucid varnish-3.0" >> /etc/apt/sources.list
apt-get update
apt-get install varnish

The configuration is splitted between two files:

  • /etc/default/varnish
  • /etc/varnish/default.vcl
Basic configuration is handled in /etc/default/varnish:


DAEMON_OPTS="-a :80 \
-T localhost:6082 \
-f /etc/varnish/default.vcl \
-S /etc/varnish/secret \
-s file,/var/lib/varnish/$INSTANCE/varnish_storage.bin,1G"

# -a ${VARNISH_LISTEN_ADDRESS}:${VARNISH_LISTEN_PORT} \
# -T ${VARNISH_ADMIN_LISTEN_ADDRESS}:${VARNISH_ADMIN_LISTEN_PORT} \
# -f ${VARNISH_VCL_CONF} \
# -S ${VARNISH_SECRET_FILE} \
# -s ${VARNISH_STORAGE}"

If you want to use the RAM for the cache storage alter the last line to:


-s malloc,100M"
Using:
K, k The size is expressed in kilobytes
M, m The size is expressed in megabytes
G, g The size is expressed in gigabytes

Next file is /etc/varnish/default.vcl which is used to define how the cache should work.

This is a configuration usable for wordpress:


backend default {
.host = "192.168.10.10";
.port = "80";
.max_connections = 30;
.connect_timeout = 4.0s;
.first_byte_timeout = 600s;
.between_bytes_timeout = 600s;
}

sub vcl_recv {
#Add forwarded header
if (req.restarts == 0) {
if (req.http.x-forwarded-for) {
set req.http.X-Forwarded-For =
req.http.X-Forwarded-For + ", " + client.ip;
} else {
set req.http.X-Forwarded-For = client.ip;
}
}
#Remove cookies for logged in users
if (!(req.url ~ "wp-(login|admin)") &&
!(req.url ~ "&preview=true" ) ) {
unset req.http.cookie;
}
if (req.http.Authorization || req.http.Cookie) {
return (pass);
}
#Fix encodings
if ( req.http.Accept-Encoding ) {
if ( req.http.Accept-Encoding ~ "gzip" ) {
# If the browser supports it, we'll use gzip.
set req.http.Accept-Encoding = "gzip";
}
else if ( req.http.Accept-Encoding ~ "deflate" ) {
# Next, try deflate if it is supported.
set req.http.Accept-Encoding = "deflate";
}
else {
# Unknown algorithm. Remove it and send unencoded.
unset req.http.Accept-Encoding;
}
}
}

sub vcl_fetch {
if (req.url ~ "wp-(login|admin)" || req.url ~ "preview=true" || req.url ~ "xmlrpc.php") {
return (hit_for_pass);
}
#Remove cookies for logged in users
if ( (!(req.url ~ "(wp-(login|admin)|login)")) || (req.request == "GET") ) {
unset beresp.http.set-cookie;
set beresp.ttl = 1h;
}
if (req.url ~ "\.(gif|jpg|jpeg|swf|css|js|flv|mp3|mp4|pdf|ico|png)(\?.*|)$") {
set beresp.ttl = 7d;
}
}

There are three main sections:

  • backend default
    Target to cache
  • sub vcl_recv
    How to change requests
  • sub vcl_fetch
    How to change responses
A good resource for varnish vcls is mattiasgeniar:
 


A set of configuration samples used for Varnish 3.0. This includes templates for:
Wordpress
Drupal (works decently for Drupal 7, depends on your addons obviously)
Joomla (WIP)
Fork CMS
OpenPhoto

And various configuration for:
Server-side URL rewriting
Clean error pages for debugging
Virtual Host implementations
Various header normalizations
Cookie manipulations
301/302 redirects from within Varnish

Set the host to the ip of the vps you want to cache. After altering all settings you can restart varnish.

Last thing to do is to point the domain to the new ip address of the second vps.

If you want to run varnish on the same vps you have to change the port of your webserver. Only one service can listen to the port 80.

Looking to the run-of-the-mill blogs adding one post a day and using disqus for comments varnish might be a great idea.

Ask yourself how often your frontpage changes. It is all about the hit ratio of the cache.
 
Last edited by a moderator:

acd

New Member
I'm a fan of adding ulimit -v 192000 to /etc/default/varnish on pre-physpages (2.6.18 series) kernels. This will cause varnishd to shut down its child and restart it when it starts using too much virtual memory, preventing your other processes from getting an out-of-memory error. Not needed in 2.6.32 which properly handles virtual memory. Note this is only useful with disk-backed caching; a RAM backed cache will have to start from scratch on child-restart
 
Last edited by a moderator:

TheLinuxBug

New Member
Your quite welcome for the idea.  However, as I stated in my small tutorial, it is very important to make sure you know which version of varnish you are working with as the VCL language is a bit different between versions. I see you updated the .deb above, however, vanilla repos do have varnish.  If you just go and apt-get install on Debian 6 or lower you will end up with version 2.1.x and the above code will not all work all the way.  In some ways I have found version 2.1.x to actually be a bit more streamlined and the code is a bit less complicated as well for some things, which is why it is still provided in the vanilla repos. 

Cheers!
 
Last edited by a moderator:

TheLinuxBug

New Member
#Add forwarded header
if (req.restarts == 0) {
if (req.http.x-forwarded-for) {
set req.http.X-Forwarded-For =
req.http.X-Forwarded-For + ", " + client.ip;
} else {
set req.http.X-Forwarded-For = client.ip;
}
}

Also, unless you installed mod_rpaf for Apache, this section will not do anything for you.  I mentioned how to install mod_rpaf in the previous thread. 

Cheers!
 
Last edited by a moderator:

drmike

100% Tier-1 Gogent
My question for low resource boxes is why even use Varnish?  You should be able to accomplish same functionality of static caching within your web server.  My choice these days, Nginx.  Nginx does more than adequately and the configuration is greatly simplified + only one daemon and config files to babysit and debug.

As far as the two VPSes from one provider scenario posed:

WORLD ---> VARNISH VPS ---> REAL VPS ---> OFFLOADED MYSQL

Varnish like Nginx is intended to be high performance.  Offloading to another VPS really puts a dent in performance even if on the same network.  It's doable, but starts to add overhead delay. 

The offloaded MySQL, meh, why do people do this again?  I ran dedicated SQL Servers years ago and had for a number of years.  Problem is the network gigabit speed (even bonded) isn't enough and network overhead inflates page load time in a big way.  It's a bad model unless you have a fancy storage connector with much high speeds and get away from using the typical network stack which isn't exactly high performance.

One of my other gripes with Varnish is the ever changing scripting language.   Things do no go forward and work as they keep changing features and structure and that ends in breakage.    It remains an excellent piece of software and high performance.

Small RAM really isn't capable of storing many object/files/data.  You would do much better to push those elements out to a CDN like MaxCDN and eliminate front end caching like this.  $40/1 TB good for 12 months = $3-4 a month for MaxCDN --- certainly within the low price market.

That's my take.  Thanks @wlanboy for the tutorial and open conversation.
 

Marc M.

Phoenix VPS
Verified Provider
You also need to account for latency on your local network. Sometimes that can be a deal breaker.
 

TheLinuxBug

New Member
@, The difference between using varnish and not are night and day.  Also, I would still suggest using varnish in front of nginx+php-fpm.  The best use for varnish with Apache may actually be on a cPanel server, it can reduce load substantially. 

To give you an example, I  have been working on a medium sized WordPress site and Forum and I had them running on a 2 core (2x2.0Ghz)  2GB Ram SSD vps was constantly at 2.x+ load when busy. With placing Varnish in front of Apache, the load will now average around 0.31.  If you tune Varnish well, you can get away with a lot less resources.  I have a setup that works really well with Varnish as a reverse proxy cache which uses round robin to pick between two back end servers.  The two backend servers run MySQL in a Master-Master replication and I use rsync to sync the data across the two servers.  Both backend servers are now 1 core, 1Gb ram and they run at an average load of under 0.5 under load.  Before using Varnish, this was 2 seperate servers with 2 cores 2Gb ram in hot fail over and the server as mentioned above was usually hovering a load of 2.

I personally wouldn't pay more for a cdn when you can setup a load balanced varnish configuration, unless you are talking a really high end website, in which case you need to rethink your plan anyhow.

Cheers!
 
Last edited by a moderator:

drmike

100% Tier-1 Gogent
If you tune Varnish well, you can get away with a lot less resources.
Same can be said about Nginx :)  You could/can get Nginx up in front of Apache pretty easily and drop overhead/waste.

I have a setup that works really well with Varnish as a reverse proxy cache which uses round robin to pick between two back end servers.  
 

I do the same thing with Nginx :)

All that said, both Nginx and Varnish are capable.  You can front end anything with anything in this mix (Varnish, Nginx, Apache, etc.).  Varnish in particular is aimed at big RAM environments, but it will work in smaller RAM.  Stacking Varnish + web server on small resource server = wasted RAM and higher latency.  To what degree it might seem annoying, but everything counts in mass / at scale.

I personally wouldn't pay more for a cdn when you can setup a load balanced varnish configuration,
While you can balance and cache with Nginx or Varnish, you can't / don't affordably  geo-distribute files for better user experience.  Self built CDN is possible but at cost and complication.   By removing the load from static elements and resource contention  from your servers (offloading it to a CDN) you will see quite a decrease in CPU load.  Something like MaxCDN is idiot simple to get running and cost isn't much.  End users will see a big difference compared to your single homed long hauled traffic distribution.
 

TheLinuxBug

New Member
Stacking Varnish + web server on small resource server = wasted RAM and higher latency.
I have never experienced this, in fact, quite the opposite.  Varnish does not require much memory to operate efficiently, I am not sure in what ways you have been using it.  Also, I would still use Varnish in front of an nginx setup, varnish does way better as a cache and has a lot more options than nginx.  When I started out my project, my intention was to use nginx as a cache, but that was quickly abandoned because it just lacks in easily configurable features.  You can have varnish in as little as 64mb ram running quite well with 1-2 small sites with Apache or Nginx and MySQL(Using memory as cache, however, using hard drive as cache you can work with even less memory).  Apache does require a bit more modification on a lower end server to work well, especially regarding your php setup, as mentioned in the previous post.

I am not sure how you are seeing higher latency either, if anything initial page loads are much much faster in my experience.

Anyhow, not really trying to debate, but more so understand what wasn't working for you with varnish, as I haven't seemed to have had some of the experiences you are describing.

Cheers!
 
Last edited by a moderator:

drmike

100% Tier-1 Gogent
Varnish does not require much memory to operate efficiently
 

Varnish works well when you get your config in order.  Now running Varnish on a 64MB box is kind of silly.  Can you do it?  Sure, but why not cache on the bottom layers and get the caching right on Nginx as your web server (if you can)?  The daemon itself has some RAM + CPU overhead, no matter how minor it may seem.

I've used Varnish (like others I know) on rather large and often dedicated Varnish-only servers.  Sometimes Varnish + Nginx + PHP on the same server (8GB and larger servers).

my intention was to use nginx as a cache, but that was quickly abandoned because it just lacks in easily configurable features.
No doubt, Varnish has a very large caching feature set and script ability.  The scripting is so powerful that simple setups often take eons to get perfected and lots of Varnish users are running things far from optimally. Nginx does plenty, but obviously less on caching since it isn't intended solely as a caching layer. 

I was dual running Varnish + Nginx but did away with Varnish since Nginx functionality was straightforward and more than adequate.  One less piece of clunky software to maintain and deal with when something failed.  I do mostly reverse proxying of real servers/app servers ---> Varnish ---> Nginx --> other web software or app servers.  That now looks like Nginx --> reverse proxy to web or app server.

I am not sure how you are seeing higher latency either, if anything initial page loads are much much faster in my experience.
Varnish + Nginx will introduce higher times per page dynamically generated.  It will be a faster combo where the content is cachable and such is already stored in Varnish.  If Varnish can't or hasn't cached such, then you have the overhead of Varnish doing it's internal determination for the content plus the communication to Nginx to fulfill the request, plus the return trip(s) with the data.  Depending on your site use, total pages, size, etc. that could be minor or major.  Regardless, running requests through multiple front end servers increases load times, no matter how slightly.   Aggregate that increase to 1000 users or more in short period of time and quickly you start looking for ways to reduce.

I must say, I am interested any time someone decides to dual stack Varnish + Nginx on small RAM machines.  Always good to hear other folks experiences.   Software can often be personal situation and suited or ill suited.  I appreciate the comments :)
 

peterw

New Member
Lot's of input and lot's of opinions. I want to ask two questions and hoping for short answers.

1. Is offloaded mysql always bad? This board is using that too, or? Should mysql always be local even if the vps lacks of cpu power and IO? 2. What is the best choice to run a cache? Buying a vps to run the cache or expand the existing vps to run the cache local? I think first choice is cheaper and I don't have to change any settings.

I don't want to talk about 4GB RAM servers because I only run some private sites. I have two 128MB vps and a mysql account from one provider and want to know how I should use them to run three blogs and some static sites.
 

TheLinuxBug

New Member
@peterw, it is all dependent on the amount of latency you are willing to live with, which was one of @'s points.  Yes you can use offloaded mysql, but you will also incur a slight delay in load times based on the latency of the mysql server. For most smaller sites this really isn't an issues. 

As per Varnish, I prefer to use it on an external server (however I have multiple backends in round robin, this may not be as important on a single homed server), HOWEVER, one thing you must consider is that if used on an external server, that server will be using 2x bandwidth as it has  to download the site from the backend and then serve it.  If you have varnish locally, you avoid having the extra bandwidth overhead because everything is done locally.  Using it on an external server will let you not have the backend server take down your front end cache if the server fails for any reason, and one of the interesting parts of varnish, is much like CloudFlare it can server you site from cache for a set amount of time if you want (this is set in the config file). 

If you are just looking for the boost from Varnish and you are not truly concerned concerned with the redundancy, especially when using cPanel, then running Varnish locally on the same server as Apache/Nginx will be just fine.  In fact for cPanel, if your lazy, you can also get the Unixy Varnish plugin for I think $6/month and it is a really easy install and does most of the configurations for you. 

TL;DR:

So as to make this as confusing as possible, it really depends on what you want from it, running Varnish locally will be fine just to boost page loads from cache.  If you are interested in building a front end that can pull from 2 redundant backend servers and/or still allow serving of the cached site upon crash of the main server, then running Varnish on a remote server (~256-384Mb RAM)  could be useful. For small sites, offloading MySQL shouldn't be an issue, for larger sites, having a correctly optimized internal server may prove to provide faster loading times.  My suggestion is you figure out what you want from it and then plan it out, running it locally or on a separate server can have different advantages/disadvantages so it will need to be based on what you want out of it.

I hope this is not too confusing and is somewhat helpful.  If you have more questions or something is unclear please feel free to ask more questions.

Cheers!
 

peterw

New Member
You can use a module: https://www.varnish-cache.org/vmod/memcached

Code:
This VMOD provides a general purpose memcached client module for Varnish using libmemcached to access memcached servers. 
It implements the basic memcached operations of get, set, incr, decr. More features are sure to be added as we go along. 
Developed by Aaron Stone. See the readme on github for examples and details.
 

WebSearchingPro

VPS Peddler
Verified Provider
After experimenting with Varnish and plain Apache configurations, I have come to the conclusion that regardless of the application, using varnish has a far more pros than cons, regarding latency vs CPU/Ram usage. 

1000 Concurrent connections to a vanilla Apache server renders it non operational cpu load of ~40-70, somewhat of a Layer 7 DoS

1000 Concurrent connections to that same Apache server with varnish in-front, doesn't even get past a load of .5.

So in a sense this could be useful for someone who doesn't want to take the time to fool around with Nginx/Apache optimization and converting the .htaccess files over. 
 

acd

New Member
HOWEVER, one thing you must consider is that if used on an external server, that server will be using [up to] 2x bandwidth as it has  to download the site from the backend and then serve it.
ftfy. Ideally it would be using 1x bw + a bit of change.

I've found for small installs, using nginx proxy_cache or fastcgi_cache is less of a pain to set up than varnish for similar performance, though vcl is a lot more expressive than nginx's own config language.
 

TheLinuxBug

New Member
@acd, You are correct. I guess I should have been more specific, but it was late when I was replying. What you said is correct though, it should read "up to 2x bandwidth". Thanks for pointing that out.

Cheers!
 

TheLinuxBug

New Member
@member='jcaleb'], on Debian (I am not sure about CentOS) this is controlled in the file /etc/default/varnish 


# # Default TTL used when the backend does not specify one
# VARNISH_TTL=120


Varnish acts like a RFC2616 client side cache by default, with the footnote, that if no cacheability information is available, we use a default Time To Live (TTL) from the paramter "default_ttl".

This means that Varnish will respect the s-maxage or max-age Cache-Control fields and will respect Expires headers.

Varnish leaves Expires: and Cache-Control: headers intact, and sets the Age: header with the number of seconds the object have been cached and therefore, any RFC2616 client will do the right thing by default.
Edit: On a side note, in case you didn't know, doing a ctrl-Refresh in your client will actually force it to refresh the content from the server anyhow.  It automatically will expire the cache on said page when you do that and pull down the new information.
 
Last edited by a moderator:

wlanboy

Content Contributer
After experimenting with Varnish and plain Apache configurations, I have come to the conclusion that regardless of the application, using varnish has a far more pros than cons, regarding latency vs CPU/Ram usage. 

1000 Concurrent connections to a vanilla Apache server renders it non operational cpu load of ~40-70, somewhat of a Layer 7 DoS

1000 Concurrent connections to that same Apache server with varnish in-front, doesn't even get past a load of .5.

So in a sense this could be useful for someone who doesn't want to take the time to fool around with Nginx/Apache optimization and converting the .htaccess files over. 
Second that.

Just log the http output of your webserver and realize how often your server is sending the exact same message.
 
Top