Posts Tagged “cache”

Pre-req reading: Part 1

In this part we will cover setting up a backend. A backend is your application server, whether this be apache / nginx / iis (IIS – Is Inherently Stupid) you are telling varnish where it should sends it’s requests to.

Very basic configuration

1
2
3
4
.backend app1 {
    .host = "127.0.0.1";
    .port = "8080;"
}

For a quick start that’s it really you tell varnish a backend and the port to connect to it on … just make sure you use it in vcl_recv, but you’re not here for simple and quick start are you? lets add the following.

  • timeout settings
  • probe settings

Timeout settings

Your timeout settings deinf how long varnish should wait for a response from your backend

1
2
3
4
5
6
7
.backend app1 {
    .host = "127.0.0.1";
    .port = "8080;"
    .connect_timeout = 0.05s;
    .first_byte_timeout = 2s;
    .between_bytes_timeout = 2s;
}
  • connect_timeout wait 50ms for a tcp connection to take place
  • first_byte_timeout wait 2s for the first byte of data to be sent from the backend
  • between_bytes_timeout wait 2s if there is a pause mid data stream

Timeouts are a basic way of determining if a backend is down / miss behaving if you have multiple backends if timeouts occur then the backend is marked as sick and the other backends will be used.

probe settings – Trust me I’m a doctor

1
2
3
4
5
6
7
8
9
10
11
12
13
14
.backend app1 {
    .host = "127.0.0.1";
    .port = "8080;"
    .connect_timeout = 0.05s;
    .first_byte_timeout = 2s;
    .between_bytes_timeout = 2s;
    .probe = {
    .url = "/status.html";
    .timeout = 0.05s;
    .window = 5;   
    .threshold = 3; #60% of last checks must of been OK for this backend to be healthy
    .interval = 2s; #how often to run the checks
    }
}
  • url the URL to to query this must return a 200 OK response, you could use a php script to return a 500 on say a mySQL outage
  • timeout how long to wait for a 200 OK response from the URL
  • window keep the result of the last 5 probes in memory
  • threshold how many of the window total must be OK for the backend to be “healthy”
  • interval how often to run the probe

And that about wraps up this post.

Tags: , , ,

Comments No Comments »

Part 1, what is varnish?

The varnish cache project is one you really need to get familiar with if you manage any high volume websites, it can mean the difference between a self destructing web app that buckles under it’s own load, and an apparently seamless web app serving 1000′s of concurrent connections per second with relative ease.

How does it work?

Varnish acts as a proxy server, in that when a use sends a GET request varnish will lookup in its internal database for a cached version and if it can not find one it will pass the request to the “back end” or in this case an apache server, varnish will then cache the response for subsequent accesses.

Now you may ask yourself why do you need this? this boils down to what you are trying to achieve with your web application, if your application is heavily reliant on dynamic content and regularly gets some 400 concurrent users for example, lets assume the following:

  1. 400 concurrent unique users
  2. Average page render time is 0.85s

The Math

Based on this if you were to place varnish in front of your application with a 60second ttl (time to live, length of time varnish will hold an object in cache):

  1. Varnish ttl 60 seconds
  2. 400/0.85 = 470.59/second
  3. 28235.29/minute
  4. Factor of reduction to “back end”: x28235.29

So in the example above simply by caching a page for as little as 60 seconds, the requests/minute as reduced from 28235.29 to 1, now even reducing the cache times to 10 seconds in this example would give a x4705.88 reduction.

How is this reduction a good thing, well time on cpu for one, varnish when configured correctly is very very fast, and even with an out of the box configuration it’s still going to be much faster than your dynamic web application.

Summary

So here ends a brief introduction to varnish and why you realy want to start using it, in the following parts we will cover

  • Configuration overview
    • brief overview of each sub section based on the 2.1 syntax
    • Advanced configuration
      • Load balancing
      • Failover handling
      • Raising cache hitrate
      • Pros and cons of each setup
      • Benchmarks

Tags: , , , ,

Comments 3 Comments »

Sounds simple enough, right?

Use a cache to serve pages faster, well yes that is true but people often do not realize the fundamentals of caching and how if not done properly it can lead to a detriment in performance.

The first thing you need to realize that by caching your content is no longer dynamic, … (short pause while we wait for the outrage in the back to die down).

The whole point behind your cache is that it will be used instead of processing all your code, why this is beneficial?

You have to remember that PHP is an interpreted language, meaning it takes the following I/O flow:

Apache -> mod_php -> Script -> Interpreter -> Bytecode -> Execution -> Output Buffer

Now there are two types of caching to consider, the first is completion output caching, this also yields the best performance, the second is opcode caching, this caches the byte code generated by the interpreter thus removing that step from the chain of execution.

With me so far? Ok take a deep breath because here we go …

Output caching

This option often yields the best performance, but at the cost of removing the dynamic element from your web app.
But this can be summed up in a single line: What good is dynamic content if you can serve all of 5% of your audience at a given time?

Another turn of phrase is “The slashdot effect”, there are many options for output caching, and you should ideally provide gziped and plain cache files to your end user, for instance on this blog I use WP Super Cache, and can high recommend it, as new content is posted the relevant caches are regenerated, if you are writing your own WebApp check for the “Accept-Encoding:gzip” header being sent via the users browser.

For end user transparency couple this with some mod_rewrite voodoo

1
2
3
RewriteCond %{HTTP:Accept-Encoding} gzip
RewriteCond %{DOCUMENT_ROOT}/cache/%{HTTP_HOST}/%{REQUEST_FILENAME}.gz -f
RewriteRule ^(.*) "/cache/%{HTTP_HOST}/%{REQUEST_FILENAME}.gz" [L]

1: If gzip is supported
2: and the cache file exists
3: Redirect visitor to compressed cached file

You “chain of execution” is now

Apache -> readfile

To serve non gziped content:

1
2
3
RewriteCond %{HTTP:Accept-Encoding} !gzip
RewriteCond %{DOCUMENT_ROOT}/cache/%{HTTP_HOST}/%{REQUEST_FILENAME} -f
RewriteRule ^(.*) "/cache/%{HTTP_HOST}/%{REQUEST_FILENAME}" [L]

Now to clarify a point you should not be caching images,css,js etc, we’re only covering dynamic content here, and the above are only examples to get you started, you should write rules to exclude certain content specific to your needs.

And before going of at any more of a tangent, here are some figures for you!

ab -c 100 -n 500 -g ./saiweb-nocache-nogzip.bpl http://www.saiweb.co.uk/

  • No caching
  • No Gzip

Server Hostname: www.saiweb.co.uk
Server Port: 80

Document Path: /
Document Length: 109086 bytes

Concurrency Level: 100
Time taken for tests: 123.304 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Total transferred: 54831652 bytes
HTML transferred: 54692607 bytes
Requests per second: 4.06 [#/sec] (mean)
Time per request: 24660.828 [ms] (mean)
Time per request: 246.608 [ms] (mean, across all concurrent requests)
Transfer rate: 434.26 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 57 423 225.5 374 1837
Processing: 2331 20460 16701.2 17232 115192
Waiting: 270 1835 4155.8 576 38549
Total: 2656 20882 16648.1 17692 115421

Percentage of the requests served within a certain time (ms)
50% 17692
66% 20700
75% 24063
80% 25770
90% 35157
95% 53328
98% 82957
99% 101497
100% 115421 (longest request)

As can be seen as the number of requests grew the response time began to increase sharply and the overall performace of the site degrade, bare in mind these benchmarks are being made on my home DSL for the time being.


ab -c 100 -n 500 -g ./saiweb-cached.bpl http://www.saiweb.co.uk/

Server Hostname: www.saiweb.co.uk
Server Port: 80

Document Path: /
Document Length: 109086 bytes

Concurrency Level: 100
Time taken for tests: 79.212 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Total transferred: 54889292 bytes
HTML transferred: 54705058 bytes
Requests per second: 6.31 [#/sec] (mean)
Time per request: 15842.342 [ms] (mean)
Time per request: 158.423 [ms] (mean, across all concurrent requests)
Transfer rate: 676.70 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 56 314 112.5 322 1341
Processing: 2545 14721 5116.7 14296 36677
Waiting: 216 1283 2228.2 351 13776
Total: 2647 15035 5108.9 14624 36897

Percentage of the requests served within a certain time (ms)
50% 14624
66% 16675
75% 18058
80% 19093
90% 21608
95% 23489
98% 27684
99% 29972
100% 36897 (longest request)

A much more consistent line here, however as you can clearly see response times are roughly equal this is due to my DSL connection, so lets run these tests from somewhere with a little more bandwidth say the webserver itself using a loop back connection.


ab -c 100 -n 500 -g ./saiweb-cached.bpl http://www.saiweb.co.uk/

Server Hostname: www.saiweb.co.uk
Server Port: 80

Document Path: /
Document Length: 109086 bytes

Concurrency Level: 100
Time taken for tests: 0.262199 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Total transferred: 54945406 bytes
HTML transferred: 54761172 bytes
Requests per second: 1906.95 [#/sec] (mean)
Time per request: 52.440 [ms] (mean)
Time per request: 0.524 [ms] (mean, across all concurrent requests)
Transfer rate: 204642.27 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 1 2.6 0 9
Processing: 4 45 10.3 49 58
Waiting: 1 38 9.9 41 50
Total: 9 47 9.5 50 64

Percentage of the requests served within a certain time (ms)
50% 50
66% 51
75% 52
80% 52
90% 54
95% 56
98% 59
99% 61
100% 64 (longest request)

In this case the response times rise and then plateau, no after which no further degradation occurs.


ab -c 100 -n 500 -g ./saiweb-nocache.bpl http://www.saiweb.co.uk/

Server Hostname: www.saiweb.co.uk
Server Port: 80

Document Path: /
Document Length: 109086 bytes

Concurrency Level: 100
Time taken for tests: 8.919565 seconds
Complete requests: 500
Failed requests: 0
Write errors: 0
Total transferred: 54680788 bytes
HTML transferred: 54543000 bytes
Requests per second: 56.06 [#/sec] (mean)
Time per request: 1783.913 [ms] (mean)
Time per request: 17.839 [ms] (mean, across all concurrent requests)
Transfer rate: 5986.73 [Kbytes/sec] received

Connection Times (ms)
min mean[+/-sd] median max
Connect: 0 14 30.7 0 85
Processing: 246 1556 714.3 1365 6735
Waiting: 241 1539 707.8 1360 6731
Total: 250 1571 708.0 1368 6735

Percentage of the requests served within a certain time (ms)
50% 1368
66% 1451
75% 1550
80% 1700
90% 2658
95% 3121
98% 3491
99% 3638
100% 6735 (longest request)

Oh dear of dear lets cut to the hard facts shall we?

We’ve gone from serving 1906.95 requests a second to 56.06

  • a 97.1% decrease in performance when removing caching
  • or a 3401.1% increase in performance when implementing caching

We’ve gone from a response time of ~50ms to ~2000ms

  • a 97.5% decrease in performance when removing caching
  • or a 4000% increase in performance when caching is on

Then there is the CPU an memory overheads to consider, in this case a more prolonged test is required to gain the relevant sar data,
now let me tell you that intentionally trying to get a test like this to run over a 10 minute period with the correct caching on is a lot harder than it sounds, the tests infact were completing far too quickly …

The problem I face is to make ab perform a long enough timed duration of results cached, I know for a fact uncached the server will fail under the load, so I have no way at present of grabbing this reliably,

what I can tell you is that this command: ab -c 300 -n 1000000 -g ./saiweb-cached.bpl http://www.saiweb.co.uk/

caused a load average of 2.96, 1.9,0.93 cache, and got as high as 21 before I killed it uncached.

Now I am going to bring this post to an end as it is getting quiet long, I plan to cover the following in a 2nd part.

  1. Opcode caching
  2. CPU & Memory usage, Cached vs. UNcached

Tags: , , ,

Comments No Comments »

Some users apparently do not know how to clear their Internet Explorer cache, so I have taken two minutes to do a screen cast here: http://screencast.com/t/FGDnc2gjcft

Tags: , , , ,

Comments No Comments »