Varnish中英文手册

2010年1月17日 由 月影鹏鹏 留言 »


.设置Backend服务器

backend www {
.host = “www.example.com”;
.port = “http”;
}

The backend object can later be used to select a backend at request time:

if (req.http.host ~ “^example.com$”) {
set req.backend = www;
}

////////////////////////////////////////////////////////////////////////////////////

.多Backend负载均衡

Can Varnish do load balancing?

Yes, Varnish allows backends to be grouped in a director, which directs requests to its members in a pre-defined

fashion. Here is an example of a round robin director:

指定一个Backend组随机机制
director www-director round-robin {
{ .backend = www; }
{ .backend = { .host = “www2.example.com; .port = “http”; } }
}

使用这个组, 则会随机选择组中的backend

sub vcl_recv {

if (req.http.host ~ “^(www.)?example.com$”) {

set req.backend = www-director;
}
}

2)
backend b3 = {
.host = ”b3”;
.port=83;
}

director b2 random {
{ .backend = {
.host = ”b1”;
.port=81;
}
.weight = 7;
}
{ .backend = b3;
.weight = 2;
}
}
///////////////////////////////////////////////////////////////////////

.当一台Backend无法获取数据时,重新请求另一台
Retrying with another backend if one backend reports a non-200 response.
1)
sub vcl_recv {
if (req.restarts == 0) {
set req.backend = b1;
} else {
set req.backend = b2;
}
}

sub vcl_fetch {
if (obj.status != 200) {
restart;
}
}

2)

backend b1 {
.host = “fs.freebsd.dk”;
.port = “82″;
}
backend b2 {
.host = “fs.freebsd.dk”;
.port = “81″;
}
backend b3 {
.host = “fs.freebsd.dk”;
.port = “80″;
}

sub vcl_recv {
if (req.restarts == 0) {
set req.backend = b1;
} else if (req.restarts == 1) {
set req.backend = b2;
} else {
set req.backend = b3;
}
}

sub vcl_fetch {
## If the request to the backend returns a code other than 200, restart the loop
## If the number of restarts reaches the value of the parameter max_restarts,
## the request will be error’ed.  max_restarts defaults to 4.  This prevents
## an eternal loop in the event that, e.g., the object does not exist at all.
if (obj.status != 200 && obj.status != 403 && obj.status != 404) {
restart;
}
}

////////////////////////////////////////////////////////////////////////

.阻止spider
Preventing search engines from populating the cache with old documents

This can be done by checking the user-agent header in the HTTP request.

sub vcl_miss {
if (req.http.user-agent ~ “spider”) {
error 503 “Not presently in cache”;
}
}

////////////////////////////////////////////////////////////////////////
.常用命令

Varnish has a set of command line tools and utilities to monitor and administer Varnish. These are:

* varnishncsa: Displays the varnishd shared memory logs in Apache / NCSA combined log format
* varnishlog: Reads and presents varnishd shared memory logs.
* varnishstat: Displays statistics from a running varnishd instance.
* varnishadm: Sends a command to the running varnishd instance.
* varnishhist: Reads varnishd shared memory logs and presents a continuously updated histogram showing the

distribution of the last N requests by their processing.
* varnishtop: Reads varnishd shared memory logs and presents a continuously updated list of the most commonly

occurring log entries.
* varnishreplay: Parses varnish logs and attempts to reproduce the traffic.

//////////////////////////////////////////////////////////////////////////

.使用pipe还是pass

Should I use pipe or pass in my VCL code? What is the difference?

When varnish does a pass it acts like a normal HTTP proxy. It reads the request and pushes it onto the backend. The

next HTTP request can then be handled like any other.

pipe is only used when Varnish for some reason can’t handle the pass. pipe reads the request, pushes in onty the

backend _only_ pushes bytes back and forth, with no other actions taken.

Since most HTTP clients do pipeline several requests into one connection this might give you an undesirable result -

as every subsequent request will reuse the existing pipe.

Varnish versions prior to 2.0 does not support handling a request body with pass mode, so in those releases pipe is

required for correct handling.

In 2.0 and later, pass will handle the request body correctly.

////////////////////////////////////////////////////////////////////////////

.刷新缓存,或清空缓存

How can I force a refresh on a object cached by varnish?

Refreshing is often called purging a document. You can purge at least 2 different ways in Varnish:

1. From the command line you can write:

url.purge  ^/$

to purge your / document. As you might see url.purge takes an regular expression as its argument. Hence the ^ and $

at the front and end. If the ^ is ommited, all the documents ending in a / in the cache would be deleted.

So to delete all the documents in the cache, write:

url.purge  .*

at the command line.

////////////////////////////////////////////////////////////////////////////

.针对client进制调试

How can I debug the requests of a single client?

The “varnishlog” utility may produce a horrendous amount of output. To be able debug our own traffic can be useful.

The ReqStart? token will include the client IP address. To see log entries matching this, type:

$ varnishlog -c -o ReqStart 192.0.2.123

To see the backend requests generated by a client IP address, we can match on the TxHeader? token, since the IP

address of the client is included in the X-Forwarded-For header in the request sent to the backend.

At the shell command line, type:

$ varnishlog -b -o TxHeader 192.0.2.123

/////////////////////////////////////////////////////////////////////////////

.重写url

How can I rewrite URLS before they are sent to the backend?

You can use the “regsub()” function to do this. Here’s an example for zope, to rewrite URL’s for the

virtualhostmonster:

if (req.http.host ~ “^(www.)?example.com”) {
set req.url = regsub(req.url, “^”, “/VirtualHostBase/http/example.com:80/Sites/example.com/VirtualHostRoot”);
}

//////////////////////////////////////////////////////////////////////////////

.针对域名进行访问

I have a site with many hostnames, how do I keep them from multiplying the cache?

You can do this by normalizing the “Host” header for all your hostnames. Here’s a VCL example:

if (req.http.host ~ “^(www.)?example.com”) {
set req.http.host = “example.com”;
}

///////////////////////////////////////////////////////////////////////////////

.在backend日志记录Client的IP

How can I log the client IP address on the backend?

All I see is the IP address of the varnish server. How can I log the client IP address?

We will need to add the IP address to a header used for the backend request, and configure the backend to log the

content of this header instead of the address of the connecting client (which is the varnish server).

Varnish configuration:

sub vcl_recv {
# Add a unique header containing the client address
remove req.http.X-Forwarded-For;
set    req.http.X-Forwarded-For = client.ip;
# [...]
}

For the apache configuration, we copy the “combined” log format to a new one we call “varnishcombined”, for

instance, and change the client IP field to use the content of the variable we set in the varnish configuration:

LogFormat “%{X-Forwarded-For}i %l %u %t \”%r\” %>s %b \”%{Referer}i\” \”%{User-Agent}i\”" varnishcombined

And so, in our virtualhost, you need to specify this format instead of “combined” (or “common”, or whatever else you

use)

<VirtualHost *:80>
ServerName www.example.com
# [...]
CustomLog /var/log/apache2/www.example.com/access.log varnishcombined
# [...]
</VirtualHost>

The mod_extract_forwarded Apache module might also be useful.

////////////////////////////////////////////////////////////////////////////////////

.添加HTTP头信息

How do I add a HTTP header?

To add a HTTP header, unless you want to add something about the client/request, it is best done in vcl_fetch as

this means it will only be processed every time the object is fetched:

sub vcl_fetch {
# Add a unique header containing the cache servers IP address:
remove obj.http.X-Varnish-IP;
set    obj.http.X-Varnish-IP = server.ip;
# Another header:
set    obj.http.Foo = “bar”;
}

/////////////////////////////////////////////////////////////////////////////////////

.修改前往backend的请求

How do I do to alter the request going to the backend?

You can use the bereq object for altering requests going to the backend but from my experience you can only ’set’

values to it. So, if you need to change the requested URL, this doesn’t work:

sub vcl_miss {
set bereq.url = regsub(bereq.url,”stream/”,”/”);
fetch;
}

Because you cannot read from bereq.url (in the value part of the assignment). You will get:

mgt_run_cc(): failed to load compiled VCL program:
./vcl.1P9zoqAU.o: undefined symbol: VRT_r_bereq_url
VCL compilation failed

Instead, you have to use req.url:

sub vcl_miss {
set bereq.url = regsub(req.url,”stream/”,”/”);
fetch;
}

///////////////////////////////////////////////////////////////////////////////////////

.强制backend发送多样的headers

How do I force the backend to send Vary headers?

We have anectdotal evidence of non-RFC2616 compliant backends, which support content negotiation, but which do not

emit a Vary header, unless the request contains Accept headers.

It may be appropriate to send no-op Accept headers to trick the backend into sending us the Vary header.

The following should be sufficient for most cases:

Accept: */*
Accept-Language: *
Accept-Charset: *
Accept-Encoding: identity

Note that Accept-Encoding can not be set to *, as the backend might then send back a compressed response which the

client would be unable to process.

This can of course be implemented in VCL.

////////////////////////////////////////////////////////////////////////////////////////

.定制error信息

How can I customize the error messages that Varnish returns?

A custom error page can be generated by adding a vcl_error to your configuration file. The default error page looks

like this:

sub vcl_error {
set obj.http.Content-Type = “text/html; charset=utf-8″;

synthetic {”
<?xml version=”1.0″ encoding=”utf-8″?>
<!DOCTYPE html PUBLIC “-//W3C//DTD XHTML 1.0 Strict//EN”
http://www.w3.org/TR/xhtml1/DTD/xhtml1-strict.dtd”>
<html>
<head>
<title>”} obj.status ” ” obj.response {“</title>
</head>
<body>
<h1>Error “} obj.status ” ” obj.response {“</h1>
<p>”} obj.response {“</p>
<h3>Guru Meditation:</h3>
<p>XID: “} req.xid {“</p>
<address><a href=”http://www.varnish-cache.org/”>Varnish</a></address>
</body>
</html>
“};
deliver;
}

///////////////////////////////////////////////////////////////////////////////////////////

.忽略URL中的参数

How do I instruct varnish to ignore the query parameters and only cache one instance of an object?

This can be achieved by removing the query parameters using a regexp:

sub vcl_recv {
set req.url = regsub(req.url, “\?.*”, “”);
}

///////////////////////////////////////////////////////////////////////////////////////////

.Varnish settings

To see further description of these settings, also check param.show -l in the Varnish management interface.

-p thread_pool_max=4000 (default 1000)

This number should be as low as possible, but with an upwards margin. Do not set it much higher than you need,

that only leads to thread pile-ups. The “correct” number is something like the 90% centile number of concurrent

requests when running your peak load. Since that is an incredible tricky number to measure, I suggest you set it 10%

over the highest number of threads you see during normal operation.

-p thread_pools=4 (default 1)

To reduce lock contention, you might want to increase this number a little. But just a little.

-p listen_depth=4096 (default 1024)

You may want to increase this, but there is little advantage to increasing it too high. Set it to your peak

connection/second rate, so that you get a buffer of a full second if the acceptor gets busy. More than that is not

going to do anything good.

============================================

Running with many objects

If you have many objects (more than 100000), you may need to set the following command line options:

-p lru_interval=3600 (default: 2 seconds)

If your cache servers cache most/all objects for a longer time, it makes sense to increase the period before an

object is moved to the LRU list. This reduces the amount of lock operations necessary for LRU list access.

-h classic,500009 (default: 16383)

To keep hash lookups fast, you should not have more than 10 objects per hash bucket. If you have 3 million

objects, number of objects should be at least 300000. The number should be a prime number. You can generate one on

http://www.prime-numbers.org/.

-p obj_workspace=4096 (default: 8192)

For every object, this amount of memory is allocated for HTTP protocol header information. Try to decrease this

setting, it will decrease the need for VM space to fit all your objects. Be aware that Varnish currently crashes if

there is an object is too big for this limit (see #214)

-s malloc,50G

Try running with malloc storage if you experience VM hangs. You do this instead of setting up data files, and

might have to increase the amount of swap space needed. You can set a limit for how much to allocate, which should

be smaller than available swap space on the machine. Possible benefit of not having any swap space on the OS/system

disk.

======================================================

VCL Setting

Enable grace period (varnish serves stale (but cacheable) objects while retriving object from backend)

in vcl_recv:

set req.grace = 30s;

in vcl_fetch:

set obj.grace = 30s;

//////////////////////////////////////////////////////////////////////////////////////////////////

.FreeBSD

* If using FreeBSD 7.0 or newer, try using SCHED_ULE instead of SCHED_4BSD in your kernel config.

* Turn off soft-updates on the filesystems where you keep your Varnish data files. It will not help Varnish.

* sysctl.conf settings (see tuning(7) manpage and http://www.freebsd.org/doc/en/books/handbook/configtuning-

kernel-limits.html):

kern.ipc.nmbclusters=65536 kern.ipc.somaxconn=16384 kern.maxfiles=131072 kern.maxfilesperproc=104856

kern.threads.max_threads_per_proc=4096

* loader.conf settings:

kern.ipc.maxsockets=”131072″ kern.ipc.maxpipekva=”104857600″ (only if you get the “kern.ipc.maxpipekva exceeded”

messages in your logs, varnish does not use pipes for worker pool synchronization any more)

* If you run 32-bit FreeBSD, you will need to change set kern.maxdsiz (maximum data size per process in number

of bytes) in loader.conf to a larger number if you want to cache more than 512 MB (the default setting) of objects.

* If you use the malloc storage type, and your system hangs with “swap zone exhausted, increase kern.maxswzone”

on the console, try increasing kern.maxswzone (default is 32 MB in FreeBSD 7.0) in loader.conf.

///////////////////////////////////////////////////////////////////////////////////////////////////

.Linux

Edit /etc/sysctl.conf

These are numbers from a highly loaded varnishe serving about 4000-8000 req/s

(details: http://projects.linpro.no/pipermail/varnish-misc/2008-April/001769.html)

net.ipv4.ip_local_port_range = 1024 65536
net.core.rmem_max=16777216
net.core.wmem_max=16777216
net.ipv4.tcp_rmem=4096 87380 16777216
net.ipv4.tcp_wmem=4096 65536 16777216
net.ipv4.tcp_fin_timeout = 3
net.ipv4.tcp_tw_recycle = 1
net.core.netdev_max_backlog = 30000
net.ipv4.tcp_no_metrics_save=1
net.core.somaxconn = 262144
net.ipv4.tcp_syncookies = 0
net.ipv4.tcp_max_orphans = 262144
net.ipv4.tcp_max_syn_backlog = 262144
net.ipv4.tcp_synack_retries = 2
net.ipv4.tcp_syn_retries = 2

///////////////////////////////////////////////////////////////////////////////////////////////////

.All UNIX platforms

* Set the mount option noatime and nodiratime on the filesystems where you keep your Varnish data files. There

is no point in keeping track of how often they are accessed, it will waste cycles/give unneccessary disk activity.

//////////////////////////////////////////////////////////////////////////////////////////////////

.缓存带cookie的页面

Caching, even when cookies are present

Please note that this might quite easily end up serving content meant for one user to another, with all the chaos

which can follow.

By default (and design) Varnish does not cache content with cookies in it. This is the recommended behaviour, so

please only use the following receipe if you are sure you want to cache even with cookies and changing the web

application is not possible.
Adding the cookie to the hash

This causes a lookup for a given object to include the Cookie. This will give you a per-user cache, so fairly low

cache hit ratio and requires your system to not change the cookie on each page hit.

sub vcl_hash {
set req.hash += req.http.cookie;
}

////////////////////////////////////////////////////////////////////////////////////////

.清除jpg等的cookie

Removing Set-Cookie from the backend (for a particular path)

In this case, we remove both the Cookie header and the Set-Cookie header for objects under a predefined path. This

is quite common for images and similar static content.

sub vcl_recv {
if (req.url ~ “^/images”) {
unset req.http.cookie;
}
}

sub vcl_fetch {
if (req.url ~ “^/images”) {
unset obj.http.set-cookie;
}
}

Caching based on file extensions

Here we throw away the cookie the client supplied by forcing a lookup. The default VCL code is _not_ run after

vcl_recv.

sub vcl_recv {
if (req.url ~ “\.(png|gif|jpg|swf|css|js)$”) {
lookup;
}
}

# strip the cookie before the image is inserted into cache.
sub vcl_fetch
if (req.url ~ “\.(png|gif|jpg|swf|css|js)$”) {
unset obj.http.set-cookie;
}

/////////////////////////////////////////////////////////////////////////////////////////

.在客户端长久缓存

How to cache things longer on Varnish than on the client

RFC2616 spends quite a lot of time explaining what the expiration rules are for normal client-side caches.

The explanation is not the best I have seen, and Varnish is not a client side cache anyway, so this is my attempt to

set the record straight, or at least firmly crooked on the subject in a Varnish context.
At the sound of the tone…

In an ideal world, all computers would have clocks that show the correct time.

If they did, the Expires: header could be used to say when a given web-object should be thrown away, and my

explanation would be done now.

Getting computer clocks in sync is a lot harder than it sounds and despite the valiant efforts of Prof. Dave Mills

and his NTP gang, this is far from the situation on the internet.

The main obstacle is that people just does not care enough to do it, and the secondary obstacles are complicated

rules for timekeeping, which involves not only time zones and daylight savings time but also leap seconds.

Fortunately, once upon a time it was predicted that some basic web-clients would not have a clock at all, and

therefore the RFC2616 standard offers a way to control lifetime in relative terms (“throw away after 600 seconds”)

instead of absolute terms (“throw away at 10:35:00 20-01-2008 UTC”).

RFC2616 specifies an algorithm in section 13.2.4 which combines the absolute information and the relative

information, and then picks the earlier of the two resulting deadlines.
The Varnish complication

Varnish does not fit the model in RFC2616 for the simple reason that varnish is not a client side cache, but a part

of the web-server.

Where a client cache must be defensive about everything, to not get in the way or change semantics for the

client/server relationship, varnish is the server in the relationship and may be responsible for implementing

content policies etc.

At the most basic level, how long varnish and the client can cache a given object may differ.

A website may very likely want varnish to cache an object forever, trusting the backend server to explicitly purge

it, should it be updated.

But that does not mean that we want the clients to cache the object forever.

Because the backends purge requests can not reach the clients, it is necessary to have the client check back after a

reasonable amount of time, to see if the object has changed.
How it works

Varnish acts like a RFC2616 client side cache by default, with the footnote, that if no cacheability information is

available, we use a default Time To Live (TTL) from the paramter “default_ttl”.

This means that Varnish will respect the s-maxage or max-age Cache-Control fields and will respect Expires headers.

Varnish leaves Expires: and Cache-Control: headers intact, and sets the Age: header with the number of seconds the

object have been cached and therefore, any RFC2616 client will do the right thing by default.
How it should work

It is very likely that you want to have Varnish cache objects longer than the clients do, and this is where RFC2616

comes up short: it offers no way to communicate the two different lifetimes from the backend.

The solution is to have the backend emit the objects with the desired headers for client use, and then set the

obj.ttl in the VCL code to the longer duration.

But this is not quite enough to get the desired effect.

The Expires header from the backend must be removed, it would pertain only to the direct client onnection case, and

it could in theory be replaced, by varnish, with a new header.

According to RFC2616, just issuing a max-age to the client should be just as precise as generating an Expires

header, and it has the advantage of not expecting the clients clock to be correct, so unless informs me otherwise,

my recommendation is to not bother with Expires.

Besides, we do not have a convenient way to generate this timestamp in Varnish presently.

The Age: header generated by Varnish must also be neutered, otherwise it would grow well beyond the max-age sent to

the client.

A solution in VCL could look like this:

sub vcl_fetch {

if (obj.cacheable) {
/* Remove Expires from backend, it’s not long enough */
unset obj.http.expires;

/* Set the clients TTL on this object */
set obj.http.cache-control = “max-age = 900″;

/* Set how long Varnish will keep it */
set obj.ttl = 1w;

/* marker for vcl_deliver to reset Age: */
set obj.http.magicmarker = 1;
}
}

sub vcl_deliver {
if (resp.http.magicmarker) {
/* Remove the magic marker */
unset resp.http.magicmarker;

/* By definition we have a fresh object */
set resp.http.age = “0″;
}
}

/////////////////////////////////////////////////////////////////////////////////////////////

.Removing all, but not all cookies

In some cases, you might want to remove only a few, selected cookies, for example if you use Google Analytics.

Currently, this has to be done using regular expressions as follows:

sub vcl_recv {
# Is it the first one?
set req.http.cookie = regsub(req.http.cookie, “foo=[^;]+(; )?”, “”);

# Or perhaps one in the middle or the last one?
set req.http.cookie = regsub(req.http.cookie, “(; )?foo=[^;]+”, “”);
if (req.http.cookie ~ “^ *$”) {
remove req.http.cookie;
}
}

Replace foo with the name of the cookie you wish to remove.

//////////////////////////////////////////////////////////////////////////////////////////////

.Enable force-refresh from clients 当客户端强制刷新浏览器,比如按Ctrl+F5 无缓存请求时

When receiving a “force-refresh” request from a client, this configuration will fetch the requested element from the

backend, update the cache and deliver it to the client.

sub vcl_recv {
# Force lookup if the request is a no-cache request from the client
if (req.http.Cache-Control ~ “no-cache”) {
purge_url(req.url);
}
}

////////////////////////////////////////////////////////////////////////////////////////////////

.重定向URL

If you, for some reason, don’t want to redirect on the backend, but prefer to do it in VCL, you can do it using one

of the following receipes:
Redirect if the user agent matches a regex

sub vcl_recv {
if (req.http.user-agent ~ “iP(hone|od)”) {
error 750 “Moved Temporarily”;
}
}

sub vcl_error {
if (obj.status == 750) {
set obj.http.Location = “http://www.example.com/iphoneversion/”;
set obj.status = 302;
deliver;
}
}

Redirect if the user agent matches a regex (multiple sites)

sub vcl_recv {
if (req.http.user-agent ~ “lwp”) {
if (req.http.host ~ “example.com”) {
error 750 “example.com”;
} else {
error 750 “localhost”;
}
}
}

sub vcl_error {
if (obj.status == 750) {
if (obj.response ~ “example.com”) {
set obj.http.Location = “http://www.example.com/customversion”;
} elsif (obj.response ~ “localhost”) {
set obj.http.Location = “http://localhost/customversion”;
}
set obj.status = 302;
deliver;
}

///////////////////////////////////////////////////////////////////////////////

.忽略cookie

Ignore cache headers from the backend

Some backends send headers that tell varnish not to cache elements. Header examples are:

Cache-Control: no-cache, no-store, max-age=0, must-revalidate
Pragma: no-cache

To override these headers and still put the element into cache for 2 minutes, the following configuration may be

used:

sub vcl_fetch {
if(obj.ttl < 120s){
set obj.ttl = 120s;
}
}

广告位

留言