Url changes after image search query string is augmented
|Deletions are marked like this.||Additions are marked like this.|
|Line 44:||Line 44:|
|build_url_regexp($domain = /(.*\.)?google/, $path = /(images)|(search\?tbm=isch)/),||build_url_regexp($domain = /(.*\.)?google/, $path = /images|search\?tbm=isch/),|
1. User Agent
The user-agent is a string of text that browsers use to identify themselves to websites when making requests. The user-agent string typically tells the name and version number of the browser, possibly what rendering engine it has, what OS it is running on, and other information, more or less. Conkeror's default user-agent string looks like this, though the details will vary:
- Mozilla/5.0 (X11; Linux x86_64; rv:15.0) Gecko/20120905 conkeror/1.0pre
The user-agent string is sent as part of the HTTP headers of the request. Websites can then use it for either useful, or horrible, purposes, to serve different versions of documents, [in]appropriate to the browser. This technique is generally known as user-agent sniffing, and is responsible for any number of maladies for those of us who are fond of web browsers outside of the mainstream.
1.1. Changing Your User Agent String (Globally)
There are two main ways to change the user agent string globally. You can set the ua to an arbitrary string, or you can use the Firefox compat mode pref.
You can set the user agent to any string you like, with a utility that Conkeror provides, called set_user_agent. You could set it to an empty string if you don't want websites knowing anything about what browser you're using, or you could look up user agent strings of another browser and use one of them. To use this, you will just put a line like the following in your RC:
set_user_agent("foo bar baz");
1.1.2. Firefox Compat Mode
The Firefox Compat Mode pref is a way to just modify your default user agent string in a way that still shows that you are using Conkeror, but also includes the word "Firefox" so that some sites will be fooled into thinking you are running Firefox. A line like the following in the RC turns this on:
1.2. Changing Your User Agent String (Per-Site)
Conkeror provides the ability for you to vary your user-agent string based on the url you're requesting. This allows you to proudly use Conkeror's user-agent string for most sites, but a firefox one (for example) for those few annoying sites that barf pea soup when they see a ua string that they don't recognize. The module that does this is called user-agent-policy, and it is a convenience api on top of http-request-hook. If you find that you want to do something with your user-agent string that is not within the capabilities of user-agent-policy, then you can probably do it with the lower-level http-request-hook instead.
User-agent-policy is very straightforward. Just put a block like this in your rc:
require("user-agent-policy"); user_agent_policy.define_policy("default", user_agent_firefox(), "images.google.com", build_url_regexp($domain = /(.*\.)?google/, $path = /images|search\?tbm=isch/), "plus.google.com");
Policies are defined with user_agent_policy.define_policy, and you can make as many of them as you want — each must have a different name. The order in which policies are tested is not specified. Each policy uses a single user-agent string, so all urls matched by the patterns in a given policy use that policy's user-agent string. The arguments to define_policy are as follows:
- The name can be any string. It only serves to allow the policy to be referenced for redefinition and update.
The user-agent string to be used if any patterns of this policy match. Since the commonest case is to spoof as Firefox, a convenience utility is provided, user_agent_firefox(), that returns a Firefox-like user-agent string for your current OS and Gecko version. It's not exactly the same as what Firefox would give, in all details, in all cases, but it should be good enough for all practical purposes.
The remaining arguments are patterns that this policy matches. Patterns can either be string literals, or RegExp objects. String literal patterns are matched against the host part of the url only. RegExp objects are matched against the entire url. This distinction was made for the sake of efficiency — your policies are tested for every single http request, and string literals are faster to match than RegExps, so you want to use them as often as possible, and only use RegExps for the tricky cases.
In the example above, there are two string literal patterns and one RegExp pattern. The RegExp pattern is built with another convenience utility provided by conkeror, build_url_regexp. You can use it, or you can write RegExps in literal form — it doesn't matter.