build_url_regexp

Before we start writing a page-mode, we need to cover an important utility function that gets much use in page-modes called build_url_regexp. Part of writing a page-mode is constructing a test against which an url can be matched so that Conkeror knows when to turn the page-mode on. The test can actually be either a function or a regular expression, but regular expressions are the more commonly used.

Regular expressions to match urls can be complex, so to help, we use the utility function build_url_regexp to create them. It takes several keyword arguments that specify parts of an url, and produces a regexp accordingly.

$domain

A regexp or a literal string to match the domain name, not including the top-level domain (.com, .net, .org, etc), and not including www., unless the www. is required.

$allow_www

Boolean where true means that the domain name may optionally have the subdomain www.. Default is false.

$tlds

A list of allowed top-level domains. The default is the list ["com"].

$path

A regexp or a literal string to match against the path portion of the url, excluding the initial /. The default matches any path.