MediaWikiDoc:Title.doc
Techniques: Software: MediaWiki: Developer Documents: Title.doc
Contents
The Wikipedia software's "Title" class represents article titles, which are used for many purposes: as the human-readable text title of the article, in the URL used to access the article, the wikitext link to the article, the key into the article database, and so on. The class in instantiated from one of these forms and can be queried for the others, and for other attributes of the title. This is intended to be an immutable "value" class, so there are no mutator functions.
To get a new instance, call one of the static factory methods WikiTitle::newFromURL(), WikiTitle::newFromDBKey(), or WikiTitle::newFromText(). Once instantiated, the other non-static accessor methods can be used, such as getText(), getDBKey(), getNamespace(), etc.
The prefix rules: a title consists of an optional Interwiki prefix (such as "m:" for meta or "de:" for German), followed by an optional namespace, followed by the remainder of the title. Both Interwiki prefixes and namespace prefixes have the same rules: they contain only letters, digits, space, and underscore, must start with a letter, are case insensitive, and spaces and underscores are interchangeable. Prefixes end with a ":". A prefix is only recognized if it is one of those specifically allowed by the software. For example, "de:name" is a link to the article "name" in the German Wikipedia, because "de" is recognized as one of the allowable interwikis. The title "talk:name" is a link to the article "name" in the "talk" namespace of the current wiki, because "talk" is a recognized namespace. Both may be present, and if so, the interwiki must come first, for example, "m:talk:name". If a title begins with a colon as its first character, no prefixes are scanned for, and the colon is just removed. Note that because of these rules, it is possible to have articles with colons in their names. "E. Coli 0157:H7" is a valid title, as is "2001: A Space Odyssey", because "E. Coli 0157" and "2001" are not valid interwikis or namespaces. Likewise, ":de:name" is a link to the article "de:name"--even though "de" is a valid interwiki, the initial colon stops all prefix matching.
Character mapping rules: Once prefixes have been stripped, the rest of the title processed this way: spaces and underscores are treated as equivalent and each is converted to the other in the appropriate context (underscore in URL and database keys, spaces in plain text). "Extended" characters in the 0x80..0xFF range are allowed in all places, and are valid characters. They are encoded in URLs. Other characters may be ASCII letters, digits, hyphen, comma, period, apostrophe, parentheses, and colon. No other ASCII characters are allowed, and will be deleted if found (they will probably cause a browser to misinterpret the URL). Extended characters are _not_ urlencoded when used as text or database keys.
Character encoding rules: TODO
Canonical forms: the canonical form of a title will always be returned by the object. In this form, the first (and only the first) character of the namespace and title will be uppercased; the rest of the namespace will be lowercased, while the title will be left as is. The text form will use spaces, the URL and DBkey forms will use underscores. Interwiki prefixes are all lowercase. The namespace will use underscores when returned alone; it will use spaces only when attached to the text title.
getArticleID() needs some explanation: for "internal" articles, it should return the "cur_id" field if the article exists, else it returns 0. For all external articles it returns 0. All of the IDs for all instances of Title created during a request are cached, so they can be looked up wuickly while rendering wiki text with lots of internal links.
Edit Log
- 2005-06-13 Transcribed from docs for MediaWiki version 1.4.5