Proper RFC 4122 UUIDs as GUIDs in WordPress

UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally Unique IDentifier), is a string that identifies a piece of information in computer systems. WordPress use GUIDs to identify each individual post, but use URLs (kind of) for GUIDs, and thus does not follow the standard definition (RFC 4122) of a UUID (or GUID).

Advertisement:

A WordPress GUID: https://bjornjohansen.no/?p=1901
A proper RFC 4122 UUID as a URN: urn:uuid:65396530-3934-5930-a563-343736343835

As you can see, the WordPress GUID isn’t even the regular permalink to a post (if you have pretty permalinks enabled, and most people do). But as slugs in permalinks may change, we need a GUID that doesn’t change. They should be immutable to work as an identifier. So it makes sense that WordPress uses that URL. This makes it is easy to mistake the GUIDs in WordPress for being URLs you can use for something else than as an ID. But the GUIDs in WordPress should never be treated as URLs. They simply are not. They are IDs. They just happen to also be URLs in the default WordPress implementation. They are not the URLs we want to expose anywhere, though.

Both URLs and URNs are URIs, but a GUID should be a URN, as it is for ID, not for location. The difference between URI, URL and URN is well explained here.

To avoid any possible confusion around if WordPress GUIDs are URLs, and to make them compatible with the UUID format that the rest of the world uses, we can use the wonderful Plugin API and hook into WordPress to use proper RFC 4122 UUIDs.

About UUIDs and versions (subtypes)

A UUID is 128 bits long, and requires no central registration process.

Adoption of UUIDs and GUIDs is widespread, with many computing platforms providing support for generating them, and for parsing their textual representation.

– WikiPedia (on RFC 4122 UUIDs)

RFC 4122 defines different versions, or subtypes, of UUIDs. Version 4 is the one that is easiest to use, as it is completely based on cryptographically random (or pseudo-random) bits. UUID v3 and v5 specifies how we can use URLs in the URL namespace as basis for UUIDs. The difference between v3 and v5 is that v3 use MD5 whereas v5 use SHA-1. SHA-1 should be used where backwards compatibility with MD5 isn’t necessary.

UUID versions (subtypes) that are interesting to us:
UUID Version 4: Based on random bits. Gives us 2^122 different combinations, which should never be an issue. Really. There are 7.38e26 possible UUIDs for each human being on the planet.
UUID Version 5: Based on a SHA-1 hash generated from the URL namespace UUID and a URL. Not even a 1:2^122 chance of a collision.

Hooking into WordPress

Because of how WordPress saves new posts, the most efficient is to use UUID v4, as they can be included when a new post SQL insert is performed.

The way WordPress by default insert GUIDs is to do an SQL update after the first insert, as the new post ID is required to create the “permalink”. If we want to use UUID v5, based on a “permalink”, there is unfortunately no filter for the GUID update, so we have to hook in a little later, where we check if the GUID is set to the “permalink” and then run yet another SQL update to set the GUID field to a proper UUID v5 string.

However, IMHO, since we are in fact dealing with articles that are assigned unique URLs (permalinks), we shouldn’t have to resort to using random UUIDs (v4). I think I’ll settle on using version 5, but it is up to you to make your own decision.

Using UUID version 4

This is the computationally most efficient, as we filter the UUID into the post field before it is inserted into the database.

To use UUID version 4 for your GUIDs in WordPress, you can add this snippet, e.g. as an mu-plugin:

<?php
add_filter( 'wp_insert_post_data', function ( $data, $postarr ) {
	if ( '' === $data['guid'] ) {
		$data['guid'] = wp_slash( 'urn:uuid:' . wp_generate_uuid4() );
	}
	return $data;
} );

And that’s really everything that’s needed!

(Thanks to Dominik Schilling for pointing out to me that WordPress introduced wp_generate_uuid4() in version 4.7, so you don’t need to bring your own implementation.)

Using UUID version 5

This is not based on (pseudo) randomness, and are truly unique, but requires two additional SQL update queries to be run after the initial insert. It should however not really be an issue in most (any?) cases.

The UUIDs are based on the URLs that WordPress use for GUIDs as default, but follows a standardized format for UUIDs as URNs, and will not be confused as URLs.

To use UUID version 5 for your GUIDs in WordPress, you can add this snippet, e.g. as an mu-plugin:

<?php
add_action( 'save_post', function( $post_ID, $post = null, $update = false ) {

	/*
	 * We’ll only update the GUIDs when inserting new posts.
	 * A GUID should never be changed for an existing post.
	 */
	if ( ! $update ) {
		global $wpdb;

		$where = array(
			'ID' => $post_ID,
		);

		$wpdb->update( $wpdb->posts, array(
			'guid' => 'urn:uuid:' . uuid_v5( get_permalink( $post_ID ) ),
		), $where );
	}
} );

Unlike UUID version 4, you need to bring your own UUID implementation (an uuid_v5() function in the example above).

UUID version 5 implementation

Here’s a ready RFC 4122 compliant implementation for UUID version 5 (name based with SHA-1 hashing). Save it as an mu-plugin, e.g. uuid.php:

<?php
/**
 * RFC 4122 compliant UUIDs.
 *
 * The RFC 4122 specification defines a Uniform Resource Name namespace for
 * UUIDs (Universally Unique IDentifier), also known as GUIDs (Globally
 * Unique IDentifier).  A UUID is 128 bits long, and requires no central
 * registration process.
 *
 * @package UUID
 * @license https://www.gnu.org/licenses/gpl-2.0.txt GPLv2
 * @author bjornjohansen
 */

if ( ! function_exists( 'uuid_v5' ) ) {
	/**
	 * RFC 4122 compliant UUID version 5.
	 *
	 * @param  string $name    The name to generate the UUID from.
	 * @param  string $ns_uuid Namespace UUID. Default is for the NS when name string is a URL.
	 * @return string          The UUID string.
	 */
	function uuid_v5( $name, $ns_uuid = '6ba7b811-9dad-11d1-80b4-00c04fd430c8' ) {

		// Compute the hash of the name space ID concatenated with the name.
		$hash = sha1( $ns_uuid . $name );

		// Intialize the octets with the 16 first octets of the hash, and adjust specific bits later.
		$octets = str_split( substr( $hash, 0, 16 ), 1 );

		/*
		 * Set version to 0101 (UUID version 5).
		 *
		 * Set the four most significant bits (bits 12 through 15) of the
		 * time_hi_and_version field to the appropriate 4-bit version number
		 * from Section 4.1.3.
		 *
		 * That is 0101 for version 5.
		 * time_hi_and_version is octets 6–7
		 */
		$octets[6] = chr( ord( $octets[6] ) & 0x0f | 0x50 );

		/*
		 * Set the UUID variant to the one defined by RFC 4122, according to RFC 4122 section 4.1.1.
		 *
		 * Set the two most significant bits (bits 6 and 7) of the
		 * clock_seq_hi_and_reserved to zero and one, respectively.
		 *
		 * clock_seq_hi_and_reserved is octet 8
		 */
		$octets[8] = chr( ord( $octets[8] ) & 0x3f | 0x80 );

		// Hex encode the octets for string representation.
		$octets = array_map( 'bin2hex', $octets );

		// Return the octets in the format specified by the ABNF in RFC 4122 section 3.
		return vsprintf( '%s%s-%s-%s-%s-%s%s%s', str_split( implode( '', $octets ), 4 ) );
	}
}// End if().

One last word of caution

Please do not think that UUIDs have anything at all to do with security. Do not use it as such.

Do not assume that UUIDs are hard to guess; they should not be used as security capabilities (identifiers whose mere possession grants access), for example.

– RFC 4122


By the way: If you want hassle-free, amazingly fast web hosting, you should check out Servebolt (affiliate link). They’ll even transfer your site for free.


There are 5 comments

Your email address will not be published. Required fields are marked *

This site uses Akismet to reduce spam. Learn how your comment data is processed.