URI, URL, URN

Table of Contents

The Semantic Dōjō - This article is part of a series.

Part 1: This Article

Part 2: RTFM: Understanding W3C’s Semantic Web Standards

You most likely already know what a URL—or web address—is, as most people who have ever used a web browser have had to type or click on one to access a webpage. What you might not know, though, is that a URL is a type of URI, and also that you have already encountered many URNs without realizing it. In the following paragraphs, you will find an explanation of these three concepts, the differences between them, and why understanding them is important.

Resources Need to be Identified
#

In order to retrieve a resource on the internet—whether it is a website, a document, a PDF, an image, a video, or audio—you need to identify it, much like a government needs to identify its citizens. For example, ‘James Smith’ is the most common full name in America. While there may be many people with this name, each individual can be uniquely identified by their Social Security Number (SSN). The SSN ensures that one James Smith can be distinguished from another. If the governments wants instead to send a document to one James Smith, it can use their mailing address, since neither the name, nor the SSN would be enough for locating that particular John Smith.

Now, you are working on a project on Heritage, and you need to read a book titled Methods and Methodologies in Heritage Studies by Rachel King and Trinidad Rico. There are likely many books and articles about Heritage studies, but this one can be uniquely identified by its ISBN (International Standard Book Number). The ISBN for this book is 978-1-80008-381-3, which is a unique number that helps to identify it. Think of it like a person’s Social Security Number—it ensures that there’s no mix-up between this book and any other.

To access a digital copy, you would use a website address, such as https://uclpress.co.uk/book/methods-and-methodologies-in-heritage-studies/. This works like a mailing address—it tells your browser exactly where to go to find the book online.

Now, what if you’re looking for the book’s digital version but the webpage changes over time? This is where the DOI (Digital Object Identifier) comes in. The DOI for this book is doi:10.14324/111.9781800083790. A DOI is a permanent identifier that remains the same even if the book’s online location changes. You can use a DOI to locate a resource, just like a web address does.

Congratulations! You’ve just encountered URNs, URLs, and URIs. But what exactly is the difference between them? If both a DOI and an ISBN serve as unique identifiers, how do they differ? And how does a DOI compare to a URL? To answer these questions, let’s dive into a bit of technical mumbo-jumbo.

URIs
#

At the most basic level, a URI (Uniform Resource Identifier) is a compact sequence of characters used to identify abstract or physical resources on the internet. A URI can reference any entity—whether abstract, such as the moral and aesthetic concept of purity, or concrete, like the Colosseum in Rome. The structure of URIs is defined in RFC3986, a normative specification published by the Internet Engineering Task Force (IETF) in January 2005.

Generally, URIs follow a common pattern that includes five components: scheme, authority, path, query, and fragment. These elements are hierarchical, meaning the order in which they appear matters.

 foo://example.com:8042/over/there?name=ferret#nose
 \_/   \______________/\_________/ \_________/ \__/
  |           |            |            |        |
scheme     authority       path        query   fragment
  |   _____________________|__
 / \ /                        \
 urn:example:animal:ferret:nose

Breaking Down the Components
#

Scheme: every URI starts with a scheme, followed by a colon (:). The scheme specifies the protocol or method for interpreting the rest of the URI. For example, the mailto scheme, as in mailto:user@example.com, instructs the computer to open the default email service and pre-fill the “To” field with the specified email address (trust me, it does). The URI doi:10.14324/111.9781800083790 has the doi: scheme, but it is a bare or unresolved URI. This means that the URI alone doesn’t provide a direct link to the resource, it just serves as an identifier. To access the resource, you need a resolver — a service or system that translates the URI into a working URL. In the case of a DOI, the DOI resolver is an online service that looks up the DOI and redirects you to the full URL of the resource. For example, entering doi:10.14324/111.9781800083790 in the DOI resolver will direct you to one of the resource’s locations.

Many identifiers are designed to be location-independent, meaning they remain valid even if the resource moves. A resolver is only needed when you want to resolve (i.e., map) the identifier to its current location or a URL. However, while some identifiers (like DOIs) are designed to be resolvable, others (like ISSNs) were not originally intended to be resolvable but can be mapped to a resource using external systems. ISSN resolvers do exist, but the ISSN is not natively resolvable by design.

Authority: preceded by two forward slashes (//), this part identifies the server or endpoint where the resource is located. It typically includes the domain name (e.g., example.com) and other optional elements. In the resolved URI https://doi.org/10.14324/111.9781800083790 the authority is doi.org, which is the server responsible for resolving the DOI to its full URL.
Path: specifies the location of the resource on the server. In the //uclpress.co.uk/ web address mention above (which is the authority), book/methods-and-methodologies-in-heritage-studies/ is the path that leads to the resource.
Query and Fragment: The query, which follows a question mark (?), allows for additional parameters or data to be passed to the resource. The fragment comes after a hash (#) and refers to a specific section within a resource, often used for bookmarks or references to a specific part of a webpage or document. The URL https://en.wikipedia.org/wiki/Semantic_Web#History will take you to the history section of the Wikipedia page on the Semantic Web.

URIs vs URLs vs URNs
#

URI is thus a broad term that refers to any string of characters that uniquely identifies a resource, whether or not it provides the means to locate it. Essentially, a URI serves as a unique identifier for a resource on the internet or in a broader context. You might ask what are the benefits of defining something with a URI: a URI provides a globally unique way to identify a resource, preventing ambiguity. With URIs, resources can be unambiguously related, persistently identified, and easily accessed by machines. However, while all URLs and URNs are URIs, not all URIs are necessarily URLs or URNs.

URLs (Uniform Resource Locator) are a specific type of URI that not only identifies a resource but also specifies how to locate and retrieve it. Each URL is meant to point to a single resource, though the same resource can have multiple URLs. A URL typically includes a protocol (such as http, https, ftp, etc.) and a domain (like www.example.com) to indicate where the resource is located, along with an optional path, query parameters, and fragment identifiers that specify exactly how to access or interact with the resource. For example in https://uclpress.co.uk/book/methods-and-methodologies-in-heritage-studies/ the scheme is https:, which tells us that the resource should be accessed via the secure HTTP protocol. The authority is //uclpress.co.uk, the domain that hosts the website for UCL Press, where the book is available. The book is available at the path /book/methods-and-methodologies-in-heritage-studies/ which specifies the exact location of the book webpage on the UCL Press website.

The prominent characteristic of URLs is that they are typically tied to the current location of a resource and can change if the resource is moved to a different website or publisher platform. While a URL provides the present location of the book on the UCL Press website, a DOI ensures that the book remains findable and uniquely identified, no matter where it is hosted in the future. DOIs serve as persistent, stable identifiers for resources. Even if an article, book, or dataset is relocated to a different server or platform, the DOI will always point to the correct content. This guarantees long-term access and citation accuracy.

A URN (Uniform Resource Name) identifies a resource but provides no immediate mean to access it. As exemplified above, typical URNs are the ISBN or the ISSN. From a technical perspective:

A DOI, in its base form, is a URN, a unique identifier not resolvable by design.
When a DOI is resolved (e.g., via https://dx.doi.org/), it effectively functions as a URL.

In summary:

The book’s ISBN is a URN (not resolvable by design).
The book’s DOI is a URN (resolvable by design).
The book’s webpage is a URL (tied to its current location).

All of these are types of URIs (Uniform Resource Identifiers).

So, are there URIs that are not URLs or URNs? Technically, no. In the formal specification of URIs (RFC3986, linked above), all URIs fall into one of these two categories. So, every URI you encounter will be a URL or URN, or possibly a combination of the two. However, in common usage, you might hear the term “URI” used more generally, especially when discussing schemes or identifiers that don’t fully fall into the categories of URL or URN. But from a technical standpoint, any URI you encounter will be a URL or a URN.

You might wonder what is the point of having the URN doi:10.14324/111.9781800083790 and the resolved URL https://doi.org/10.14324/111.9781800083790. Wouldn’t https://doi.org/10.14324/111.9781800083790 be enough? The URN and the URL serves different purposes, and each has its distinct role in the broader ecosystem of digital resource identification and retrieval.

Special Guest: IRIs
#

URIs use exclusively the American Standard Code for Information Interchange (ASCII, pronounced /ˈæskiː/) character encoding system. This system includes the Latin alphabet without accents, along with a few punctuation symbols (A-Z, a-z, 0-9, - . _ ~). ASCII has been in use since 1963 and is universally supported across all computer systems, ensuring compatibility and preventing encoding issues.

Internationalized Resource Identifiers (IRIs) were introduced in 2005 as part of RFC 3987. They were created to allow web addresses (URIs) to include characters from a wider range of languages beyond the ASCII character set. While URIs were limited to a small set of characters (A-Z, a-z, 0-9, and a few symbols), IRIs support Unicode characters, making it possible to use languages such as Japanese, Arabic, Chinese, and many others directly in URLs. For example, in a typical URI, you would see a web address like https://en.wikipedia.org/wiki/Zeus, but with IRIs, you could have a web address like https://ja.wikipedia.org/wiki/ゼウス for the Japanese Wikipedia page about Zeus, which includes Katakana characters.

However, older systems or protocols that do not support IRIs still require percent-encoding for compatibility. This converts the Unicode characters into a format that only uses ASCII characters, which helps older systems interpret them. For instance, the IRI https://ja.wikipedia.org/wiki/ゼウス would be encoded as https://ja.wikipedia.org/wiki/%E3%82%BC%E3%82%A6%E3%82%B9.

Key Takeaways
#

URIs are the backbone of the internet, ensuring unique identification, access, interconnectivity, and data exchange across systems. Without them, the web wouldn’t exist as we know it.

In the Semantic Web, precise identification is essential. The term “semantic” implies that we want the web to reason and infer meaning from data. For this to work, information must be interconnected and unambiguously identifiable. Take The Night Watch as an example—one of Rembrandt’s most famous paintings and a masterpiece of the Dutch Golden Age. In the 17th century, a copy was commissioned from Gerrit Lundens. But look at how many more unrelated resources go by the same name.

Do you have any questions or notice something incorrect? Help us improve this page by opening an issue.

Cite as: