HTTP vs. HTTPS in resource identification

Angelegt von Hentschke, Jana, zuletzt geändert am 2019-01-31

Sie zeigen eine alte Version dieser Seite an. Zeigen Sie die aktuelle Version an.

Unterschiede anzeigen Seitenhistorie anzeigen

In November 2018 at SWIB18 conference a breakout session "Cool URIs might be insecure if they don't change" took place. Participants wished to find a place to record the results and follow up on the discussion. This wiki page is meant to serve this purpose.

It is hosted by the German DINI-AG Competence Centre Interoperable Metadata (KIM). Further exchange of any parties interested in the topic could be initiated through comments on this page or through the mailing list of KIM's Working Group "Identifiers" (registration, public archive).

SWIB18 Breakout Session: Cool URIs might be insecure if they don't change

2018-11-27. Bonn, Germany. Friedrich-Ebert-Stiftung.

Participants: Lars G. Svensson (Deutsche Nationalbibliothek) (Initiator), Jana Hentschke (Deutsche Nationalbibliothek) (Initiator), Raphaëlle Lapôtre (Bibliothèque nationale de France), Pascal Christoph (Hochschulbibliothekszentrum NRW (hbz)), Michele Casalini (Casalini Libri), Carsten Klee (Staatsbibliothek zu Berlin - Preußischer Kulturbesitz), Tom Baker (Dublin Core Metadata Initiative), Alexander Jahnke (Niedersächsische Staats- und Universitätsbibliothek Göttingen), Martin Scholz (Friedrich-Alexander-Universität Erlangen-Nürnberg), Joachim Laczny (Staatsbibliothek zu Berlin – Preußischer Kulturbesitz)

Teaser:

Many (most?) providers of linked data publish their resources using http URIs as identifiers. http, however, is a very insecure protocol and there is a movement towards making the web – and thus the Semantic Web – a more secure and trusted place by moving from http to https. For the cosmos of linked RDF data, the seemingly small addition of one character changes a lot: the URIs of a resource. And cool URIs don't change ...

Redirects from http to https URIs seem to solve the problem at first glance. But do they really? As long as an initial request for a resource is directed at its cool http-URI, there is unprotected exchange of data that could be intercepted (or even altered). On the other hand, changing http://example.org/resource123 to https://example.org/resource123 might cause some discomfort at the data consumer side as data stores need to be updated and queries amended.

How are "Cool URIs" to be weighed out against trustworthiness of data providers, privacy of users ... ?

Introductory slides: Should the Semantic Web switch to HTTPS.pptx (by Jana)

Discussion notes (arranged by subtopics by Jana ):

Why can identification and data transfer not be considered two different things?

Statement #1 of the Linked Data Principles says "Use URIs as names for things" and statement #2 says "Use http URIs to that people can look up those names". This stresses that URIs are used for the purpose of identification as well as to transport data.

... #3 not so important, but #4 "include links" could be a problem, since you might only have http-URIs to link to (not https)

... TLS requires the use of https, but "Cool URIs don't change". The problem is that http traffic can be intercepted, so redirects

... aren't a solution, since the http request can be intercepted and e. g. a 303 redirect can be replaced by a redirect to another resource

... We learned from Halpin that there is no simple solution. owl:sameAs just blows up the data volume and doesn't solve the problem

... on the protocol level (on the data level it does, but needs inferencing to exploit this extra information)

... Halpin also says that there is no way in RDF to say that one URI is equivalent to anocher URI (owl:sameAs iinks individual/entities but not the identifiers)

Why do 301 redirects not solve the problem?

Why do owl:sameAs relations not solve the problem?

Why are identifiers not independent from protocol?

That currently is the way it is for URI/IRIs. There are oter

Seitenhierarchie

SWIB18 Breakout Session: Cool URIs might be insecure if they don't change

Why can identification and data transfer not be considered two different things?

Why do 301 redirects not solve the problem?

Why do owl:sameAs relations not solve the problem?

Why are identifiers not independent from protocol?

Why don't we just stick to http identifiers while using the https for transfer?

Why not using https identifiers for new resources while keeping http for the existing ones?

Can the problem with http identifiers be solved on the client side?

Are there more pragmatic approaches to prevent data manipulation?

How do search engines react?

What about URIs of RDF Element Sets

Who does what?

Materials