Uncertainty

NB: We are producing the final version in French of this relevant aspect of our project. The English version will be completed and updated by the end of 2023

These data and metadata have characteristics that make them “imperfect” in the eyes of computer scientists. This “imperfection” stems from their uncertainty, gaps and imprecision. It’s worth pointing out that historians are used to working with this uncertainty, which is inherent in all traces of the past. It might prompt the historian to seek out other sources that witness the practices of that time. For example, we might wonder about the changing tonnages of what we clearly consider as the same ship: tonnages seem to be systematically overestimated in some ports and underestimated in others, so this “imperfection” might lead to an historical question (why and where is it possible to do so?).

The fact remains, however, that data processing requires these uncertainties, gaps and inaccuracies to be formalized in a coherent and explicit manner, so that all files issued from the databases first, and visualization tools thereafter, can take them properly into account. In some instances, the team members took a decision to solve uncertainty (is the last letter of a name an “s” or a “t”?), but often we introduced a specific semiology to detect our interventions on the data. Knowing that a field‘s content is derived by interpretation, eases up the procedure of correcting errors of understanding or interpretation on our side. But it also enables us to qualify uncertainty and to vizualize it.

A considerable part of the PORTIC program is namely devoted to the visualization of data uncertainty, which often tends to “disappear” in visualization tools. PORTIC aims to give a visual account of the gaps and uncertainties surrounding the data collected, and not to “erase” their imprecise nature.

This section describes the various types of uncertainty we have identified, and how we have proceeded.

Uncertainty specific to the sources

 The sources mobilized for Navigocorpus are incomplete, insofar as a number of registers of ship departures are no longer preserved. Their completeness itself is also uncertain, insofar as for a (limited) number of ports we do not know whether the registers ever existed.

There are also questions of completeness and uncertainty within each register. Clerks did not always note down the same information, sometimes they forgot a variable, sometimes they made mistakes.

They were also the holders of administrative practices that they did not necessarily take the time to put down in writing, and which we must therefore interpret in order to be able to mobilize the source, with the risk of being wrong in our turn.

We still have open questions. This document gives you an idea of the challenges that we faced.

Incomplete or redundant sources

 We are aware of a number of missing registers of departure: other reports (comptes rendus) allow us to put a figure on the number of issued clearances permits (congés), although we do not know the details of each of them. We integrated the existence of these ports into the visualizations to show the gaps in the corpus.

We know that some ports existed, but we have neither the registers nor the reports. This is the case, among others, for most of the colonies, but also for Libourne and other ports of a certain size, which we listed as far as possible based on Chardon’s survey in the 1780’s. But for minor ports, we don’t know if captain took their permit in the closest possible port, or on site.

In Marseille, historians considered so far that double entries between the registers of small-scale coastal trade and the captains’ depositions were minimal. We found over 100 per year (ca. 5% of total). We are exploring the causes of this phenomenon, and have processed the information to avoid double entries when querying the database.

Incomplete and imprecise register entries

 Certain information is systematically present in some ports (e.g. home port) and systematically absent from others. The clerk may occasionally have forgotten the name of the ship, the captain, or the tonnage, or even the destination. In the visualizations wizsources, it is possible to see the proportion of known information in main variables.

We have also noted a number of inaccuracies in the sources. Some are due to the fact that the clerks obviously noted down what they heard orally, without always having a written document in front of them. Their transcriptions of place names and personal names are odd whenever the term sounded strange to their ears (such as for Breton or foreign names, or foreign localities rarely heard of in their port). In Les Sables-d’Olonne, Aber-Ildut is called “La Berlduque”. In some ports, foreign ship names are systematically translated into French, with possible errors. The variation in tonnage intrigued us most of all, and we processed them with specific data mining procedures.

Ontological uncertainty: the future of the past

By definition, the future is uncertain. So any declaration of a future destination points to the uncertain status of the information. The ship may have been wrecked, captured or hit by bad weather, forcing it to change course.

In a dedicated Navigocorpus field (Pointcall_statuts), we have therefore indicated the status of the information relative to each location visited by the ship by means of a marker (“P” for the past, “F” for the future; in the screenshot below, the status is in the last field, to the right of the port name). This status is relative to the point at which the voyage was observed.

The “certainty” indicated by the second letter “C” is therefore relative to the documentary entry (the indication provided by a given source for a given ship at a given date), and should not be understood in an absolute sense. Thus, if the captain declared that he was going to Nantes on leaving Bordeaux, Nantes will be associated with a “FC” status (“future certain”: therefore ‘certain’ for the captain at the date of the declaration).  The question of whether other sources confirm or invalidate this declaration is different, and has therefore been dealt with separately in another field of the database (see below, section: route uncertainty). “Certain” in this context is therefore not synonymous with “proven”, and this has been made clear when querying online visualizations.

However, we are also able to confirm whether or not this ship has actually arrived in Nantes, as we have the Nantes vacations. It is therefore also possible to measure the degree to which future intentions have been fulfilled, at least for the French destinations of the ports whose records are kept.

When the intentional destination information is contradicted by the same source that declares this intention (in the case, for example, of a capture at sea), the second letter “U” indicates the unfulfilled intention. For example, the vessel La Ville d’Yverdun, which left Marseille on December 28, 1787, bound for Smyrna, returned to Marseille four days later, after suffering damage on the Frioul islands. The Smyrna destination is therefore associated with a status indicating an unproven past intention, “PU”:

The certainty of the proven voyage is reflected in all the associated data. Thus, in the previous case, if the departure of a tonnage from Bordeaux is certain, as attested by the source, the arrival of this tonnage (or this captain, or this product loaded in Bordeaux, etc.) is uncertain. An uncertainty value is therefore assigned to each variable for display purposes, to enable the user to assess whether the information is certain or not.

Limits which are inherent to the nature of sources

The existing legislation in France obliged captains to take a departure permit (congé). There were, however, exceptions, and it existed different types of congés which complicated our work of reconstructing ship itineraries:

– for a round trip within the same admiralty office, a single outward permit was sufficient. It is therefore the existence of a subsequent leave for the same ship and captain that attests to a return voyage, which does not exist in the Navigocorpus database as such. While the existence of the voyage can thus in this case be indirectly attested, neither its date nor its cargo are known. There is therefore a systematic underestimation of the flow of goods and ships, which can be partly pointed out.

– some admiralty offices issued fixed-term permits (ranging from 3 to 12 months). This is the case, for example, in Brittany and Normandy, for ships sailing within the province. Permits for fresh-fish fishing were generally valid fort a whole year. As the sources do not always specify the duration of the congé, we will endeavor to report this type of leave whenever possible, and to take into account the underestimation they entail in port movements.

Starting in 1786, the Marseilles Health Office’s collection includes the “cahiers du petit cabotage” (coastal trade notebooks), which record voyages not included in the captains’ depositions registers, which are preserved starting in 1709. There is a structural underestimation of entrances in Marseille in the captains’ depositions. Our analysis of the cahiers du petit cabotage make it possible to determine at least the criteria of missing entrances.

Uncertainty linked to identifiers

As with any database, we made a distinction between data entries, which respects as much as possible the original manuscript source and its spelling, and the attribution of identifiers in dedicated fields, which aims to standardize the information. This section explains the choices we have made, the degree of uncertainty involved, and the way we have chosen to treat or represent it.

Ship identifiers

We dispose of a very dense set of information delivered by different sources, which makes it possible to identify ships and trace their itinerary over time. However, nothing in the sources confirms that it is one and the same ship. We assigned an identifier (ship_id) manually, taking into account proximity of name, similarity of tonnage, similarity of captain’s name, proximity of home port and any other contextual information.

This identification is, strictly speaking, systematically uncertain, and identification errors do exist, especially for very frequent ship names. We will continue to correct them throughout the program and thereafter, notably through an algorithm developed by the team (2021) to “predict” the possibility that two or more documentary entries refer to one and the same ship.

Flag attribution [ship_flag]

The flag is rarely indicated by the sources we exploited.

In most cases, we have assigned a flag (which we digitalized in the database in square brackets) on the basis of other variables, such as the ship’s homeport or, if this was missing, the captain’s residence (ships with a Saint-Tropez captain or a “Catalan” captain are thus assigned the [French] and [Spanish] flags respectively.

For ships that bought a departure permit (congés, subseries G5), additional information was provided by the nature of the congé (there are different series of congés: French, foreign, colonial, etc.). We ought to beat in mind, however, that in some French ports, Spanish ships benefit from preferential treatment and could therefore buy the cheaper French permit. In general, ships having a French congé have been provided by the flag indication [French], even in the absence of any other field that might give clues as to the ship’s flag.

The allocation of a flag that is indeterminate but not French [NonFrench] characterizes all ships that have acquired a foreign congé in France and for which no other element in the documentary unit or in the documentary units of the same ship (=same ship_id) makes it possible to assume the nationality.

The information in square brackets is qualified as uncertain in the visualizations [level -2].

Captains' identifiers

Navigocorpus assigns a captain identifier (captain_id) to enable the captain’s trajectory to be traced over time. Strictly speaking, however, such identification is systematically uncertain, and errors do occur, especially in the case of very frequent names. We will continue to correct these throughout the program and thereafter.

When assigning the captain_id, which we did manually, we took into account the proximity of the name, loyalty to the same ship, similarity of ship tonnage and sailing areas when the same person seems to change ship, proximity to the home port and any other contextual information.

The thorniest cases are those where the captain has more than one first name. We attributed two different identifiers to Jean Dupont and Pierre Dupont, even though they served on the same ship, but when we found also a Jean Pierre Dupont on the same ship, we decided to assign a single identifier to all three names. Users should be aware that the transcription of proper names and the identification of captains are in themselves open to question. Our visualisation tool of captain’s routes makes it possible to search through any name spelling within the database.

Product and Location identifiers

Text is currently being drafted. Please come back later.

Qualifying uncertainty

We have assigned to all the information contained in the sources and to all the identifiers we added a value that qualifies the degree of uncertainty of the information in question. These values are:

  • -4 : information missing
  • -3: false, as denied by other documents or historical analysis
  • -2: the content of the field is uncertain, as it is derived from another source or contextual information
  • -1: unconfirmed (status of information relating to an unconfirmed future event – pointcall_status = FC)
  • 0 : Observed as present or past by source (pointcall_status = PC)

An example of level -3: On 1 January 1787, a congé was issued in the port of Bordeaux to a ship called the Belle Poule, captain Jean Durand, to sail to Saint-Domingue. Another congé was issued in the port of La Rochelle on 21 January 1787 to a ship called the Belle Poule, captain Jean Durand, to sail to Cap-Français. The two ships have the same homeport and a very similar tonnage.  We believe they are the same ship, and therefore assign the same ship identifier (ship_id). In reconstructing the itineraries, we qualified the Bordeaux-> Santo Domingo route as “false”, so as to reconstruct what we believe to be the “true” route: Bordeaux -> La Rochelle -> Cap-Français. We have a certainty (uncertainty value = 0) for the Bordeaux exit and the LaRochelle exit. We rate the destination Saint-Domingue from Bordeaux as -3 (false, as it is denied by other documents or historical analysis); and we rate the arrival at Cap-Français as -1. We also create a leg “Bordeaux -> La Rochelle” which is not in any of our sources.

An example of level -2: we assigned to a ship with Le Havre as homeport and taking a French congé, the flag = French. This information is not to be found in the source. It is -2 because it is uncertain/derived from contextual information. Another example: a documentary entry states that La Belle Poule‘s home port is Le Havre. We assigned Le Havre as homeport to all ships to which we have attributed the same ship identifier (ship_id). The homeport of these documentary entries is qualified -2: uncertain, as it is derived from another source.