Procedure for building "Globalvat data portal"

1. Objective

What is goal of Globalvat data portal?

  • data portal that can be searched to some degree (persons, groups, full text)

  • includes:

    • information on audiences for 10 years (march 1939 to december1948)

    • information on who participates in them

      • some identified high-level persons/groups

        • cardinals, bishops

      • participant descriptions, with associated attributes:

        • with type (religious, other), number,nationality…

      • → allow for analysis of some particular aspects

    • detailed information on identified high-level persons/groups

 

Tasks

2. Procedure to build the data portal

 

A) Initial data sheet preparation

  1. finish excel transcriptions ongoing

  2. clean errors in data sheets ongoing

    1. (add participations where needed)

    2. define standards to do so

  3. transmit data into joint work-bench

B) Conceptual Preparation of import

  1. decide structure of data tables to be produced for import and which key information to keep for each table ongoing

  2. decide which key persons and groups to be identified analysing the existing sheets ongoing

    1. decide which persons, groups, places have to be identified using group by / sort functions (e.g. OpenRefine)

C) Technical preparation of import

  1. create them in Geovistory to produce identifiant - by doing so, check whether they exist already in Geovistory database

  2. prepare import tables out of excel files for import

    1. prepare code to split initial data into defined import tables ongoing

    2. clearly identifying for each column the corresponding class

    3. interlinking all identified places, people and groups with GV identifier

  3. import data into Geovistory Toolbox

D) Continued data enrichment in the Toolbox

  1. continue identifying additional persons and groups and creating them manually in GV

  2. continue enriching identified persons and groups

E) Build webpage for data portal

  1. specify structure of webpage

  2. develop webpage and enable data access


two ways of progressing:

  • have a go year by year/ month by month

  • combine all the years together and make one big import

 

Question:

  • how to produce a meaningful table of participant descriptions

  • how to define the linking between all the tables? (identifier of audience is unique only with the name of the file)

qualité personne is extra information with a person name so that it can be more, thus: personne reçue et qualité personne” take it together.


A) Initial data sheet preparation ongoing

1. finish excel transcriptions

ongoing

2. clean errors in data sheets (add participations where needed)

 

steps for cleaning and data preparation as implemented by Jacopo on 8.3.24

  • take original excel table & import into deep note

  • process the table:

    • highlighting the errors using simple analysis

    • produce additional columns like day of the week

  • produce a copy of deepnote output and store it in google sheets

  • work manually in the google sheets to clean it (using the highlighted errors)

  • then: import the cleaned google sheet via API into deepnote and

  • run again the code from before in deepnote

  • if needed: combine several month-tables into a yearly table or bigger to prepare for splitting

  • then, split it in deepnote into the final tables to be imported ino GV (this is equivalent to step C-2.a)

next steps

task

assigned

status

task

assigned

status

complete initial excel files for all months @Jacopo Cossu

@Jacopo Cossu

ongoing

advice on package size to be imported (month, years, all) @Francesco Beretta

@Francesco Beretta

ongoing

test & improve cleaning workflow as described above

@Jacopo Cossu

backlog

check the data and correct them manually

@Jacopo Cossu

backlog

 

3. transmit data into joint work-bench

next steps

task

assigned

status

task

assigned

status

Transmit data into joint work-bench in Yellow PostGres priority @Jacopo Cossu

see steps below

@Jacopo Cossu

ongoing

 

 

 

Steps to carry out

exchange with Gaétan on how to work with Yellow database
clarify which code to use
where to store credentials and how to ensure security (e.g password)
share google drive links to filaes with KleioLab & Vincent
make available all treated excel files in Yellow Postgres database at the same time they are loaded into google drive making explicit which files have been treated.
Idea: Create a table with all “audience” in yellow -> produce primary key -> bring this back into the tables

B) Conceptual Preparation of import ongoing

1. Decide structure of data tables to be produced for import and which key information to keep for each table

Tables to be produced

  • audiences

  • participations (with foreign key to “table person” and to “audience table”)

  • participant descriptions linking some of them to created identified persons & groups

  • ev. tables with attributes of participant descriptions

 

current structure

excel table with participations in each line that belong to an audience

image-20240307-104535.png

simplified target structure (approx):

image-20240306-141156.png

 

conceptual target data model:

Open Questions:

  • one participant description per participation or can there be several?

    • Decision: if obvious that several entities and easy to split without a lot of technical effort, then create an additional participant description and participation

      • cases hard to split: add “tag”: “cases to be examined”

      • see example of results with current code: “il P. Clerici con un confratello” -

        • but hard to understand what type of confratello it is. for this, we link it back to P. Clerici via “accompany by”

We pass from 250 to 360 participations

Tables to be produced

Information to be retained per table

Audience:

previous discussion and work. Schema below

Information of interest

Relations/ links

Examples

Description & Comments

Relevant to Research Questions

Information of interest

Relations/ links

Examples

Description & Comments

Relevant to Research Questions

Audience identifier

PK

to Participation on “Audience identifier”

! the form it should take must be determined

Identifier of audience

Ensures the correct quantification of audiences and derived properties

Date audience

crm:E7 Activity “Audience” crm:P4 has time-span crm:E52 Time-Span “Date audience”

 1939-04-01; 1940-04-05; 1943-02-10

Date of the audience; always expressed in the format YYYY-MM-DD.

Ensures the possibility of chronological fixation and chronological scanning in all analysis processes

Jour semaine

“Lunedì”, “Martedì”, “Mercoledì” …

Day of the week in which the audience takes place; the data is automatically assigned by recourse to the “datetime” library.

VA: How are we going to manage this in GV? It could be a type, but how do we link it to the date?

Allows you to check for specific trends that can highlight possible audience planning strategies

Identifiant audience calculé

 

1; 2; 7; 30

Is used to identify each papal audience within, and only within, each excel file, consists of non-negative integers, and is assigned automatically

For the purpose of research questions is replaced by: “Audience identifier”

Heure

09:30; 10:30; 12:00

Indicates the time of receipt and should be transcribed as hh:mm

Evaluate the time scanning of the audiences

Modalité de renseignement heure (Liste fixe)

VA: in the symogih import project, we manage this as “comments on date”: TeEn geov:P19 has comment geov:C20 Comment geov:P22 has comment type "Commentaire sur la date" (id GV 7953586).

"Selon la source"; “Postulée"

Records whether the hearing time is present in the source ("according to the source") or whether the time is "postulated" by the producer of the record. The ability to postulate the time of the hearing is normally used when there are multiple hearings grouped under one slot, often in the case of Specials, or Generals and do not have a slot assigned by the CdM.

 

Durée maximale (minutes)

VA: It's not possible to link a time entity directly to a duration, you have to go through the time-span. But this is not possible in GV. So how can we do it?

5; 15; 30; 45

Expressed in minutes in the form of integer, indicates the length of the hearing, which, if there are two consecutive times specified by the MdC, must be calculated. It can also be postponed. By convention, the last hearing of the day is "postponed" by 15 minutes.

Allows the duration of hearings to be evaluated based on other variables (type, participating subjects...)

Modalité de renseignement durée (Liste fixe)

 

“Postulée”; “Calculée”

Allows you to record whether the audience duration is present in the source or assumed.

VA: the same as for Modalité heure.

 

Type audience selon la source

VA: it could be manage as a comment of the audience with comment type “Type audience selon la source”.

“speciale”; “Alcune persone per il Bacio dell’Anello”

Records the different categories of audience recognized in the source.

 

 

Type audience (catégorie recherche)

crm:E7 Activity “Audience” sdh-sup:P19 has activity type sdh-sup:C3 Activity Type “Type audience”

'Privata - Curia', 'Eventi liturgici', 'Privata - corpo diplomatico', 'Privata - vescovi', 'Privata - superiori istituti religiosi', 'Privata - nobili', 'Privata - altri', 'Speciali', 'Privata - Roma pontificia', 'Privata - CC', 'Privata - movimenti cattolici / associazioni cattoliche', 'Bacio del S. Anello', 'Generale', 'Generale - Sposi'

Classification of the public according to prearranged categories; attributable values are collected in a controlled vocabulary.

VA : Types could be created first as entities and replaced by their id in the table. cf. Udienze Privata - Curia Activity Type id 13257214 and Udienze Privata - vescovi Activity Type id 13257194.

Allows hearings to be analyzed on the basis of meaningful typing

Lieu

crm:E7 Activity “Audience” sdh:P6 took place sdh:C13 Geographical Place “Lieu”

“Sala degli Arazzi”; “Sala del Tronetto”

Records the " Place" where the meeting was held. This information is rarely given in the source.

Allows an analysis on the spatiality of audiences

Source/archival reference

ref: tabella segnature?

crm:E7 Activity “Audience” geov:P26 is mentionned in geov:C28 Mentionning geov:P27 is mentionned in frbroo:F4 Manifestation Singleton “Busta”

  • AAV, Prefettura Casa Pont., Udienze private e speciali, b. 38, fasc. 3

 

records in form of string information about archive, fund, series, buste, file.

VA : the same as types, entities should be created first.

Through archival signature, in association with “Folio dans le fascicule", ensures proper referencing of documentary data collected

Folio dans le fascicule

crm:E7 Activity “Audience” geov:P26 is mentionned in geov:C28 Mentionning [the same as above] geov:P28 at position crm:E73 Information Object “Folio”

328; 517 ; 499

Is the folio identifier assigned by the archivist; consists of non-negative integers

 

NB: The folio is numbered by busta and not by fascicolo

In association with “Source/archival reference", ensures proper referencing of documentary data collected

Détails

 

 

The "Détails" column allows to record information that is useful but not referable to a specific category.

VA: useful in tables that will be imported into Yellow but not relevant in GV?

Provides supplementary information

Recommandation

 

"Fol. 388, Segreteria di Stato, attraverso la quale l'Ambasciata di Romania ha chiesto udienza per il Principe", 'Fol. 389 , Mons. Hudal ', "Fol. 394 Racc. Seg. St., Ambasciata d'Italia presso Santa Sede ", 'Fol. 395 Racc. Segreteria di Stato ', "Fol. 406, Racc. Hudal, rettore dell'Anima", "Fol. 415 Racc. Nemesio Dutra (Premier Secrétaire de l'Ambassade du Brésil) ", 'Fol. 420 S.C. Pro Ecclesia Orientali', "Fol. 428 Lettera dell'Ambasciata belga ", 'Fol. 431-2 Racc. Bernardo Era da Doesbury, Olanda', 'Fol. 444 Biglietto da visita di Giovanni Lottini',

Records the person who recommended the hearing. This information is present in the file, in the form of a letter of recommendation (some institutions have pre-filled forms, such as that of the Regia Legazione d'Ungheria alla SS, which is here).

 

Provides supplementary information.

VA: This issue came up late and was not taken into account in the model.

Participations

previous work:

Information of interest

Relations/ links

Examples

Description & Comments

Relevant to Research Questions

Information of interest

Relations/ links

Examples

Description & Comments

Relevant to Research Questions

Audience identifier

to Audience on “Audience identifier”

! the form it should take must be determined

To link the participant event back to the audience

Ensures the correct quantification of partecipation for audience and derived properties

Participation event identifier

PK

! the form it should take must be determined

 

 

Personne reçue (comme indiquée)

 

Emo Cardinale Sibilia; S.E. Mons. Pisani; S.E. Mons. Cesarini; P. Cordovani O.P.; Conte Stanislao Medolago Albani; Barone de Pfyffer d'Altishofen

Constitutes the mention of an individual subject received in the audience, as reported by the source.

 

Qualité personne (comme indiquée)

 

Assessore della S.C. per la Chiesa Orientale; Prefetto delle Cerimonie Pontificie; Maestro del Sacro Palazzo Apostolico; Sostituto della Segreteria di Stato; Segretario della S.C. dei Religiosi; Sotto-Segretario della S.C. dei Sacaramenti; Sotto-Segretario della S.C. dei Sacramenti per la vigilanza sui Tribunali matrimoniali; Tenente Colonello della Guardia Svizzera; Superiora dell'Istituto Volpicelli

Records the title, function, or more generally the description defining the person received on the title page of the physical file (documentary source)

 

Groupe reçu\n(comme indiqué)

 

cfr. https://deepnote.com/workspace/jacopo-cossu-f66a-69f0c44c-de05-429f-8c28-79b5017ff8b9/project/GLOBALVAT-526f87ec-0b51-4baf-b4c9-20a43afa690a/notebook/gruppi-597abd179c1e40d597f567b0ee42914e

It constitutes the mention of a set of people (formal or informal groups), otherwise unspecifiable, as reported by the source

 

Qualité groupe\n(comme indiquée)

 

Deprecated

Records the title, function or, more generally, description that defines the indistinct set of subjects as recorded by the source

 

Mention d’accompagnement\n(comme indiquée)

 

"e la famiglia"; "con due sorelle"; "e un gruppo di connazionali"

Records mention of unnamed companions

 

Accompagnement"

 

“personne accompagnée”; “personne accompagnant”

Records whether a person is mentioned as accompanied or chaperoned

 

Participant descriptions

Information of interest

Relations/ links

Examples

Description & Comments

Information of interest

Relations/ links

Examples

Description & Comments

Identifier of the participation event& audience

 

 

 to link the participant description back to the participation event

Identifier of the audience event

 

 

to link the participant description back to the audienceevent

GV-ID of identiffied groups if applicable

 

 

 

GV ID of identified person if applicable

 

 

 

Mention

 

 

mention of the entity according to the source

Quality

 

 

records the title, function, or more generally the description that defines the person or sets of persons receiveds

Type group vs person

 

"Personne reçue (comme indiquée)"; "Groupe reçu
(comme indiqué)"

reports in which group of columns should be searched within participation and is useful in differentiating individuals from sets

Type (religious, non religious)

 

! to be defined

tag is intended to characterize the entities mentioned; terms are handled by a controlled open vocabulary; to ensure good categorization it should be possible to assign multiple tags per single mention

Gender

 

 

 

Nationality

 

 

 

Number

 

 

 

challenges:

  • be very prgamatic in the identification of the attributes

next steps

tasks

assigned

status

tasks

assigned

status

complete the tables above as validated with principal investigator priority @Jacopo Cossu

@Jacopo Cossu

ongoing

complete the draw.io draft overview of tables to be created priority @Jacopo Cossu

@Jacopo Cossu

ongoing

comment the tables produced by Jacopo and disucss them jointly @Vincent Alamercery @Francesco Beretta

based on inspected data

@Vincent Alamercery @Francesco Beretta

ongoing

 

2. decide which key persons and groups to be identified analysing the existing sheets

  1. decide which persons, groups, places have to be identified using group by / sort functions (e.g. OpenRefine)

next steps

tasks

assigned

status

tasks

assigned

status

test new workflow grouping and sorting list of participant descriptions to get an overview of recurring visits (e.g. OpenRefine) priority @Jacopo Cossu

@Jacopo Cossu

ongoing

write down documentation of the process priority @Jacopo Cossu

@Jacopo Cossu

ongoing

discuss results with technical project team (Jacopo, Francesco, Vincent, David)

 

 

discuss results with principal investigator and agree on way forward

 

 

implement it

 

 

 

 

C) Technical preparation of import

1. create identified entities in Geovistory to produce identifier - by doing so, check whether they exist already in Geovistory database

 

2. prepare import tables out of excel files for import

2.1 prepare code to split initial data into defined import tables ongoing

Code creaed so fare:

book 1: https://deepnote.com/workspace/jacopo-cossu-f66a-69f0c44c-de05-429f-8c28-79b5017ff8b9/project/GLOBALVAT-526f87ec-0b51-4baf-b4c9-20a43afa690a/notebook/Tabelle-34cc4b750ae744c0a2f201ba691c6cec

book 2: https://deepnote.com/workspace/jacopo-cossu-f66a-69f0c44c-de05-429f-8c28-79b5017ff8b9/project/GLOBALVAT-526f87ec-0b51-4baf-b4c9-20a43afa690a/notebook/Tabelle-34cc4b750ae744c0a2f201ba691c6cec#24a21377fa8f4bb2b1a1905f6820c9d2

steps
prepare first version of code to split into import tables
prepare code to represent the “sources” of the data
prepare code for audiencia speziali
refine code based on final conceptual model

2.2. clearly identifying for each column the corresponding class

2.3 interlinking all identified places, people and groups with GV identifier

3. import data into Geovistory Toolbox

 

Possible moments:

14.3. at 15h