Procedure for building "Globalvat data portal"
1. Objective
What is goal of Globalvat data portal?
data portal that can be searched to some degree (persons, groups, full text)
includes:
information on audiences for 10 years (march 1939 to december1948)
information on who participates in them
some identified high-level persons/groups
cardinals, bishops
participant descriptions, with associated attributes:
with type (religious, other), number,nationality…
→ allow for analysis of some particular aspects
detailed information on identified high-level persons/groups
Tasks
- 1 complete initial excel files for all months @Jacopo Cossu
- 2 advice on package size to be imported (month, years, all) @Francesco Beretta
- 3 Transmit data into joint work-bench in Yellow PostGres priority @Jacopo Cossu
- 4 complete the tables above as validated with principal investigator priority @Jacopo Cossu
- 5 complete the draw.io draft overview of tables to be created priority @Jacopo Cossu
- 6 comment the tables produced by Jacopo and disucss them jointly @Vincent Alamercery @Francesco Beretta
- 7 test new workflow grouping and sorting list of participant descriptions to get an overview of recurring visits (e.g. OpenRefine) priority @Jacopo Cossu
- 8 write down documentation of the process priority @Jacopo Cossu
2. Procedure to build the data portal
A) Initial data sheet preparation
finish excel transcriptions ongoing
clean errors in data sheets ongoing
(add participations where needed)
define standards to do so
transmit data into joint work-bench
B) Conceptual Preparation of import
decide structure of data tables to be produced for import and which key information to keep for each table ongoing
decide which key persons and groups to be identified analysing the existing sheets ongoing
decide which persons, groups, places have to be identified using group by / sort functions (e.g. OpenRefine)
C) Technical preparation of import
create them in Geovistory to produce identifiant - by doing so, check whether they exist already in Geovistory database
prepare import tables out of excel files for import
prepare code to split initial data into defined import tables ongoing
clearly identifying for each column the corresponding class
interlinking all identified places, people and groups with GV identifier
import data into Geovistory Toolbox
D) Continued data enrichment in the Toolbox
continue identifying additional persons and groups and creating them manually in GV
continue enriching identified persons and groups
E) Build webpage for data portal
specify structure of webpage
develop webpage and enable data access
two ways of progressing:
have a go year by year/ month by month
combine all the years together and make one big import
Question:
how to produce a meaningful table of participant descriptions
how to define the linking between all the tables? (identifier of audience is unique only with the name of the file)
qualité personne is extra information with a person name so that it can be more, thus: personne reçue et qualité personne” take it together.
A) Initial data sheet preparation ongoing
1. finish excel transcriptions
ongoing
2. clean errors in data sheets (add participations where needed)
steps for cleaning and data preparation as implemented by Jacopo on 8.3.24
take original excel table & import into deep note
process the table:
highlighting the errors using simple analysis
produce additional columns like day of the week
produce a copy of deepnote output and store it in google sheets
work manually in the google sheets to clean it (using the highlighted errors)
then: import the cleaned google sheet via API into deepnote and
run again the code from before in deepnote
if needed: combine several month-tables into a yearly table or bigger to prepare for splitting
then, split it in deepnote into the final tables to be imported ino GV (this is equivalent to step C-2.a)
next steps
task | assigned | status |
---|---|---|
complete initial excel files for all months @Jacopo Cossu | @Jacopo Cossu | ongoing |
advice on package size to be imported (month, years, all) @Francesco Beretta | @Francesco Beretta | ongoing |
test & improve cleaning workflow as described above | @Jacopo Cossu | backlog |
check the data and correct them manually | @Jacopo Cossu | backlog |
3. transmit data into joint work-bench
next steps
task | assigned | status |
---|---|---|
Transmit data into joint work-bench in Yellow PostGres priority @Jacopo Cossusee steps below | @Jacopo Cossu | ongoing |
|
|
|
Steps to carry out
B) Conceptual Preparation of import ongoing
1. Decide structure of data tables to be produced for import and which key information to keep for each table
Tables to be produced
audiences
participations (with foreign key to “table person” and to “audience table”)
participant descriptions linking some of them to created identified persons & groups
ev. tables with attributes of participant descriptions
current structure
excel table with participations in each line that belong to an audience
simplified target structure (approx):
conceptual target data model:
Open Questions:
one participant description per participation or can there be several?
Decision: if obvious that several entities and easy to split without a lot of technical effort, then create an additional participant description and participation
cases hard to split: add “tag”: “cases to be examined”
see example of results with current code: “il P. Clerici con un confratello” -
but hard to understand what type of confratello it is. for this, we link it back to P. Clerici via “accompany by”
We pass from 250 to 360 participations
Tables to be produced
Information to be retained per table
Audience:
previous discussion and work. Schema below
Information of interest | Relations/ links | Examples | Description & Comments | Relevant to Research Questions |
---|---|---|---|---|
Audience identifier |
to Participation on “Audience identifier” | ! the form it should take must be determined | Identifier of audience | Ensures the correct quantification of audiences and derived properties |
Date audience | crm:E7 Activity “Audience” crm:P4 has time-span crm:E52 Time-Span “Date audience” | 1939-04-01; 1940-04-05; 1943-02-10 | Date of the audience; always expressed in the format YYYY-MM-DD. | Ensures the possibility of chronological fixation and chronological scanning in all analysis processes |
Jour semaine |
| “Lunedì”, “Martedì”, “Mercoledì” … | Day of the week in which the audience takes place; the data is automatically assigned by recourse to the “datetime” library. VA: How are we going to manage this in GV? It could be a type, but how do we link it to the date? | Allows you to check for specific trends that can highlight possible audience planning strategies |
Identifiant audience calculé |
| 1; 2; 7; 30 | Is used to identify each papal audience within, and only within, each excel file, consists of non-negative integers, and is assigned automatically | For the purpose of research questions is replaced by: “Audience identifier” |
Heure |
| 09:30; 10:30; 12:00 | Indicates the time of receipt and should be transcribed as hh:mm | Evaluate the time scanning of the audiences |
Modalité de renseignement heure (Liste fixe) | VA: in the symogih import project, we manage this as “comments on date”: TeEn geov:P19 has comment geov:C20 Comment geov:P22 has comment type "Commentaire sur la date" (id GV 7953586). | "Selon la source"; “Postulée" | Records whether the hearing time is present in the source ("according to the source") or whether the time is "postulated" by the producer of the record. The ability to postulate the time of the hearing is normally used when there are multiple hearings grouped under one slot, often in the case of Specials, or Generals and do not have a slot assigned by the CdM. |
|
Durée maximale (minutes) | VA: It's not possible to link a time entity directly to a duration, you have to go through the time-span. But this is not possible in GV. So how can we do it? | 5; 15; 30; 45 | Expressed in minutes in the form of integer, indicates the length of the hearing, which, if there are two consecutive times specified by the MdC, must be calculated. It can also be postponed. By convention, the last hearing of the day is "postponed" by 15 minutes. | Allows the duration of hearings to be evaluated based on other variables (type, participating subjects...) |
Modalité de renseignement durée (Liste fixe) |
| “Postulée”; “Calculée” | Allows you to record whether the audience duration is present in the source or assumed. VA: the same as for Modalité heure. |
|
Type audience selon la source | VA: it could be manage as a comment of the audience with comment type “Type audience selon la source”. | “speciale”; “Alcune persone per il Bacio dell’Anello” | Records the different categories of audience recognized in the source.
|
|
Type audience (catégorie recherche) | crm:E7 Activity “Audience” sdh-sup:P19 has activity type sdh-sup:C3 Activity Type “Type audience” | 'Privata - Curia',
'Eventi liturgici',
'Privata - corpo diplomatico',
'Privata - vescovi',
'Privata - superiori istituti religiosi',
'Privata - nobili',
'Privata - altri',
'Speciali', 'Privata - Roma pontificia',
'Privata - CC',
'Privata - movimenti cattolici / associazioni cattoliche',
'Bacio del S. Anello',
'Generale',
'Generale - Sposi' | Classification of the public according to prearranged categories; attributable values are collected in a controlled vocabulary. VA : Types could be created first as entities and replaced by their id in the table. cf. Udienze Privata - Curia Activity Type id 13257214 and Udienze Privata - vescovi Activity Type id 13257194. | Allows hearings to be analyzed on the basis of meaningful typing |
Lieu | crm:E7 Activity “Audience” sdh:P6 took place sdh:C13 Geographical Place “Lieu” | “Sala degli Arazzi”; “Sala del Tronetto” | Records the " Place" where the meeting was held. This information is rarely given in the source. | Allows an analysis on the spatiality of audiences |
Source/archival reference | ref: tabella segnature? crm:E7 Activity “Audience” geov:P26 is mentionned in geov:C28 Mentionning geov:P27 is mentionned in frbroo:F4 Manifestation Singleton “Busta” |
| records in form of string information about archive, fund, series, buste, file. VA : the same as types, entities should be created first. | Through archival signature, in association with “Folio dans le fascicule", ensures proper referencing of documentary data collected |
Folio dans le fascicule | crm:E7 Activity “Audience” geov:P26 is mentionned in geov:C28 Mentionning [the same as above] geov:P28 at position crm:E73 Information Object “Folio” | 328; 517 ; 499 | Is the folio identifier assigned by the archivist; consists of non-negative integers
NB: The folio is numbered by busta and not by fascicolo | In association with “Source/archival reference", ensures proper referencing of documentary data collected |
Détails |
|
| The "Détails" column allows to record information that is useful but not referable to a specific category. VA: useful in tables that will be imported into Yellow but not relevant in GV? | Provides supplementary information |
Recommandation |
| "Fol. 388, Segreteria di Stato, attraverso la quale l'Ambasciata di Romania ha chiesto udienza per il Principe",
'Fol. 389 , Mons. Hudal ',
"Fol. 394 Racc. Seg. St., Ambasciata d'Italia presso Santa Sede ",
'Fol. 395 Racc. Segreteria di Stato ',
"Fol. 406, Racc. Hudal, rettore dell'Anima",
"Fol. 415 Racc. Nemesio Dutra (Premier Secrétaire de l'Ambassade du Brésil) ",
'Fol. 420 S.C. Pro Ecclesia Orientali',
"Fol. 428 Lettera dell'Ambasciata belga ",
'Fol. 431-2 Racc. Bernardo Era da Doesbury, Olanda',
'Fol. 444 Biglietto da visita di Giovanni Lottini', | Records the person who recommended the hearing. This information is present in the file, in the form of a letter of recommendation (some institutions have pre-filled forms, such as that of the Regia Legazione d'Ungheria alla SS, which is here).
| Provides supplementary information. VA: This issue came up late and was not taken into account in the model. |
Participations
previous work: Modélisation de l'événement "Audience" | Participation
Information of interest | Relations/ links | Examples | Description & Comments | Relevant to Research Questions |
---|---|---|---|---|
Audience identifier | to Audience on “Audience identifier” | ! the form it should take must be determined | To link the participant event back to the audience | Ensures the correct quantification of partecipation for audience and derived properties |
Participation event identifier |
| ! the form it should take must be determined |
|
|
Personne reçue (comme indiquée) |
| Emo Cardinale Sibilia; S.E. Mons. Pisani; S.E. Mons. Cesarini; P. Cordovani O.P.; Conte Stanislao Medolago Albani; Barone de Pfyffer d'Altishofen | Constitutes the mention of an individual subject received in the audience, as reported by the source. |
|
Qualité personne (comme indiquée) |
| Assessore della S.C. per la Chiesa Orientale; Prefetto delle Cerimonie Pontificie; Maestro del Sacro Palazzo Apostolico; Sostituto della Segreteria di Stato; Segretario della S.C. dei Religiosi; Sotto-Segretario della S.C. dei Sacaramenti; Sotto-Segretario della S.C. dei Sacramenti per la vigilanza sui Tribunali matrimoniali; Tenente Colonello della Guardia Svizzera; Superiora dell'Istituto Volpicelli | Records the title, function, or more generally the description defining the person received on the title page of the physical file (documentary source) |
|
Groupe reçu\n(comme indiqué) |
| cfr. https://deepnote.com/workspace/jacopo-cossu-f66a-69f0c44c-de05-429f-8c28-79b5017ff8b9/project/GLOBALVAT-526f87ec-0b51-4baf-b4c9-20a43afa690a/notebook/gruppi-597abd179c1e40d597f567b0ee42914e | It constitutes the mention of a set of people (formal or informal groups), otherwise unspecifiable, as reported by the source |
|
Qualité groupe\n(comme indiquée) |
| Deprecated | Records the title, function or, more generally, description that defines the indistinct set of subjects as recorded by the source |
|
Mention d’accompagnement\n(comme indiquée) |
| "e la famiglia"; "con due sorelle"; "e un gruppo di connazionali" | Records mention of unnamed companions |
|
Accompagnement" |
| “personne accompagnée”; “personne accompagnant” | Records whether a person is mentioned as accompanied or chaperoned |
|
Participant descriptions
Information of interest | Relations/ links | Examples | Description & Comments |
---|---|---|---|
Identifier of the participation event& audience |
|
| to link the participant description back to the participation event |
Identifier of the audience event |
|
| to link the participant description back to the audienceevent |
GV-ID of identiffied groups if applicable |
|
|
|
GV ID of identified person if applicable |
|
|
|
Mention |
|
| mention of the entity according to the source |
Quality |
|
| records the title, function, or more generally the description that defines the person or sets of persons receiveds |
Type group vs person |
| "Personne reçue (comme indiquée)"; "Groupe reçu | reports in which group of columns should be searched within participation and is useful in differentiating individuals from sets |
Type (religious, non religious) |
| ! to be defined | tag is intended to characterize the entities mentioned; terms are handled by a controlled open vocabulary; to ensure good categorization it should be possible to assign multiple tags per single mention |
Gender |
|
|
|
Nationality |
|
|
|
Number |
|
|
|
challenges:
be very prgamatic in the identification of the attributes
next steps
tasks | assigned | status |
---|---|---|
complete the tables above as validated with principal investigator priority @Jacopo Cossu | @Jacopo Cossu | ongoing |
complete the draw.io draft overview of tables to be created priority @Jacopo Cossu | @Jacopo Cossu | ongoing |
comment the tables produced by Jacopo and disucss them jointly @Vincent Alamercery @Francesco Berettabased on inspected data | @Vincent Alamercery @Francesco Beretta | ongoing |
2. decide which key persons and groups to be identified analysing the existing sheets
decide which persons, groups, places have to be identified using group by / sort functions (e.g. OpenRefine)
next steps
tasks | assigned | status |
---|---|---|
test new workflow grouping and sorting list of participant descriptions to get an overview of recurring visits (e.g. OpenRefine) priority @Jacopo Cossu | @Jacopo Cossu | ongoing |
write down documentation of the process priority @Jacopo Cossu | @Jacopo Cossu | ongoing |
discuss results with technical project team (Jacopo, Francesco, Vincent, David) |
|
|
discuss results with principal investigator and agree on way forward |
|
|
implement it |
|
|
C) Technical preparation of import
1. create identified entities in Geovistory to produce identifier - by doing so, check whether they exist already in Geovistory database
2. prepare import tables out of excel files for import
2.1 prepare code to split initial data into defined import tables ongoing
Code creaed so fare:
steps
2.2. clearly identifying for each column the corresponding class
2.3 interlinking all identified places, people and groups with GV identifier
3. import data into Geovistory Toolbox
Possible moments:
14.3. at 15h