Part 1: Import Procedure Hugenot Database

 

Steps

Objective

People

Status

0. Further digitize data Hugenot Data

Remaining information of the Hugenot Database is digitized and also ready to be imported.

Mathilde, Oriane, Vincent, Morgane

ongoing

1. Establish internal work database

We have an infrastructure to internally accomodate a database that we can use to inspect and analyze the data

Gaétan & Francesco

Done

2. Import & transform Hugenot Data into work database

Data is imported from several tables into 1 table making it possible to easily inspect and analyze the data

Gaétan & Francesco

Done

3. Inspect the data & describe “potential” & “challenges”

Inspect the data so that we understand:

how are the different main classes expressed?

Places

People

Value of donations

Professions

answering the following questions

what are the different types of these classes?

are there different ways of writing them?

are they

Gaétan, Oriane,

Supervision: Francesco

open

4. exchange with PI/ researchers on data potential & challenges & decide priorities

Exchange between data team and principal investigator/ researcher so that we understand:

what kind of data do we have and therfore can we use for research?

what are the detailed research questions?

Out of this, it is decided which information is how (a) cleaned and (b) aggregated

For example, we might have interesting data on:

quantitative values of donations by instution over time

migration of people and family (grouping their appearances)

information of origine and professions

information on family structures

Based on the feedback in step 4, different data aggregation approaches might be chosen (focusing on a specific aspect of the data)

Mathilde, Gaétan, Oriane, Francesco

 

5. build initial datamodel

Based on this exchange, reflection on how the different main classes identified can be modelled in the current datamodel

Francesco

with support of Mathilde, Oriane, Gaetan

 

6. Test data aggregation worfklows based on priorities

In the work database, the Hugenot data is grouped according to clustering-algorithms (publishing the data on the fly on a sparql-endpoint)

Clusters could include:

family structures

life journeys

Clusters

Gaétan, Oriane,

Supervsision: Francesco

 

7. inspect the aggregation clusters

Based on the draft clusters, first analysis results become visible and allow the research team to understand analysis-potential.

This step is important to decide on the data-matching and wrangling algorithms to be used for the data import

Mathilde, Gaétan, Oriane, Francesco

 

8. Adjust & improve aggregation cluster

Adjust aggregation algorithms according to discussions

Gaétan, Oriane, Francesco

 

9. Build controlled vocabularies & lists of main classes & relate them with work database

In this step, “clean” lists of specific classes, (in the sense of controlled vocabularies) are built in order to prepare for the import. e.g.

These vocabularies and their identifiers are then associated with the values in the database.

This can for example include:

list of all places

list of all professions

list of all institutions

list of currencies

etc.

Oriane & Mathilde

 

10. Building detailed semantic data model

In this step, based on the above decided research priorities, the data model is fine-tuned.

Francesco

with support of Mathilde, Oriane, Gaetan

 

11. Final cleaning of all data selected for import

Data in the internal database is then cleaned

Mathilde, Gaétan, Oriane, Francesco

 

12. Data mapping, matching & Import

Data is then mapped on the data model, merged according to the selected & tested merging/aggregation algorithms and the data imported into the Geovistory environment (as facts or factoids, depending on the strategy chosen)

Gaétan, Francesco