Part 1: Import Procedure Hugenot Database
Steps | Objective | People | Status |
0. Further digitize data Hugenot Data | Remaining information of the Hugenot Database is digitized and also ready to be imported. | Mathilde, Oriane, Vincent, Morgane | ongoing |
1. Establish internal work database | We have an infrastructure to internally accomodate a database that we can use to inspect and analyze the data | Gaétan & Francesco | Done |
2. Import & transform Hugenot Data into work database | Data is imported from several tables into 1 table making it possible to easily inspect and analyze the data | Gaétan & Francesco | Done |
3. Inspect the data & describe “potential” & “challenges” | Inspect the data so that we understand: how are the different main classes expressed? Places People Value of donations Professions … answering the following questions what are the different types of these classes? are there different ways of writing them? are they | Gaétan, Oriane, Supervision: Francesco | open |
4. exchange with PI/ researchers on data potential & challenges & decide priorities | Exchange between data team and principal investigator/ researcher so that we understand: what kind of data do we have and therfore can we use for research? what are the detailed research questions? Out of this, it is decided which information is how (a) cleaned and (b) aggregated For example, we might have interesting data on: quantitative values of donations by instution over time migration of people and family (grouping their appearances) information of origine and professions information on family structures Based on the feedback in step 4, different data aggregation approaches might be chosen (focusing on a specific aspect of the data) | Mathilde, Gaétan, Oriane, Francesco |
|
5. build initial datamodel | Based on this exchange, reflection on how the different main classes identified can be modelled in the current datamodel | Francesco with support of Mathilde, Oriane, Gaetan |
|
6. Test data aggregation worfklows based on priorities | In the work database, the Hugenot data is grouped according to clustering-algorithms (publishing the data on the fly on a sparql-endpoint) Clusters could include: family structures life journeys Clusters | Gaétan, Oriane, Supervsision: Francesco |
|
7. inspect the aggregation clusters | Based on the draft clusters, first analysis results become visible and allow the research team to understand analysis-potential. This step is important to decide on the data-matching and wrangling algorithms to be used for the data import | Mathilde, Gaétan, Oriane, Francesco |
|
8. Adjust & improve aggregation cluster | Adjust aggregation algorithms according to discussions | Gaétan, Oriane, Francesco |
|
9. Build controlled vocabularies & lists of main classes & relate them with work database | In this step, “clean” lists of specific classes, (in the sense of controlled vocabularies) are built in order to prepare for the import. e.g. These vocabularies and their identifiers are then associated with the values in the database. This can for example include: list of all places list of all professions list of all institutions list of currencies etc. | Oriane & Mathilde |
|
10. Building detailed semantic data model | In this step, based on the above decided research priorities, the data model is fine-tuned. | Francesco with support of Mathilde, Oriane, Gaetan |
|
11. Final cleaning of all data selected for import | Data in the internal database is then cleaned | Mathilde, Gaétan, Oriane, Francesco |
|
12. Data mapping, matching & Import | Data is then mapped on the data model, merged according to the selected & tested merging/aggregation algorithms and the data imported into the Geovistory environment (as facts or factoids, depending on the strategy chosen) | Gaétan, Francesco |
|