We suggest an incremental approach where new sources are added to a daily build and explored with ad-hoc queries before choosing a more permanent representation as nodes and relations.
Extract to Files in json, one or more for each source.
Merge and Transform into csv node and relation files.
Attach Source to Relations with source attribute.
Cautiously Merge new sources for familiar nodes.
Match with Heuristics tabulated and reviewed.
Count Matches and Mismatches and prioritize.
Save and Revise Queries against work in progress.