TDG Carlos R.Rivero Committed to research! TDG Site Manager 1.1

MostoDEx

A technical report regarding MostoDEx can be found here.

Tool

  • You may found a research prototype of MostoDEx here, and two examples here and here.
  • By right-clicking on the source dataset, you may load the source data ("DBpedia.n3").
  • By right-clicking on the target dataset, you may load the target data ("Freebase.n3").
  • By right-clicking on the correspondences, you may load them ("correspondences.txt").
  • Then, you may navigate through the tabs to perform the steps of generating schema mappings.

 

Validation

Preliminaries

  • Download the Preprocess data, Data exchanger, and Target comparator utilities.
  • Include them in the same folder with the Configuration files.
  • Inside the "conf" folder, it is necessary to configure "schema-mapping-problem.properties" to be run on a specific machine. The first group of properties (all.*) refer to general properties, such as a temp file to exchange data or the location of the datasets. The rest of the groups of properties deal with the data exchange problems that our repository comprises.
  • Inside the "no_transformations" folder, you may find the files that we use for each data exchange problem. For instance, "dbp_fb_film" contains the data related to DF-F and FD-F problems, namely: "Freebase.n3" and "DBpedia.n3" comprise the source and target triples of the single exchange sample; "correspondences.txt" comprises the correspondences in these problems, note that correspondences may be bidirectional (--) or unidirectional (<- or ->); finally, "Fb2DBp.qry" and "DBp2Fb.qry" comprise the handcrafted schema mappings in SPARQL.
  • The equivalences between the names in these files and the names in our research are the following: dfpp = DF-P; fdpp = FD-P; dffl = DF-F; fdfl = FD-F; dfts = DF-TS; fdts = FD-TS; dfun = DF-U; fdun = FD-U; dg = DG; and gd = GD.
  • Note that, to add a new data exchange problem, it is necessary to specify all of the previous files.
  • The data regarding the data exchange problems of our repository can be found in the following links: DF-P, FD-P, DF-F, FD-F, DF-TS, FD-TS, DF-U, FD-U, DG, and GD.

Preprocessing data

  • Note that these steps are only necessary if you want to include a new data exchange problem that has DBpedia, Freebase, or GovWILD as the source dataset.
  • We use DBpedia 3.8 (Ontology Infobox Types and Ontology Infobox Properties, both in english), Freebase, and GovWILD datasets.
  • To preprocess a source dataset to only retrieve those triples that relate the entities specified in the correspondences, use "run-preprocess-data.bat" in which it is necessary to state the source dataset (dbpedia, freebase, or govwild), the data exchange problem that we deal with, and where the full RDF dump is located.
  • When this script finishes, it stores the desired data in the corresponding folder, which is specified in file "schema-mapping-problem.properties".

Exchanging data

  • To run a data exchange problem, use "run-data-exchanger.bat" in which the problem has to be specified together with the times to be run.
  • When this script finishes, it prints the measures in the console.

Comparing target data

  • To compare the target data that have been generated by the automatically-generated schema mappings with the handcrafted schema mappings, it is necessary to run "run-target-comparator.bat" in which the problem has to be specified.
  • When this script finishes, it prints whether or not the generated target data are the same in the console.

 

Scalability

  • Download the Evaluator utility.
  • The data exchange problems that we have used to evaluate MostoDEx, which has been generated using MostoBM can be found here. 
  • "run-evaluator.bat" is used to run the evaluation, in which it is necessary to specify the folder that comprises the data exchange problems to evaluate, the data exchange pattern (2 = Lift Properties, 3 = Sink Properties, 5 = Extract Subclasses, and 6 = Extract Superclasses), and the number of problems that we wish to evaluate (in the IF NOT line).
  • When this script finishes, it creates a file inside the folder that comprises the data exchange problems called "output.txt", which contains the evaluation results.
  • The scalability results of our proposal can be found here.