TDG HassanA. Sleiman Committed to research! TDG Site Manager 1.1

Blog

V 1.0 of the Framework is now Available!

Tuesday, April 05, 2011 - 09:36:48 AM

We have published version 1.0 of our framework for the research community. We are working on user manuals and the javadocs which will be published as soon as possible. We also have published a collection of Datasets that can be used to assess and compare new proposals.

To use the framework, you may use one of these three commands:

Usage:
-------------------------------------1/3---------------------------------------
java -jar IE-1.0.0.jar learner technique trainingset rules [tokeniser]
-------------------------------------2/3---------------------------------------
java -jar IE-1.0.0.jar tester rules testset
-------------------------------------3/3---------------------------------------
java -jar IE-1.0.0.jar interpreter dataset rules [output]

 

For more information please contactme!

Coming soon

Tuesday, November 16, 2010 - 12:02:43 PM

Soon a first version of the information extraction framework will be released. This version will allow working with datasets and embeds a default information extraction algorithm besides many utilities and functionalities to build and test new proposals.

Besides the framework, a set of datasets to train and test algorithms will be available too. For more information please do not hesitate to contact me.

Reification in Jena

Tuesday, July 27, 2010 - 10:01:14 AM

Reification aims at giving properties to another properties. For example, suppose we would like to add the property ModifiedBy for each property of our ontology where we save the last user that modified this property, this is possible by Statement reification. Statement Reification converts a statement to an individual with a unique URI and that can has its own properties.

Although Reification is too important when working with ontologies, Jena [1] still has a bug when statements are reified. We recently found this bug which was reported to the Jena bug list [2].

Here we can see how reification is lost after merging more than one ontModel:

OntModel subModel,superModel;

subModel = ModelFactory.createOntologyModel(OntModelSpec.OWL_MEM);
OntClass bookClass = subModel.createClass("foo:Book");
DatatypeProperty hasTitleProperty = subModel.createDatatypeProperty("foo:hasTitle");
DatatypeProperty hasLocator = subModel.createDatatypeProperty("foo:hasLocator");
hasTitleProperty.setDomain(bookClass);
Individual book1 = subModel.createIndividual(bookClass);
book1.addProperty(hasTitleProperty, "Title1");
StmtIterator stmtIt =subModel.listStatements(book1,hasTitleProperty,"Title1");
while(stmtIt.hasNext())
{
Statement propertyStatement = stmtIt.next();
ReifiedStatement reifiedStatement=
propertyStatement.createReifiedStatement();
reifiedStatement.addProperty(hasLocator, "10");
}

System.out.println("Number of reified statements before adding it to a Model :"+subModel.listReifiedStatements().toList().size());

superModel = ModelFactory.createOntologyModel( OntModelSpec.OWL_MEM );
superModel.addSubModel(subModel);
System.out.println("Number of reified statements after adding it to a Model:"+superModel.listReifiedStatements().toList().size());

The obtained result is :

Number of reified statements before adding it to a Model :1
Number of reified statements after adding it to a Model :0

1-http://jena.sourceforge.net/

2-http://sourceforge.net/mailarchive/forum.php?forum_name=jena-bugs&max_rows=25&style=ultimate&viewmonth=20100

 

 

Regular expressions Libs

Wednesday, July 21, 2010 - 01:45:06 PM

Not all existing regular expressions libs satisfy our needs, the main problems are that many existing libs use a recursive implementation and not an automaton to search for matches besides to the greedy matching. Another problem we discovered recently:  In a disjunctive regular expression, the largest matching should be selected, while the greatest number of existing libs matches the first part of the disjunction. An example can be seen here

String regex = "([A-Z][a-z]+)|([A-Z]+)|(\\w+)|([0-9]+)";
String input = "XaaaaYXaaaaYXaaaaYXaaaaY";

The correct matching should be: "XaaaaYXaaaaYXaaaaYXaaaaY" which is only returned by using the gnu-regexp-1.1.4 Lib. Below we can see some of the used libs and their results:

 

System.out.println("JINT");
System.out.println("...............................................................");
            
kmy.regex.util.Regex regexpr = kmy.regex.util.Regex.createRegex(regex);
if(regexpr.matches(input))
   System.out.println(regexpr.getMatchString()); 

//Result:

JINT
...............................................................
Xaaaa

Other lib:

System.out.println("dk.brics.automaton");
System.out.println("...............................................................");
dk.brics.automaton.RegExp reg;
dk.brics.automaton.Automaton automaton;
dk.brics.automaton.RunAutomaton runAutomaton;
dk.brics.automaton.AutomatonMatcher autoMatcher;
                
reg= new dk.brics.automaton.RegExp(regex);
automaton = reg.toAutomaton();
runAutomaton = new dk.brics.automaton.RunAutomaton(automaton);
autoMatcher = runAutomaton.newMatcher(input);
while(autoMatcher.find())
       {
           System.out.println(autoMatcher.group());
        }

//Result:

dk.brics.automaton
...............................................................
Xaaaa
YX
YX
YX
Y