Skip to content

Releases: rdfio/rdf2smw

v0.6 - Bugfix: Don't allow commas in facts and titles, as is used as separator

05 Apr 10:41
Compare
Choose a tag to compare

This release mainly fixes a bug about commas in titles. See commit: b25a31

v0.5 - More solid inferencing of template name

17 Aug 18:48
Compare
Choose a tag to compare

This release is much similar to v0.4, but greatly improves the smartness of the functionality that figures out the template name to use when writing facts via pages (using the the one of the page's category with the longest chain of super-categories, so as to get as specific a template/category name as possible).

See v0.4 release notes for more info.

v0.4 - Powerful new features

17 Aug 00:34
Compare
Choose a tag to compare
Pre-release

Latest release, including some powerful new features and fixes.

Some highlights:

  • Infer and write type info to property pages (e4d84ac)
  • Write template pages, based on properties used (32b7873)
    • Facts are written using a nicely formatted table by default, but templates can of course be fine tuned "by hand" to your liking, after the import is done.
  • Write templates and properties to separate files, for more efficient importing (also 32b7873)

These additions should decrease the amount of manual work needed even more.

v0.3 - Many bug fixes and important performance improvements

15 Aug 22:31
Compare
Choose a tag to compare

This release marks a point when somewhat usable results, with reasonable processing time (< 20 s) have been achieved with datasets of sizes around 0.5M triples.

See the commit history for more details, but some highlights:

  • Don't add duplicate facts or categories
  • Shorten titles to MediaWiki's max
  • Fixed silly code that allocated insane amounts of memory
  • Better RDF parsing error checking
  • Collapse multiple argument to same variable to comma-separated list

The usage is also slightly updated, with a dedicated flag for the out-file:

./rdf2smw --in mydataset.nt --out mydataset.xml

v0.2: Writing facts via template calls and better performance

05 Aug 15:49
Compare
Choose a tag to compare

The main new stuff in this release is:

Writing facts as template calls

Facts can now be written as a template call to the template named the same as a RDF subject's first category (this might often be only one anyway), and the variable names named as the properties would be called themselves.

This gives you a lot of freedom as in how to format the presentation of the data, by later going in and implementing the corresponding template (it does not have to be implemented before).

Better performance

After removing some silly code that unnecessarily split xml chunks into lines, we got a 50% speedup and are now up to the following numbers:

720324 triples were converted into 181184 wiki pages in ca 18.5 sec. This gives:

  • ~ 38.9K RDF triples/s ... converted into:
  • ~ 9.8K wiki pages/s

v0.1: Basic RDF to Semantic MediaWiki conversion works

02 Aug 18:21
Compare
Choose a tag to compare

This initial release marks the first version featuring a fully working conversion from RDF n-triples to SemanticMediaWiki facts in MediaWiki XML dump format, which can easily be imported to a Semantic MediaWiki installation.

Some planned features like adding facts via templates, remains to implement.

Some performance numbers to give a hint (zero optimizations done so far):

720324 triples were converted into 181184 wiki pages in ca 28 sec. This gives:

  • ~26000 triples / sec
  • ~6500 pages / sec

Note: The included binary is compatible with 64 bit Linux only.

EDIT 4 Aug 2016: Updated performance numbers. Something strange (maybe too little spare memory) seem to have been causing the previous much lower numbers.