In progress

MP3Importer (works for me)

Things to do

  • Adaptation for music domain
    • finish skiptrax ontology (see Music Ontology) - 1-3 weeks
      • handle remixes/live/unplugged (song variants)
      • discern song and file - model file ownership
    • Proper GUI with the following functions:
      • metadata importers/exporters (ID3, LastFM, etc.) - 1-2 weeks
      • (adapter for own audio collection)
      • simplify entering song/band information - 1 week
      • (generate/play playlists, possibly with changing style over time (AutoDJ) - 2-6 weeks)
  • Metadata-based functionality
    • different sharing rights for contacts (e.g. A only ontologies, B also metadata, C everything) - <1 week
    • expert recommender (ask friends about people having knowledge about X/interests in Y) - 1-3 weeks
    • trust visualization
    • scalable algorithms - scalable = void algo(Graph oldFacts, Graph additionalFacts, Datastructure internalData, Notifications interestingThings)
      • computation of feature summaries - 1 week
      • feature/item similarity measure - 2 weeks
      • structured search (index)
      • implicit trust (= peer competence) metric - 2-5 weeks
      • explicit representation of trust (see Skippies ontology) - 2-5 weeks
        • user interface changes for that
        • trust metric changes for that
        • by combining computed and explicit trust ratings, identifying shared interests/fields of competence should be possible.



Skipforward is intended to provide content-bases recommendations, but there might be some cool variants of combining it with collaborative filtering/recommendation:

Collaborative filtering per feature

Instead of the boring "people who like this also like that" recommendations, this could be adapted in the following way:

People who like A also like B => People who think A has feature F also think that B has feature F
People who like the same items like you also like this => People who annotated items the same way as you annotated this that way

The last strategy could be used to rate items that haven't been reviewed by the user, but by his friends.

Music Ontology


Artist collaboration

At first glance, one might say that only one class Artist is sufficient, but it quickly reveals its limits. A few examples:

  1. Chicane feat. Tom Jones - Stoned in Love (at (at
  2. Cerf, Mitiska & Jaren - Light The Skies (at

Let's forget for a moment that this are in fact releases and not songs and concentrate on the artists: In the first case there are two artists, Chicane and Tom Jones. Or is there one artist called Chicane feat. Tom Jones? Musicbrainz has a special FeaturingArtistStyle, whereas discogs just has multiple artists.

The second case is a bit more difficult, as there is a an artist on discogs called ''Cerf, Mitiska & Jaren'', but it consists of three members, Jaren, Shawn Mitiska and Matt Cerf.

Discogs has the following guidelines for artists, which say that "Artists which commonly collaborate together should be listed as one artist" and "Do NOT attempt to split artists who regularly collaborate. (Regular collaboration consists of 3 or more collaborations (different releases), excluding remix EPs)".

Artist Aliases

A different problem is the one of the same artist having different names. The problem is rather easy with music groups, because it can be modelled accurately by two different music groups having the same members.

But with solo artists (persons) one has to distinguish name variations (DJ Tiësto, Tiësto, Tiesto, ...) and different names (aliases) under which he releases stuff (Drumfire is an alias of DJ Tiësto)

To make this mess even worse, there are fictitious artists, e.g. Bernd das Brot.


Even though and have put much thought in it, they couldn't cover all cases and there are always exceptions. We have to come up with a solution that has to special enough to fit our cause, but simple enough to be used by non-nerds.

more to follow




  • Agenda
    • Thesis parts (finished):
      • chapter on evaluation finished
      • all chapters beautified
    • Software:
      • all bugs fixed
      • all classes and methods commented
      • cruft removed
      • TODOs/hacks marked clearly
    • Evaluation:
      • finished


  • Agenda
    • Thesis parts:
      • chapter on trust
      • chapter user-similarity complete
      • chapter MP3-Importer and Skiptrax ontology complete
      • started beautifying other chapters
    • Software functionality (fully implemented):
      • implicit and explicit trust in other users (edit and view) TODO
      • benchmarking of algorithms (testcases on real data)
      • from now on only bugfixing
    • Evaluation:
      • all test persons are finished annotating and using recommendations
      • feedback gathered


  • Agenda
    • Thesis parts:
      • content of chapter user-similarity complete (moved to 2008-11-28)
      • content of chapter MP3-Importer and Skiptrax ontology complete (moved to 2008-11-28)
    • Software functionality:
      • Possibility to trigger recalculation of similarities from outside
      • Logging
        • automatic comparing of incremental and recalculated user-similarities (not so important)
        • feedback for recommended items ("I don't like this" on recommendations)
      • user-similarity used in prediction of usefulness (done)
        • mapping from correlation [-1..+1] to weight [0..+1] (done)
        • persistence of values (done)
        • inference implemented (done)
        • calculate confidence of user similarity


  • Agenda
    • Thesis parts:
      • complete list of metrics/similarity measures including short description and examples
      • all equations and formulas are in TeX
      • terminology (done)
      • raw structure for each chapter
      • chapter describing Skipforward (done)
      • chapter describing user-similarity started (done)
    • Software functionality:
      • Skiptrax ontology completely labeled (done)
      • Importer gives simple explanations for not-importable songs
      • make recommendations filterable (exclude own items or those used for recommendation)
    • Evaluation
      • 2 test persons finished annotating 25 songs (done)
      • Instructions for test persons finished based on feedback of first 2 (done)
      • all other test persons instructed and they already begun work (done)


  • Agenda
    • Show importer and current state of competence metric
    • Discuss roadmap
      • computation of feature summaries?
      • feature/item similarity measure?
      • magnet?


  • fix mp3 set (done)
  • let Malte test mp3 importer (2008-10-31)


  • start thesis tex (2008-10-24)
    • contents (2008-10-24)
    • terminology
    • function signatures (mathematical) (2008-10-24)
    • usage of these functions (what combinations for what purposes) (2008-10-24)
    • design music domain evaluation (2008-10-24, start)
  • test with min() as co-rated weight (2008-10-24)
  • commit code (2008-10-24) (done)
    • RDF<->FactsDb<->Item/Feature/User matrix<->ExpRecAlg<->API (no inference: 2008-10-24)
      • Matrix is used for "inference": Push positive features up, negative features down, end/start with abstract classes
    • speed/memory benchmarking logs (2008-10-24)
  • talk with Raffael/Darko (done)
  • @Malte: Proper diffs (done)
  • @Malte: portforwarding/new group on


  • @Malte: Syncing between own nodes (done)
  • Expert Recommender/Item Opinion distance metric
    • have a look at Dempster-Shafer
    • do not forget confidence values (use them as weights when aggregating)
  • start with .tex (done)
  • look at evaluations in related papers
  • test algorithm (and simplifications) with testing data (done)


  • (minor) fix bugs of importer (done)
  • add possibility in GUI to annotate multiple items at a time (web frontend or Swing GUI)
    • possibility to copy all annotations from one item to another (maybe copy/paste)
  • write "competence calculator" (expert recommender) (done)
    • public float calculateCompetence(String jid, String featureNamespace)
    • incremental


  • Keep Trac pages and tickets updated
  • Evaluation...
    • Song annotation: Let users annotate one set of fixed songs (to get good overlap) and other songs of their choice (to test recommendations)
    • Same for feature sets to annotate: Create one certificate that must be completed and let users complete other features freely
    • Evaluating recall will be a problem and probably less relevant than evaluating precision (see here). Still, do not omit recall completely; best collect user comments ("I'd have expected to get X here").


  • Create paper prototype for importer/annotator GUI
  • Write/complete wiki page for importer


  • Assemble MP3 file test collection
  • Manually create baseline
  • Create unit tests for importer


  • Put global file id data in skip:// namespace (in normal store) - including md5sum, size, bitrate, file basename (normal OS info; perhaps also specialized "core md5sum/core mp3 size")
    • ItemName is basename/Skipmedia:File
    • Global file id data is useful for fast lookup of file (without "importing" them) - lurker use case
  • Put local file id data in file:// namespace (again, in normal store) - won't be shared. Include localpath, modifieddate, etc. Use own properties/ontology for this.
  • Importer should keep "open questions" list (and persist this in RDF store, again as local non-shared info - proper import might take several weeks)
  • Idea: Annotate certificates and recommendations with "first issued" date


  • Skipmedia: Thing->MediaInstance->File->(MP3,OGG,FLAC) - bitrate,md5sum attributes (not Features!) of File.
  • Some XMPP/bot discussions. Music importer should import into user's namespace.


  • Skiptrax (Album, Record, etc. part) looks fine.
  • TODO
    • Importer: Have a look at GNAT and MusicBrainz services.
    • Create Skipmedia ontology (File, md5sum, etc.).
    • Implement Jabber chat bot as preliminary frontend?
    • Think about GUI.
Last modified 16 years ago Last modified on 11/21/08 11:50:25