wiki:FloriansDiplomaThesis

Version 12 (modified by kiesel, 16 years ago) (diff)

Meeting 2008-08-22

In progress

MP3Importer

Things to do

  • Adaptation for music domain
    • finish skiptrax ontology (see Music Ontology) - 1-3 weeks
      • handle remixes/live/unplugged (song variants)
      • discern song and file - model file ownership
    • Proper GUI with the following functions:
      • metadata importers/exporters (ID3, LastFM, etc.) - 1-2 weeks
      • (adapter for own audio collection)
      • simplify entering song/band information - 1 week
      • (generate/play playlists, possibly with changing style over time (AutoDJ) - 2-6 weeks)
  • Metadata-based functionality
    • different sharing rights for contacts (e.g. A only ontologies, B also metadata, C everything) - <1 week
    • expert recommender (ask friends about people having knowledge about X/interests in Y) - 1-3 weeks
    • trust visualization
    • scalable algorithms - scalable = void algo(Graph oldFacts, Graph additionalFacts, Datastructure internalData, Notifications interestingThings)
      • computation of feature summaries - 1 week
      • feature/item similarity measure - 2 weeks
      • structured search (index)
      • implicit trust (= peer competence) metric - 2-5 weeks
      • explicit representation of trust (see Skippies ontology) - 2-5 weeks
        • user interface changes for that
        • trust metric changes for that
        • by combining computed and explicit trust ratings, identifying shared interests/fields of competence should be possible.

[RelatedWork]

Ideas

Skipforward is intended to provide content-bases recommendations, but there might be some cool variants of combining it with collaborative filtering/recommendation:

Collaborative filtering per feature

Instead of the boring "people who like this also like that" recommendations, this could be adapted in the following way:

People who like A also like B => People who think A has feature F also think that B has feature F
People who like the same items like you also like this => People who annotated items the same way as you annotated this that way

The last strategy could be used to rate items that haven't been reviewed by the user, but by his friends.

Music Ontology

Artists

Artist collaboration

At first glance, one might say that only one class Artist is sufficient, but it quickly reveals its limits. A few examples:

  1. Chicane feat. Tom Jones - Stoned in Love (at discogs.com) (at musicbrainz.com)
  2. Cerf, Mitiska & Jaren - Light The Skies (at discogs.com)

Let's forget for a moment that this are in fact releases and not songs and concentrate on the artists: In the first case there are two artists, Chicane and Tom Jones. Or is there one artist called Chicane feat. Tom Jones? Musicbrainz has a special FeaturingArtistStyle, whereas discogs just has multiple artists.

The second case is a bit more difficult, as there is a an artist on discogs called ''Cerf, Mitiska & Jaren'', but it consists of three members, Jaren, Shawn Mitiska and Matt Cerf.

Discogs has the following guidelines for artists, which say that "Artists which commonly collaborate together should be listed as one artist" and "Do NOT attempt to split artists who regularly collaborate. (Regular collaboration consists of 3 or more collaborations (different releases), excluding remix EPs)".

Artist Aliases

A different problem is the one of the same artist having different names. The problem is rather easy with music groups, because it can be modelled accurately by two different music groups having the same members.
But with solo artists (persons) one has to distinguish name variations (DJ Tiësto, Tiësto, Tiesto, ...) and different names (aliases) under which he releases stuff (Drumfire is an alias of DJ Tiësto)

To make this mess even worse, there are fictitious artists, e.g. Bernd das Brot.

http://musicbrainz.org/doc/ArtistAlias http://musicbrainz.org/doc/FeaturingArtistStyle

Conclusion

Even though discogs.com and musicbrainz.com have put much thought in it, they couldn't cover all cases and there are always exceptions. We have to come up with a solution that has to special enough to fit our cause, but simple enough to be used by non-nerds.

more to follow

Meetings

2008-08-22

  • Keep Trac pages and tickets updated
  • Evaluation...
    • Song annotation: Let users annotate one set of fixed songs (to get good overlap) and other songs of their choice (to test recommendations)
    • Same for feature sets to annotate: Create one certificate that must be completed and let users complete other features freely
    • Evaluating recall will be a problem and probably less relevant than evaluating precision (see here). Still, do not omit recall completely; best collect user comments ("I'd have expected to get X here").

2008-08-18

  • Create paper prototype for importer/annotator GUI
  • Write/complete wiki page for importer

2008-07-30

  • Assemble MP3 file test collection
  • Manually create baseline
  • Create unit tests for importer

2008-07-11

  • Put global file id data in skip:// namespace (in normal store) - including md5sum, size, bitrate, file basename (normal OS info; perhaps also specialized "core md5sum/core mp3 size")
    • ItemName is basename/Skipmedia:File
    • Global file id data is useful for fast lookup of file (without "importing" them) - lurker use case
  • Put local file id data in file:// namespace (again, in normal store) - won't be shared. Include localpath, modifieddate, etc. Use own properties/ontology for this.
  • Importer should keep "open questions" list (and persist this in RDF store, again as local non-shared info - proper import might take several weeks)
  • Idea: Annotate certificates and recommendations with "first issued" date

2008-07-04

  • Skipmedia: Thing->MediaInstance->File->(MP3,OGG,FLAC) - bitrate,md5sum attributes (not Features!) of File.
  • Some XMPP/bot discussions. Music importer should import into user's namespace.

2008-06-27

  • Skiptrax (Album, Record, etc. part) looks fine.
  • TODO
    • Importer: Have a look at GNAT and MusicBrainz services.
    • Create Skipmedia ontology (File, md5sum, etc.).
    • Implement Jabber chat bot as preliminary frontend?
    • Think about GUI.