| 92 | === Comparing items === |
| 93 | |
| 94 | The similarity of two items is defined through the similarity of their features. The outline of a potential algorithm looks like this: |
| 95 | |
| 96 | * The similarity of two items is the arithmetic mean of the similarities of all features |
| 97 | * Features that are not annotated will be ignored |
| 98 | * If a feature type is only annotated in one item, feature values need to be inferred until they can be compared |
| 99 | |
| 100 | Example: |
| 101 | |
| 102 | Let there be a simple feature-hierarchy as follows: |
| 103 | {{{ |
| 104 | A |
| 105 | / \ |
| 106 | B C |
| 107 | }}} |
| 108 | |
| 109 | Example similarities would be: ("-" means: not annotated) |
| 110 | {{{ |
| 111 | 1 -1 |
| 112 | / \ ; / \ => similarity = 0 |
| 113 | - - - - |
| 114 | }}} |
| 115 | {{{ |
| 116 | 1 1 |
| 117 | / \ ; / \ => similarity = (1 + 0) / 2 = 0.5 |
| 118 | -1 - 1 - |
| 119 | }}} |
| 120 | |
| 121 | Some non-trivial cases: |
| 122 | {{{ |
| 123 | - -1 - - |
| 124 | / \ ; / \ => similarity = ? / \ ; / \ => similarity = ? |
| 125 | -1 - - - +1 - - -1 |
| 126 | }}} |
| 127 | |
| 128 | Suggestion: Propagate possible values as intervals up or down the hierarchy |
| 129 | |
| 130 | We extend the distance metric on intervals, where x1 and x2 denote the interval bounds of x = [x1;x2] (if x1 = x2, we just write [x1], which is equal to the value x1) |
| 131 | {{{ |
| 132 | dist(x,y) = (|x1-y1| + |x2-y2|) / 2 |
| 133 | }}} |
| 134 | |
| 135 | The above example can then be compared: |
| 136 | {{{ |
| 137 | - -1 [-1;+1] -1 |
| 138 | / \ ; / \ => / \ ; / \ => similarity = (sim(-1,[-1]) + sim([-1;+1],-1)) / 2 = (1 + 0.5) / 2 = 0.75 |
| 139 | -1 - - - -1 - [-1] - |
| 140 | }}} |