| | 92 | === Comparing items === |
| | 93 | |
| | 94 | The similarity of two items is defined through the similarity of their features. The outline of a potential algorithm looks like this: |
| | 95 | |
| | 96 | * The similarity of two items is the arithmetic mean of the similarities of all features |
| | 97 | * Features that are not annotated will be ignored |
| | 98 | * If a feature type is only annotated in one item, feature values need to be inferred until they can be compared |
| | 99 | |
| | 100 | Example: |
| | 101 | |
| | 102 | Let there be a simple feature-hierarchy as follows: |
| | 103 | {{{ |
| | 104 | A |
| | 105 | / \ |
| | 106 | B C |
| | 107 | }}} |
| | 108 | |
| | 109 | Example similarities would be: ("-" means: not annotated) |
| | 110 | {{{ |
| | 111 | 1 -1 |
| | 112 | / \ ; / \ => similarity = 0 |
| | 113 | - - - - |
| | 114 | }}} |
| | 115 | {{{ |
| | 116 | 1 1 |
| | 117 | / \ ; / \ => similarity = (1 + 0) / 2 = 0.5 |
| | 118 | -1 - 1 - |
| | 119 | }}} |
| | 120 | |
| | 121 | Some non-trivial cases: |
| | 122 | {{{ |
| | 123 | - -1 - - |
| | 124 | / \ ; / \ => similarity = ? / \ ; / \ => similarity = ? |
| | 125 | -1 - - - +1 - - -1 |
| | 126 | }}} |
| | 127 | |
| | 128 | Suggestion: Propagate possible values as intervals up or down the hierarchy |
| | 129 | |
| | 130 | We extend the distance metric on intervals, where x1 and x2 denote the interval bounds of x = [x1;x2] (if x1 = x2, we just write [x1], which is equal to the value x1) |
| | 131 | {{{ |
| | 132 | dist(x,y) = (|x1-y1| + |x2-y2|) / 2 |
| | 133 | }}} |
| | 134 | |
| | 135 | The above example can then be compared: |
| | 136 | {{{ |
| | 137 | - -1 [-1;+1] -1 |
| | 138 | / \ ; / \ => / \ ; / \ => similarity = (sim(-1,[-1]) + sim([-1;+1],-1)) / 2 = (1 + 0.5) / 2 = 0.75 |
| | 139 | -1 - - - -1 - [-1] - |
| | 140 | }}} |