RichardBerg : KeyFinderNeuralNetwork

FavoriteLinksCondensed :: PageIndex :: RecentChanges :: RecentlyCommented :: UserSettings
back to KeyFinder paper index

Mathematical Foundation


As discussed, the key center is the focal point of a piece or section of tonal music. Harmony is founded in the relationships between other notes (and the chords built on them) and this central note. By the same token, exactly the same relationship exists when the notes and center are "translated." For example, when the key center is A, the note/chord E is called the V [Roman five]. Likewise, when the key center is C, G is the V. The relationship is clearest when each note is assigned a number in order, upon which E - A = G - C. The system "wraps" after the last letter G, so that A is the V of D. Every other abstract construction in tonal theory has these properties.

There are 12 notes between an A and the next A' above it. (Ditto B and B', etc. by the properties above). Of these, Western music most commonly picks 7 as defining a scale built on A. The notes of a piece in A Major will largely be drawn from the "major scale" that begins on A and ends on A'. When we map A to 0, these notes become {0, 2, 4, 5, 7, 9, 11}. Using the translational and wrapping properties, the C Major scale is {4, 6, 8, 9, 11, 1, 3}, and so on. In this way, we see that a scale may also be defined by the interval between its notes; a "major scale" has intervals {2, 2, 1, 2, 2, 2}. Returning to the example of finding the V of a key center, we note for completeness that it is the fifth note of the major scale built on that key: in our numbering 7 maps to E and 11 to G, deriving that the interval between any key and its V is 7.

Methodology: Data Collection


Entering the raw note values from each sonata was quickly rejected as rather intractable for this exploratory project. Instead, we sampled the chords at 50 regular intervals from the beginning of each piece.* With 8 pieces in all, they were ordered from Beethoven's earliest to latest compositions, then interleaved into training and test sets. The pieces were selected to all be "in major key" (i.e. to use primarily major scales), and all representing a fast sonata-form movement (lots of implications, but in sum ensuring similar styles). Each has its full details shown in comments, including the popular nickname of several works like the Moonlight Sonata.

Methodology: Benchmarks


Using the earlier math we can quickly build some software to represent scales and intervals. First we translate incoming chords into numbers from 0-11. We next create a lookup table containing a scale starting on each note, derived from the intervals in the major scale. With these we can create an array of counters: the C counter is incremented whenever the current note is contained in the C Major scale, and so on. Iterating over the test data and subsequently finding the max value in the array gives us our most basic benchmark, owing to the aforementioned fact that notes and chords tend to be constructed from the scale of their key center. The program implementing this logic we call ScaleFinder.

We create a finer measurement by using another property of intervals: the movement of a chord from V to I (e.g. from E to A) represents the strongest way a composer can signal that the current key center is I. When examining an incoming chord sample, additional points are given to that key's score if the preceding chord was 7 notes higher (mod 12). This technique should allow reliable detection of nearly any Classical piece through Beethoven's time period. We name it TheoryFinder.

Methodology: Neural Networks

(Multi-Layer Perceptrons, as it were)

Our neural network is a multi-layer perceptron with 50 inputs, a variable number of intermediate nodes, and one output. The most naive feature selection sends the 50 chords directly to the inputs by mapping A to 0, A# to 1, and so on up to 11. This sequence corresponds to the chords' relative pitch, as for example the order of notes on a piano. A human looking at the features in this way would begin to see patterns: the key center would appear several times, along with the chords lying on its scale. However, we anticipate that the neural network, will have trouble even though there is technically more information in the data than is provided to either of the benchmark programs. Its results will be classified under MLPChordFinder.

Thus, we provide another representation of the same data. One property of a 12-tone system is that the interval of a fifth (V-I), when applied repeatedly, will include every note exactly once before reaching the original tone. That is, every tone can be represented uniquely as [base tone] + 7 * n, mod 12, for some {base, n}. In music theory this construction is known as the "circle of fifths." Choosing A=0 for a base, we now have an new mapping of chord tones to numbers, where E=1, B=2, and so on. This has the advantage that neighboring chords (in the harmonic sense) have adjacent numbers. Thus, a neural network that can approximate an average function should be able to choose a key center that is at worst close to the true center. This algorithm is dubbed MLPCircleFinder.

Nevertheless, since the input nodes of a neural network do not influence each other, it may prove unlikely for their weights to align. This is especially troublesome when the space of solutions is so wide. It could in theory take thousands of train data to correctly search what is effectively a 50-dimensional hyperplane; increasing the number of hidden nodes to combat underfitting would only serve to further increase the complexity of the function without any added ability to determine its parameters. Thus, for a final feature selection we choose to dramatically crop the quantity of data provided to the neural network while hopefully keeping its essential character. Starting from the same mapping as CircleFinder, we first transform it into frequency space, cutting the width from 50 chords of linear time to the number of instances of each of 12 chords. (This is similar in principle to the point-counting methods). From there only the 3 most frequent chords are chosen and inserted (in sorted order) into an array. The neural network should then be able to learn how these most common chords -- which will almost always be neighboring tones given the "circle" mapping -- surround and usually determine the key center. In the C code this method is found under MLPFreqFinder.

Elements common to each neural network:

These values were tweaked in the initial experimentation phase but no combination gave appreciably better results than those suggested in the cited references.

on to KeyFinderData


*Interpreting chords from a score is a project in itself that lies beyond the scope of this paper. We'll trust that these intermediate measurements represent something useful without trivializing the task at hand.

There are no comments on this page. [Add comment]

Valid XHTML 1.0 Transitional :: Valid CSS :: Powered by Wikka Wakka Wiki 1.1.6.4
Page was generated in 0.5481 seconds