This document explains how Kabuto's GCR scheme works. It's work in progress, supposed to be understandable by any experienced coder. Feel free to contribute or get back to me (preferably CSDB) if you don't understand a section :)

Version 0.1


GCR encoding is needed due to how the drive electronics work.

A bit of prerequisite: The drive electronics writes bits to disk with a small coil to magnetise the disk media. To encode bits it can feed the coil with either positive or negative current, magnetising the media in that direction. However, engineers chose not to write one direction for zeroes and the other direction for ones but instead they chose to keep the direction the same whenever a 0 is written and reverse the direction whenever a 1 is written. To give you an example what the magnetic flux looks like for a given bitstream:

| 1 | 0 | 1 | 1 | 0 | 0 | 1 | 0 | 1 | 0 | 1 |
>> <<<<<<< >>> <<<<<<<<<<< >>>>>>> <<<<<<< >>

When reading the disk again a short pulse is induced in the read/write head whenever it comes across a flux change (but no current is being induced while a constant flux is applied!):

>> <<<<<<< >>> <<<<<<<<<<< >>>>>>> <<<<<<< >>
--v-------^---v-----------^-------v-------^--

In the next step the signal is lowpass filtered to transform that back into a rectangular wave that represents the actual flux on the disk:

--v-------^---v-----------^-------v-------^--
   _______     ___________         _______
__/       \___/           \_______/       \__

And then edges are detected:

   _______     ___________         _______
__/       \___/           \_______/       \__

  |       |   |           |       |       |
  
Finally the drive electronics outputs a 1 for every edge it detected and a 0 whenever it had to wait for a certain time after it saw an edge:

  |       |   |           |       |       |
  1   0   1   1   0   0   1   0   1   0   1

Internally the drive outputs the first 0 when it waited for 1.5 times the normal distance of 2 consecutive 1s and the second 0 after 2.5 times that, so there's a bit of tolerance for speed changes and jitter (detecting edges slightly too early or late).

As you can see there's no global clock telling the drive when to output bits, it directly deduces that information from data on the disk. Long sequences of 0s impose a few problems: they don't contain clock info (as there's no change while reading them) so if the disk isn't spinning exactly as fast as it was when it was written the chances of detecting an edge too early or too late would increase. Also, the circuitry that transforms pulses back to a rectangular wave can only cope with pulses up to a certain length as otherwise it could misdetect noise for a signal (yes, the input signal from the read/write head is very noisy). Thus commodore said "there must never be 3 or more consecutive 0's".

As the bitstream is very fast (up to slightly more than one bit every 4 CPU cycles) the CPU won't be able to deal with bits directly, so the drive electronic accepts and delivers whole bytes and handles dividing them into / recombining them from bits internally. But how should it know where a byte starts when reading bit after bit? That's why commodore reserved sequences of 10 or more 1's as a "sync": whenever the last 10 bits were 1's the bit accumulator is reset so it will start reading the next bit when it encounters the next 0 (yes, the first byte following a sync always must start with a 0 bit, as 1 leading 1 bits would be considered part of the sync.) Thus any sequence of data bits written to disk must not contain 10 or more consecutive 1's either.


How can you write arbitrary data to disk using these constraints? This is where GCR encoding comes into play: for every 4 bits of raw data the drive looks up 5 bits to actually write to disk:

Data    Written to disk
0000 	01010
0001 	01011
0010 	10010
0011 	10011
0100 	01110
0101 	01111
0110 	10110
0111 	10111
1000 	01001
1001 	11001
1010 	11010
1011 	11011
1100 	01101
1101 	11101
1110 	11110
1111 	10101
(Source: Wikipedia)

As you can see, no code contains more than two consecutive 0's. Also, no code has more than one 0 or four 1's at either end, so if you concatenate any 2 you'll never get a sequence of more than two 0's or eight 1's either, thus adhering to the limits explained earlier.

This scheme means that you need to write 5/4 as many bits to disk as your original data contains. But it's not optimal either, so I wondered what could be done. Doing a bit of math revealed that the theoretical limit is encoding about 7.01 bits into every 8 bits on disk. Fractional bits, sounds like some evil magic and very advanced maths, probably way too advanced for a poor old commodore to handle!

(TODO include formula)

After thinking about it a lot I came up with a scheme that's simple enough to allow decoding in realtime. (For comparison, only very recently did lft come up with a way of decoding Commodore GCR itself in realtime and doing so requires a number of advanced programming tricks, beyond what commodore dared to do back in the days). Still, the encoding part is quite difficult. But maybe it can be optimised as well.

First of all I wrote down all the bytes one could write to disk without violating the rules above, or rather their bit patterns:

00100100
00100101
00100110
00100111
00101001
00100101
...
01001001
...
10010010
...
11001001
...
11111100
11111101
11111110
11111111

Then I needed to classify these bit patterns a bit by the sequence of identical bits they start and end with:

# 00100100 #
# 00100101 1
# 00100110 0
# 00100111 3
# 00101001 1
# 00100101 1
# ...
0 01001001 1
0 ...
1 10010010 0
1 ...
2 11001001 1
  ...
6 11111100 #
6 11111101 1
7 11111110 0
8 11111111 8

Legend:

# 00
0 0
1 1
2 11
3 111
4 1111
5 11111
6 111111
7 1111111
8 11111111

This classification is needed for choosing adjacent bytes.

Unfortunately there's a small problem: when we have a 11111111 byte, we don't know whether the previous byte ended with 01 (i.e. 9 1's) or with 0 (i.e. 8 1's), a problem that doesn't exist for all other bytes (since there's always an internal bit that ends the respective end sequence).

But there's a solution: we disallow sequences of 9 1's, this still allows us to encode 7 raw bits into 8 bits on disk, but this way we can be sure that when we see such a byte that the previous byte ended with a 0.

This leads to the following adjacency table of what previous byte endings are compatible with following byte starts (- = no, * = yes):

 #012345678
#--********
0-*********
1*********-
2********--
3*******---
4******----
5*****-----
6****------
7***-------
8**--------

Fortunately depending on the last byte's end there's always a *range* of legal successor byte starts, and since the byte list is already sorted by prefixes this means that we can directly select a range of legal successor bytes using a lookup on the previous byte.

E.g. if the current byte is 11010111, this means its end is "3", and the table says that has a starting sequence classified from "#" to "5" is legal, thus any bytes from 00100100 to 11111011, inclusive.

Still that doesn't tell us how to encode. There are way more than 128 legal byte values we could write, but depending on the previous byte there can be way less than 128 legal byte values. Especially following 11111111 there are way less than 128 legal byte values since they all need to start with 0.

But we can see that it's still theoretically possible to encode enough values in the long term: write down all legal byte values. This gives you TODO maybe around 170 states. Then replace each byte value with all byte values that might follow the byte value you just wrote down. And do so again and again. Of course not on paper (would be way too much work), but using a computer and clever programming (combine identical byte values with a counter instead of writing them down multiple times). And for every step write down the factor by which the number of total states increased. You'll see that this number converges to about 129, even though some of the encoded bytes allow a number of follow-up byte that's way less than 128, that's more than outweighed by the encoded bytes that do allow more follow-up bytes.

To demonstrate that, I'll use a simplified version that uses bytes of 2 bits each and allows sequences of 1's of arbitrary length, but still has the limit of just up to 2 0's.

All legal states:

|00|01|10|11|

Add all legal follow-up states:

|00   |01         |10      |11         |
|10|11|00|01|10|11|01|10|11|00|01|10|11|

And doing that one more time:

|00                  |01                                    |10                              |11                                    |
|10      |11         |00   |01         |10      |11         |01         |10      |11         |00   |01         |10      |11         |
|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|00|01|10|11|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|

And we can do that as often as we like, but of course it'll quickly exceed the pixel density of your monitor, let's do this one more time:

|00                  |01                                    |10                              |11                                    |
|10      |11         |00   |01         |10      |11         |01         |10      |11         |00   |01         |10      |11         |
|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|00|01|10|11|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||

We can't see much yet, but of course we can zoom in to reveal more detail:

|00                  |01                                    |10                              |11                                    |
|10      |11         |00   |01         |10      |11         |01         |10      |11         |00   |01         |10      |11         |
|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|00|01|10|11|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|
|||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||||
  __________________/                                        \_____________________________________________________________________
 /                                                                                                                                 \
|01                                                                                                                                 |
|00                  |01                                    |10                              |11                                    |
|10      |11         |00   |01         |10      |11         |01         |10      |11         |00   |01         |10      |11         |
|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|00|01|10|11|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|


We're intersted in the number of states in each row.

If we keep the order 00,01,10,11 we can just write the initial count as 1,1,1,1.

Determining the follow-up states each:

|00| -> |10|11|       = 1,0,0,0 -> 0,0,1,1
|01| -> |00|01|10|11| = 0,1,0,0 -> 1,1,1,1
|10| -> |01|10|11|    = 0,0,1,0 -> 0,1,1,1
|11| -> |00|01|10|11| = 0,0,0,1 -> 1,1,1,1

we can easily compute the number of occurrences of each state in each row:

1,1,1,1
2,3,4,4
7,11,13,13
24,37,44,44
81,125,149,149
274,423,504,504
927,1431,1705,1705
3136,4841,5768,5768
10609,16377,19513,19513
35890,55403,66012,66012
...

The total number of states in each row is:

4,13,44,149,504,1705,5768,19513,66012,223317,...

And the ratio compared to the predecessor is:

3.25000000..
3.38461538...
3.38636363...
3.38255033...
3.38293650...
3.38299120...
3.38297503...
3.38297545...
3.38297582...
...
3.38297576...

As you can see this quickly converges, and every step eventually increases the number of possible states by a constant factor of 3.38297576... . States equal bits, and the dual logarithm tells us the number of bits we could encode theoretically (e.g. 8 bits = 256 states, log2(256) = 8): Each encoded "byte" adds 1.75829283... raw bits.

And if we repeat that with our 8-bits-per-byte encoding, we come to the same conclusion as earlier: We get about 7.01 raw bits per encoded byte.

But how can we encode actual data with that? All we got so far is just some theory...


Let's go back to appending bytes in legal ways. Imagine you've already appended a large number of bytes:

|00                  |01                                    |10                              |11                                    |
|10      |11         |00   |01         |10      |11         |01         |10      |11         |00   |01         |10      |11         |
|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|00|01|10|11|01|10|11|00|01|10|11|10|11|00|01|10|11|01|10|11|00|01|10|11|

Just notice that each original state has a range of different length! And the zoom factor we used for zooming in earlier is exactly that value 3.38297576... 

The idea here is to assume a length of 1 for the whole range and assign each byte the fraction of length its value takes. We also write down the offset (i.e. sum of lengths of all previous states) so we'll end up with a range with all ranges filling the space between 0 and 1 completely:

Value Length  Offset
|00|  0.16071 0.00000
|01|  0.29560 0.16071
|10|  0.24809 0.45631
|11|  0.29560 0.70440

Now if we have a byte sequence we could take the first byte's offset, add the second byte's offest scaled down by the scale factor, add the third byte's offset scaled down twice by the scale factor and so on... wait! 

After e.g. |10| not all bytes are permitted, here |00| is not permitted, so all further ranges have different offsets - still, as we saw above, all permitted ranges are always adjacent so we just need to subtract the offset of the first permitted range (scaled down of course), and for speed-up we just create a 3rd column of offsets with the offset of the first permitted follow-up byte already scaled down and subtracted to yield the "adjusted offset".

Value Length  Offset  AdjustedOffset
|00|  0.16071 0.00000 -0.13488
|01|  0.29560 0.16071  0.16071
|10|  0.24809 0.45631  0.40880
|11|  0.29560 0.70440  0.70440


So to find out where in the range a given byte sequence will end up we use the following algorithm (offset[byte] means "the offset of the byte just read" etc.)

* scale=1, pos=0
* For each byte:
** pos += adjustedOffset[byte]*scale
** scale /= 3.38297576...

Let's have a look at the encoding algorithm again, but this time for 8-bit bytes:

* scale=1, pos=0
* For each byte:
** pos += adjustedOffset[byte]*scale
** scale /= 129...

Of course we don't want to deal with huge floating-point numbers, especially not on c64. 

What if we just divide by 128 instead of 129? This means that after each step ranges will overlapping each other a little. Of course adjusted offsets must be recomputed wit the changed scale factor as we might get gaps as well if we don't, yielding a table we just name "adjustedOffset2".

* scale=1, pos=0
* For each byte:
** pos += adjustedOffset2[byte]*scale
** scale /= 128...

Also, who says that we need to divide? We can just increase the size of the global scale instead of decreasing the size of the scales added:

* scale=1, pos=0
* For each byte:
** pos += adjustedOffset2[byte]
** pos *= 128
* pos /= pow(128,bytecount)

Note that we no longer need the scale, also we could also move the "add offset" step after "multiply" if we pre-multiply adjustedOffset2 by 128 to yield adjustedOffset3:

* pos=0
* For each byte:
** pos *= 128
** pos += adjustedOffset3[byte]
* pos /= pow(128,bytecount)

Also we could scale adjustedOffset3 up by e.g. a factor of 20, this yields table adjustedOffset4:

* pos=0
* For each byte:
** pos *= 128
** pos += adjustedOffset4[byte]
* pos /= pow(128,bytecount)*20

Remember how I mentioned that ranges are now slightly overlapping? Adjusted offsets were originally values between 0 and 1, but after pre-multiplication they're rather in the range between 0 and 2500. Chances are that we can adjust them enough to make them integers while still not creating gaps between ranges - since they're slightly overlapping we can move them around a bit.

Because the scale factor is a power of two we could in theory extract bits while we're adding them and scaling them up. But as old offsets are constantly scaled up new offsets to be added could introduce a carry that could climb all the way up in worst case... that wouldn't be very nice.

But there's a solution. We go back one step and imagine we were reading bytes the other way round, easily possible since addition doesn't care about order:

* pos=0
* For each byte in reverse order:
** pos += adjustedOffset4[byte]
** pos /= 128

Reading bytes in reverse order? Not so nice, and also not neccessary. We just reverse the whole sequence of encoded bits, this means that not only byte order is reversed (and thus normal again) but also bit order in each byte is reversed - this we can solve by introducing table adjustedOffset5 which is just adjustedOffset4 with values re-mapped from the same index with bits reversed.

Also, dividing the offset by 128 is THE opportunity of extracting bits! No risk of bits ever carrying over :)

* pos=0
* For each byte:
** pos += adjustedOffset5[byte]
** Store pos & 127
** pos >>= 7

Since the C64 is an 8-bit machine we split up the offsets into 2 tables, one carrying the lower 7 bits, one carrying the upper 7 bits:

* pos=0
* For each byte:
** pos += adjustedOffset5Upper[byte]<<7
** pos += adjustedOffset5Lower[byte]
** Store pos & 127
** pos >>= 7

And rearrange, since the upper value doesn't affect the bits to be added we can move it after updating pos:

* pos=0
* For each byte:
** pos += adjustedOffset5Lower[byte]
** Store pos & 127
** pos >>= 7
** pos += adjustedOffset5Upper[byte]

Introducing a "carry":

* pos=0, carry=0
* For each byte:
** pos += adjustedOffset5Lower[byte] + carry
** Store pos & 127
** carry = pos >> 7
** pos = adjustedOffset5Upper[byte]

Just one small bit left: We read 7-bit values that way but want 8-bit values. And adding 2 7-bit values give you your carry in bit 7 instead of the carry flag...

We fix this as follows:
* The first 7-bit value decoded is shifted to the left (-> carry in bit 7 becomes carry) and stored in a temporary location
* The following 7 7-bit values are shifted to the left by shifting in one bit from the temporary location each and then stored as actually decoded data.



But how to write a good encoder? I wrote a javascript version that brute forces its way through the ranges and offsets to create a byte stream that decodes properly. Maybe one day someone will write a proper encoder that even works on C64 :)