The aim of this blog is to walk through an example of compressing a data sample using RDP 6.0 Compression. In a future blog I will provide a walkthrough of decoding the compressed stream back to the original data stream. In order to better understand the example I want to first establish some key points. Most of this blog will center around [MS-RDPEGDI] Remote Desktop Protocol: Graphics Device Interface (GDI) Acceleration Extensions Section 3.1.8 Bulk Data Compression. This blog will mainly include only details that are important to the example walkthrough itself. For more inquisitive minds, other important details not covered here can be found in section 3.1.8. RDP version 6.0 supports an extension to the compression techniques described in [MS-RDPBCGR] Remote Desktop Protocol: Basic Connectivity and Graphics Remoting Specification section 3.1.8. This extension is called "RDP 6.0 Bulk Compression" (RDP6.0-BC) and is only supported for server-to-client traffic.
[MS-RDPEGDI] http://msdn.microsoft.com/en-us/library/cc241537(PROT.10).aspx
[MS-RDPBCGR] http://msdn.microsoft.com/en-us/library/cc240445(v=PROT.10).aspx
Let’s take a look at some important concepts before we look at the example. The shared state necessary to support the transmission and reception of RDP6.0-BC compressed data between a client and server requires a history buffer and a current offset into the history buffer (HistoryOffset). The size of the history buffer is 64 KB. The HistoryOffset MUST start initialized to zero, while the history buffer MUST be filled with zeros. After it has been initialized, the entire history buffer is immediately regarded as valid.
In addition to the history buffer and HistoryOffset, a small cache MUST also be managed by the client and server endpoints. This cache is referred to as the OffsetCache and is used to store the last four unique copy-offsets encountered during data compression (copy-offsets are described in [MS-RDPBCGR] section 3.1.8.1). This saves on bandwidth in cases where there are many repeated copy-offsets. Whenever the history buffer is initialized or reinitialized, the OffsetCache MUST be emptied. The uncompressed data is first inserted into the local history buffer at the position indicated by HistoryOffset by the sender. The compressor then runs through the length of newly added uncompressed data to be sent and produces as output a sequence of literals (bytes to be sent uncompressed) or copy-tuples which consists of a <copy-offset, length-of-match> pair. The copy-offset component of the copy-tuple is an index into the history buffer (counting backwards from the current byte being compressed in the history buffer towards the start of the buffer) where there is a match to the data to be sent.
There are 6 tables and a flowchart that are needed to walk through the sample data. These can be found under [MS-RDPEGDI] 3.1.8.1.4.1 Literal, EOS, and Copy-Offset Tables and [MS-RDPEGDI] 3.1.8.1.4.3 Encoding the Logically Compressed Stream respectively.
Here are the tables and flowcharts that we will be using. You may find it easier to print these off in order to follow the walkthrough.
Table 1: Bit-lengths for the 294 Huffman-encoded LiteralOrEosOrCopyOffset codes
Table 2: Huffman codebook for the 294 Huffman-encoded LiteralOrEosOrCopyOffset codes
Table 3: Bit count and base value lookup tables to encode and decode copy-offset values
Table 4: Table 4: Bit lengths for the 32 Huffman-encoded LengthOfMatch codes
Table 5: Huffman codebook for the 32 Huffman-encoded LengthOfMatch codes
Table 6: Bit count and base value lookup tables to encode and decode length-of-match values
Flowchart:
Code index
0
1
2
3
4
5
6
7
0x6
0x7
8
0x8
16
0x9
24
32
0xa
40
48
56
64
72
80
88
96
104
112
120
128
136
144
152
160
168
0xd
176
0xb
184
192
200
208
216
224
232
240
248
256
264
0x5
272
280
288
0xD
For example, it can be determined from the previous table that the 0th Huffman-encoded LiteralOrEosOrCopyOffset code has a length of 6 bits, and the 131st Huffman-encoded LiteralOrEosOrCopyOffset code has a length of 10 (0x0A) bits.
0x0004
0x0024
0x0014
0x0011
0x0051
0x0031
0x0071
0x0009
0x0049
0x0029
0x0069
0x0015
0x0095
0x0055
0x00d5
0x0035
0x00b5
0x0075
0x001d
0x00f5
0x011d
0x009d
0x019d
0x005d
0x000d
0x008d
0x015d
0x00dd
0x01dd
0x003d
0x013d
0x00bd
0x004d
0x01bd
0x007d
0x006b
0x017d
0x00fd
0x01fd
0x0003
0x0103
0x0083
0x0183
0x026b
0x0043
0x016b
0x036b
0x00eb
0x0143
0x00c3
0x02eb
0x01c3
0x01eb
0x0023
0x03eb
0x0123
0x00a3
0x01a3
0x001b
0x021b
0x0063
0x011b
0x0163
0x00e3
0x00cd
0x01e3
0x0013
0x0113
0x0093
0x031b
0x009b
0x029b
0x0193
0x0053
0x019b
0x039b
0x005b
0x025b
0x015b
0x035b
0x0153
0x00d3
0x00db
0x02db
0x01db
0x03db
0x003b
0x023b
0x013b
0x01d3
0x033b
0x00bb
0x02bb
0x01bb
0x03bb
0x007b
0x002d
0x027b
0x017b
0x037b
0x00fb
0x02fb
0x01fb
0x03fb
0x0007
0x0207
0x0107
0x0307
0x0087
0x0287
0x0187
0x0387
0x0033
0x0047
0x0247
0x0147
0x0347
0x00c7
0x02c7
0x01c7
0x0133
0x03c7
0x0027
0x0227
0x0127
0x0327
0x00a7
0x00b3
0x0019
0x01b3
0x0073
0x02a7
0x0173
0x01a7
0x03a7
0x0067
0x00f3
0x0267
0x0167
0x0367
0x00e7
0x02e7
0x01e7
0x03e7
0x01f3
0x0017
0x0217
0x0117
0x0317
0x0097
0x0297
0x0197
0x0397
0x0057
0x0257
0x0157
0x0357
0x00d7
0x02d7
0x01d7
0x03d7
0x0037
0x0237
0x0137
0x0337
0x00b7
0x02b7
0x01b7
0x03b7
0x0077
0x0277
0x07ff
0x0177
0x0377
0x00f7
0x02f7
0x01f7
0x03f7
0x03ff
0x000f
0x020f
0x010f
0x030f
0x008f
0x028f
0x018f
0x038f
0x004f
0x024f
0x014f
0x034f
0x00cf
0x000b
0x02cf
0x01cf
0x03cf
0x002f
0x022f
0x010b
0x012f
0x032f
0x00af
0x02af
0x01af
0x008b
0x03af
0x006f
0x026f
0x018b
0x016f
0x036f
0x00ef
0x02ef
0x01ef
0x03ef
0x001f
0x021f
0x011f
0x031f
0x009f
0x029f
0x019f
0x039f
0x005f
0x004b
0x025f
0x015f
0x035f
0x00df
0x02df
0x01df
0x03df
0x003f
0x023f
0x013f
0x033f
0x00bf
0x02bf
0x014b
0x01bf
0x00ad
0x00cb
0x01cb
0x03bf
0x002b
0x007f
0x027f
0x017f
0x012b
0x037f
0x00ff
0x02ff
0x00ab
0x01ab
0x006d
0x0059
0x17ff
0x0fff
0x0039
0x0079
0x01ff
0x0005
0x0045
0x0034
0x000c
0x002c
0x001c
0x0000
0x003c
0x0002
0x0022
0x0010
0x0012
0x0008
0x0032
0x000a
0x002a
0x001a
0x003a
0x0006
0x0026
0x0016
0x0036
0x000e
0x002e
0x001e
0x003e
0x0001
0x00ed
0x0018
0x0021
0x0025
0x0065
0x1fff
For example, it can be determined from the previous table that the 0th Huffman-encoded LiteralOrEosOrCopyOffset code has a value of 0x0004. Applying the bit length information from the first table in this section (6 bits), it can be determined that the Huffman code in binary MUST be 000100.
As another example, it can be determined from the previous table that the 131st Huffman-encoded LiteralOrEosOrCopyOffset code has a value of 0x02a7. Applying the bit length information from the first table in this section (10 bits), it can be determined that the Huffman code in binary MUST be 1010100111.
Table 3: Bit count and base value lookup tables to encode and decode copy-offset values.
Index
CopyOffsetBitsLUT
CopyOffsetBaseLUT
9
13
17
25
10
33
11
49
12
65
97
14
129
15
193
257
385
18
513
19
769
20
1025
21
1537
22
2049
23
3073
4097
6145
26
8193
27
12289
28
16385
29
24577
30
32769
31
49153
65537
Table 4: Bit lengths for the 32 Huffman-encoded LengthOfMatch codes
0x4
0x2
0x3
For example, it can be determined from the previous table that the 0th Huffman-encoded LengthOfMatch code has a length of 4 bits while the 21st Huffman-encoded LengthOfMatch code has a length of 8 bits.
Using the canonical Huffman algorithm (for more information, see [CANONHUFF]), the Huffman codebook table shown in the following table (HuffCodeL) can be obtained. The bit lengths in the previous table MUST be used to mask out the appropriate bits.
For example, it can be determined from the previous table that the 0th Huffman-encoded LengthOfMatch code has a value of 0x0001. Applying the bit length information from the first table in this section (4 bits), it can be determined that the Huffman code in binary MUST be 0001.
As another example, it can be determined from the previous table that the 21st Huffman-encoded LengthOfMatch code has a value of 0x009f. Applying the bit length information from the first table in this section (8 bits), it can be determined that the Huffman code in binary MUST be 10011111.
LoMBitsLUT
LoMBaseLUT
34
42
50
58
66
82
98
114
130
194
258
514
The following flow chart describes how literals and copy-tuples in a logically compressed stream are encoded.
Figure 8: Encoding a logically compressed stream
Literals are merely encoded using the Huffman Tables HuffCodeLEC (table 2) and HuffLengthL (table 1). Copy-tuples are encoded using the four lookup tables in table 3 and table 6 (CopyOffsetBaseLUT, CopyOffsetBitsLUT, LoMBaseLUT, and LoMBitsLUT) and the Huffman Tables.
The flowchart shows that the initial decision is whether we are dealing with a literal or not. If we are working with a literal then the process is fairly simple. Encode the literal using Huffman tables using table 1 and 2 and output the Encoded bits. If the data does not represent a literal then we are dealing with outputting a copy-tuple which consists of a <copy-offset, length-of-match> pair, as previously mentioned.
For the copy-tuple there are two encodings that need to be performed, one for the copy-offset and one for the length-of-match (LoM).
Encoding the Length of Match:
Encoding of the length-of-match is shown in the previous figure by the Action B item. The following describes the algorithm for encoding the length-of-match.
LUTIndex = IndexOfEqualOrSmallerEntry(LoM, LoMBaseLUT)
OutputBits(HuffCodeL[LUTIndex], HuffLengthL[LUTIndex])
ExtraBitsLength = LoMBitsLUT[LUTIndex]
ExtraBits = (LoM - 2) & ((2 ^ ExtraBitsLength) - 1)
OutputBits (ExtraBits, ExtraBitsLength)
Sample Walkthrough:
Now that we are armed with all of this information and perhaps a little dangerous lets walk through the sample data to see what data the encoder should spit out. The sample data we will use is:
Offset 0 1 2 3 4 5 6 7 8 9 10 11 12 13 14 15
Bytes 01 00 00 00 0a 00 0a 00 20 00 20 00 80 00 80 00
Step 1 Encoding a literal
01 For this data we use tables one and 2. We use the value of 1 to index into both tables. Table 1 gives us a value of 6 for the length and 0x24 for the value. Doing so we get the output of 0x24:6
literal 6 bit code 36 (0x24)
Output 0x24:6
Step 2 Encoding a literal
00 Again it’s a literal so use tables 1 and 2 in the same manner. Output is 0x04:6
Step 3 Encoding the Copy-Offset for the match – Action A in the Flowchart
00 00
Match (non cached)
At 1, length 2
(Back 1 position) Looking back at the history buffer we see that we have a match at index 1 (back 1 position) and the match has a length of 2. We need to use the algorithm for encoding a copy-tuple to obtain the copy-offset and Extra match bits if any. Here is that algorithm:
Encoding the Copy-Offset:
LUTIndex = IndexOfEqualOrSmallerEntry(CopyOffset + 1, CopyOffsetBaseLUT)
HuffmanIndex = LUTIndex + 257
OutputBits(HuffCodeLEC[HuffmanIndex], HuffLengthLEC[HuffmanIndex])
ExtraBitsLength = CopyOffsetBitsLUT[LUTIndex]
ExtraBits = CopyOffset & ((2 ^ ExtraBitsLen) - 1)
OutputBits(ExtraBits, ExtraBitsLength)
The IndexOfEqualOrSmallerEntry function searches through the specified LUT table and returns the index of the entry that contains a value of equal or lesser value than the first parameter. The OutputBits function outputs the bits specified by the first parameter in the appropriate order (the number of bits to output is given by the second parameter). "^" is the exponentiation operator, and "&" is the bitwise AND operator.
Insert our values, note that the IndexOfEqualOrSmallerEntry function uses the amount of the offset (backup position) +1 as its first parameter.
LUTIndex = 1 (Table 3)
HuffmanIndex = 1 + 257 = 258
OutputBits(0x0039, 7) obtain the value and length from Huffman tables 1 and 2
ExtraBitsLength = 2
ExtraBits = 8 & ((2 ^ 2) - 1) = 8 & 3 = bin:1000 & bin:0011 = 0
OutputBits(0x00,
ExtraMatchBits = 0 (from table 3 for LUTIndex 1)
Output 0x39:7
Step 4 Encoding the Length of the match – Action B in the flowchart
Encoding LoM
Length = 2
Using the following algorithm we calculate LoM:
Calculation using our data:
LUTIndex = 0 This comes from Table 6 using out LOM of 2
HuffmanCode = 0x0001 (Index 0 of Table 5)
LoMBits = 4 (Index of 0 Table 4)
OutputBits(0x0001, 4) Output of Huffman code and length from Tables 5 and 4 respectively.
ExtraBitsLength = 0 (from table 3 for LUTIndex 0)
No extra bits to output
Step 5 Encode Literal
0a
Using the value a (dec 10) as the index into Tables 1 and 2 we obtain a length of 7 and a code of 0x69
Output 0x69:7
Step 6 Found a Match Encode the Copy-Offset and LoM
00 0a 00
Match
At 3, length 3 (non cached)
(Back 2 positions)
Encoding Copy-Offset
LUTIndex = 2 (Table 3)
HuffmanIndex = 2 + 257 = 259 (0x103)
Index 0x103 -> Huffman Code 0x0079 (from table 2)
Bits for Index 259 -> 0x7 (from table 1)
ExtraMatchBits = 0 (from Table 3 for LUTIndex 2)
Output 0x79:7
Length = 3
LUTIndex = 1 (Table 6)
HuffmanCode for LoM 0x0000 (Index 1 Table 5)
LoMBits = 0x2 (index 1 Table 4)
ExtraMatchBits = 0 (from Table 3 for LUTIndex 1)
Output 0x0:2
Step 6 Encode Literal
literal 8 bit code 77 (0x4d)
Output 0x4d:8
Step 7 Found a Match [Cached] Encode the Copy-Offset and LoM
00 20 00
Match (cached)-- Last Offset used was also back 2 postions so we have a match with LRU cache index 0
Looking back at the flowchart you can see what happens when we get a cache hit.
At 7, length 3
Encode OffsetCacheIndex
Index + 289 = 0 + 289 = 289 (0x121)
HuffmanCode for 289 -> 0x0018 (table 2)
Bits for Index 289 -> 0x5 (from Table 1)
Output 0x18:5
Encode the LoM as previously performed.
Step 8 Encode Literal
literal 7 bit code 25 (0x0019)
Output 0x19:7
00 80 00
Match (cached)
At 11 (0xb), length 3
Last Offset used was also back 2 postions so we have a match with LRU cache index 0
Encoding Finished!
Note Wire Format:
The compressed stream consists of a bit-encoded sequence of literals, <copy-offset, length-of-match> tuples (also known as copy-tuples), and a terminating End-of-Stream (EOS) marker. The convention used on the stream is that the sender MUST output bytes in little-endian order with the most significant bit first.
Input : 01 00 00 00 0a 00 0a 00 20 00 20 00 80 00 80 00
Output: 0x24:6
0x04:6
0x39:7
0x01:4
0x69:7
0x79:7
0x00:2
0x4d:8
0x18:5
0x19:7
Output of the Encoder: 24 91 8b 74 9e 26 4c 06 13 be bf b6 80 06 4c 77 4c 4f 4c 2f 2c 01 d1 6b 41…
There you have it! This stream will be byte transposed into little-endian format and then sent over the wire to the decoder. Look for Part 2 RDP 6.0 Decompressing Data Walkthrough coming soon to a blog near you.