Warm tip: This article is reproduced from serverfault.com, please click

How to improve time efficiency of CRC-5 calculation?

发布于 2020-12-06 15:11:14

I have just started to study the CRC and how to implement it in software. My information source is mainly following document. Here is mentioned some simple algorithm for calculating CRC for any generator polynomial. I have attempted to write this algorithm in C++ language. I have tested it for generator polynomial x^5 + x^4 + x^2 + 1 (CRC-5) (generator polynomial used by chip) with usage of this online calculator and it works.

#include <iostream>
using namespace std;

int main() {
   
    uint8_t data_byte = 0x31;
    // polynom x^5 + x^4 + x^2 + 1 
    uint16_t polynom = 0x35;

    // register contains 0 at the beginning
    uint32_t crc = 0;
    uint32_t message = 0;
    // shift the message byte to left by so many bits which is needed for generator polynomial
    message = (data_byte << 5);
    // now the message byte is 13 bits long
    uint8_t processed_bit = 13;
    while(processed_bit > 0) {
                    
        // prepare free space for new bit from the message byte
        crc = crc << 1;
        // find out value of current msb in the message byte
        message = message << 1;
        if(message & 0x2000) {
            // msb in message byte is "1"
            // lsb in register is set to "1"
            crc |= 1U;
        } else {
            // msb in message byte is "0"
            // lsb in register is set to "0"
            crc &= ~1U;
        }
        // remove msb from message byte
        message = message & ~0x2000;
        
        if(crc & 0x20) {
            // subtract current multiple of the generator polynomial
            crc = crc ^ polynom;
        }
        // remove msb from the register
        crc = crc & ~0x20;
        
        processed_bit--;
    }
                
    cout << "CRC: " << (int)crc << endl;
    
    return 0;
}  

It is obvious that this program is uneffective as far as execution time. So I have been thinking about a possibility how to improve it in this perspective. I know that there is a variant to use the look-up table containing the precalculated reminders but I would like to avoid this method. Does anybody know how to improve the above mentioned algorithm from the execution time perspective? Thanks in advance for any suggestions.

Questioner
L3sek
Viewed
11
Mark Adler 2020-12-07 12:59:21

Just a quick glance shows several unnecessary statements. You don't need crc &= ~1U;, since the crc = crc << 1; already put a zero there. You don't need message = message & ~0x2000;, since you are only ever looking at one bit in there. Just let the other bits shift up and away. You don't need the crc = crc & ~0x20;, since the exclusive-or with the polynomial already did that.

If you read the document you linked, you will find that you do not need to process five more bits (13 total). You only need to process the eight message bits. Also reading that document, you do not need to feed in the message bits one at a time. You can exclusive-or the message byte directly onto the CRC register, and then process the eight bits all in the register.

Finally, you can speed up the calculation significantly with a table look up, processing eight bits at a time instead of one bit at a time. This is also described beautifully in the document you linked. You can find code here to automatically generate the table and C code for the calculation.

In the end though, none of this matters if you're not calculating the right thing to begin with. You need to verify the calculation with data from the chip first. I found this document with details on the CRC calculation for that chip. You need to spend some time with it and understand it in detail.

To answer your question directly, here is some code that does what your code does, but is much simpler. Also it is extended to work on n bits, not just eight. It does n loops instead of n+5 loops:

// Return a CRC-5 of the low n bits of data. The remaining bits of data must be
// zero. n must be in [5..32].
uint8_t crc5(uint32_t data, int n) {
    int shift = n - 5;
    uint32_t poly = (uint32_t)0x15 << shift;
    uint32_t top = (uint32_t)1 << (n - 1);
    do {
        data = data & top ? (data << 1) ^ poly : data << 1;
    } while (--n);
    return (data >> shift) & 0x1f;
}

Simpler and faster still is the equivalent of yours restricted to eight bits, unrolled:

uint8_t crc5_8(uint8_t data) {
    data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
    data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
    data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
    data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
    data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
    data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
    data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
    data = data & 0x80 ? (data << 1) ^ 0xa8 : data << 1;
    return data >> 3;
}

However neither can calculate what you need for your chip.