Endianness

Endianness refers to the order of bits/bytes perceived by the machine.

Byte order endianness

Big Endian (Network Order) - Left to right

Examples:

  • IBM's 370 mainframes
  • Motorola MP's
  • TCP/IP

Little Endian (Host Order) - Right to left
When to use Little Endian over Big Endian

If more digits are used to represent a number, the additional digits can be reserved memory on top of the existing memory layout used for Little Endian. Big Endian would require that the higher order byte be allocated memory first or moving of the elements before this new additional digit can be incorporated.

On a big-endian processor, its addressing unit has to be told how big the addition is going to be so that it can hop forward to the least significant byte, then count back down towards the most significant byte (MSB). On the other hand, arithmetic division is done starting from the MSB, so it is more natural for big-endian processors. However, high-performance processors usually fetch typical multi-byte operands from memory in the same amount of time they would have fetched a single byte, so the complexity of the hardware is not affected by the byte ordering.1

Examples:

  • Intel CPU's
  • DEC Alphas
  • Domain Names (www.google.com)
  • hexdump (because the dumping program is unable to know what kind of data it is dumping, the only orientation it can observe is monotonically increasing addresses.)

Middle/Mixed Endian

The compiler stores 32-bit values with the 16-bit halves swapped from the expected little-endian order. This ordering is known as PDP-endian.

Example:

Conversion between byte orders

Berkeley sockets API provides utilities which can convert from host-to-network-byte-order (htons/htonl) or from network-to-host-byte-order (ntohs/ntohl).

Bi-Endianness

Some architectures (including ARM versions 3 and above, PowerPC, Alpha, SPARC V9, MIPS, PA-RISC, SuperH SH-4 and IA-64) feature a setting which allows for switchable endianness in data fetches and stores, instruction fetches, or both. This feature can improve performance or simplify the logic of networking devices and software. The word bi-endian, when said of hardware, denotes the capability of the machine to compute or pass data in either endian format.

Bit Order endianness

While the high-level network protocols usually consider the byte (mostly meant as octet) as their atomic unit, the lowest network protocols may deal with ordering of bits within a byte.

This concept refers to bit ordering instead of byte ordering. Bit ordering is important during physical layer transmission and is irrelevant for higher application level programming languages.

Bit Endianness matters with:

  • In Serialization and Deserialization the order of bits used during physical transmission is important.

  • CRC (Cyclic Redundancy Checks) or Error Checking schemes validate the stream of bits and where the CRC checksum is present makes it important for CRC feature to know what bit endianess is used during transmission.

Endianness with Binary Files

Endianness is a problem when a binary file created on a computer is read on another computer with different endianness. This can be solved in the following different ways:

  • CPU Instruction assist: rev (arm) and bswap (intel)

  • Compiler assisted

  • Data encoded with fixed endian: Eg XLS file format in Little Endian, JPEG file format with Big Endian.

  • Data encoded with file formats having 1 byte as atomic/basic unit are independent: Eg: ASCII files.

  • Data encoded within metadata

    • Unicode text can optionally start with byte order mark (BOM). Its code point is U+FEFF

    • Store info about endianess of the data as part of file metadata. Eg: TIFF image files

Exercise: Check Endianness

#include 
int main() 
{
   unsigned int i = 1;
   char *c = (char*) &i;
   /* 
       character pointer 'c' is pointing to integer i.
       size of character is 1 byte.
       if machine is little endian then *c is 1 
          because LSB is stored in lowest address.
       if machine is big endian then *c is 0 
          because MSB is stored in lowest address.
   */

   if (*c) 
   {
       printf("Little endian");
   }
   else 
   {
       printf("Big endian");
   }
   return 0;
}

Exercise: What is the output of this?

#include 

/* function to show bytes in memory, from location start to start+n*/
void show_mem_rep(char *start, int n) 
{
    int i;
    for (i = 0; i < n; i++)
         printf(" %.2x", start[i]);
    printf("\n");
}

/* Main function to call above function for 0x01234567 */
int main()
{
   int i = 0x01234567;
   show_mem_rep((char *)&i, sizeof(i));
   getchar();
   return 0;
}

Output:
For Big Endian: 0x01 0x23 0x45 0x67
For Little Endian: 0x67 0x45 0x23 0x01

Exercise: Endianness important with type casting?

#include 
int main()
{
    unsigned char arr[2] = {0x01, 0x00};
    unsigned short int x = *(unsigned short int *) arr;
    printf("%d", x);
    getchar();
    return 0;
}

Explanation:
Character array contains 2 elements, 
        1st element is 0x01 and 
        2nd element is 0x00. 
Therefore 2 bytes of data looks like this 0x01 0x00
Typecasting character array to "short int" (which is 2 bytes) 
    causes it to reference 2 bytes and
    In Big Endian: thus referring to 0x0100 after typecasting.
    In Little Endian: thus referring to 0x0001 after typecasting.

Exercise: Convert from Big Endian format to Little Endian format?

#include 
uint32_t convert_big_to_little (uint32_t data);
int main()
{
    uint32_t big_data = 0x12345678;
    uint32_t little_data = convert_big_to_little(data);
    return 0;
}

uint32_t convert_big_to_little (uint32_t data)
{
    char *char_data_ptr = (char *) &data;
    uint32_t result = 0;
    result |= ( char_data_ptr[0] & 0x000000FF ) << 24;
    result |= ( char_data_ptr[1] & 0x0000FF00 ) << 8;
    result |= ( char_data_ptr[2] & 0x00FF0000 ) >> 8;
    result |= ( char_data_ptr[3] & 0xFF000000 ) >> 24;
    return result;
}

results matching ""

    No results matching ""