Diff for "BitManipulation"

Differences between revisions 23 and 24

Here is some information and goals related to Python bit manipulation, binary manipulation.

Some tasks include:

Turn "11011000111101..." into bytes, (padded left or right, 0 or 1,) and vice versa.
Slice ranges of bits
Rotate bits, addressed by the bit. That is, say: "rotate bits 13-17, wrapping around the edges," or, "rotate bits 13-17, lose bits on the one side, set all new bits to 0."
Similarly, revert regions of bits, apply logic to regions of bits, etc.,.
Switch Endianness, with different block sizes.
Apply operations in block groupings: ex: apply XOR 10101 (5 bits) repeatedly across a field.

Relevant libraries include:

Some simple code is at ASPN: bit-field manipulation.

Here are some other examples.

Manipulations

To integer.

Toggle line numbers

   1 >>> print int('00100001', 2)
   2 33

To hex string. Note that you don't need to use x8 bits.

Toggle line numbers

   1 >>> print "0x%x" % int('11111111', 2)
   2 0xff
   3 >>> print "0x%x" % int('0110110110', 2)
   4 0x1b6
   5 >>> print "0x%x" % int('0010101110101100111010101101010111110101010101', 2)
   6 0xaeb3ab57d55

To character. 8 bits max.

Toggle line numbers

   1 >>> chr(int('111011', 2))
   2 ';'
   3 >>> chr(int('1110110', 2))
   4 'v'
   5 >>> chr(int('11101101', 2))
   6 '\xed'

Characters to integers, but not to strings of 1's and 0's.

Toggle line numbers

   1 >>> int('01110101', 2)
   2 117
   3 >>> chr(int('01110101', 2))
   4 'u'
   5 >>> ord('u')
   6 117

Individual bits.

Toggle line numbers

   1 >>> 1 << 0
   2 1
   3 >>> 1 << 1
   4 2
   5 >>> 1 << 2
   6 4
   7 >>> 1 << 3
   8 8
   9 >>> 1 << 4
  10 16
  11 >>> 1 << 5
  12 32
  13 >>> 1 << 6
  14 64
  15 >>> 1 << 7
  16 128

Transformations Summary

Strings to Integers:

"1011101101": int(str, 2)
"m": ord(str)
"0xdecafbad": int(str, 16) (known to work in Python 2.4)
"decafbad": int(str, 16) (known to work in Python 2.4)

Integers to Strings:

"1011101101": built-in to Python 3 (see below)
"m": chr(str)
"0xdecafbad": hex(val)
"decafbad": "%x" % val

We are still left without a technique for producing binary strings, and decyphering hex strings.

Hex String to Integer

The simplest approach is to use the int type with the base argument.

Toggle line numbers

   1 >>> int('0xff',16)
   2 255
   3 >>> int('d484fa894e',16)
   4 912764078414

Another approach to decyphering "0xdecafbad" style hex strings, is to use eval:

Toggle line numbers

   1 >>> eval("0xdecafbad ")
   2 3737844653L

However, this could be dangerous, depending on where you're getting your data from. Q. Why could this be dangerous? What data makes this dangerous?

Here's a function that is safer:

Toggle line numbers

   1 def hex_to_integer(h):
   2     """Convert a hex string to an integer.
   3 
   4     The hex string can be any length. It can start with an 0x, or not.
   5     Unrecognized characters will raise a ValueError.
   6 
   7     This function released into the public domain by it's author, Lion
   8     Kimbro.
   9     """
  10     num = 0  # Resulting integer
  11     h = h.lower()  # Hex string
  12     if h[:2] == "0x":
  13         h = h[2:]
  14     for c in h:  # Hex character
  15         num *= 16
  16         if "0" <= c <= "9":
  17             num += ord(c) - ord("0")
  18         elif "a" <= c <= "f":
  19             num += ord(c) - ord("a") + 10
  20         else:
  21             raise ValueError(c)

Or just use this:

int('0xdecafbad', 16)

Integer to Bin String

Python 3 supports binary literals (e.g. 0b10011000) and has a bin() function. For older versions:

Toggle line numbers

   1 >>> def bin(a):
   2         s=''
   3         t={'0':'000','1':'001','2':'010','3':'011',
   4            '4':'100','5':'101','6':'110','7':'111'}
   5         for c in oct(a)[1:]:
   6                 s+=t[c]
   7         return s

or better:

Toggle line numbers

   1 def bin(s):
   2     return str(s) if s<=1 else bin(s>>1) + str(s&1)

Python Integers

From "The Python Language Reference" page on the Data Model:

"Integers (int) These represent numbers in an unlimited range, subject to available (virtual) memory only. For the purpose of shift and mask operations, a binary representation is assumed, and negative numbers are represented in a variant of 2’s complement which gives the illusion of an infinite string of sign bits extending to the left."

Prior to Python 3.1, there was no easy way to determine how Python represented a specific integer internally, i.e. how many bits were used. Python 3.1 adds a bit_length() method to the int type that does exactly that.

Unless you know you are working with numbers that are less than a certain length, for instance numbers from arrays of integers, shifts, rotations, etc. may give unexpected results.

The number of the highest bit set is the highest power of 2 less than or equal to the input integer. This is the same as the exponent of the floating point representation of the integer, and is also called its "integer log base 2".(ref.1)

In versions before 3.1, the easiest way to determine the highest bit set is*:

* There is a long discussion on this topic, and why this method is not good, in "Issue 3439" at Python.org: http://bugs.python.org/issue3439 This discussion led up to the addition of bit_length() in Python 3.1.

Toggle line numbers

   1 import math
   2 
   3 hiBit = math.floor(math.log(int_type, 2))

An input less than or equal to 0 results in a "ValueError: math domain error"

The section "Finding integer log base 2 of an integer" on the "Bit Twiddling Hacks"(ref.1) web page includes a number of methods for determining this value for integers of known magnitude, presumably when no math coprocessor is available. The only method generally applicable to Python integers of unknown magnitude is the "obvious way" of counting the number of bitwise shift operations needed to reduce the input to 0.

Bit Length Of a Python Integer

bitLen() counts the actual bit length of a Python integer, that is, the number of the highest non-zero bit plus 1. Zero, with no non-zero bit, returns 0. As should be expected from the quote above about "the illusion of an infinite string of sign bits extending to the left," a negative number throws the computer into an infinite loop.

The function can return any result up to the length of the largest integer your computer's memory can hold.

Toggle line numbers

   1 def bitLen(int_type):
   2     length = 0
   3     while (int_type):
   4         int_type >>= 1
   5         length += 1
   6     return(length)
   7 
   8 for i in range(17):
   9      print(bitLen(i))
  10 
  11 # results: 0, 1, 2, 2, 3, 3, 3, 3, 4, 4, 4, 4, 4, 4, 4, 4, 5

The method using the math module is much faster, especially on huge numbers with hundreds of decimal digits.

bitLenCount()

In common usage, the "bit count" of an integer is the number of set (1) bits, not the bit length of the integer described above. bitLen() can be modified to also provide the count of the number of set bits in the integer. There are faster methods to get the count below.

Toggle line numbers

   1 def bitLenCount(int_type):
   2     length = 0
   3     count = 0
   4     while (int_type):
   5         count += (int_type & 1)
   6         length += 1
   7         int_type >>= 1
   8     return(length, count)

Operations on Integers of Unknown Magnitude

Some procedures don't need to know the magnitude of an integer to give meaningful results.

bitCount()

The procedure and the information below were found in "Bit Twiddling Hacks"(ref.1)

- - - - - - - - - - - - - - - - - - - - - - - -

Counting bits set, Brian Kernighan's way*

unsigned int v;          // count the number of bits set in v
unsigned int c;          // c accumulates the total bits set in v
for (c = 0; v; c++)
{   v &= v - 1;  }       //clear the least significant bit set

This method goes through as many iterations as there are set bits. So if we have a 32-bit word with only the high bit set, then it will only go once through the loop.

* The C Programming Language 2nd Ed., Kernighan & Ritchie, 1988.

Don Knuth pointed out that this method was published by Peter Wegner in CACM 3 (1960), 322. Also discovered independently by Derrick Lehmer and published in 1964 in a book edited by Beckenbach.

- - - - - - - - - - - - - - - - - - - - - - - -

Kernighan and Knuth, potent endorsements!

This works because each subtraction "borrows" from the lowest 1-bit. For example:

Toggle line numbers

   1 #       loop pass 1                 loop pass 2
   2 #      101000     101000           100000     100000
   3 #    -   #!python
   4   & 100111         -   #!python
   5   & 011111
   6 #    = 100111   = 100000         = 011111   =      0

It is an excellent technique for Python, since the size of the integer need not be determined beforehand.

Toggle line numbers

   1 def bitCount(int_type):
   2     count = 0
   3     while(int_type):
   4         int_type &= int_type - 1
   5         count += 1
   6     return(count)

parityOf()

From "Bit Twiddling Hacks"

Code almost identical to bitCount(), above, calculates the parity of an integer, returning 0 if there are an even number of set bits, and -1 if there are an odd number. In fact, counting the bits and checking whether the result is odd with bitcount & 1 is about the same speed as the parity function.

Toggle line numbers

   1 def parityOf(int_type):
   2     parity = 0
   3     while (int_type):
   4         parity = ~parity
   5         int_type = int_type & (int_type - 1)
   6     return(parity)

lowestSet()

To determine the bit number of the lowest bit set in an integer, in twos-complement notation i & -i zeroes all but the lowest set bit. The bitLen() proceedure then determines its position. Obviously, negative numbers return the same result as their opposite. In this version, an input of 0 returns -1, in effect an error condition.

Toggle line numbers

   1 For example:
   2 #    00111000     # 56
   3 #    11001000     # twos complement, -56
   4 # &= 00001000

Toggle line numbers

   1 def lowestSet(int_type):
   2     low = (int_type & -int_type)
   3     lowBit = -1
   4     while (low):
   5         low >>= 1
   6         lowBit += 1
   7     return(lowBit)

Single bits

The usual single-bit operations will work on any Python integer. It is up to the programmer to be sure that the value of 'offset' makes sense in the context of the program.

Toggle line numbers

   1 # testBit() returns a nonzero result, 2**offset, if the bit at 'offset' is one.
   2 
   3 def testBit(int_type, offset):
   4     mask = 1 << offset
   5     return(int_type & mask)
   6 
   7 # setBit() returns an integer with the bit at 'offset' set to 1.
   8 
   9 def setBit(int_type, offset):
  10     mask = 1 << offset
  11     return(int_type | mask)
  12 
  13 # clearBit() returns an integer with the bit at 'offset' cleared.
  14 
  15 def clearBit(int_type, offset):
  16     mask = ~(1 << offset)
  17     return(int_type & mask)
  18 
  19 # toggleBit() returns an integer with the bit at 'offset' inverted, 0 -> 1 and 1 -> 0.
  20 
  21 def toggleBit(int_type, offset):
  22     mask = 1 << offset
  23     return(int_type ^ mask)

Bit fields, e.g. for communication protocols

If you need to interpret individual bits in some data, e.g. a byte stream in a communications protocol, you can use the ctypes module.

import ctypes
c_uint8 = ctypes.c_uint8

class Flags_bits(ctypes.LittleEndianStructure):
    _fields_ = [
            ("logout", c_uint8, 1),
            ("userswitch", c_uint8, 1),
            ("suspend", c_uint8, 1),
            ("idle", c_uint8, 1),
            ]

class Flags(ctypes.Union):
    _fields_ = [("b", Flags_bits),
            ("asbyte", c_uint8)]

flags = Flags()
flags.asbyte = 0xc
print(flags.b.idle)
print(flags.b.suspend)
print(flags.b.userswitch)
print(flags.b.logout)

References

ref.1. "Bit Twiddling Hacks" By Sean Eron Anderson

http://graphics.stanford.edu/~seander/bithacks.html

ref.2. "The Art of Assembly Language" by Randall Hyde

http://webster.cs.ucr.edu/AoA/index.html Volume 4, Chapter 5 "Bit Manipulation"

ref.3. Hacker's Delight

http://www.hackersdelight.org/

Research Links

this is the sort of thing we're looking for:

ASPN: bit-field manipulation

related modules:

array module -- (issued with Python)
struct module -- (issued with Python)
binascii module -- (issued with Python)
pySerial module -- access the serial port

-  ⇤ ← Revision 23 as of 2012-08-02 11:18:50 → 
  Size: 13557
  Editor: cpc1-darl4-0-0-cust170
  Comment: wiki restore 2013-01-23
+   ← Revision 24 as of 2013-09-04 15:55:40 → ⇥
  Size: 13040
  Editor: Feuermurmel
  Comment:
-Deletions are marked like this.
+Additions are marked like this.
 Line 1:
-Line 4:
+Line 3:
-Line 6:
+Line 4:
-Line 15:
+Line 12:
-Line 17:
+Line 13:
-Line 23:
+Line 18:
-Line 26:
+Line 20:
-Line 29:
+Line 22:
-Line 31:
+Line 23:
-Line 34:
+Line 24:
-Line 45:
+Line 29:
-Line 49:
+Line 30:
-Line 64:
+Line 39:
-Line 68:
+Line 40:
-Line 83:
+Line 49:
-Line 87:
+Line 50:
-Line 102:
+Line 59:
-Line 106:
+Line 60:
-Line 131:
+Line 79:
-Line 135:
+Line 80:
-Line 139:
+Line 82:
- * {{{"1011101101"}}}: {{{int(str, 2)}}}
+ * {{{"1011101101"}}}: {{{int(str, 2)}}}
-Line 142:
+Line 84:
  * {{{"0xdecafbad"}}}: {{{int(str, 16)}}} (known to work in Python 2.4)
 * {{{"decafbad"}}}: {{{int(str, 16)}}} (known to work in Python 2.4)
-Line 147:
+Line 88:
-Line 152:
+Line 92:
  * {{{"decafbad"}}}: {{{"%x" % val}}}
-Line 157:
+Line 96:
-Line 159:
+Line 97:
-Line 162:
+Line 98:
-Line 172:
+Line 105:
-Line 176:
+Line 106:
-Line 187:
+Line 111:
-Line 192:
+Line 113:
-Line 194:
+Line 114:
-Line 223:
+Line 137:
-    return num
}}}
+}}}
Or just use this:

{{{
int('0xdecafbad', 16)
}}}
-Line 229:
+Line 144:
-Line 232:
+Line 145:
-Line 248:
+Line 155:
-Line 252:
+Line 156:
-Line 263:
+Line 161:
-Line 267:
+Line 162:
-Line 271:
+Line 164:
-Line 274:
+Line 166:
-Line 277:
+Line 168:
-Line 280:
+Line 170:
-Line 283:
+Line 172:
-Line 286:
+Line 174:
-Line 289:
+Line 176:
-Line 300:
+Line 181:
-An input less than or equal to 0 results in a "[[ValueError|ValueError]]: math domain error"
+An input less than or equal to 0 results in a "ValueError: math domain error"
-Line 308:
+Line 185:
-Line 310:
+Line 186:
-Line 314:
+Line 188:
-Line 316:
+Line 189:
-Line 336:
+Line 203:
-Line 341:
+Line 205:
-Line 343:
+Line 206:
-Line 346:
+Line 207:
-Line 363:
+Line 218:
-Line 367:
+Line 219:
-Line 371:
+Line 221:
-Line 373:
+Line 222:
-Line 377:
+Line 224:
-Line 380:
+Line 226:
-Line 382:
+Line 227:
-Line 393:
+Line 234:
-Line 397:
+Line 236:
-Line 400:
+Line 238:
-Line 403:
+Line 240:
-Line 406:
+Line 242:
-Line 409:
+Line 244:
-Line 411:
+Line 245:
-Line 426:
+Line 254:
-Line 430:
+Line 255:
-Line 445:
+Line 264:
-Line 449:
+Line 265:
-Line 453:
+Line 267:
-Line 455:
+Line 268:
-Line 470:
+Line 277:
-Line 474:
+Line 278:
-Line 477:
+Line 279:
-Line 490:
+Line 286:
-Line 507:
+Line 295:
-Line 511:
+Line 296:
-Line 514:
+Line 297:
-Line 546:
+Line 323:
-Line 550:
+Line 324:
-Line 553:
+Line 325:
-Line 581:
+Line 349:
-Line 584:
+Line 350:
-Line 588:
+Line 352:
- .
 http://graphics.stanford.edu/~seander/bithacks.html
+ . http://graphics.stanford.edu/~seander/bithacks.html
-Line 596:
+Line 356:
- .
 http://webster.cs.ucr.edu/AoA/index.html Volume 4, Chapter 5 "Bit Manipulation"
+ . http://webster.cs.ucr.edu/AoA/index.html Volume 4, Chapter 5 "Bit Manipulation"
-Line 604:
+Line 360:
- .
 http://www.hackersdelight.org/
+ . http://www.hackersdelight.org/
-Line 611:
+Line 363:
-Line 615:
+Line 365:
-Line 618:
+Line 367:
-Line 620:
+Line 368:
-Line 627:
+Line 374:
-see also: [[BitwiseOperators|BitwiseOperators]]
+see also: BitwiseOperators

Page

User

Manipulations

Transformations Summary

Hex String to Integer

Integer to Bin String

Python Integers

Bit Length Of a Python Integer

bitLenCount()

Operations on Integers of Unknown Magnitude

bitCount()

parityOf()

lowestSet()

Single bits

Bit fields, e.g. for communication protocols

References

Research Links