Monday, November 09, 2020

raw string in Python

 Normal strings use the backslash character as an escape character for special characters (like newlines):


>>> print('this is \n a test')

this is 

 a test

The r prefix tells the interpreter not to do this:


>>> print(r'this is \n a test')

this is \n a test

>>> 

This is important in regular expressions, as you need the backslash to make it to the re module intact - in particular, \b matches empty string specifically at the start and end of a word. re expects the string \b, however normal string interpretation '\b' is converted to the ASCII backspace character, so you need to either explicitly escape the backslash ('\\b'), or tell python it is a raw string (r'\b').


Ref: https://stackoverflow.com/questions/21104476/what-does-the-r-in-pythons-re-compiler-pattern-flags-mean/21104539#:~:text=According%20to%20this%20the%20%22r,literal%20prefixed%20with%20'r'.

byte and str in Python

 To store anything in a computer, you must first encode it, i.e. convert it to bytes. For example:


  • If you want to store music, you must first encode it using MP3, WAV, etc.
  • If you want to store a picture, you must first encode it using PNG, JPEG, etc.
  • If you want to store text, you must first encode it using ASCII, UTF-8, etc.

MP3, WAV, PNG, JPEG, ASCII and UTF-8 are examples of encodings. An encoding is a format to represent audio, images, text, etc in bytes.


In Python, a byte string is just that: a sequence of bytes. It isn't human-readable. Under the hood, everything must be converted to a byte string before it can be stored in a computer.


On the other hand, a character string, often just called a "string", is a sequence of characters. It is human-readable. A character string can't be directly stored in a computer, it has to be encoded first (converted into a byte string). There are multiple encodings through which a character string can be converted into a byte string, such as ASCII and UTF-8.


'I am a string'.encode('ASCII')

The above Python code will encode the string 'I am a string' using the encoding ASCII. The result of the above code will be a byte string. If you print it, Python will represent it as b'I am a string'. Remember, however, that byte strings aren't human-readable, it's just that Python decodes them from ASCII when you print them. In Python, a byte string is represented by a b, followed by the byte string's ASCII representation.


A byte string can be decoded back into a character string, if you know the encoding that was used to encode it.


b'I am a string'.decode('ASCII')

The above code will return the original string 'I am a string'.


Encoding and decoding are inverse operations. Everything must be encoded before it can be written to disk, and it must be decoded before it can be read by a human.


Ref: https://stackoverflow.com/questions/6224052/what-is-the-difference-between-a-string-and-a-byte-string#:~:text=In%20Python%2C%20a%20byte%20string,is%20a%20sequence%20of%20characters.

Variadic Function in Python

 Variadic functions can accept  a variable number of arguments. 

 Naming convention: *args for positional input parameters and **kwargs for keyword input parameters.

 

# Function definition
def foo(*args, **kwargs):
    return args, kwargs

# Function calls
foo(1, 2, eleven=11, twelve=12)
# Output
# ((1, 2), {'eleven': 11, 'twelve': 12})


foo(*range(5,7), **{'thirteen': 13})
# Output
# ((5, 6), {'thirteen': 13})


mylist = [3,4]
mydict = {'fourteen': 14}
foo(*mylist, **mydict)
# Output
# ((3, 4), {'fourteen': 14})