Wilson Mar bio photo

Wilson Mar

Hello!

Email me Calendar Skype call

LinkedIn Twitter Gitter Instagram Youtube

Github Stackoverflow Pinterest

The rules shown in samples using Keywords, arguments, Exception Handling, OS commands, Strings, Lists, Sets, Tuples, Files, Timers

US (English)   Español (Spanish)   Français (French)   Deutsch (German)   Italiano   Português   Estonian   اَلْعَرَبِيَّةُ (Egypt Arabic)   中文 (简体) Chinese (Simplified)   日本語 Japanese   한국어 Korean

Overview

This is Therapy for me

I wrote this because I have a mental block about programming Python. It’s like I’m afraid of snakes.

Maybe it’s fear of not doing well on coding interviews. That’s weird because as an SRE I don’t have a job where I’m programming Python every day. Yet employers make people go through coding challenges anyway like it was a fraternity hazing ritual.

So to get over my Python phobia, like any other aversion therapy, I needed to de-sensitize myself and do the very thing I fear. My notes on the various activities:

References

https://learnpython.com/blog/9-best-python-online-resources-start-learning/

Setup an IDE

The most popular IDEs for Python are:

  • VSCode from Microsoft (free)
  • PyCharm (FREE or PRO $89/$71/$53 year)
  • Cloud9 free on-line on AWS (which automatically generates new credentials every 5 minutes or on browser Reset*)

BLOG: Setup VSCode for Python Development https://code.visualstudio.com/docs/editor/extension-marketplace

On IDE such as VSCode you can see key/value pairs without typing print statements in code, like an Xray machine:

  1. click next to a line number at the left to set a Breakpoint.
  2. Click “RUN AND DEBUG” to see variables: Locals and Globals.
  3. To expand and contract, click “>” and “V” in front of items.
  4. “special variables” are dunder (double underline) variables.
  5. Under each “function variables” and special variables of their own. For a list, it’s append, clear, copy, etc.
  6. Under Globals are its special variable (such as file for the file path of the program) and class variables, plus an entry for each class defined in the code (such as unittest).

Scan for vulnerable code

https://github.com/PyCQA/bandit

Reserved Keywords

Listed alphabetically below are words that Python’s reserved for itself, so you can’t use them as custom variables.

PROTIP: Research and find out what each is about:

  • and
  • as
  • assert
  • async
  • await
  • break - force escape from for/while loop
  • class
  • continue - force loop again next iteration
  • def - define function
  • del list1[2] # delete 3rd list item
  • elif - else if
  • else
  • except
  • False - boolean
  • finally - of a try
  • for
  • from
  • global
  • if
  • import
  • in
  • is
  • lambda - if/then/else in one line
  • None
  • nonlocal
  • not
  • or
  • pass - instruction to do nothing (instead of return or yield with value)
  • raise NotImplementedError() throws an exception purposely
  • return
  • True - Boolean
  • try - https://www.youtube.com/watch?v=NIWwJbo-9_8
  • while
  • with
  • yield - resumes after returning a value back to the caller to produce a series of values over time.

The list above can be retrieved (as an array) by this code after typing python for the REPL (Read Evaluate Print Loop) interactive prompt:

python
>>> import keyword
>>> keyword.kwlist
>>> exit()

Press control+D to exit anytime.

Use Not None Reserved Word

Returning 0 on error can be confused with the number 0 as a valid response.

To avoid the confusion, return the Python reserved word “None”:

result = safe_square_root(4)
if result is not None:   # happy path:
   value = result.pop()  # pop up from stack.
   print(value)
else:  # notice we're not checking for None.
    # calling function does not need to handle error:
    # an error occurred, but encapsulated to be forwarded and processed upstream:
    print("unable to compute square root")

Function:

def safe_square_root(x):
    try:
        return [math.sqrt(x)]   # in a stack.
    except ValueError:
        return None   # using reserved word.

The parameter (x) is what is declared going into the function.

The value passed through when calling the function is called an argument.

Operators

Floor division Operators

11 // 5 uses “floor division” to return just the integer (integral part) of 2, discarding the remainder. This can be useful to efficiently solve the “Prefix Sums CountDiv” coding interview challenge: “Write a function … that, given three integers A, B and K, returns the number of integers within the range [A..B] that are divisible by K”:

def solution(a, b, k):
    return 0 if b == 0 else int(b // k - (a - 1) // k)
   

Instead of a “brute force” approach which has linear time complexity — O(n), the solution using floor division is constant time - O(1).

Modulo operator

11 % 5 uses the (percent sign), the modulo operator to divide 11 by the quotient 5 in order to return 1 because two 5s can go into 11, leaving 1 left over, the remainder. Modulus is used in circular buffers and hashing algorithms.

def solution(A, K):
    # A is the array.
    # K is the increment to move.
    result = [None] * len(A)   # initialize result array for # items in array
 
    for i in range(len(A)):
        # Use % modulo operator to calculate new index position 0 - 9:
        result[(i + K) % len(A)] = A[i]   
        print(f'i={i} A[i]={A[i]} K={K} result={result} ')
    return result
 
print(solution([7, 2, 8, 3, 5], 2))

Modulu is also used in this

Duration calculations

There are several ways to present date and time. The ISO 8601 format is: 2022-02-22T07:53:19.051615-05:00

There are several ways to capture how long a particular function or the whole program took to run.

To time the difference between calculation strategies, new in Python 3.7 is PEP 564

time.perf_counter() (abbreviation of performance counter) measures the elapsed time of short duration because it returns 82 nano-second resolution on Fedora 4.12. It is based on Wall-Clock Time which includes time elapsed during sleep and is system-wide. The reference point of the returned value is undefined, so that only the difference between the results of consecutive calls is valid. See https://docs.python.org/3/library/time.html#time.perf_counter

time.clock is no longer available since Python 3.8.

time.time() has a resolution of whole seconds. And in a measurement period between start and stop times, if the system time is disrupted (such as for daylight savings) its counting is disrupted. time.time() resolution will only become larger (worse) as years pass since every day adds 86,400,000,000,000 nanoseconds to the system clock, which increases the precision loss. It is called “non-monotonic” because falling back on daylight savings would cause it to report time going backwards:

    start_time = time.time()
    # your code
    e = time.time() - start_time
    time.strftime("%H:%M:%S", time.gmtime(e))  # for hours:minutes:seconds
    print('{:02d}:{:02d}:{:02d}'.format(e // 3600, (e % 3600 // 60), e % 60))
    

timeit.timer() provides a nice output format of 0:00:01.946339 for almost 2 seconds. See https://docs.python.org/3/library/timeit.html and https://www.guru99.com/timeit-python-examples.html

    # from timeit import default_timer as timer
    # from datetime import timedelta
    start = timer()
    # do some stuff ...
    end = timer()
    print(timedelta(seconds=end-start))
    

PEP-418 in Python 3.3 added three timers:

time.process_time() offers 1 nano-second resolution on Linux 4.12. It does not include time during sleep.

# import time
t = time.process_time()
# do some stuff ...
elapsed_time = time.process_time() - t

time.monotonic() is used for measurements on the order of hours/days, when you don’t care about sub-second resolution. It has 81 ns resolution on Fedora 4.12. BTW “monotonic” = only goes forward. See https://docs.python.org/3/library/time.html#time.monotonic

datetime.datetime.now() provides microsecond precision:

    # import datetime
    start = datetime.datetime.now()
    # do some stuff ...
    end = datetime.datetime.now()
    elapsed = end - start
    print(elapsed)
    # or
    print(elapsed.seconds,":",elapsed.microseconds) 
    

References:

  • https://stackoverflow.com/questions/7370801/how-to-measure-elapsed-time-in-python
  • https://stackoverflow.com/questions/3620943/measuring-elapsed-time-with-the-time-module/47637891#47637891
  • See https://docs.python.org/3/library/datetime.html#strftime-and-strptime-format-codes
  • See https://www.codingeek.com/tutorials/python/datetime-strftime/
  • use the .st_birthtime attribute of the result of a call to os.stat().

Timezone handling

NOTE: On macOS, timezone data are in a binary file at /etc/localtime.

Once a datetime has a tzinfo, the astimezone() strategy supplants new tzinfo.

Timing Attacks

A malicious use of precise microseconds timing code is used by Timing Attacks based on the time it takes for an application to authenticate a password to determine the algorithm used to process the password. In the case of Keyczar vulnerability found by Nate Lawson, a simple break-on-inequality algorithm was used to compare a candidate HMAC digest with the calculated digest. A value which shares no bytes in common with the secret digest returns immediately; a value which shares the first 15 bytes will return 15 compares later.

Similarly, PDF: entropy python-sample-entropy-times-957x402

PROTIP: Use the secrets.compare_digest module (introduced in Python 3.5) to check passwords and other private values. It uses a constant amount of time to process every request.

Functions hmac.compare_digest() and secrets.compare_digest() are designed to mitigate against timing attacks.

http://pypi.python.org/pypi/profilehooks

Depth-First Seach (DFS) uses a stack, whereas Breadth-First Search (BFS) use a queue.

Time Complexity

Time complexity analysis estimates how long it will take for an algorithm to complete its assigned job based on its structure.

Use of Modulus would result in “O(n)” (linear) growth in time to run as the dataset grows.

Depth-first trees would have steeper (logarithmic) Time Complexity:

python-coding-time-complexity-1222x945

In https://bigocheatsheet.com, in the list of Big O values for sorting,

Sorting

To swap values, here’s a straight-forward function:

def swap1(var1,var2):
    var1,var2 = var2,var1
    return var1, var2
>>> swap1(10,20)
>>> 2 1
def swap2(x,y):
    x = x ^ y
    y = x ^ y
    x = x ^ y
    return x, y
>>> swap2(10,20)
(20,10)

Reduce Space Complexity with Dynamic programming

Techniques for calculation of nested loops is often used to shown how to reduce run times by using techniques that use more memory space. Rather than “brute-force” repeatitive computations as in the definition of how to calculate Fibonacci numbers, which by definition is based on numbers preceding it.

    fib(5) = fib(4) + fib(3)

Memoization (sounds like memorization) is the technique of writing a function that remembers the results of previous computations.

Longest Increasing Subsequence (LIS)

That’s a technique of “Dynamic Programming” (See https://www.wikiwand.com/en/Dynamic_programming)

Dynamic programming is a catch phrase for solutions based on solving successively similar but smaller problems, using algorithmic tasks in which the solution of a bigger problem is relatively easy to find, if we have solutions for its sub-problems.

Making change


Built-in Methods/Functions

https://docs.python.org/3/library/functions.html

  • abs()
  • any()
  • all()
  • ascii()
  • bin()
  • bool()
  • bytearray()
  • callable()
  • bytes()
  • chr()
  • compile()
  • classmethod()
  • complex()
  • delattr()
  • dict()
  • dir()
  • divmod()
  • enumerate()
  • staticmethod()
  • filter()
  • eval()
  • float()
  • format()
  • frozenset()
  • getattr()
  • globals()
  • exec()
  • hasattr()
  • help()
  • hex()
  • hash()
  • input()
  • id()
  • isinstance() - checks if the object (first argument) is an instance or subclass of classinfo class (second argument). True/False
  • int()
  • issubclass()
  • iter()
  • list() Function
  • locals()
  • len([1, 2, 3]) is 3.
  • max()
  • min()
  • map()
  • next()
  • memoryview()
  • object()
  • oct()
  • ord()
  • open()
  • pow()
  • print()
  • property()
  • range()
  • repr()
  • reversed()
  • round()
  • set()
  • setattr()
  • slice() - extract substring
  • sorted()
  • str()
  • sum()
  • tuple() Function
  • type()
  • vars()
  • zip() - combine two interable arrays
  • import()
  • super()

class functions

using .maketrans() and .translate()

  • a.find(‘a’) returns the index where ‘a’ is found.

if/then/else

Avoid divide by zero errors

Use this in every division to ensure that a zero denominator results in falling into “else 0” rather than a “ZeroDivisionError” at run-time:

def weird_division(n, d):
    # n=numerator, d=denominator.
    return n / d if d else 0

Environment Variables

To read a file named “.env” at the $HOME folder, and obtain the value from “MY_EMAIL”:

import os
env_vars = !cat ~/.env
for var in env_vars:
    key, value = var.split('=')
    os.environ[key] = value
 
print(os.environ.get('MY_EMAIL'))   # containing "johndoe@gmail.com"

This code is important because it keeps secrets in your $HOME folder, away from folders that get pushed up to GitHub.

There is the “load_dotenv” package that can do the above, but using native commands mean less exposure to potential attacks.

Remember that attackers can use directory traversal sequences (../) to fetch the sensitive files from the server.

Sanitize the user input using “shlex”


Blob vs. File vs. Text

A “BLOB” (Binary Large OBject) is a data type that stores binary data such as mp4 videos, mp3 audio, pictures, pdf. So usually large – up to 2 TB (2,147,483,647 characters).

https://github.com/googleapis/google-cloud-python/issues/1216

https://towardsdatascience.com/image-processing-blob-detection-204dc6428dd

Azure storage

https://github.com/yokawasa/azure-functions-python-samples

https://chriskingdon.com/2020/11/24/the-definitive-guide-to-azure-functions-in-python-part-1/

https://chriskingdon.com/2020/11/30/the-definitive-guide-to-azure-functions-in-python-part-2-unit-testing/

https://github.com/Azure/azure-storage-python/blob/master/tests/blob/test_blob_storage_account.py

https://docs.microsoft.com/en-us/azure/storage/blobs/storage-quickstart-blobs-python

Azure Blobs

NOTE: Update of azure-storage-blob deprecates blockblobservice.

VIDEO: https://pypi.org/project/azure-storage-blob/

https://www.educative.io/edpresso/how-to-download-files-from-azure-blob-storage-using-python

https://github.com/Azure/azure-sdk-for-python/issues/12744 exists() new feature

import asyncio

async def check():
    from azure.storage.blob.aio import BlobClient
    blob = BlobClient.from_connection_string(conn_str="my_connection_string", container_name="mycontainer", blob_name="myblob")
    async with blob:
        exists = await blob.exists()
        print(exists)

Azure Streams

https://blog.siliconvalve.com/2020/10/29/reading-and-writing-binary-files-with-python-with-azure-functions-input-and-output-bindings/ Reading and writing binary files with Python with Azure Functions input and output bindings

GCP

https://gcloud.readthedocs.io/en/latest/storage-blobs.html

https://cloud.google.com/appengine/docs/standard/python/blobstore

OpenCV

https://learnopencv.com/blob-detection-using-opencv-python-c/

Scikit-Image

https://towardsdatascience.com/image-processing-with-python-blob-detection-using-scikit-image-5df9a8380ade

GIS

https://gsp.humboldt.edu/olm/Courses/GSP_318/11_B_91_Blob.html


String Handling

Regular Expressions

  • https://www.tutorialspoint.com/python/python_reg_expressions.htm
  • https://www.udemy.com/course/python-quiz/learn/quiz/4649042#overview within quiz

Handle Strings safely

Python has four different ways to format strings.

Using f-strings to format (potentially malicious) user-supplied strings can be exploited:

from string import Template
...
greeting_template = Template(“Hello World, my name is $name.”)
greeting = greeting_template.substitute(name=”Hayley”)
   

So use a way that’s less flexible with types and doesn’t evaluate Python statements.

Slicing

For flexibility with alternative languages such as Cyrillic (Russian) character set, return just the first 3 characters of a string:

letters = "abcdef"
first_part = letters[:3]
   

Unicode Superscript & Subscript characters

# Specify Unicode characters:
# superscript
print("x\u00b2 + y\u00b2 = 2")  # x² + y² = 2
 
# subscript
print(u'H\u2082SO\u2084')  # H₂SO₄

Superscript

# super-sub-script.py converts to superscript:
def conv_superscript(x):
    normal = "ABCDEFGHIJKLMNOPQRSTUVWXYZabcdefghijklmnopqrstuvwxyz0123456789+-=()"
    super_s = "ᴬᴮᶜᴰᴱᶠᴳᴴᴵᴶᴷᴸᴹᴺᴼᴾᴾᴿˢᵀᵁⱽᵂˣʸᶻᵃᵇᶜᵈᵉᶠᵍʰᶦʲᵏˡᵐⁿᵒᵖ۹ʳˢᵗᵘᵛʷˣʸᶻ⁰¹²³⁴⁵⁶⁷⁸⁹⁺⁻⁼⁽⁾"
    res = x.maketrans(''.join(normal), ''.join(super_s))
    return x.translate(res)
 
print(conv_superscript('Convert all this2'))
# Or you can simply copy the text

Functions

Internationalization & Localization (I18N & L18N)

Internationalization, aka i18n for the 18 characters between i and n, is the process of adapting coding to support various linguistic and cultural settings:

  • date and time zone calculations
  • numbers and currency
  • Pluralization

  1. Install

    pip install gettext

    NOTE: pip is a recursive acronym that stands for either “Pip Installs Packages” or “Pip Installs Python”.

  2. Create a folder for each locale in the ./locale folder.

  3. Use Lokalise utility to manage translations through a GUI. It also has a CLI tool to automate the process of managing translations. https://lokalise.com/blog/lokalise-apiv2-in-practice/

    locales/
    ├── el
    │   └── LC_MESSAGES
    │       └── base.po
    └── en
     └── LC_MESSAGES
         └── base.po
    
  4. Add the library

    import gettext
    # Set the local directory
    localedir = './locale'
    # Set up your magic function
    translate = gettext.translation('appname', localedir, fallback=True)
    _ = translate.gettext
    # Translate message
    print(_("Hello World"))
    

    See https://phrase.com/blog/posts/translate-python-gnu-gettext/

  5. Store a master list of locales supported in a Portable Object Template (POT) file, also known as a translator:

    #: src/main.py:12
    msgid "Hello World"
    msgstr "Translation in different language"
    
    >>> unicode_string = u"Fuu00dfbu00e4lle"
    >>> unicode_string
    Fußbälle
    >>> type(unicode_string)
    

<type ‘unicode’>

utf8_string = unicode_string.encode(“utf-8”) utf8_string ‘Fuxc3x9fbxc3xa4lle’ type(utf8_string)

<type ‘str’>

ALTERNATIVE: TODO: http://babel.pocoo.org/en/latest/numbers.html

#from babel import numbers

numbers.format_decimal(.2345, locale=’en_US’)

Internationalization: http://babel.pocoo.org/en/latest/dates.html

Requires: pip install Babel

from babel import Locale

NOTE: Babel generally recommends storing time in naive datetime, and treat them as UTC.

from babel.dates import format_date, format_datetime, format_time

d = date(2007, 4, 1)

format_date(d, locale=’en’) # u’Apr 1, 2007’

format_date(d, locale=’de_DE’) # u’01.04.2007’

Switch language in browsers

Ensure that your program works correctly when another human language (such as “es” for Spanish, “ko” for Korean, “de” for German, etc.) is configured by the user:

A. English was selected in browser’s Preferences, but the app displays another language.

B. Another language was selected in browser’s preferences, and the app displays that language.

To simulate selecting another language in the browser’s Preferences in Firefox:

FirefoxOptions options = new FirefoxOptions();
options.addPreference("intl.accept_languages", language);
driver = new FirefoxDriver(options);

Alternately, in Chrome:

HashMap<String, Object> chromePrefs = new HashMap<String, Object>();
chromePrefs.put("intl.accept_languages", language);
ChromeOptions options = new ChromeOptions();
options.setExperimentalOption("prefs", chromePrefs);
driver = new ChromeDriver(options);

Excel handling using Dictionary object

Alternately, the Python library to work with Excel spreadsheets translates between Excel cell addresses (such as “A1”) and zero-based Python array tuple:

str = xl_rowcol_to_cell(0, 0, row_abs=True, col_abs=True)  # $A$1
(row, col) = xl_cell_to_rowcol('A1')    # (0, 0)
column = xl_col_to_name(1, True)   # $B

However, if you want to avoid adding a dependency, this function defines a dictionary to convert an Excel column number to a number:*

def letter_to_number(letters):
    letters = letters.lower()
    dictionary = {'a':1,'b':2,'c':3,'d':4,'e':5,'f':6,'g':7,'h':8,'i':9,'j':10,'k':11,'l':12,'m':13,'n':14,'o':15,'p':16,'q':17,'r':18,'s':19,'t':20,'u':21,'v':22,'w':23,'x':24,'y':25,'z':26}
    strlen = len(letters)
    if strlen == 1:
        number = dictionary[letters]
    elif strlen == 2:
        first_letter = letters[0]
        first_number = dictionary[first_letter]
        second_letter = letters[1]
        second_number = dictionary[second_letter]
        number = (first_number * 26) + second_number
    elif strlen == 3:
        first_letter = letters[0]
        first_number = dictionary[first_letter]
        second_letter = letters[1]
        second_number = dictionary[second_letter]
        third_letter = letters[2]
        third_number = dictionary[third_letter]
        number = (first_number * 26 * 26) + (second_number * 26) + third_number
    return number

REMEMBER: Square brackets are used to reference by value.

Instead of defining a dictionary, you can use a property of the ASCII character set, in that the Latin alphabet begins from its 65th position for “A” and its 97th character for “a”, obtained using the ordinal function:

ord('a')  # returns 97
ord('A')  # returns 65

This returns ‘a’ :

chr(97)

More dictionaries:

# Eastern European countries: SyntaxError: invalid character in identifier
ee_countries={"Ukraine": "43.7M", "Russia": "143.8M", "Poland": "38.1M", "Romania": "19.5M", "Bulgaria": "6.9M", "Hungary": "9.6M", "Moldova": "4.1M"}
float(ee_countries["Moldova"].rstrip("M"))  # 4.1
ee_countries.get("Moldova")   # 4.1M
len(ee_countries.items())     # 7 are immutable in dictionary
min(ee_countries.items())     # ('Bulgaria', '6.9M') the smallest country
max(ee_countries.values())  # largest country = 9.6M ?
max(ee_countries.keys())    # largest key length = Ukraine
sorted(ee_countries.keys(),reverse=True) # ['Ukraine', 'Russia', 'Romania', 'Poland', 'Lithuania', 'Latvia', 'Hungary', 'Bulgaria']
 
del ee_countries["Estonia"]
ee_countries.pop["Bulgaria"]
ee_countries["Latvia"] = "1.9M"
ee_countries.update[['Lithuania', '2.8M'],['Belarus' , '9.4M']]
ee_countries.popitem()     # remove item last added
len(ee_countries.items())  # 8 are immutable in dictionary
ee_countries["Bulgaria"]="7M"
 
ee2=ee_countries.copy()
ee_countries.clear()  # remove all
print(ee_countries)   # {} means empty 

https://www.codesansar.com/python-programming-examples/sorting-dictionary-value.htm

File open() modes

The Python runtime does not enforce type annotations introduced with Python version 3.5. But type checkers, IDEs, linters, SASTs, and other tools can benefit from the developer being more explicit.

Use this type checker to discover when the parameter is outside the allowed set and warn you:

MODE = Literal['r', 'rb', 'w', 'wb']
def open_helper(file: str, mode: MODE) -> str:
    ...
    open_helper('/some/path', 'r')  # Passes type check
    open_helper('/other/path', 'typo')  # Error in type checker

BTW Literal[…] was introduced with version 3.8 and is not enforced by the runtime (you can pass whatever string you want in our example).

PROTIP: Be explicit about using text (vs. binary) mode.

with open("D:\\myfile.txt", "w") as myfile:
    myfile.write("Hello")
CharacterMeaning
bbinary (text mode is default)
ttext mode (default)
rread-only (the default)
+open for updating (read and write)
wwrite-only after truncating the file
aappend
a+opens a file for both appending and reading at the same time
xopen for exclusive creation, failing if file already exists
Uuniversal newlines mode (used to upgrade older code)

myfile.write() returns the count of codepoints (characters in the string), not the number of bytes.

myfile.read(3) returns 3 line endings (\n) in string lines.

myfile.readlines() returns a list where each element of the list is a line in the file.

myfile.truncate(12) keeps the first 12 characters in the file and deletes the remainder of the file.

myfile.close() to save changes.

myfile.tell() tells the current position of the cursor.

File Copy commands

The shutil package provides fine-grained control for copying files:

    import shutil

This table summarizes the differences among shutil commands:

 Dest. dir.Copies metadataPreserve permissionsAccepts file object
shutil.copyfile----
shutil.copyfileobj---Yes
shutil.copyYes-Yes-
shutil.copy2YesYesYes-

See https://docs.python.org/3/library/filesys.html

File Metadata

Metadata includes Last modified and Last accessed info (mtime and atime). Such information is maintained at the folder level.

For all commands, if the destination location is not writable, an IOError exception is raised.

  • To copy a file within the same folder as the source file:

    shutil.copyfile(src, dst)

    buffer cannot be when copying to another folder.

  • To copy a file within the same folder and buffer file-like objects (with a read or write method, such as StringIO):

    shutil.copyfileobj(src, dst)

Notice both individual file copy commands do not copy over permissions from the source file. Both folder-level copy commands below carry over permissions.

CAUTION: folder-level copy commands do not buffer.

  • PROTIP: To copy a file to another folder and retain metadata:

    file_src = 'source.txt'  
    f_src = open(file_src, 'rb')
    file_dest = 'destination.txt'  
    f_dest = open(file_dest, 'wb')
    shutil.copyfileobj(f_src, f_dest)  
     

    The destination needs to specify a full path.

  • To copy a file to another folder and NOT retain metadata:

    shutil.copy2(src, "/usr", *, follow_symlinks=True)
  • You can use the operating system shell copy command, but there is the overhead of opening a pipe, system shell, or subprocess, plus poses a potential security risk.

    # In Unix/Linux
    os.system('cp source.txt destination.txt')  \# https://docs.python.org/3/library/os.html#os.system
    status = subprocess.call('cp source.txt destination.txt', shell=True) 
     
    # In Windows
    os.system('copy source.txt destination.txt')
    status = subprocess.call('copy source.txt destination.txt', shell=True)  \# https://docs.python.org/3/library/subprocess.html
    
  • Pipe open has been deprecated. https://docs.python.org/3/library/os.html#os.popen

    # In Unix/Linux
    os.popen('cp source.txt destination.txt')
     
    # In Windows
    os.popen('copy source.txt destination.txt')
    

Error Exception handling

Handle file not found exception : :

# if file doesn't exist in folder, create it:
import os
import sys
 
def make_at(path p, dir_name)
    original_path = os.getcwd()
    try:
        os.chdir(path)
        os.makedir(dir_name)
    except OSError as e:
        print(e, file=sys.stderr)
        raise
    finally:  #clean-up no matter what:
        os.chdir(original_path)

Operating system

There are platform-specific modules:

  • Windows msvcrt (Visual C run-time)
  • MacOS sys, tty, termios, etc.

To determine what operating system to wait for a keypress, use sys.platform, which has finer granularity than sys.name because it uses uname:

    # https://docs.python.org/library/sys.html#sys.platform
    from sys import platform
    if platform == "linux" or platform == "linux2":
        # linux
    elif platform == "darwin":
        # MacOS
    elif platform == "win32":
        # Windows
    elif platform == "cygwin":
        # Windows running cygwin Linux emulator
       

http://code.google.com/p/psutil/ to do more in-depth research.

PROTIP: This is an example of Python code issuing a Linux operating system command:

if run("which python3").find("venv") == -1:
    # something when not executed from venv

SECURITY PROTIP: Avoid using the built-in Python function “eval” to execute a string. There are no controls to that operation, allowing malicious code to be executed without limits in the context of the user that loaded the interpreter (really dangerous):

    import sys
    import os
    try:
        eval("__import__('os').system('clear')", {})
        #eval("__import__('os').system(cls')", {})
        print "Module OS loaded by eval"
    except Exception as e:
        print repr(e)
       

Command generator

Create custom CLI commands by parsing a command help text into cli code that implements it.

Brilliant.

See docopt from https://github.com/docopt/docopt described at http://docopt.org

CLI code enhancement

Python’s built-in mechinism for coding Command-line menus, etc. is difficult to understand. So some have offered alternatives:

  • cement - CLI Application Framework for Python.
  • click - A package for creating beautiful command line interfaces in a composable way.
  • cliff - A framework for creating command-line programs with multi-level commands.
  • docopt - Pythonic command line arguments parser.
  • python-fire - A library for creating command line interfaces from absolutely any Python object.
  • python-prompt-toolkit - A library for building powerful interactive command lines.

Handling Arguments

For parsing parameters supplied by invoking a Python program, the command-line arguments and options/flags:

    python myprogram.py -v -LOG=info

The argparse package comes with Python 3.2+ (and the optparse package that comes with Python 2), it’s difficult to understand and limited in functionality.

https://www.geeksforgeeks.org/argparse-vs-docopt-vs-click-comparing-python-command-line-parsing-libraries/

Alternatives: to Argparse are Docopt, Click, Client, argh, and many more.

Instead, Dan Bader recommends the use of click.pocoo.org/6/why click custom package (from Armin Ronacher).

Click is a Command Line Interface Creation Kit for arbitrary nesting of commands, automatic help page generation. It supports lazy loading of subcommands at runtime. It comes with common helpers (getting terminal dimensions, ANSI colors, fetching direct keyboard input, screen clearing, finding config paths, launching apps and editors, etc.)

Click provides decorators which makes reading of code very easy.

The “@click.command()” :

\# cli.py
import click
 
@click.command()
def main():
    print("I'm a beautiful CLI ✨")
 
if __name__ == "__main__":
    main()
   

Python in the Cloud

On AWS:

Tutorials:

On Azure:

pip install azure
  • https://docs.microsoft.com/python/azure/
  • https://azure.microsoft.com/resources/samples/?platform=python
  • https://github.com/Azure/azure-sdk-for-python/wiki/Contributing-to-the-tests
  • https://azure.microsoft.com/en-us/support/community/

Sets: Day of week Set handling

set([3,2,3,1,5]) # auto-renumbers with duplicates removed

day_of_week_en = ["Sun","Mon","Tue","Wed","Thu","Fri","Sat"]
day_of_week_en.append("Luv")
days_in_week=len(day_of_week_en)
print(f"{days_in_week} days a week" )
print(day_of_week_en)
 
x=0
for index in range(8):
    print("{0}={1}".format(day_of_week_en[x],x))
    x += 1

Lists

Use a list instead for a collection of similar objects.

Tuples

Values are passed to a function with a single variable. So to multiple values of various types to or from a function, we use a tuple - a fixed-sized collection of related items (akin to a “struct” in Java or “record”).

PROTIP: When adding a single value, include a comma at the end to avoid it being classified as a string:

  1. REMEMBER: When storing a single value in a Tuple, the comma at the end makes it not be classified as a string:

    mytuple=(50,) 
    type(mytuple)
    
    <class 'tuple'>
  2. Store several items in a single variable:

    person = ('john', 'doe', 40)
    (a, b, c) = person
    person
    a
    person[0::2]  # every 2 from 2nd item  =  ('john', 40)
    person.index(40)  # index of item containing 40 = 2
    

Range

myrange=range(3)
type(myrange)
myrange  # range(0, 3)
print(myrange)  # range(0, 3)
list(myrange)   # [0, 1, 2] from zero
myrange=range(1,5)
list(myrange)   # [1, 2, 3, 4] # excluding 5!
myrange=range(3,15,2)
list(myrange)         # [3, 5, 7, 9, 11, 13]  # skip every 2
list(myrange)[2]      # 7
print( range(5,15,4)[::-1] )  # range(13, 1, -4)
   

&LT;class ‘range’>

List comprehension

squares = [x * x for x in range(10)]

would output:

[0, 1, 4, 9, 16, 25, 36, 49, 64, 81]

Classes and Objects

Encapsulation is a software design practice of bundling the data and the methods that operate on that data.

Methods encode behavior (programmed logic) of an object and are represented by functions.

Attributes encode the state of an object and are represented by variables.

MEMONIC: Scopes: LEGB

  • Local - Inside the current function
  • Enclosing - Inside enclosing functions
  • Global - At the top level of the module
  • Built-in - In the special builtins module

Metaclasses

metaclasses: 18:50

metaclasses(explained): 40:40

Decorators

The string starting with “@” before a function definition

Decorators allow changes in behavior without changing the code.

Decorators take advantage of Python being live dynamically compiled.

There are limitations, though.

By default, functions within a class need to supply “self” as the first parameter.

    class MyClass:
       attribute = "class attribute"
       ...
       def afunction(self,text_in):
           cls.attribute = text_in
       

VIDEO: However, decorator @classmethod enable “cls” to be accepted as the first argument:

    def afunction(self,text_in):
           cls.attribute = text_in
       

The @classmethod is used for access to the class object to call other class methods or the constuctor.

There is also @staticmethod when access is not needed to class or instance objects.

Generators

  • VIDEO
  • https://www.youtube.com/watch?v=bD05uGo_sVI
  • https://www.youtube.com/watch?v=vBH6GRJ1REM Python dataclasses will save you HOURS, also featuring attrs

generator: 1:04:30

Context Manager

context manager: 1:22:37


https://www.codementor.io/alibabacloud/ how-to-create-and-deploy-a-pre-trained-word2vec-deep-learning-rest-api-oekpbfqpj


Secure coding

https://snyk.io/blog/python-security-best-practices-cheat-sheet/

  1. Always sanitize external data

  2. Scan your code

  3. Be careful when downloading packages

  4. Review your dependency licenses

  5. Do not use the system standard version of Python

  6. Use Python’s capability for virtual environments

  7. Set DEBUG = False in production

  8. Be careful with string formatting

  9. (De)serialize very cautiously

  10. Use Python type annotations

Insecure code in Pygoat

https://awesomeopensource.com/project/guardrailsio/awesome-python-security

https://github.com/mpirnat/lets-be-bad-guys from 2017

https://github.com/fportantier/vulpy from 2020 in Brazil

OWASP’s PyGoat is written using Python with Django web framework. Its code intentionally contains both traditional web application vulnerabilities (i.e. XSS, SQLi) and OWASP vulnerabilities The top 10 OWASP vulnerabilities in 2020 are:

• A1:2017-Injection • A2:2017-Broken Authentication • A3:2017-Sensitive Data Exposure • A4:2017-XML External Entities (XXE) • A5:2017-Broken Access Control • A6:2017-Security Misconfiguration • A7:2017-Cross-Site Scripting (XSS) • A8:2017-Insecure Deserialization • A9:2017-Using Components with Known Vulnerabilities • A10:2017-Insufficient Logging & Monitoring

Instructions at https://github.com/adeyosemanputra/pygoat

  1. Obtain the Docker image:

    docker pull pygoat/pygoat
    docker run --rm -p 8000:8000 pygoat/pygoat
    
    Watching for file changes with StatReloader
    Performing system checks...
     
    System check identified no issues (0 silenced).
    November 05, 2021 - 14:57:11
    Django version 3.0.14, using settings 'pygoat.settings'
    Starting development server at http://127.0.0.1:8000/
    Quit the server with CONTROL-C.
    
  2. In the browser localhost:

    http://127.0.0.1:8000
    

To learn how to code securely, PyGoat has an area where you can see the source code to determine where the mistake was made that caused the vulnerability and allows you to make changes to secure it.

https://owasp.org/www-pdf-archive/OWASP-AppSecEU08-Petukhov.pdf

https://rules.sonarsource.com/python/tag/owasp/RSPEC-4529 3400+ static analysis rules across 27 programming languages

Logging for Monitoring

  • https://github.com/python/cpython/tree/3.6/Lib/logging
  • https://realpython.com/python-logging-source-code/
  • https://infosecwriteups.com/most-common-python-vulnerabilities-and-how-to-avoid-them-5bbd22e2c360
  • https://docs.python.org/3/howto/logging.html#configuring-logging


It is estimated that it can take up to 200 days, and often longer, between attack and detection by the attacked. In the meantime, attackers can tamper with servers, corrupt databases, and steal confidential information.

“Insufficient Logging and Monitoring” is among the top 10 OWASP.

The vulnerability includes ineffective integration of security systems, which give attackers a way to pivot to other parts of the system to maintain persistent threats.

Prevent that by emitting a log entry for each activity such as: add, change/update, delete.

Use the Python logging module:

import logging

To emit each log entry, use the loggin method so that logs can be filtered by level. In order of severity:

logging.critical("CRITICAL - Can't ... Aborting!") # A serious error. The program itself may be unable to continue running. Displayed even in production runs.
logging.error("ERROR - Program cannot do it!") # A serious problem: the software is not been able to perform some function. Displayed even in production runs.
logging.warning("WARNING - unexpected!")  # The software is still working as expected. But may be a problem in the near future (e.g. ‘disk space low’). 
logging.info("INFO - version xxx")  # Provides confirmation that things are working as expected.
logging.debug('DEBUG - detailed information such as each iteration in a loop used during troubleshooting at the lowest level of detail.')
   

At run-time, specify the highest level to display during that run:

python3 pylogging.py --log=INFO
   
  • CRITICAL = 50
  • FATAL = CRITICAL
  • ERROR = 40
  • WARNING = 30
  • WARN = WARNING
  • INFO = 20
  • DEBUG = 10
  • NOTSET = 0

CRITICAL, FATAL, and ERROR are always shown.

WARN (WARNING) is the default verbosity level. Set the default:

    logging.basicConfig(level=logging.WARNING)
    logging.basicConfig(format='%(asctime)s %(levelname)s - %(message)s', datefmt='%H:%M:%S')
    #logging.basicConfig(level=logging.DEBUG,filename='example.log')
    
    </pre>

Also, provide a run-time option for outputing to a file:

logging.basicConfig(filename='app.log', filemode='w', format='%(name)s - %(levelname)s - %(message)s')
   

CAUTION: Be careful to not disclose sensitive information in logs. Encrypt plaintext.

The logging module also allows you to capture the full stack traces in an application.

-q (for -quiet) suppresses INFO headings.

-v (for -verbose) to display DEBUB messages.

-vv to display TRACE messages.

Use assert only during testing

PROTIP: By default, python executes with “debug” = “true” so asserts are processed by the Python interpreter. But in production when the program is run in optimized mode, “debug” = “true” so assert statements are ignored.

So avoid coding the sample code below which uses a comma that acts as an if/then:

def get_clients(user):
    assert is_superuser(user),  # user is not a member of superuser group
    return db.lookup('clients')

In the above code, the user ends up with access to a resource with improper authentication controls.

Instead (to remediate), use a if-else logic to implement true and false conditions.

https://app.pluralsight.com/library/courses/using-unit-testing-python/table-of-contents

Concurrency Programming

https://app.pluralsight.com/library/courses/python-concurrency-getting-started

Bit-wise operators

https://app.pluralsight.com/course-player?clipId=5802d30b-69a9-4679-8594-53854739368a

https://techstudyslack.com/ a Slack for people studying tech

Stegnography

https://packetstormsecurity.com/files/165102/Stegano-0.10.1.html Stegano implements two methods of hiding: using the red portion of a pixel to hide ASCII messages, and using the Least Significant Bit (LSB) technique. It is possible to use a more advanced LSB method based on integers sets. The sets (Sieve of Eratosthenes, Fermat, Carmichael numbers, etc.) are used to select the pixels used to hide the information.

Parallel Computing

Multithreading, Multiprocessing, Concurrency & Parallel programming in Python for high performance.

Use multiple threads, processes, mutexes, barriers, waitgroups, queues, pipes, condition variables, deadlocks, and more.

https://www.udemy.com/course/parallel-computing-in-python/

On LinkedIn Learning: “Python Parallel and Concurrent Programming2h 11m Part 1 and Part 2 (using Python 3.7.3 on Windows PC machines) by Barron Stone and Olivia Chiu Stone Advanced

  • A Mutex can only be acquired/released by the same thread.
    A Semaphore can be acquired/released by different threads.

More about Python

This is one of a series about Python:

  1. Python install on MacOS
  2. Python install on MacOS using Pyenv
  3. Python install on Raspberry Pi for IoT

  4. Python tutorials
  5. Python Examples
  6. Python coding notes
  7. Pulumi controls cloud using Python, etc.
  8. Jupyter Notebooks provide commentary to Python

  9. Python certifications

  10. Test Python using Pytest BDD Selenium framework
  11. Test Python using Robot testing framework
  12. Testing AI uses Python code

  13. Microsoft Azure Machine Learning makes use of Python

  14. Python REST API programming using the Flask library
  15. Python coding for AWS Lambda Serverless programming
  16. Streamlit visualization framework powered by Python
  17. Web scraping using Scrapy, powered by Python
  18. Neo4j graph databases accessed from Python