Python dictionary Vs class variables

Thread Starter

strantor

Joined Oct 3, 2010
6,798
I have a rather large multi-threaded (using PyQt5's threading provisions; Qrunnable, QThreadPool, etc.) Python script which is performing at least 10 repetitive/concurrent functions, running in perpetuity. For each of these threaded functions, I pass class variables from the PyQt GUI parent thread as arguments into the function, put the function into the queue/QThreadpool, and when the function runs it emits signals which I copy back into class variables. When the function ends, repeat.

The latest function that I added to the script is one where I can send an email to request the status of various things (the values of various class variables) and it will reply with whatever I asked for. There are hundreds of class variables, probably >1,000. Formatting this data for email is proving to be quite a chore. I have to deliberately add every single variable that I want to send. For example, let's say I request the state of variables related to the "override" function by sending an email containing "OVERRIDE INFO":

Python:
if self.emailReceived:
    emailStatusData = {}
    if self.dataRequested == "OVERRIDE INFO":
        emailStatusData["systemsEnabled"] = self.systemsEnabled
        emailStatusData["waitingOnApproval"] = self.waitingOnApproval
        emailStatusData["overrideMode"] = self.overrideMode
        emailStatusData["overrideState"] = self.overrideState
        (etc. many more variables)
    elif self.dataRequested == "CLEANOUT INFO":
        emailStatusData["hopperEmpty"] = self.hopperEmpty
        emailStatusData["cleanoutRequired"] = self.cleanoutRequired
        emailStatusData["cleanoutPerformed"] = self.cleanoutPerformed
        emailStatusData["cleanoutRunning"] = self.cleanoutRunning
        (etc. many more variables)
    elif....
      (etc., many more options)
    self.sendReply(emailStatusData)
I would get the values of [systemsEnabled, waitingOnApproval, overrideMode, overrideState, et. al.].


As I expand the options for these email requests, I have to go scour through my code, collect all the variables I want to be part of the groups, and add them to the appropriate cluster of "emailStatusData["thing"] = self.thing". It's a pain. And I'm adding new class variables all the time (this script is under perpetual development), and I have to remember every time a create I new one, to go add it to one of these clusters.



So to make things easier, what I'm thinking of doing, is replacing all of my class variables with a single (nested) dictionary, so:

Code:
self.systemsEnabled     becomes     self.systemVariables["overrideData"]["systemsEnabled"]
self.waitingOnApproval  becomes     self.systemVariables["overrideData"]["waitingOnApproval"]
self.overrideMode       becomes     self.systemVariables["overrideData"]["overrideMode"]
self.overrideState      becomes     self.systemVariables["overrideData"]["overrideState"]
(etc. many more variables)

self.hopperEmpty        becomes     self.systemVariables["cleanoutData"]["hopperEmpty"]
self.cleanoutRequired   becomes     self.systemVariables["cleanoutData"]["cleanoutRequired"]
self.cleanoutPerformed  becomes     self.systemVariables["cleanoutData"]["cleanoutPerformed"]
self.cleanoutRunning    becomes     self.systemVariables["cleanoutData"]["cleanoutRunning"]
(etc. many more variables)
I could write a quick script to automate the changeover, and then from there, my variables are all grouped logically, and sending the right data in the email reply becomes as easy as:

Python:
if self.emailReceived:
    if self.dataRequested == "OVERRIDE INFO":
        self.sendReply(self.systemVariables["overrideData"])
    elif self.dataRequested == "CLEANOUT INFO":
        self.sendReply(self.systemVariables["cleanoutData"])
    elif....
        (etc., many more options, INCLUDING...)
    elif self.dataRequested == "ALL SYSTEM INFO":
        self.sendReply(self.systemVariables)

It seems like a fine idea to me, but what I don't know (among most other things) is the performance characteristics of Python's dictionary objects vs class variables. These class variables are being read by and written by multiple threads at once, some of them change once per day, some of them change thousands of times per second, etc.

By putting them all into a single dictionary would I be creating a bottleneck where only one variable can be written at a time?
(or is that how it is already?)
By putting them all into a single dictionary would I be creating a bottleneck where only one variable can read at a time?
(or is that how it is already?)
Would I be adding some delay each time a variable is accessed, as now it has to be looked up in the dictionary?
(or is that how it is already?)
I don't really know what's going on beneath all the layers of the Python onion, so I don't know what to expect.
I don't even know if these are the right questions to be asking.
What else is there to consider?
 
Last edited:

hrs

Joined Jun 13, 2014
400
I think class variables (attributes) are implemented as a dictionary anyway.
Code:
>>> class Meh:
    a = 1

>>> Meh.__dict__
mappingproxy({'__module__': '__main__',
              'a': 1,
              '__dict__': <attribute '__dict__' of 'Meh' objects>,
              '__weakref__': <attribute '__weakref__' of 'Meh' objects>,
              '__doc__': None})
So with your proposed solution you'll end up with a dict in a dict in a dict. Xzibit would approve but it seems a bit convoluted. And wouldn't it start to show up everywhere? Instead of the nice and clean self.systemsEnabled you'll get
self.systemVariables["overrideData"]["systemsEnabled"] everwhere when you need to use it?

Maybe you could try something like this untested code:
Code:
self.dataGroups = {"OVERRIDE INFO": ["systemsEnabled", "waitingOnApproval", "overrideMode", "overrideState"],
              "CLEANOUT INFO": ["hopperEmpty", "cleanoutRequired", "cleanoutPerformed", "cleanoutRunning"]}

if self.emailReceived:
    emailStatusData = {}
    try:
         dataAttributes = self.dataGroups[self.dataRequested]
         for attr in dataAttributes:
             emailStatusData[attr] = getattr(self, attr)
    except KeyError:
        print(self.dataRequested, "not available")
    self.sendReply(emailStatusData)
 

Thread Starter

strantor

Joined Oct 3, 2010
6,798
I think class variables (attributes) are implemented as a dictionary anyway.
Yes, you are correct. I have been doing some research and I also re-posted this on the python.org forum and the discussion evolved into more of a discussion about the speed of accessing class attributes (accessing the class' dictionary by means of addressing class attributes) vs the speed of accessing entries of a dictionary that you create (as a single class attribute). Supposedly (at least, according this), accessing class attributes should be faster because "Dictionaries of classes are protected by mappingproxy . The proxy checks that all attribute names are strings, which helps to speed-up attribute lookups." I wrote a script to test this, and found the opposite to be true (more details at the end).

So with your proposed solution you'll end up with a dict in a dict in a dict. Xzibit would approve but it seems a bit convoluted. And wouldn't it start to show up everywhere? Instead of the nice and clean self.systemsEnabled you'll get
self.systemVariables["overrideData"]["systemsEnabled"] everwhere when you need to use it?
The -dict within a dict- (within a dict(within a dict(within...))), I do not think is so convoluted. That's the way everything is in the background anyway. I like the organization that it offers. However your point about accessing that nested data (self.systemVariables["overrideData"]["systemsEnabled"]) I am totally in agreement with. For the sake of simplicity I only discussed two levels of dicts here but I am planning at least 3 for most objects, and some of those objects will actually be, you guessed it, nested dictionaries. So there may be something as clumsy as self.systemVariables["guiData"]["displayData"]["RFID"]["statusLog"][timestamp]["rfidTopStatus"]["header"]["KiB Mem"]["free"] hanging out somewhere in my script. I'm not thrilled about that aspect of it, but I can start by simplifying my attribute names (they won't need to be so descriptive since they'll be categorized) and then from there, are "dot." notation solutions I am evaluating. So self.systemVariables["overrideData"]["systemsEnabled"] could instead be self.v.override.sysEnabled

Maybe you could try something like this untested code:
I do like that. I will have to down-shift a few gears and really think about it because I've currently got a lot of inertia moving in a different direction.



So here's the script I wrote. It creates 4 million variables/attributes and measures the time taken. It compares the creation and random read/write of:
1. classA(): 100,000 class attributes
2. classB(): a single class attribute (single-level dictionary with 100,000 entries)
3. classC(): 100,000 class attributes again (but the script is different)
4. classD(): a single class attribute (5-level dictionary with 100,000 entries)

attributeSpeedTest.py:
import time
import random

class classA:

    def __init__(self):
        self.doThing1()
        self.doThing2()

    def doThing1(self):
        # create 100,000 new class attributes and assign a value of something other than 1,000
        for i in range (0,100000):
            name = "attribute" + str(i)
            value = random.randint(0,1000)
            if value == 1000:
                value = 999
            setattr(self,name,value)

    def doThing2(self):
        # 100,000 times, look up a random atribute and change its value to 1,000
        for i in range (0,100000):
            name = "attribute" + str(random.randint(0,10000))
            setattr(self,name,1000)


class classB:

    def __init__(self):
        self.attribute = {}
        self.doThing1()
        self.doThing2()

    def doThing1(self):
        # create a single one-level nested dictionary attribute with 100,000 entries and assign
        # a value of something other than 1,000 to each
        for i in range (0,100000):
            name = str(i)
            value = random.randint(0,1000)
            if value == 1000:
                value = 999
            self.attribute[name] = value

    def doThing2(self):
        # 100,000 times, look up a random key in the dict attribute and change its value to 1,000
        for i in range (0,100000):
            name = str(random.randint(0,10000))
            self.attribute[name] = 1000

class classC:

    def __init__(self):
        self.doThing1()
        self.doThing2()

    def doThing1(self):
        # create 100,000 new class attributes and assign a value of something other than 1,000 to each
        # (Simpler ways to do this (ex: classA), but did it this way to keep consistent with classD
        for i1 in range (0,10):
            for i2 in range(0, 10):
                for i3 in range(0, 10):
                    for i4 in range(0, 10):
                        for i5 in range(0, 10):
                            name = "attribute"+str(i1)+str(i2)+str(i3)+str(i4)+str(i5)
                            value = random.randint(0,1000)
                            if value == 1000:
                                value = 999
                            setattr(self,name,value)


    def doThing2(self):
        # 100,000 times, look up a random class attribute and change its value to 1,000
        for i1 in range (0,10):
            tier1 = str(random.randint(0,9))
            for i2 in range(0, 10):
                tier2 = str(random.randint(0, 9))
                for i3 in range(0, 10):
                    tier3 = str(random.randint(0, 9))
                    for i4 in range(0, 10):
                        tier4 = str(random.randint(0, 9))
                        for i5 in range(0, 10):
                            tier5 = str(random.randint(0, 9))
                            name = "attribute" + str(tier1)+str(tier2)+str(tier3)+str(tier4)+str(tier5)
                            setattr(self,name,1000)

class classD:

    def __init__(self):
        self.attribute = {}
        self.doThing1()
        self.doThing2()

    def doThing1(self):
        # create a single 5-level nested dictionary attribute with 100,000 entries and assign
        # a value of something other than 1,000 to each
        for i1 in range (0,10):
            self.attribute[str(i1)] = {}
            for i2 in range (0,10):
                self.attribute[str(i1)][str(i2)] = {}
                for i3 in range(0, 10):
                    self.attribute[str(i1)][str(i2)][str(i3)] = {}
                    for i4 in range(0, 10):
                        self.attribute[str(i1)][str(i2)][str(i3)][str(i4)] = {}
                        for i5 in range(0, 10):
                            value = random.randint(0,1000)
                            if value == 1000:
                                value = 999
                            self.attribute[str(i1)][str(i2)][str(i3)][str(i4)][str(i5)] = value

    def doThing2(self):
        # 100,000 times, look up a random key in the nested dict attribute and change its value to 1,000
        for i1 in range (0,10):
            tier1 = str(random.randint(0,9))
            for i2 in range(0, 10):
                tier2 = str(random.randint(0, 9))
                for i3 in range(0, 10):
                    tier3 = str(random.randint(0, 9))
                    for i4 in range(0, 10):
                        tier4 = str(random.randint(0, 9))
                        for i5 in range(0, 10):
                            tier5 = str(random.randint(0, 9))
                            self.attribute[tier1][tier2][tier3][tier4][tier5] = 1000

classAruntimes = []
classBruntimes = []
classCruntimes = []
classDruntimes = []
iters = 10
for iteration in range(0,iters):
    start = time.time()
    for i in range (0,10):
        myStuff = classA()
    runtime = round(time.time()-start,5)
    classAruntimes.append(runtime)

    start = time.time()
    for i in range (0,10):
        myStuff = classB()
    runtime = round(time.time()-start,5)
    classBruntimes.append(runtime)

    start = time.time()
    for i in range(0, 10):
        myStuff = classC()
    runtime = round(time.time() - start, 5)
    classCruntimes.append(runtime)

    start = time.time()
    for i in range(0, 10):
        myStuff = classD()
    runtime = round(time.time() - start, 5)
    classDruntimes.append(runtime)
    print("Iteration",(iteration+1),"of",iters,"complete.")

def avg(timeList):
    avg = 0
    for value in timeList:
        avg += value
    avg = avg/len(timeList)
    return round(avg,5)
print("class A runtimes:", classAruntimes, "average:",avg(classAruntimes))
print("class B runtimes:", classBruntimes, "average:",avg(classBruntimes))
print("class C runtimes:", classCruntimes, "average:",avg(classCruntimes))
print("class D runtimes:", classDruntimes, "average:",avg(classDruntimes))
Here is the result:
Code:
class A runtimes: [4.56826, 5.18130, 5.09429, 4.61726, 5.35331, 4.69626, 4.44063, 4.44963, 4.40563, 4.89768] average: 4.77043
class B runtimes: [4.50026, 4.50126, 4.03561, 3.91660, 5.42831, 4.02703, 4.03323, 3.91722, 3.83122, 4.08623] average: 4.22770
class C runtimes: [6.81964, 6.81471, 6.66738, 6.66138, 6.83636, 6.48337, 6.56238, 6.46137, 6.43137, 6.61538] average: 6.63533
class D runtimes: [5.82065, 5.58732, 5.66032, 6.01334, 5.67632, 5.41231, 5.37531, 5.37131, 5.38068, 5.44031] average: 5.57379
Breaking down the data:
  • classA (100,000 class attributes) Vs. classB (single-level dictionary with 100,000 entries):
    • It is 12.85% faster to use a single class attribute (single-level dictionary) to store 100,000 variables than it is to use 100,000 class attributes
  • classC (100,000 class attributes) Vs. classD (5-level dictionary with 100,000 entries):
    • There is probably a simpler way to write classD but I didn't find it.
    • Because classD is a bit convoluted, I wrote a rev of classA in the same convoluted way as classD, and called it classC.
    • classD was 19.05% faster than classC
  • classA (100,000 class attributes, simple) Vs. classC (100,000 class attributes, convoluted):
    • The only point of classC is to isolate that portion of [classD's speed increase over classB] which is due to the convoluted way I wrote it
    • classA and classC are doing the exact same thing, but classC took 1.8649 seconds longer than classA
  • classB (single-level dictionary with 100,000 entries) Vs. classD (5-level dictionary with 100,000 entries):
    • Since 1.8649 seconds can be attributed to convoluted code, classD's "corrected" time would be 3.70889 seconds.
    • classE (theoretical, 5-level dictionary with 100,000 entries, NOT convoluted) would probably be:
      • the winner of all of these classes.
      • 14.00% faster than classB (single-level dictionary with 100,000 entries)
      • 28.62% faster than classA (100,000 class attributes)

So, my conclusion is that contrary to my own suspicions and contrary to what other, more knowledgeable people have said, a single dictionary class attribute is appreciably faster than multiple class attributes. And even more counterintuitively, the deeper you nest your data, the faster it is to access it.

-OR, (equally or more likely) I'm such a bad programmer that I can't even write an effective script to answer a simple question. But I don't think this is the case because another guy on the Python forum did his own test, and it seemed to be about in line with my results (although he didn't test multi-level nested dictionaries).

So at this point I am in favor of the single class attribute of nested dictionaries but I need to:
  • test out some of the "dot." notation solutions and see how they impact performance
  • think hard about the solution suggested here by hrs and see if it would address most of my needs and desires
 
Last edited:

Thread Starter

strantor

Joined Oct 3, 2010
6,798
It sounds like you have an impressive project on your hands.
Impressive in scale maybe, but if a real Python programmer ever sifted through it they would probably want to take a bath afterwards.

For that level of nesting you may look at a recursive way of traversing the dictionaries where it is to your advantage. Here's a link for some inspiration:
https://stackoverflow.com/questions/15436318/traversing-a-dictionary-recursively
Thanks. I have a function which does this, but it is not as elegant as ones found in the link you provided. It only goes 6 levels deep, because it's like (posting from phone):
for k, v in level1.items:
....for k, v in level2.items:
........for k,v in level3.items:
............and so on
So far, 6 levels has been enough, but knew that I would outgrow it at some point and I would look for a more legitimate solution as I don't want to just keep piling "for k, v in levelxx.items" statements onto the pile. So thanks for the link, it is probably about to become handy shortly.
Also, are you familiar with the cProfile module? It allows you to have a more fine grained look into the performance of your programs.
No I haven't. I will on your suggestion though. Although I must admit, the name frightens me a bit. Messing with wrapped C stuff has proven to be poking the bear in my case. I've spent countless hours, days, weeks, trying to troubleshoot things that just <CRASH> Python, no exception thrown, just "Python has stopped working" with the little swirly wheel
 

Thread Starter

strantor

Joined Oct 3, 2010
6,798
It sounds like you have an impressive project on your hands. For that level of nesting you may look at a recursive way of traversing the dictionaries where it is to your advantage. Here's a link for some inspiration:
https://stackoverflow.com/questions/15436318/traversing-a-dictionary-recursively
Here's what I came up with based on the link you provided:
Python:
def unNestDict(inputDict, thisTier=0,maxTier=10):
    thisTier += 1
    op = ""
    for key, val in inputDict.items():
        op += "\n" + (".." * (thisTier-1)) + (str(key) + ":  ")
        if isinstance(val, dict):
            if thisTier+1<=maxTier:
                nextTier = unNestDict(val, thisTier,maxTier)
                op += str(nextTier[0])
            else:
                op += "{dict}"
        else:
            op += str(val)
    if thisTier == 1:
        return op
    else:
        return op, thisTier

mylib = {}
mylib["key1"] = "afdasdf"
mylib["key2"] = "fgjfgh"
mylib["key3"] = {}
mylib["key3"]["key1"] = "qewdsaf"
mylib["key3"]["key2"] = {}
mylib["key3"]["key2"]["key1"] = "ktyhrgfsd"
mylib["key4"] = {}
mylib["key4"]["key1"] = {}
mylib["key4"]["key1"]["key1"] = "sdfghetyh"
mylib["key5"] = "sdfheytj"
mylib["key6"] = "sdfheytj"
print(unNestDict(mylib,maxTier=1),"\n\n")
print(unNestDict(mylib,maxTier=2),"\n\n")
print(unNestDict(mylib,maxTier=3),"\n\n")
print(unNestDict(mylib,maxTier=4),"\n\n")
It returns a human-readable indent-formatted text that should be sent as attachments in emails.
output:

Code:
key1:  afdasdf
key2:  fgjfgh
key3:  {dict}
key4:  {dict}
key5:  sdfheytj
key6:  sdfheytj



key1:  afdasdf
key2:  fgjfgh
key3:
..key1:  qewdsaf
..key2:  {dict}
key4:
..key1:  {dict}
key5:  sdfheytj
key6:  sdfheytj



key1:  afdasdf
key2:  fgjfgh
key3:
..key1:  qewdsaf
..key2:
....key1:  ktyhrgfsd
key4:
..key1:
....key1:  sdfghetyh
key5:  sdfheytj
key6:  sdfheytj



key1:  afdasdf
key2:  fgjfgh
key3:
..key1:  qewdsaf
..key2:
....key1:  ktyhrgfsd
key4:
..key1:
....key1:  sdfghetyh
key5:  sdfheytj
key6:  sdfheytj



Process finished with exit code 0
Real-word example output, a single entry (every-5-minute log) dict from one of my scripts which monitors the health of a running RFID reader and fetches local weather to see if there is any correlation between periodic failures and ambient conditions or process memory usage:

Code:
2021-04-19 16:12:38.353666:
..localWeather:
....coord:
......lon:  -xx.xxxx
......lat:  yy.yyyy
....weather:
......id:  800
......main:  Clear
......description:  clear sky
......icon:  01d
....base:  stations
....main:
......temp:  72.39
......feels_like:  71.08
......temp_min:  69.8
......temp_max:  73.99
......pressure:  1016
......humidity:  37
....visibility:  10000
....wind:
......speed:  9.22
......deg:  70
....clouds:
......all:  1
....dt:  1618866630
....sys:
......type:  1
......id:  3944
......country:  US
......sunrise:  1618832925
......sunset:  1618879746
....timezone:  -18000
....id:  4704108
....name:  La Porte
....cod:  200
..rfidWebStatus:
....deviceName:  Bagline1Izar
....temperature:  47
....region:  North America
....readerOS:  5.7.0.19 (2020-07-09T04:07:33-0400)
....webUI:  5.7.0.19 (2020-07-09T04:24:24-0400)
....apps:  5.7.0.19 (2020-07-09T04:07:33-0400)
....afe:  M6e Micro HWVer:20.00.00.01 BootVer:12.12.13.00 AppVer:01.0B.03.07 AppDate:2019.02.20
....status:  Running
..rfidTopStatus:
....header:
......time:  16:12:40
......up:  8:50
......user:  1
......load average:  [2.79, 2.41, 2.46]
......Tasks:
........total:  109
........running:  1
........sleeping:  108
........stopped:  0
........zombie:  0
......%Cpu(s):
........us:  21.9
........sy:  42.5
........ni:  0.0
........id:  34.6
........wa:  0.0
........hi:  0.0
........si:  1.0
........st:  0.0
......KiB Mem:
........total:  508296
........used:  263056
........free:  245240
........buffers:  26324
......KiB Swap:
........total:  0
........used:  0
........free:  0
........cached:  81740
....body:
......1744:
........PID:  1744
........USER:  root
........PR:  20
........NI:  0
........VIRT:  63116
........RES:  3428
........SHR:  2288
........S:  S
........%CPU:  15.0
........%MEM:  0.7
........TIME+:  75:18.32
........COMMAND:  tmmpd.bin
(hundreds more PIDs)
 
Top