Parsing unique .txt messages issue in python?!

Thread Starter

JimmyCho

Joined Aug 1, 2020
109
Hello world of python programming .

I am trying and struggling to parse a series messages from text file that has different unique patterns and save them as txt files using python programming.


I have as input txt file:

Code:
    [#11:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    INFO isn't NULL
    [#12:25][PERFECT][0x0015a] process returned as NULL load index[1] , length[20] , type[0]
    [#13:3][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    PERFECT isn't NULL
    [#4:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    Time is here [Tick:135055] , Time:  17, index: 608, CastedType:20002, area :0
    [#15:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    [#16:25][PERFECT][0x0015a] process returned as NULL load index[1] , length[20] , type[0]
    [#17:3][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    [#8:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    time is here [Tick:135055] , Time:  17, index: 608, CastedType:20002, area :0
    [#16:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    [#14:25][PERFECT][0x0015a] process returned as NULL load index[1] , length[20] , type[0]
    [#18:3][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    [#6:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    Time is here [Tick:135055] , Time:  17, index: 608, CastedType:20002, area :0

this is the type formats of all rows that txt have , so each row is repeated on given txt file and it has its own unique pattern as I showed above, where the key words [INFO] , [PERFECT] are not changed per the message those key words values are not changed in this message pattern.
consider each row is a new message , so at each row there is a new message starts irrelative to the other messages.


what Im trying to implement in python a function that reads line by line the given txt file to a function as input and all rows there has different patterns of messages as I mentioned above and to dump all rows / messages that has the certain type of keyword [PERFECT] :

Code:
    [#12:25][PERFECT][0x0015a] process returned as NULL load index[1] , length[20] , type[0]

to another txt file. so then if I go to another txt file I shall see all rows there has this type of messages:

Code:
    [#12:25][PERFECT][0x0015a] process returned as NULL load index[1] , length[20] , type[0]

Now after sniffing this type of message that has keyword [PERFECT] from the given txt(input txt) , I need to read line by line the new txt file that I generated that has the certain message type and then take the load index values and dump them in another txt file that has just the values of load index.


Function description:

Given txt file to the function as input that txt file has all messages patterns that I mentioned above and at each row there is new message:

Code:
    [#11:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    INFO isn't NULL
    [#12:25][PERFECT][0x0015a] process returned as NULL load index[1] , length[20] , type[0]
    [#13:3][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    PERFECT isn't NULL
    [#4:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    Time is here [Tick:135055] , Time:  17, index: 608, CastedType:20002, area :0
    [#15:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    [#16:25][PERFECT][0x0015a] process returned as NULL load index[1] , length[20] , type[0]
    [#17:3][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    [#8:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    time is here [Tick:135055] , Time:  17, index: 608, CastedType:20002, area :0
    [#16:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    [#14:25][PERFECT][0x0015a] process returned as NULL load index[1] , length[20] , type[0]
    [#18:3][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    [#6:23][INFO][0x0015a] it's here and it's optimally required start index[1] , length[15]
    Time is here [Tick:135055] , Time:  17, index: 608, CastedType:20002, area :0


Results/output of the function:


1. Generating txt file that has all rows of the **certain pattern** that I explained above (all rows that has word **[PERFECT]** so the generated txt file shall be having all messages / rows that has **[PERFECT]** :

Code:
    [#12:25]**[PERFECT]**[0x0015a] process returned as NULL load index[1] , length[20] , type[0]
    [#16:25]**[PERFECT]**[0x0015a] process returned as NULL load index[1] , length[20] , type[0]
    [#14:25]**[PERFECT]**[0x0015a] process returned as NULL load index[1] , length[20] , type[0]

2. Then generating a another new txt file for the load index values which in my case load index values found inside [ ] of the word load index ( load index [value] ), so the function shall dump in new txt file the values of the load index **as column** into the another new generated txt file :


1
1
1




How to parse and implement in python that function functionality?




thanks alot for any cooperation !
 

djsfantasi

Joined Apr 11, 2010
9,163
Disclosure: I'm not familiar with Python, but I am very good at parsing.

First, I note that your input file contains three tokens followed by a variable length description (more later)

First, I’d create a token array of four elements. Each element contains a string. The actual data type definition depends on the availability of strings in Python. I’d also initialize a token index to one less than the index for the first element of the array.

I’d walk the input character by character. If an open bracket is found, I’d increment the token index and set a flag that I’m in a token. If I’m already in a token, I’d append that char to the token element.

If I find a close bracket or a new line character (or otherwise at the end of a line, I’d turn off the flag indicating I was in a token.

Finally, if the token index shows that I’m looking for the fourth token , I’d also set the flag that I’m in a token.

Looping through these last three steps will parse out the data necessary. To create your intermediate file is just s matter of formatting your output.

You can use this array to directly output your final file… you only have to parse the fourth token. It looks like the value you want will always be in the same position, so you could trim the 4th token to that position and then build a string of the characters before the closing parentheses.

These techniques or algorithm can be adapted to almost any situation. I use it to parse code and run-time parms in a language that I developed.
 

Thread Starter

JimmyCho

Joined Aug 1, 2020
109
Disclosure: I'm not familiar with Python, but I am very good at parsing.

First, I note that your input file contains three tokens followed by a variable length description (more later)

First, I’d create a token array of four elements. Each element contains a string. The actual data type definition depends on the availability of strings in Python. I’d also initialize a token index to one less than the index for the first element of the array.

I’d walk the input character by character. If an open bracket is found, I’d increment the token index and set a flag that I’m in a token. If I’m already in a token, I’d append that char to the token element.

If I find a close bracket or a new line character (or otherwise at the end of a line, I’d turn off the flag indicating I was in a token.

Finally, if the token index shows that I’m looking for the fourth token , I’d also set the flag that I’m in a token.

Looping through these last three steps will parse out the data necessary. To create your intermediate file is just s matter of formatting your output.

You can use this array to directly output your final file… you only have to parse the fourth token. It looks like the value you want will always be in the same position, so you could trim the 4th token to that position and then build a string of the characters before the closing parentheses.

These techniques or algorithm can be adapted to almost any situation. I use it to parse code and run-time parms in a language that I developed.
Thanks much for your clarification, Im still confused and afraid that I didnt get you ...
can you write in the language that you're familiar with(any language that you're familiar with)? maybe I can understand the logic .
 

djsfantasi

Joined Apr 11, 2010
9,163
Thanks much for your clarification, Im still confused and afraid that I didnt get you ...
can you write in the language that you're familiar with(any language that you're familiar with)? maybe I can understand the logic .
I’m looking for the code on my phone. I might have to boot up my laptop to find it. Give me a while…
 

Ya’akov

Joined Jan 27, 2019
9,165
This is not really a direct answer to your question because I am not a Python programmer, but part of your solution is to use regular expressions which are very powerful ways of writing textual pattern to test and capture parts of them.

The code below is a perl "one-liner" that would sit between reading line by line and writing out the files. The salient points in the code are that I am testing to see if the line contains "[PERFECT]" and at the same time capturing the value of load_index().

The =~ operator is the regular expression test in perl, it will be something else in Python. The // surrounding the text is the match operator, and the stuff inside them are elements of the regular expression. The () captures whatever is in string specified between them, and it assigns it to a special variable ($1 for the first, $2 for the second, etc.). If the test for [PERFECT] succeeds, the capture will contain the load_index value.

The print statement is pseudocode to writing out the two files, there is no reason for two passes. The two lines below are the program's output.

Perl:
perl -wle 'my $line = "\[#14:25\]\[PERFECT\]\[0x0015a\] process returned as NULL load index\[1\] , length\[20\] , type\[0\]"; if ($line =~ /.+\[PERFECT\].+load index\[(.+)\] , length\[/) {my $load_index = $1}; print "WRITE \"$line\" TO lines.txt\nWRITE \"$1\" TO indices,txt\n";'

WRITE "[#14:25][PERFECT][0x0015a] process returned as NULL load index[1] , length[20] , type[0]" TO lines.txt
WRITE "1" TO indices,txt
 

hrs

Joined Jun 13, 2014
397
Here is a basic python scheme but with printing instead of writing to a file.
Code:
#!/usr/bin/env python

with open('Text File.txt') as my_file:
    for line in my_file:
        if "[PERFECT]" in line:
            print(line)
            for word in line.split():
                if 'index[' in word:
                    print(word[6:].strip(']'))
 

Thread Starter

JimmyCho

Joined Aug 1, 2020
109
Here is a basic python scheme but with printing instead of writing to a file.
Code:
#!/usr/bin/env python

with open('Text File.txt') as my_file:
    for line in my_file:
        if "[PERFECT]" in line:
            print(line)
            for word in line.split():
                if 'index[' in word:
                    print(word[6:].strip(']'))
Appreciated ! , in order to print to specific txt output file , then what shall I do? appreciated .. I shall replace print but by what for writing and generating into txt file?
 

djsfantasi

Joined Apr 11, 2010
9,163
I’m not sending my code, because you have sample code in Python. Good luck

UPDATE: I tried to find the code on my laptop. I remembered that it was written in FreeBASIC. But, my laptop is “hosed”. Each click takes 5-10 minutes to execute.I couldn’t access the code.
 
Last edited:
Top