# Returning required subString from a given string

#### JimmyCho

Joined Aug 1, 2020
97
Hi guys !
In brief, I want to build a function in c++ that it's input is a given string with size 300, and the second input is the specified substring. this function is finds if there's my inversion of a given substring (second input) is found in a given string (first input) and if it's found then the function returns new string which its beginIndex is starting from the last index of appearance/occurrence of a substring in my given string and the endIndex of the returning new string is (last index of apperance/occurance of substring + 10offset).
assumptions :
1)the string is binary data this mean "0101111......111"
2)my substring-first input- is constant string and it's "1010" and my string -second input- is always starts with "1010" and could be repeated many times continuously in my string like "101010101010 .. " here 3 times repeated .. , so because my substring is constant which it's "1010", the inversion substring is alwaso constant which it's the invert of "1010" .... in my acase the inversion substring is "0101".
3) the substring is constant and already known in my question it's "0101" so the inversion substring string is also already known and constant which it's "1010" (the inversion of "0101").

According to what I explained above, I will clarify more by an example :
lets assume I have string like this "10101010101001010000000111" , and my substring is as what I said above constant "1010" , I want by my function coded in c++ to search for first occurrence of the inversion of my substring ("1010") which it's "0101" -I marked it by deep black color in my example- and then I will retrun as new string the followed 10 offset data from the last index of occurrence of my inversion substring, in my example I will return
"0000000111" , in other words I will step 10 offset in my string from last index of occurrence of my inversion substring (what I marked in red color)

I hope I explained my problem in detail, and just a note the number of occurences of my substring is uncountable .. could be repeated many times in my given string ... but once the inversion substring detected I must return the new string according to what I explained above, it could be more than inversion words repeated so I need to return all the new strings that are followed by the inversion substrings, for instance:
"1010101010100101000000011110101010101001010000001111" here there's two times my inversion substring occurred,

so I need to return followed 10 offset data of my string (the red data) , the output is an array of string (what I marked in red in my string example)
: {"0000000111", "0000001111" }

If the searched inversion substring in my string not found, then return -1 or whatever logically acceptable ..

thanks alot for any help/assistance guys!

#### WBahn

Joined Mar 31, 2012
26,398
It sounds like the first thing you need to do is decide what this function is supposed to return. Is it returning a string? Is it returning an array of strings? Is it returning an error code?

You need to decide up front what type of data is going to be returned and then make sure that you can return an appropriate value of that type.

You also have conflicting information. In part of your description you said that you want to search for first occurrence, but then you talk about 10 offset from the last index of occurrence, and then you talk about returning each occurrence.

You really need to clearly decide exactly what it is you want to do.

For instance, given the string that you used in your example, which of the following should and should not be returned and why?

1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111

What, if anything, should be returned if there aren't ten characters in the string following the substring being sought?

#### JimmyCho

Joined Aug 1, 2020
97
It sounds like the first thing you need to do is decide what this function is supposed to return. Is it returning a string? Is it returning an array of strings? Is it returning an error code?

You need to decide up front what type of data is going to be returned and then make sure that you can return an appropriate value of that type.

You also have conflicting information. In part of your description you said that you want to search for first occurrence, but then you talk about 10 offset from the last index of occurrence, and then you talk about returning each occurrence.

You really need to clearly decide exactly what it is you want to do.

For instance, given the string that you used in your example, which of the following should and should not be returned and why?

1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111

What, if anything, should be returned if there aren't ten characters in the string following the substring being sought?
Sorry about the missing understandings, Im not searching for just first occurrence otherwise Im searching for any occurrences once appeared ... at every occucurrence I want to retrun a string that's length 10 (10 offset from the last index of occurrence)
you can assume there's a fixed input this means that always there's 10 offset data from the last index of occurrence the substring.
the output is an array of strings, which it depends on how many times the inversion substring that I explained above is occurred/appeared in my given string.
my function is outputs the strings (array of strings) which every string is 10 offset from the last index of every occurrence of my inversion substring...

if there's no occurrence of inversion substring then return -1 or Null or any logically returned value ..

#### JimmyCho

Joined Aug 1, 2020
97
About the given string, assume a fixed correctness on the given string, this means in the given string always if appears/occurs the substring there's 10 offset data from the last index of occurrence the substring in my string and couldn't happen any next occurrence in those 10 offset data that's found from the last index of substring's occurrence....

#### WBahn

Joined Mar 31, 2012
26,398
Could you please answer the question of which of those strings listed should be returned and for any that shouldn't, exactly why not?

#### JimmyCho

Joined Aug 1, 2020
97
It sounds like the first thing you need to do is decide what this function is supposed to return. Is it returning a string? Is it returning an array of strings? Is it returning an error code?

You need to decide up front what type of data is going to be returned and then make sure that you can return an appropriate value of that type.

You also have conflicting information. In part of your description you said that you want to search for first occurrence, but then you talk about 10 offset from the last index of occurrence, and then you talk about returning each occurrence.

You really need to clearly decide exactly what it is you want to do.

For instance, given the string that you used in your example, which of the following should and should not be returned and why?

1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111
1010101010100101000000011110101010101001010000001111

What, if anything, should be returned if there aren't ten characters in the string following the substring being sought?
Aha now I got you, sorry for the miss understandings, the inversion of my substring which it's "1010" must occurred immediately after occurrences of my substring "0101", so in your given strings there're not allowed to be as inputs to me function according to what I said, more over it must first appear at least once "0101" and then the inversion of it which it's "1010" , this means can't appear "1010" immediately without at least once appeared "0101". if the inversion substring "1010" occurred in my given string and the substring "0101" wasn't occurred once, then I wouldn't return anything .. the function return an empty string. In addition there's always 10offest data from the last occurrence of my inversion substring , this means if my inversion substring occurred then there's 10 offest data found from the last index of my inversion substring occurrence .

I will try to update your examples and answer what's the the output of my function and why:

one example: 1010101010101101011110101010101001010000001111 , the output is an array of strings :{"1111010101"} (what I marked in blue marker) why? because we start from left to right on the given string, we at the beginning of string it starts with the substring "0101" three times then immediately followed by the inversion of my substring which it's "1010" so I return 10 offset from the last index of occurrence of my inversion substring("1010") ... the red color is showing where my inversion substring appeared immediately after times of occurrences of my substring "0101" .

second example: 10101010110100000011110101010101001010000001111, the output is an array of strings: {"0000011110"} (what I marked in blue marker) why? because we start from left to right on the given string, we at the beginning of string it starts with the substring "0101" two times then immediately followed by the inversion of my substring which it's "1010" so I return 10 offset from the last index of occurrence of my inversion substring("1010") ... the red color is showing where my inversion substring appeared immediately after times of occurrences of my substring "0101" .

third example: 101010101010110101111010101010100101000000111110101010110100000011110101010101001010000001111
here the output is an array of strings :{"1111010101", "0000011110"}-those substrings are the same what I marked in the blue color in my string , first string in my output array is first blue marked substring in my given string, because inversion of substring("0101") which it's "1010" appeared so 10offset data after last index of inversion string occurrence("1010") is "1111010101" ....if we complete we see that there's another replication of "0101" and immediately we see "1010" after the replication finished .. so the output will also include in the second array element the 10 offset data from last occurrence of "1010" in the second time its occurred ..

fourth example:
11111111111111111111010101 output : {} empty array , here "1010" occurred but there's no occurrence of "0101" (at least one) happen before occurrence of "1010" ..so the output is empty array..

hope now it's more clear

#### WBahn

Joined Mar 31, 2012
26,398
That's a bit clearer, but still need to get it nailed down a bit tighter.

You are saying that the inverted string must appear after the non-inverted string. But does it have to appear immediately after the non-inverted string, or just at some point after it?

If it must appear immediately after it, then you might be able to tighten things up a lot by saying that given a string s, define (~s) to be the inverse of s and w = s·(~s) to be s followed by its inverse.

So if s = 0101, then w = 01011010

What you want to do is then find each occurrence of w and return the ten characters immediately following it.

This still has some issues:

What if there are fewer than ten characters following it?

Example: 11110101101011010?

Do you return anything?

What if things overlap?

Example: 11101011010110101101000101

What should be returned?

#### JimmyCho

Joined Aug 1, 2020
97
That's a bit clearer, but still need to get it nailed down a bit tighter.

You are saying that the inverted string must appear after the non-inverted string. But does it have to appear immediately after the non-inverted string, or just at some point after it?

If it must appear immediately after it, then you might be able to tighten things up a lot by saying that given a string s, define (~s) to be the inverse of s and w = s·(~s) to be s followed by its inverse.

So if s = 0101, then w = 01011010

What you want to do is then find each occurrence of w and return the ten characters immediately following it.

This still has some issues:

What if there are fewer than ten characters following it?

Example: 11110101101011010?

Do you return anything?

What if things overlap?

Example: 11101011010110101101000101

What should be returned?
for your first question, it must appear immediately after the non-inverted sub string, not at somepoint after the non-inverted substring..

following you next explanation that's included w/s/s·(~s) .. it's right exactly that's what I want to implement ..

following your next question about if it's fewer than ten character, this case couldn't happen, because I said assume a correct
willingness input, this means if the inverted occurred in a string then there's at least following 10 ten characters .. so there's no case if it's fewer according to a correct willingness input (I mean by this that there's like assumption which the inputs must have at least 10 ten characters followed by the inverted substring ... so must the input string must has at least 10 ten characters followed by inverted substring according to my explanation .. ) . moreover, when the input is according to what I explained ...I return 10 ten following characters from last index of occurrence of my inverted substring.

Following your last question, this is gooooooooood question!
for me just the first occurrence of my inverted sub string is significant and what's following are considered characters or data that I return implicitly from them 10 ten characters because they are following the inverted substring ..so in your example I return 11101011010110101101000101 .. so in your example I return what I marked in blue (following ten characters after occurrence my inverted substring) : 1101011010 .
once again if after those at least 10 ten characters occurred the inverted substring then I must also return what's following it, and ofcourse it follows at least 10 ten characters ..

to overall, once the inverted substring occurred there's at least follow 10 ten characters and doesn't matter where it occurs in my given string, there's definitely if occurred there's at least 10 characters following it.

I hope my explanation is clearly understandable.

#### WBahn

Joined Mar 31, 2012
26,398
It's actually very hard to follow your explanations, but I suspect most of that is simply English as a second language issues, which we will work through as needed.

So it sounds like your system is the following:

Given a string s, string y is any string of the form

y = (s)·(~s)·(p) such that |p| = 10

Given an input string x, identify each occurrence of y that is a substring of x and return a list consisting of each string p from each occurrence of y.

Assume that no two occurrences of y overlap and that any occurrence of (s)(~s) will be following by a valid string p (i.e., a string consisting of ten characters).

#### JimmyCho

Joined Aug 1, 2020
97
It's actually very hard to follow your explanations, but I suspect most of that is simply English as a second language issues, which we will work through as needed.

So it sounds like your system is the following:

Given a string s, string y is any string of the form

y = (s)·(~s)·(p) such that |p| = 10

Given an input string x, identify each occurrence of y that is a substring of x and return a list consisting of each string p from each occurrence of y.

Assume that no two occurrences of y overlap and that any occurrence of (s)(~s) will be following by a valid string p (i.e., a string consisting of ten characters).
exactly!

#### WBahn

Joined Mar 31, 2012
26,398
So now that we have a clear and concise description of the problem, can you get a start on implementing a solution?

Take it one piece at a time. I'd recommend first writing a function that accepts a pointer to a string and returns the index of either the first occurrence of (s)·(~s) within that string or the index of the first character after the first occurrence. If no occurrence is found, return -1. With this function in hand, write another function that goes through the string and prints out the indices of all of the strings p that should be returned. Then modify that function to print out all of the strings themselves, then modify that function to return a list of all of those strings (returning NULL of the list is empty).

#### andrewmm

Joined Feb 25, 2011
1,456
just for clarity, by inverted, which invert do you mean, left to right or bit inverted ?

i.e. 1001 inverted by bit is 0110 , by left right is 1001 .
1100 inverted by bit is 0011 , by left right is 0011
1000 inverted by bit is 0111, by left right is 0001

#### WBahn

Joined Mar 31, 2012
26,398
just for clarity, by inverted, which invert do you mean, left to right or bit inverted ?

i.e. 1001 inverted by bit is 0110 , by left right is 1001 .
1100 inverted by bit is 0011 , by left right is 0011
1000 inverted by bit is 0111, by left right is 0001
He defines this in a previous post. His strings consist of nothing but the characters '0' and '1' and each is the inverse of the other.

#### JimmyCho

Joined Aug 1, 2020
97
So now that we have a clear and concise description of the problem, can you get a start on implementing a solution?

Take it one piece at a time. I'd recommend first writing a function that accepts a pointer to a string and returns the index of either the first occurrence of (s)·(~s) within that string or the index of the first character after the first occurrence. If no occurrence is found, return -1. With this function in hand, write another function that goes through the string and prints out the indices of all of the strings p that should be returned. Then modify that function to print out all of the strings themselves, then modify that function to return a list of all of those strings (returning NULL of the list is empty).
Yes ofcourse I didn't just fall up on this forum without attempting to code, what I tried is this but it doesn't work as I want:
Code:
using Match = std::air<std::string::size_type, std::string>;
using MatchVec = std::vector<Match>;

MatchVec findPattern(const std::string &str, const std::string &pattern)
{
MatchVec vec;
Match m;

std::string::size_type index, pos = 0;

while ((index = str.find(pattern, pos)) != std::string::npos) {
m.first = index;
index += pattern.length();
m.second = str.substr(index, 10);
vec.push_back(m);
pos = index;
}
return vec;
}

int main()
{
std::string str1 ="101010101010110101111010101010100101000000111110101010110100000011110101010101001010000001111 ";
std::string str2 = "1010"; //this is the inverted syncword

auto matches = findPattern(str1, str2);
for(auto &match : matches) {
std::cout << "Match found at position: " << match.first << std::endl;
std::cout << "String is " << match.second << std::endl;
}
return 0;
}

#### andrewmm

Joined Feb 25, 2011
1,456
Code is clear, but is it efficient ?
a sting uses 8 bits to store a 1 or 0,

I know, I'm old, don't program in C to much, like elegant solutions if its in a function that's re used with out modifications,
and just wonder if a mathematical method might work ?

wondering what would happen if things were stored compressed into bytes, and one did a bunch or OR / ands.

probably off topic, but a thought,,

#### JimmyCho

Joined Aug 1, 2020
97
Code is clear, but is it efficient ?
a sting uses 8 bits to store a 1 or 0,

I know, I'm old, don't program in C to much, like elegant solutions if its in a function that's re used with out modifications,
and just wonder if a mathematical method might work ?

wondering what would happen if things were stored compressed into bytes, and one did a bunch or OR / ands.

probably off topic, but a thought,,
Sorry OP , I didn't understand you at all and I don't know what you're talking about, sounds you're not understanding my problem ..

#### WBahn

Joined Mar 31, 2012
26,398
I'm not very familiar with C++ specific syntax, so it's a bit hard for me to be sure of what you are doing. But while I see where you define your "inverted" sync word, I don't see where you are doing anything with the non-inverted sync word.

It also appears that your findPattern() function requires that two values become exactly equal in order for the while loop to terminate, but I don't see how that is guaranteed to actually happen.

#### xox

Joined Sep 8, 2017
563
Yes ofcourse I didn't just fall up on this forum without attempting to code, what I tried is this but it doesn't work as I want:
Code:
using Match = std::air<std::string::size_type, std::string>;
using MatchVec = std::vector<Match>;

MatchVec findPattern(const std::string &str, const std::string &pattern)
{
MatchVec vec;
Match m;

std::string::size_type index, pos = 0;

while ((index = str.find(pattern, pos)) != std::string::npos) {
m.first = index;
index += pattern.length();
m.second = str.substr(index, 10);
vec.push_back(m);
pos = index;
}
return vec;
}

int main()
{
std::string str1 ="101010101010110101111010101010100101000000111110101010110100000011110101010101001010000001111 ";
std::string str2 = "1010"; //this is the inverted syncword

auto matches = findPattern(str1, str2);
for(auto &match : matches) {
std::cout << "Match found at position: " << match.first << std::endl;
std::cout << "String is " << match.second << std::endl;
}
return 0;
}
It doesn't work because your specifications is too ambiguous. The pattern "1010" appears in the target string many times throughout its length.

What exactly are you trying to achieve anyway, what is the actual application?