What's the longest rotational word in the English language.

MrAl

Joined Jun 17, 2014
8,065
The first example is a palindrome. The second involves inversion of the word, but the words are not different, so it is a semordnilap, not a palindrome. (edited)

A rotate keeps the same order of letters for those that are rotated. It is not reading the word backwards. For example:

"abcdefgh" rotated right last letter to first letter x 4 gives: "efghabcd." It is like the "swapf" PIC instruction.

Here's some enhanced midrange code that does the same:

Code:
     movlw     b'10000001'    ;WREG = b'10000001
     bsf       STATUS,0
     btfss     WREG,0
     bcf       STATUS,0
     rrf       WREG
     bra       $-4            ;after 4 rotations WREG = b'00011000'
     nop

;alternatively
     movlw     b'10000001'    ;WREG = b'10000001
     swapf     WREG
     nop                      ;WREG = b'00011000'
I

So i guess what you are saying then is that centroidal rotations are not allowed. Just character by character rotations.

But i recently found in the Bible Code that centroidal rotations are allowed (ha ha).

Oh is haha a word? How about hahaha, or hahahahahahahahahahaha, etc.
 

MrAl

Joined Jun 17, 2014
8,065
Hello again,

I could write up a program to check rotations and hits in maybe 1/2 hour, but i have trouble finding a practical application for this.

Then i though of one...maybe...

You know sometimes passwords can be hard to remember especially when you have a lot of them. So how about taking a regular phrase or word and rotating it to make it more obscure?

A really dumb example is:
"PASSWORD"

Obviously nobody would want to use that, or at least i hope not (ha ha).
But rotate it by some number you prefer and you have maybe a real password.
For that example, if we rotate by n=3 on the left then we get:
"SWORDPAS"

or if we rotate it by n=3 from the right we get:
"ORDPASSW"

Now when you go to remember your password you remember "PASSWORD" with no problem then you just have to remember what 'n' you used to rotate it then you know your real password again.

These examples are still too simple but i think the illustration is clear.
 

jpanhalt

Joined Jan 18, 2008
11,088
MrAl said:
So i guess what you are saying then is that centroidal rotations are not allowed. Just character by character rotations.
That's for WBahn to define.

In my opinion, the intent was character by character or group rotations that can be obtained character by character rotation (e.g., the swapf instruction, last four letters become first four letters in the same order). Rotating whole words, as in housework >> workhouse, was later consider a weak example. NB: Those examples all fit the single character rotation intent.

To make it clear, given the word, "abcdefgh," what rotamers* do you consider valid?

*Words obtained by rotation by analogy to the use of that term in chemistry.
 

joeyd999

Joined Jun 6, 2011
4,477
Hello again,

I could write up a program to check rotations and hits in maybe 1/2 hour, but i have trouble finding a practical application for this.

Then i though of one...maybe...

You know sometimes passwords can be hard to remember especially when you have a lot of them. So how about taking a regular phrase or word and rotating it to make it more obscure?

A really dumb example is:
"PASSWORD"

Obviously nobody would want to use that, or at least i hope not (ha ha).
But rotate it by some number you prefer and you have maybe a real password.
For that example, if we rotate by n=3 on the left then we get:
"SWORDPAS"

or if we rotate it by n=3 from the right we get:
"ORDPASSW"

Now when you go to remember your password you remember "PASSWORD" with no problem then you just have to remember what 'n' you used to rotate it then you know your real password again.

These examples are still too simple but i think the illustration is clear.
I prefer phrases I can remember:

"I ate three dollars worth of cookies before bedtime!"

I83$wocb4bt!
 

MrAl

Joined Jun 17, 2014
8,065
That's for WBahn to define.

In my opinion, the intent was character by character or group rotations that can be obtained character by character rotation (e.g., the swapf instruction, last four letters become first four letters in the same order). Rotating whole words, as in housework >> workhouse, was later consider a weak example. NB: Those examples all fit the single character rotation intent.

To make it clear, given the word, "abcdefgh," what rotamers* do you consider valid?

*Words obtained by rotation by analogy to the use of that term in chemistry.
I was thinking of other rotations too like badcfehg but maybe it's better to stick with the single char by char rotations as suggested by WBahn.
I think there would be too many variations if we dont draw the line somewhere. That's partly why i brought up the so-called Bible Code because there you can find just about anything you want to find depending on what chars you skip, etc.
 

joeyd999

Joined Jun 6, 2011
4,477
A set of easily-remembered passwords.
In that case:

"The dog ate my secure password for AAC." -- Td8mspw4AAC.
"The dog ate my secure password for Google." -- Td8mspw4G.
"The dog ate my secure password for the New York Times." -- Td8mspw4tNYT.

etc.
 

joeyd999

Joined Jun 6, 2011
4,477
Are you sure?
To find JOEY 1178 chars had to be skipped.
I mean -- no skipping between characters. Yes, you have to index to the first character of the string you are looking for.

Theoretically, all the works of Shakespeare should be hidden somewhere in the infinite string of Pi characters.
 

joeyd999

Joined Jun 6, 2011
4,477
Here is my final code. A few changes:

1. I changed the source dictionary to usa2.txt from http://www.gwicks.net/dictionaries.htm. It is a USA English dictionary with no nonsense (AFAICT). Note that "sexploitation" is not a word in this dictionary.
2. I added code to pre-sanitize the dictionary file, allowing only whole words with only two or more lowercase letters. Presumably, this rejects acronyms, abbreviations, hyphenated words, and proper nouns.
3. Rather than grepping the whole file for each dictionary word, I now only grep the lines after the line containing the current word being grepped. This reduces the search space by one line each iteration, and avoids the "weewee"s.
4. Since I pre-sanitize, I don't need to check each word for validity prior to each grep.

For the usa2.txt dictionary, the entire run time is:

real 8m46.443s
user 8m43.483s
sys 2m58.861s

Resulting output file attached.

@nsaspook: please run this on your best box. And report back the execution time. Thanks.

Bash:
#!/bin/bash

#Rotodrome Finder
#JoeyD999@AAC
#December 2019

#Select desired dictionary (or add your own)

#SOURCEDICT=/usr/share/dict/words                                #Ubuntu Linux Dictionary
#SOURCEDICT=words.txt                           #from https://github.com/dwyl/english-words -- contains nonsense, abbreviations, proper names, and acronyms.
SOURCEDICT=usa2.txt                                                            #from http://www.gwicks.net/dictionaries.htm -- just USA English words

DICT=infile.txt
TEMP=tempfile.txt
OUT=outfile.txt

#Clean start -- delete old temporary and output files

rm -f $TEMP $OUT

#avoid file not found error during dup check

touch $TEMP

#Sanitize SOURCEDICT to DICT:  Create sorted DICT with words length >= 2 that contain only lowercase english alphabet letters

grep -E "^[a-z][a-z]+$" $SOURCEDICT | sort > $DICT

#Set up a source dictionary line counter to limit search space

LINE=1

#Iterate over each word in the dictionary file

cat $DICT | while read WORD || [[ -n $WORD ]]; do

    LINE=$(($LINE+1))                                                            #search space is all source file lines below this one
    LENGTH=${#WORD}                                                       #get word length
 
    if (! grep -iwq $WORD $TEMP); then                      #only process words that do not exist in $TEMP file

        #Construct a regexp to compare all permutations simultaneously
        #i.e. "joeyd" => "^(oeydj|eydjo|ydjoe|djoey)$"

        REGEX="^("

        for ((i=1 ; i < LENGTH; i++)); do

            if ((i>1)); then
                REGEX=${REGEX}\|${WORD:$i}${WORD:0:((i))}
            else
                REGEX=${REGEX}${WORD:$i}${WORD:0:((i))}
            fi

        done

        REGEX="${REGEX})$"  #close the regexp
       
        #Match all lines in $DICT beginning with current word line+1

        unset MATCHLIST

        for MATCH in `tail -n +$LINE $DICT | grep -iE $REGEX`; do
            MATCHLIST="$MATCHLIST $MATCH"
        done
       
        #Append each successful set of matches to $TEMP file

        if [[ -v MATCHLIST ]]; then
            echo "$LENGTH $WORD $MATCHLIST" | tee -a $TEMP
        fi

    fi
done

#sort the final $TEMP into $OUT

sort -n $TEMP > $OUT
 

Attachments

Thread Starter

WBahn

Joined Mar 31, 2012
26,398
Thanks, @joeyd999.

Your output reveals another pretty obvious weak class -- words that start with s and produce a plural ending in 's' when rotated left by 1. I don't think this is as weak as compound words since the two words are much more independent. But it does seem like starting with a 's' stacks the odds significantly.
 

nsaspook

Joined Aug 27, 2009
8,168
Here is my final code. A few changes:

1. I changed the source dictionary to usa2.txt from http://www.gwicks.net/dictionaries.htm. It is a USA English dictionary with no nonsense (AFAICT). Note that "sexploitation" is not a word in this dictionary.
2. I added code to pre-sanitize the dictionary file, allowing only whole words with only two or more lowercase letters. Presumably, this rejects acronyms, abbreviations, hyphenated words, and proper nouns.
3. Rather than grepping the whole file for each dictionary word, I now only grep the lines after the line containing the current word being grepped. This reduces the search space by one line each iteration, and avoids the "weewee"s.
4. Since I pre-sanitize, I don't need to check each word for validity prior to each grep.

For the usa2.txt dictionary, the entire run time is:

real 8m46.443s
user 8m43.483s
sys 2m58.861s

Resulting output file attached.

@nsaspook: please run this on your best box. And report back the execution time. Thanks.

Bash:
#!/bin/bash

#Rotodrome Finder
#JoeyD999@AAC
#December 2019

#Select desired dictionary (or add your own)

#SOURCEDICT=/usr/share/dict/words                                #Ubuntu Linux Dictionary
#SOURCEDICT=words.txt                           #from https://github.com/dwyl/english-words -- contains nonsense, abbreviations, proper names, and acronyms.
SOURCEDICT=usa2.txt                                                            #from http://www.gwicks.net/dictionaries.htm -- just USA English words

DICT=infile.txt
TEMP=tempfile.txt
OUT=outfile.txt

#Clean start -- delete old temporary and output files

rm -f $TEMP $OUT

#avoid file not found error during dup check

touch $TEMP

#Sanitize SOURCEDICT to DICT:  Create sorted DICT with words length >= 2 that contain only lowercase english alphabet letters

grep -E "^[a-z][a-z]+$" $SOURCEDICT | sort > $DICT

#Set up a source dictionary line counter to limit search space

LINE=1

#Iterate over each word in the dictionary file

cat $DICT | while read WORD || [[ -n $WORD ]]; do

    LINE=$(($LINE+1))                                                            #search space is all source file lines below this one
    LENGTH=${#WORD}                                                       #get word length

    if (! grep -iwq $WORD $TEMP); then                      #only process words that do not exist in $TEMP file

        #Construct a regexp to compare all permutations simultaneously
        #i.e. "joeyd" => "^(oeydj|eydjo|ydjoe|djoey)$"

        REGEX="^("

        for ((i=1 ; i < LENGTH; i++)); do

            if ((i>1)); then
                REGEX=${REGEX}\|${WORD:$i}${WORD:0:((i))}
            else
                REGEX=${REGEX}${WORD:$i}${WORD:0:((i))}
            fi

        done

        REGEX="${REGEX})$"  #close the regexp
      
        #Match all lines in $DICT beginning with current word line+1

        unset MATCHLIST

        for MATCH in `tail -n +$LINE $DICT | grep -iE $REGEX`; do
            MATCHLIST="$MATCHLIST $MATCH"
        done
      
        #Append each successful set of matches to $TEMP file

        if [[ -v MATCHLIST ]]; then
            echo "$LENGTH $WORD $MATCHLIST" | tee -a $TEMP
        fi

    fi
done

#sort the final $TEMP into $OUT

sort -n $TEMP > $OUT
For the multi-processor boxes to make much improvement on your script it has to be converted into a form that GNU 'Parallel' can use.
https://en.wikipedia.org/wiki/GNU_parallel
https://www.gnu.org/software/parallel/parallel_tutorial.html
 

MrAl

Joined Jun 17, 2014
8,065
I mean -- no skipping between characters. Yes, you have to index to the first character of the string you are looking for.

Theoretically, all the works of Shakespeare should be hidden somewhere in the infinite string of Pi characters.
Hi,

Yes, so you know what that in turn means right?
That means that an infinite number of monkeys that were given an infinite amount of time created the constant we now know as pi :)
 

MrAl

Joined Jun 17, 2014
8,065
No thanks, I've real problems to work on.
Hi,

Just break the dictionary up into N pseudo equal parts where N is the number of cores and sub dictionaries. then run the same program with different output file on each core using one of the sub dictionaries. That way each core operates on a different sections of the dictionary. The hit test dictionary stays as the original though.
 
Top