# What's the longest rotational word in the English language.

#### joeyd999

Joined Jun 6, 2011
4,477
That means that an infinite number of monkeys that were given an infinite amount of time created the constant we now know as pi
That is deep. Figuratively and literally.

#### MrAl

Joined Jun 17, 2014
8,069
I mean if you memorise one rotational word to use as a base password, others are easily derived by simple rotation, so all you have to remember for those is e.g. base>>3.
Hi,

Oh yes i see now. That would help too.

#### MrAl

Joined Jun 17, 2014
8,069
That is deep. Figuratively and literally.
Hi,

Yeah but i feel sorry for whomever had to deal with what must have been a LOT of feces

#### joeyd999

Joined Jun 6, 2011
4,477

#### MrAl

Joined Jun 17, 2014
8,069
Here is my final code. A few changes:

1. I changed the source dictionary to usa2.txt from http://www.gwicks.net/dictionaries.htm. It is a USA English dictionary with no nonsense (AFAICT). Note that "sexploitation" is not a word in this dictionary.
2. I added code to pre-sanitize the dictionary file, allowing only whole words with only two or more lowercase letters. Presumably, this rejects acronyms, abbreviations, hyphenated words, and proper nouns.
3. Rather than grepping the whole file for each dictionary word, I now only grep the lines after the line containing the current word being grepped. This reduces the search space by one line each iteration, and avoids the "weewee"s.
4. Since I pre-sanitize, I don't need to check each word for validity prior to each grep.

For the usa2.txt dictionary, the entire run time is:

real 8m46.443s
user 8m43.483s
sys 2m58.861s

Resulting output file attached.

@nsaspook: please run this on your best box. And report back the execution time. Thanks.

Bash:
#!/bin/bash

#Rotodrome Finder
#JoeyD999@AAC
#December 2019

#SOURCEDICT=/usr/share/dict/words                                #Ubuntu Linux Dictionary
#SOURCEDICT=words.txt                           #from https://github.com/dwyl/english-words -- contains nonsense, abbreviations, proper names, and acronyms.
SOURCEDICT=usa2.txt                                                            #from http://www.gwicks.net/dictionaries.htm -- just USA English words

DICT=infile.txt
TEMP=tempfile.txt
OUT=outfile.txt

#Clean start -- delete old temporary and output files

rm -f $TEMP$OUT

touch $TEMP #Sanitize SOURCEDICT to DICT: Create sorted DICT with words length >= 2 that contain only lowercase english alphabet letters grep -E "^[a-z][a-z]+$" $SOURCEDICT | sort >$DICT

#Set up a source dictionary line counter to limit search space

LINE=1

#Iterate over each word in the dictionary file

cat $DICT | while read WORD || [[ -n$WORD ]]; do

LINE=$(($LINE+1))                                                            #search space is all source file lines below this one
LENGTH=${#WORD} #get word length if (! grep -iwq$WORD $TEMP); then #only process words that do not exist in$TEMP file

#Construct a regexp to compare all permutations simultaneously
#i.e. "joeyd" => "^(oeydj|eydjo|ydjoe|djoey)$" REGEX="^(" for ((i=1 ; i < LENGTH; i++)); do if ((i>1)); then REGEX=${REGEX}\|${WORD:$i}${WORD:0:((i))} else REGEX=${REGEX}${WORD:$i}${WORD:0:((i))} fi done REGEX="${REGEX})$" #close the regexp #Match all lines in$DICT beginning with current word line+1

unset MATCHLIST

for MATCH in tail -n +$LINE$DICT | grep -iE $REGEX; do MATCHLIST="$MATCHLIST $MATCH" done #Append each successful set of matches to$TEMP file

if [[ -v MATCHLIST ]]; then
echo "$LENGTH$WORD $MATCHLIST" | tee -a$TEMP
fi

fi
done

#sort the final $TEMP into$OUT

sort -n $TEMP >$OUT
Another question that comes up is if we have "tables" and find "stable" should we list:
"stable tables"

alone or also list:
"tables stable"

as a separate entry since both words have their valid rotations so maybe that counts as 2 instead of just 1.

#### joeyd999

Joined Jun 6, 2011
4,477
Another question that comes up is if we have "tables" and find "stable" should we list:
"stable tables"

alone or also list:
"tables stable"

as a separate entry since both words have their valid rotations so maybe that counts as 2 instead of just 1.
This test:

Bash:
if (! grep -iwq $WORD$TEMP); then                      #only process words that do not exist in $TEMP file explicitly prevents such (IMO) duplication. Remove it if you wish. #### MrAl Joined Jun 17, 2014 8,069 I found the following alpha distribution of finds: Code: [FONT=courier new] a b c d e f g h i j k l m n o p q r s t u v w x y z {315,147,243,145,287,65,89,201,132,7,88,232,163,91,212,286,22,213,1166,455,104,39,173,7,49,12} [/FONT] Clearly words that start with 's' are most abundant, while 't' is second, and 'a' is third. Looks like the 'x' group came in because of some Roman numerals. This distribution comes from counting each found rotation so "abc" "bca" would come out to two entries not one where one word begins with 'a' and the other word begins with 'b' and so would list as: abc bca bca abc so the distribution counts would increase 1 for 'a' and 1 for 'b' (if those were real words of course). SIDE NOTE: For some reason every time i edit this post i have to puts the 'space' back in front of the 'a' in the first line. Editing removes that first space every time. Last edited: #### joeyd999 Joined Jun 6, 2011 4,477 Attached is the output of the big dictionary at https://github.com/dwyl/english-words after using the "sanitizing" code. A glaring omission is "no" and "on". For some reason, the dictionary spells "no" with a capital "N", so it gets sanitized out. The processing time was: real 73m41.120s user 72m17.594s sys 20m28.459s The longest words are: 11 anarthropod arthropodan 11 decimosexto sextodecimo 11 deviscerate eviscerated 11 ethological lethologica 11 grasshopper hoppergrass 11 housemother motherhouse 11 peculations speculation 11 stabulation tabulations 12 breakweather weatherbreak 12 manslaughter slaughterman 13 exploitations sexploitation #### Attachments • 37.5 KB Views: 0 #### MrAl Joined Jun 17, 2014 8,069 Attached is the output of the big dictionary at https://github.com/dwyl/english-words after using the "sanitizing" code. A glaring omission is "no" and "on". For some reason, the dictionary spells "no" with a capital "N", so it gets sanitized out. The processing time was: real 73m41.120s user 72m17.594s sys 20m28.459s The longest words are: 11 anarthropod arthropodan 11 decimosexto sextodecimo 11 deviscerate eviscerated 11 ethological lethologica 11 grasshopper hoppergrass 11 housemother motherhouse 11 peculations speculation 11 stabulation tabulations 12 breakweather weatherbreak 12 manslaughter slaughterman 13 exploitations sexploitation I noticed they used capitals for some words so i used: List=lower(List) which makes them all lower case before i start. #### joeyd999 Joined Jun 6, 2011 4,477 I noticed they used capitals for some words so i used: List=lower(List) which makes them all lower case before i start. The point of "pre-sanitizing" the dictionary was to remove acronyms, abbreviations, hyphenated words, and proper nouns prior to the search -- to produce a result with less garbage. I could allow capital letters -- maybe only on the first letter (which will allow in proper nouns). The proceeding code automatically ignores character case during the search -- without requiring a "lower()" function. That's what the -i option does in: Bash: for MATCH in tail -n +$LINE $DICT | grep -iE$REGEX; do
I'll run it and report back later so you can see the difference.

#### joeyd999

Joined Jun 6, 2011
4,477
I'll run it and report back later so you can see the difference.
Attached is the output allowing initial capitalization (i.e. proper nouns and names).

Processing time:

real 98m55.727s
user 96m44.686s
sys 26m33.463s

Here's the code:

Bash:
#!/bin/bash

#Rotodrome Finder
#JoeyD999@AAC
#December 2019

#SOURCEDICT=/usr/share/dict/words                                #Ubuntu Linux Dictionary
SOURCEDICT=words.txt                           #from https://github.com/dwyl/english-words -- contains nonsense, abbreviations, proper names, and acronyms.
#SOURCEDICT=usa2.txt                                                            #from http://www.gwicks.net/dictionaries.htm -- just USA English words

DICT=infile.txt
TEMP=tempfile.txt
OUT=outfile.txt

#Clean start -- delete old temporary and output files

rm -f $TEMP$OUT

touch $TEMP #Sanitize SOURCEDICT to DICT: Create sorted DICT with words length > 2 that contain only lowercase english alphabet letters # -- or an initial capital (i.e. proper nouns) grep -E "^[a-zA-Z][a-z]+$" $SOURCEDICT | sort >$DICT

#Set up a source dictionary line counter to limit search space

LINE=1

#Iterate over each word in the dictionary file

cat $DICT | while read WORD || [[ -n$WORD ]]; do

LINE=$(($LINE+1))                                                            #search space is all source file lines below this one
LENGTH=${#WORD} #get word length if (! grep -iwq$WORD $TEMP); then #only process words that do not exist in$TEMP file

#Construct a regexp to compare all permutations simultaneously
#i.e. "joeyd" => "^(oeydj|eydjo|ydjoe|djoey)$" REGEX="^(" for ((i=1 ; i < LENGTH; i++)); do if ((i>1)); then REGEX=${REGEX}\|${WORD:$i}${WORD:0:((i))} else REGEX=${REGEX}${WORD:$i}${WORD:0:((i))} fi done REGEX="${REGEX})$" #close the regexp #Match all lines in$DICT beginning with current word line+1

unset MATCHLIST

for MATCH in tail -n +$LINE$DICT | grep -iE $REGEX; do MATCHLIST="$MATCHLIST $MATCH" done #Append each successful set of matches to$TEMP file

if [[ -v MATCHLIST ]]; then
echo "$LENGTH$WORD $MATCHLIST" | tee -a$TEMP
fi

fi
done

#sort the final $TEMP into$OUT

sort -n $TEMP >$OUT

#### Attachments

• 73.9 KB Views: 3

#### ballsystemlord

Joined Nov 19, 2018
39
If you're into palindromes, how about a book?
Lid Off A Daffodil A Book of Palindromes by John Pool Illustrations by Peter Brookes
Not that I'm really into such things...

#### MrAl

Joined Jun 17, 2014
8,069
If you're into palindromes, how about a book?
Lid Off A Daffodil A Book of Palindromes by John Pool Illustrations by Peter Brookes
Not that I'm really into such things...
Sounds like an interesting read on a cold winters night curled up in front of the fireplace.
Or you could just take a sleeping pill it will work just as well.

#### atferrari

Joined Jan 6, 2004
4,251
@joeyd999
Processing time

Could you explain the difference between real, user and sys?

??

#### joeyd999

Joined Jun 6, 2011
4,477
@joeyd999
Processing time

Could you explain the difference between real, user and sys?

??
From https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1

Real is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).

User is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process. Other processes and time the process spends blocked do not count towards this figure.

Sys is the amount of CPU time spent in the kernel within the process. This means executing CPU time spent in system calls within the kernel, as opposed to library code, which is still running in user-space. Like 'user', this is only CPU time used by the process.
Note that I ran the code on a quad core Intel i7, which can run up to 8 threads simultaneously. Therefore, the User and Sys time can exceed elapsed Real time.