What's the longest rotational word in the English language.

joeyd999 · Dec 4, 2019

MrAl said:
That means that an infinite number of monkeys that were given an infinite amount of time created the constant we now know as pi

That is deep. Figuratively and literally.

MrAl · Dec 4, 2019

Alec_t said:
I mean if you memorise one rotational word to use as a base password, others are easily derived by simple rotation, so all you have to remember for those is e.g. base>>3.

Hi,

Oh yes i see now. That would help too.

MrAl · Dec 4, 2019

joeyd999 said:
That is deep. Figuratively and literally.

Hi,

Yeah but i feel sorry for whomever had to deal with what must have been a LOT of feces

joeyd999 · Dec 4, 2019

MrAl said:
what must have been a LOT of feces...

Yeah. A whole pi's worth.

MrAl · Dec 4, 2019

joeyd999 said:
Here is my final code. A few changes:

1. I changed the source dictionary to usa2.txt from http://www.gwicks.net/dictionaries.htm. It is a USA English dictionary with no nonsense (AFAICT). Note that "sexploitation" is not a word in this dictionary.
2. I added code to pre-sanitize the dictionary file, allowing only whole words with only two or more lowercase letters. Presumably, this rejects acronyms, abbreviations, hyphenated words, and proper nouns.
3. Rather than grepping the whole file for each dictionary word, I now only grep the lines after the line containing the current word being grepped. This reduces the search space by one line each iteration, and avoids the "weewee"s.
4. Since I pre-sanitize, I don't need to check each word for validity prior to each grep.

For the usa2.txt dictionary, the entire run time is:

real 8m46.443s
user 8m43.483s
sys 2m58.861s

Resulting output file attached.

@nsaspook: please run this on your best box. And report back the execution time. Thanks.

Bash:

#!/bin/bash #Rotodrome Finder #JoeyD999@AAC #December 2019 #Select desired dictionary (or add your own) #SOURCEDICT=/usr/share/dict/words #Ubuntu Linux Dictionary #SOURCEDICT=words.txt #from https://github.com/dwyl/english-words -- contains nonsense, abbreviations, proper names, and acronyms. SOURCEDICT=usa2.txt #from http://www.gwicks.net/dictionaries.htm -- just USA English words DICT=infile.txt TEMP=tempfile.txt OUT=outfile.txt #Clean start -- delete old temporary and output files rm -f $TEMP $OUT #avoid file not found error during dup check touch $TEMP #Sanitize SOURCEDICT to DICT: Create sorted DICT with words length >= 2 that contain only lowercase english alphabet letters grep -E "^[a-z][a-z]+$" $SOURCEDICT | sort > $DICT #Set up a source dictionary line counter to limit search space LINE=1 #Iterate over each word in the dictionary file cat $DICT | while read WORD || [[ -n $WORD ]]; do LINE=$(($LINE+1)) #search space is all source file lines below this one LENGTH=${#WORD} #get word length if (! grep -iwq $WORD $TEMP); then #only process words that do not exist in $TEMP file #Construct a regexp to compare all permutations simultaneously #i.e. "joeyd" => "^(oeydj|eydjo|ydjoe|djoey)$" REGEX="^(" for ((i=1 ; i < LENGTH; i++)); do if ((i>1)); then REGEX=${REGEX}\|${WORD:$i}${WORD:0:((i))} else REGEX=${REGEX}${WORD:$i}${WORD:0:((i))} fi done REGEX="${REGEX})$" #close the regexp #Match all lines in $DICT beginning with current word line+1 unset MATCHLIST for MATCH in `tail -n +$LINE $DICT | grep -iE $REGEX`; do MATCHLIST="$MATCHLIST $MATCH" done #Append each successful set of matches to $TEMP file if [[ -v MATCHLIST ]]; then echo "$LENGTH $WORD $MATCHLIST" | tee -a $TEMP fi fi done #sort the final $TEMP into $OUT sort -n $TEMP > $OUT

Another question that comes up is if we have "tables" and find "stable" should we list:
"stable tables"

alone or also list:
"tables stable"

as a separate entry since both words have their valid rotations so maybe that counts as 2 instead of just 1.

joeyd999 · Dec 4, 2019

MrAl said:
Another question that comes up is if we have "tables" and find "stable" should we list:
"stable tables"

alone or also list:
"tables stable"

as a separate entry since both words have their valid rotations so maybe that counts as 2 instead of just 1.

This test:

Bash:

if (! grep -iwq $WORD $TEMP); then                      #only process words that do not exist in $TEMP file

explicitly prevents such (IMO) duplication.

Remove it if you wish.

MrAl · Dec 5, 2019

I found the following alpha distribution of finds:

Code:

[FONT=courier new]
 a   b   c   d   e   f  g  h   i   j k  l   m   n  o   p   q  r   s    t   u   v  w   x y  z
{315,147,243,145,287,65,89,201,132,7,88,232,163,91,212,286,22,213,1166,455,104,39,173,7,49,12}
[/FONT]

Clearly words that start with 's' are most abundant, while 't' is second, and 'a' is third.
Looks like the 'x' group came in because of some Roman numerals.
This distribution comes from counting each found rotation so "abc" "bca" would come out to two entries not one where one word begins with 'a' and the other word begins with 'b' and so would list as:
abc bca
bca abc

so the distribution counts would increase 1 for 'a' and 1 for 'b' (if those were real words of course).

SIDE NOTE:
For some reason every time i edit this post i have to puts the 'space' back in front of the 'a' in the first line. Editing removes that first space every time.

joeyd999 · Dec 5, 2019

Attached is the output of the big dictionary at https://github.com/dwyl/english-words after using the "sanitizing" code. A glaring omission is "no" and "on". For some reason, the dictionary spells "no" with a capital "N", so it gets sanitized out.

The processing time was:

real 73m41.120s
user 72m17.594s
sys 20m28.459s

The longest words are:

11 anarthropod arthropodan
11 decimosexto sextodecimo
11 deviscerate eviscerated
11 ethological lethologica
11 grasshopper hoppergrass
11 housemother motherhouse
11 peculations speculation
11 stabulation tabulations
12 breakweather weatherbreak
12 manslaughter slaughterman
13 exploitations sexploitation

MrAl · Dec 5, 2019

joeyd999 said:
Attached is the output of the big dictionary at https://github.com/dwyl/english-words after using the "sanitizing" code. A glaring omission is "no" and "on". For some reason, the dictionary spells "no" with a capital "N", so it gets sanitized out.

The processing time was:

real 73m41.120s
user 72m17.594s
sys 20m28.459s

The longest words are:

11 anarthropod arthropodan
11 decimosexto sextodecimo
11 deviscerate eviscerated
11 ethological lethologica
11 grasshopper hoppergrass
11 housemother motherhouse
11 peculations speculation
11 stabulation tabulations
12 breakweather weatherbreak
12 manslaughter slaughterman
13 exploitations sexploitation

I noticed they used capitals for some words so i used:
List=lower(List)
which makes them all lower case before i start.

joeyd999 · Dec 5, 2019

MrAl said:
I noticed they used capitals for some words so i used:
List=lower(List)
which makes them all lower case before i start.

The point of "pre-sanitizing" the dictionary was to remove acronyms, abbreviations, hyphenated words, and proper nouns prior to the search -- to produce a result with less garbage.

I could allow capital letters -- maybe only on the first letter (which will allow in proper nouns). The proceeding code automatically ignores character case during the search -- without requiring a "lower()" function. That's what the -i option does in:

Bash:

for MATCH in `tail -n +$LINE $DICT | grep -iE $REGEX`; do

I'll run it and report back later so you can see the difference.

joeyd999 · Dec 5, 2019

joeyd999 said:
I'll run it and report back later so you can see the difference.

Attached is the output allowing initial capitalization (i.e. proper nouns and names).

Processing time:

real 98m55.727s
user 96m44.686s
sys 26m33.463s

Here's the code:

Bash:

#!/bin/bash

#Rotodrome Finder
#JoeyD999@AAC
#December 2019

#Select desired dictionary (or add your own)

#SOURCEDICT=/usr/share/dict/words                                #Ubuntu Linux Dictionary
SOURCEDICT=words.txt                           #from https://github.com/dwyl/english-words -- contains nonsense, abbreviations, proper names, and acronyms.
#SOURCEDICT=usa2.txt                                                            #from http://www.gwicks.net/dictionaries.htm -- just USA English words

DICT=infile.txt
TEMP=tempfile.txt
OUT=outfile.txt

#Clean start -- delete old temporary and output files

rm -f $TEMP $OUT

#avoid file not found error during dup check

touch $TEMP

#Sanitize SOURCEDICT to DICT:  Create sorted DICT with words length > 2 that contain only lowercase english alphabet letters
# -- or an initial capital (i.e. proper nouns)

grep -E "^[a-zA-Z][a-z]+$" $SOURCEDICT | sort > $DICT

#Set up a source dictionary line counter to limit search space

LINE=1

#Iterate over each word in the dictionary file

cat $DICT | while read WORD || [[ -n $WORD ]]; do
 
    LINE=$(($LINE+1))                                                            #search space is all source file lines below this one
    LENGTH=${#WORD}                                                       #get word length
  
    if (! grep -iwq $WORD $TEMP); then                      #only process words that do not exist in $TEMP file

        #Construct a regexp to compare all permutations simultaneously
        #i.e. "joeyd" => "^(oeydj|eydjo|ydjoe|djoey)$"

        REGEX="^("

        for ((i=1 ; i < LENGTH; i++)); do

            if ((i>1)); then
                REGEX=${REGEX}\|${WORD:$i}${WORD:0:((i))}
            else
                REGEX=${REGEX}${WORD:$i}${WORD:0:((i))}
            fi

        done
 
        REGEX="${REGEX})$"  #close the regexp
        
        #Match all lines in $DICT beginning with current word line+1

        unset MATCHLIST

        for MATCH in `tail -n +$LINE $DICT | grep -iE $REGEX`; do
            MATCHLIST="$MATCHLIST $MATCH"
        done
        
        #Append each successful set of matches to $TEMP file

        if [[ -v MATCHLIST ]]; then
            echo "$LENGTH $WORD $MATCHLIST" | tee -a $TEMP
        fi

    fi
done

#sort the final $TEMP into $OUT

sort -n $TEMP > $OUT

ballsystemlord · Dec 5, 2019

If you're into palindromes, how about a book?
Lid Off A Daffodil A Book of Palindromes by John Pool Illustrations by Peter Brookes
Not that I'm really into such things...

MrAl · Dec 6, 2019

ballsystemlord said:
If you're into palindromes, how about a book?
Lid Off A Daffodil A Book of Palindromes by John Pool Illustrations by Peter Brookes
Not that I'm really into such things...

Sounds like an interesting read on a cold winters night curled up in front of the fireplace.
Or you could just take a sleeping pill it will work just as well.

atferrari · Dec 6, 2019

@joeyd999
Processing time

Could you explain the difference between real, user and sys?

??

joeyd999 · Dec 6, 2019

atferrari said:
@joeyd999
Processing time

Could you explain the difference between real, user and sys?

??

From https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1

Real is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).

User is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process. Other processes and time the process spends blocked do not count towards this figure.

Sys is the amount of CPU time spent in the kernel within the process. This means executing CPU time spent in system calls within the kernel, as opposed to library code, which is still running in user-space. Like 'user', this is only CPU time used by the process.

Note that I ran the code on a quad core Intel i7, which can run up to 8 threads simultaneously. Therefore, the User and Sys time can exceed elapsed Real time.

Thread starter	Similar threads	Forum	Replies	Date
B	What's wrong with my discharging circuit?	Homework Help	6	Mar 29, 2024
G	What's the most basic (primitive) yet working Theremin circuit that still operates on the core EM principle?	General Electronics Chat	2	Mar 9, 2024
	What's going on with Youtube ?	Off-Topic	8	Feb 4, 2024
E	What's a simple way to build a 3-bit ring counter with real components?	Digital Design	22	Jan 4, 2024
B	Photodiode ThorLabs DET08CL not outputing what's expected	Sensor Design & Implementation	5	Oct 18, 2023

What's the longest rotational word in the English language.

Join our Engineering Community! Sign-in with:

What's the longest rotational word in the English language.

joeyd999

MrAl

MrAl

joeyd999

MrAl

joeyd999

MrAl

joeyd999

Attachments

MrAl

joeyd999

joeyd999

Attachments

ballsystemlord

MrAl

atferrari

joeyd999

You May Also Like

Introspect Technology Introduces LPDDR5/LPDDR5X Protocol Analyzer

Electrify America’s Bold Move: Indoor EV Fast Charging

What’s the Difference Between RS-232 and RS-485?

Celebrating Semiconductor Pioneer Dr. Esther M. Conwell