That is deep. Figuratively and literally.That means that an infinite number of monkeys that were given an infinite amount of time created the constant we now know as pi
That is deep. Figuratively and literally.That means that an infinite number of monkeys that were given an infinite amount of time created the constant we now know as pi
Hi,I mean if you memorise one rotational word to use as a base password, others are easily derived by simple rotation, so all you have to remember for those is e.g. base>>3.
Hi,That is deep. Figuratively and literally.
Yeah. A whole pi's worth.what must have been a LOT of feces...
Another question that comes up is if we have "tables" and find "stable" should we list:Here is my final code. A few changes:
1. I changed the source dictionary to usa2.txt from http://www.gwicks.net/dictionaries.htm. It is a USA English dictionary with no nonsense (AFAICT). Note that "sexploitation" is not a word in this dictionary.
2. I added code to pre-sanitize the dictionary file, allowing only whole words with only two or more lowercase letters. Presumably, this rejects acronyms, abbreviations, hyphenated words, and proper nouns.
3. Rather than grepping the whole file for each dictionary word, I now only grep the lines after the line containing the current word being grepped. This reduces the search space by one line each iteration, and avoids the "weewee"s.
4. Since I pre-sanitize, I don't need to check each word for validity prior to each grep.
For the usa2.txt dictionary, the entire run time is:
real 8m46.443s
user 8m43.483s
sys 2m58.861s
Resulting output file attached.
@nsaspook: please run this on your best box. And report back the execution time. Thanks.
Bash:#!/bin/bash #Rotodrome Finder #JoeyD999@AAC #December 2019 #Select desired dictionary (or add your own) #SOURCEDICT=/usr/share/dict/words #Ubuntu Linux Dictionary #SOURCEDICT=words.txt #from https://github.com/dwyl/english-words -- contains nonsense, abbreviations, proper names, and acronyms. SOURCEDICT=usa2.txt #from http://www.gwicks.net/dictionaries.htm -- just USA English words DICT=infile.txt TEMP=tempfile.txt OUT=outfile.txt #Clean start -- delete old temporary and output files rm -f $TEMP $OUT #avoid file not found error during dup check touch $TEMP #Sanitize SOURCEDICT to DICT: Create sorted DICT with words length >= 2 that contain only lowercase english alphabet letters grep -E "^[a-z][a-z]+$" $SOURCEDICT | sort > $DICT #Set up a source dictionary line counter to limit search space LINE=1 #Iterate over each word in the dictionary file cat $DICT | while read WORD || [[ -n $WORD ]]; do LINE=$(($LINE+1)) #search space is all source file lines below this one LENGTH=${#WORD} #get word length if (! grep -iwq $WORD $TEMP); then #only process words that do not exist in $TEMP file #Construct a regexp to compare all permutations simultaneously #i.e. "joeyd" => "^(oeydj|eydjo|ydjoe|djoey)$" REGEX="^(" for ((i=1 ; i < LENGTH; i++)); do if ((i>1)); then REGEX=${REGEX}\|${WORD:$i}${WORD:0:((i))} else REGEX=${REGEX}${WORD:$i}${WORD:0:((i))} fi done REGEX="${REGEX})$" #close the regexp #Match all lines in $DICT beginning with current word line+1 unset MATCHLIST for MATCH in `tail -n +$LINE $DICT | grep -iE $REGEX`; do MATCHLIST="$MATCHLIST $MATCH" done #Append each successful set of matches to $TEMP file if [[ -v MATCHLIST ]]; then echo "$LENGTH $WORD $MATCHLIST" | tee -a $TEMP fi fi done #sort the final $TEMP into $OUT sort -n $TEMP > $OUT
This test:Another question that comes up is if we have "tables" and find "stable" should we list:
"stable tables"
alone or also list:
"tables stable"
as a separate entry since both words have their valid rotations so maybe that counts as 2 instead of just 1.
if (! grep -iwq $WORD $TEMP); then #only process words that do not exist in $TEMP file
[FONT=courier new]
a b c d e f g h i j k l m n o p q r s t u v w x y z
{315,147,243,145,287,65,89,201,132,7,88,232,163,91,212,286,22,213,1166,455,104,39,173,7,49,12}
[/FONT]
I noticed they used capitals for some words so i used:Attached is the output of the big dictionary at https://github.com/dwyl/english-words after using the "sanitizing" code. A glaring omission is "no" and "on". For some reason, the dictionary spells "no" with a capital "N", so it gets sanitized out.
The processing time was:
real 73m41.120s
user 72m17.594s
sys 20m28.459s
The longest words are:
11 anarthropod arthropodan
11 decimosexto sextodecimo
11 deviscerate eviscerated
11 ethological lethologica
11 grasshopper hoppergrass
11 housemother motherhouse
11 peculations speculation
11 stabulation tabulations
12 breakweather weatherbreak
12 manslaughter slaughterman
13 exploitations sexploitation
The point of "pre-sanitizing" the dictionary was to remove acronyms, abbreviations, hyphenated words, and proper nouns prior to the search -- to produce a result with less garbage.I noticed they used capitals for some words so i used:
List=lower(List)
which makes them all lower case before i start.
for MATCH in `tail -n +$LINE $DICT | grep -iE $REGEX`; do
Attached is the output allowing initial capitalization (i.e. proper nouns and names).I'll run it and report back later so you can see the difference.
#!/bin/bash
#Rotodrome Finder
#JoeyD999@AAC
#December 2019
#Select desired dictionary (or add your own)
#SOURCEDICT=/usr/share/dict/words #Ubuntu Linux Dictionary
SOURCEDICT=words.txt #from https://github.com/dwyl/english-words -- contains nonsense, abbreviations, proper names, and acronyms.
#SOURCEDICT=usa2.txt #from http://www.gwicks.net/dictionaries.htm -- just USA English words
DICT=infile.txt
TEMP=tempfile.txt
OUT=outfile.txt
#Clean start -- delete old temporary and output files
rm -f $TEMP $OUT
#avoid file not found error during dup check
touch $TEMP
#Sanitize SOURCEDICT to DICT: Create sorted DICT with words length > 2 that contain only lowercase english alphabet letters
# -- or an initial capital (i.e. proper nouns)
grep -E "^[a-zA-Z][a-z]+$" $SOURCEDICT | sort > $DICT
#Set up a source dictionary line counter to limit search space
LINE=1
#Iterate over each word in the dictionary file
cat $DICT | while read WORD || [[ -n $WORD ]]; do
LINE=$(($LINE+1)) #search space is all source file lines below this one
LENGTH=${#WORD} #get word length
if (! grep -iwq $WORD $TEMP); then #only process words that do not exist in $TEMP file
#Construct a regexp to compare all permutations simultaneously
#i.e. "joeyd" => "^(oeydj|eydjo|ydjoe|djoey)$"
REGEX="^("
for ((i=1 ; i < LENGTH; i++)); do
if ((i>1)); then
REGEX=${REGEX}\|${WORD:$i}${WORD:0:((i))}
else
REGEX=${REGEX}${WORD:$i}${WORD:0:((i))}
fi
done
REGEX="${REGEX})$" #close the regexp
#Match all lines in $DICT beginning with current word line+1
unset MATCHLIST
for MATCH in `tail -n +$LINE $DICT | grep -iE $REGEX`; do
MATCHLIST="$MATCHLIST $MATCH"
done
#Append each successful set of matches to $TEMP file
if [[ -v MATCHLIST ]]; then
echo "$LENGTH $WORD $MATCHLIST" | tee -a $TEMP
fi
fi
done
#sort the final $TEMP into $OUT
sort -n $TEMP > $OUT
Sounds like an interesting read on a cold winters night curled up in front of the fireplace.If you're into palindromes, how about a book?
Lid Off A Daffodil A Book of Palindromes by John Pool Illustrations by Peter Brookes
Not that I'm really into such things...
From https://stackoverflow.com/questions/556405/what-do-real-user-and-sys-mean-in-the-output-of-time1
Note that I ran the code on a quad core Intel i7, which can run up to 8 threads simultaneously. Therefore, the User and Sys time can exceed elapsed Real time.Real is wall clock time - time from start to finish of the call. This is all elapsed time including time slices used by other processes and time the process spends blocked (for example if it is waiting for I/O to complete).
User is the amount of CPU time spent in user-mode code (outside the kernel) within the process. This is only actual CPU time used in executing the process. Other processes and time the process spends blocked do not count towards this figure.
Sys is the amount of CPU time spent in the kernel within the process. This means executing CPU time spent in system calls within the kernel, as opposed to library code, which is still running in user-space. Like 'user', this is only CPU time used by the process.
by Jake Hertz
by Duane Benson
by Aaron Carman