Download of large blogs?

Discussion in 'Feedback and Suggestions' started by Hypatia's Protege, Sep 30, 2018.

  1. Hypatia's Protege

    Thread Starter Distinguished Member

    Mar 1, 2015
    3,134
    2,083
    Respectfully, to whom it may concern:

    It has come to my attention that this site's 'download my blog' feature does not support 'large' blogs -- Specifically: Service of the 'download stream' terminates 47.9 Mb resulting in a failed download condition and deposition of a truncated/'crippled' "blog.zip.part" file in my downloads folder --- (at present, my blog's compressed size ≈ 53 Mb)...

    While said difficulty is, of course, readily circumvented via employment of a 'scraper' -- I'm hoping for a more 'elegant' approach...

    Any advice/assistance will be greatly appreciated!

    Best regards and many advance thanks!
    HP:)
     
    Last edited: Oct 1, 2018
  2. Aleph(0)

    Active Member

    Mar 14, 2015
    562
    849
    HP IDK why it is but _blog_ feature on most sites (not just AAC or Xenforo) are usually 3'rd party code and so prone to being vry buggy:(

    Also I say it's totally ridiculous how file transfer algorithms in most software packages (including all versions of MS Windows OS) just run to miscarriage (if file-size limit is exceeded or storage space runs out) instead of just checking source file size b4 transfer:mad:!

    HP Only problem with that is scraper-bot will only have _guest_ access cuz it's totally separate process from your logged-in session! So maybe that won't matter cuz you set images as _viewable by all visitors_ but b4 doing backups u totally need 2 make sure you set all posts u want backed-up into public view!

    HP I totally agree it would be easier if blog download just worked properly! But if that's not happening I say you shouldn't be worrying about _elegance_:rolleyes:! Good scrapers (like HTTrack) work vry well cuz they follow all _downward_ links and totally ignore _upward_ links so they get the blog, the whole blog and nothing but the blog w/o chance of recursion:D! So it's just like ur always saying if it works it works:)!
     
    Hypatia's Protege likes this.
  3. Hypatia's Protege

    Thread Starter Distinguished Member

    Mar 1, 2015
    3,134
    2,083
    @bertus? @jrap?

    Any ideas? --- Although non-urgent (inasmuch as alternatives exist) I prefer the on-site 'blog back-up' feature's readiness of use and archive organization.

    In case it's of any assistance, I expect the problem owes to size limitations imposed by the archiving software -- Is there, perhaps, a method of downloading large blogs 'piecemeal'?

    Very best regards
    HP:)
     
  4. joeyd999

    AAC Fanatic!

    Jun 6, 2011
    3,983
    5,451
    Why not just wget?
     
    Hypatia's Protege likes this.
  5. Hypatia's Protege

    Thread Starter Distinguished Member

    Mar 1, 2015
    3,134
    2,083
    Indeed! Employment of 'site-retrieval' utilities are our 'plan B' should difficulty with the on-site feature prove intractable...

    Best regards
    HP:)
     
  6. Hypatia's Protege

    Thread Starter Distinguished Member

    Mar 1, 2015
    3,134
    2,083
    --Emphasis added--

    A brief 'how-to' would be nice?

    Supplied the following address:
    https://forum.allaboutcircuits.com/xfa-blogs/hypatias-protege.262666/

    All that's downloaded is the first page of my blog -- To wit: the 'spider' fails to follow the links to subsequent pages despite election of the following options/settings:

    -Follow ALL (downward) links.
    -Ignore 'robots.txt'.
    -Get all files related to a link, including non-html...
    -Attempt to detect all links (even in unknown tags/Javascript code).
    Etc, etc, etc!

    Any assistance from anyone will be greatly appreciated!:)

    Best regards
    HP
     
    Last edited: Oct 6, 2018
  7. Aleph(0)

    Active Member

    Mar 14, 2015
    562
    849
    HP I didn't mean 2 say HTTrack is best for what u want to do (just backing up your content)! So HTT is perfect for downloading web/cloud apps so they actually work offline:)! But for backing up non-executable data like pix and text I agree it's basically a fail cuz no matter _max depth_ and _direction_ settings, if you set _global travel mode_ to _stay on same addy_ then it's exactly like u say! It won't follow any links at all (incl thumbnails to full size pix!) And if you ease travel restriction to just _stay on same domain_ it tries 2 download entire site:mad:! So whether it's just bug or totally f''ed up convoluted setting interface I say either way it's way 2 much trouble getting it to work for just data bu utility!

    HP I say for your purpose plz look on _BackStreet Browser_ or _PageNest_ or like that but whatever u decide on just make totally sure it's set to automatically change links to relative address and make sure u ALWAYS test offline archive after download by disabling wifi and clearing browser cache!

    @joeyd999 is there Windows version for that? Also 2b sure it needs to automatically follow sever side image maps (Re: pages with _ismap_ attribute set) cuz otherwise links in pix (in off-line archive) are to resources on AAC servers which is just shortcut NOT backup:rolleyes: So Joeyd999 if you think Wget can work 2 backup entire blog including links to internal content (by which I mean content on blog being backed up) plz let us know cuz a straight forward tool is best:)!
     
    Last edited: Oct 8, 2018
  8. joeyd999

    AAC Fanatic!

    Jun 6, 2011
    3,983
    5,451
    Worst case: install Cygwin.

    There are lots of command line options. There is likely a combo that does what you want.

    Any remotely accessible (via http) data will be accessible to wget. Internal, protected, and hidden stuff likely not.
     
    Aleph(0) and Hypatia's Protege like this.
Loading...