Download of large blogs?

Thread Starter

Hypatia's Protege

Joined Mar 1, 2015
3,228
Respectfully, to whom it may concern:

It has come to my attention that this site's 'download my blog' feature does not support 'large' blogs -- Specifically: Service of the 'download stream' terminates 47.9 Mb resulting in a failed download condition and deposition of a truncated/'crippled' "blog.zip.part" file in my downloads folder --- (at present, my blog's compressed size ≈ 53 Mb)...

While said difficulty is, of course, readily circumvented via employment of a 'scraper' -- I'm hoping for a more 'elegant' approach...

Any advice/assistance will be greatly appreciated!

Best regards and many advance thanks!
HP:)
 
Last edited:

Aleph(0)

Joined Mar 14, 2015
597
HP IDK why it is but _blog_ feature on most sites (not just AAC or Xenforo) are usually 3'rd party code and so prone to being vry buggy:(

Also I say it's totally ridiculous how file transfer algorithms in most software packages (including all versions of MS Windows OS) just run to miscarriage (if file-size limit is exceeded or storage space runs out) instead of just checking source file size b4 transfer:mad:!

While said difficulty is, of course, readily circumvented via employment of a 'scraper'
HP Only problem with that is scraper-bot will only have _guest_ access cuz it's totally separate process from your logged-in session! So maybe that won't matter cuz you set images as _viewable by all visitors_ but b4 doing backups u totally need 2 make sure you set all posts u want backed-up into public view!

I'm hoping for a more 'elegant' approach...
HP I totally agree it would be easier if blog download just worked properly! But if that's not happening I say you shouldn't be worrying about _elegance_:rolleyes:! Good scrapers (like HTTrack) work vry well cuz they follow all _downward_ links and totally ignore _upward_ links so they get the blog, the whole blog and nothing but the blog w/o chance of recursion:D! So it's just like ur always saying if it works it works:)!
 

Thread Starter

Hypatia's Protege

Joined Mar 1, 2015
3,228
@bertus? @jrap?

Any ideas? --- Although non-urgent (inasmuch as alternatives exist) I prefer the on-site 'blog back-up' feature's readiness of use and archive organization.

In case it's of any assistance, I expect the problem owes to size limitations imposed by the archiving software -- Is there, perhaps, a method of downloading large blogs 'piecemeal'?

Very best regards
HP:)
 

Thread Starter

Hypatia's Protege

Joined Mar 1, 2015
3,228
HP I totally agree it would be easier if blog download just worked properly! But if that's not happening I say you shouldn't be worrying about _elegance_:rolleyes:! Good scrapers (like HTTrack) work vry well cuz they follow all _downward_ links
--Emphasis added--

A brief 'how-to' would be nice?

Supplied the following address:
https://forum.allaboutcircuits.com/xfa-blogs/hypatias-protege.262666/

All that's downloaded is the first page of my blog -- To wit: the 'spider' fails to follow the links to subsequent pages despite election of the following options/settings:

-Follow ALL (downward) links.
-Ignore 'robots.txt'.
-Get all files related to a link, including non-html...
-Attempt to detect all links (even in unknown tags/Javascript code).
Etc, etc, etc!

Any assistance from anyone will be greatly appreciated!:)

Best regards
HP
 
Last edited:

Aleph(0)

Joined Mar 14, 2015
597
A brief 'how-to' would be nice?

Supplied the following address:
https://forum.allaboutcircuits.com/xfa-blogs/hypatias-protege.262666/

All that's downloaded is the first page of my blog -- To wit: the 'spider' fails to follow the links to subsequent pages despite election of the following options/settings:

-Follow ALL (downward) links.
-Ignore 'robots.txt'.
-Get all files related to a link, including non-html...
-Attempt to detect all links (even in unknown tags/Javascript code).
Etc, etc, etc!
HP I didn't mean 2 say HTTrack is best for what u want to do (just backing up your content)! So HTT is perfect for downloading web/cloud apps so they actually work offline:)! But for backing up non-executable data like pix and text I agree it's basically a fail cuz no matter _max depth_ and _direction_ settings, if you set _global travel mode_ to _stay on same addy_ then it's exactly like u say! It won't follow any links at all (incl thumbnails to full size pix!) And if you ease travel restriction to just _stay on same domain_ it tries 2 download entire site:mad:! So whether it's just bug or totally f''ed up convoluted setting interface I say either way it's way 2 much trouble getting it to work for just data bu utility!

HP I say for your purpose plz look on _BackStreet Browser_ or _PageNest_ or like that but whatever u decide on just make totally sure it's set to automatically change links to relative address and make sure u ALWAYS test offline archive after download by disabling wifi and clearing browser cache!

Why not just wget?
@joeyd999 is there Windows version for that? Also 2b sure it needs to automatically follow sever side image maps (Re: pages with _ismap_ attribute set) cuz otherwise links in pix (in off-line archive) are to resources on AAC servers which is just shortcut NOT backup:rolleyes: So Joeyd999 if you think Wget can work 2 backup entire blog including links to internal content (by which I mean content on blog being backed up) plz let us know cuz a straight forward tool is best:)!
 
Last edited:

joeyd999

Joined Jun 6, 2011
5,286
@joeyd999 is there Windows version for that?
Worst case: install Cygwin.

Also 2b sure it needs to automatically follow sever side image maps...
There are lots of command line options. There is likely a combo that does what you want.

So Joeyd999 if you think Wget can work 2 backup entire blog including links to internal content (by which I mean content on blog being backed up) plz let us know cuz a straight forward tool is best:)!
Any remotely accessible (via http) data will be accessible to wget. Internal, protected, and hidden stuff likely not.
 
Top