Download of large blogs?

Hypatia's Protege · Sep 30, 2018

Respectfully, to whom it may concern:

It has come to my attention that this site's 'download my blog' feature does not support 'large' blogs -- Specifically: Service of the 'download stream' terminates 47.9 Mb resulting in a failed download condition and deposition of a truncated/'crippled' "blog.zip.part" file in my downloads folder --- (at present, my blog's compressed size ≈ 53 Mb)...

While said difficulty is, of course, readily circumvented via employment of a 'scraper' -- I'm hoping for a more 'elegant' approach...

Any advice/assistance will be greatly appreciated!

Best regards and many advance thanks!
HP

Aleph(0) · Oct 1, 2018

HP IDK why it is but _blog_ feature on most sites (not just AAC or Xenforo) are usually 3'rd party code and so prone to being vry buggy

Also I say it's totally ridiculous how file transfer algorithms in most software packages (including all versions of MS Windows OS) just run to miscarriage (if file-size limit is exceeded or storage space runs out) instead of just checking source file size b4 transfer!

Hypatia's Protege said:
While said difficulty is, of course, readily circumvented via employment of a 'scraper'

HP Only problem with that is scraper-bot will only have _guest_ access cuz it's totally separate process from your logged-in session! So maybe that won't matter cuz you set images as _viewable by all visitors_ but b4 doing backups u totally need 2 make sure you set all posts u want backed-up into public view!

Hypatia's Protege said:
I'm hoping for a more 'elegant' approach...

HP I totally agree it would be easier if blog download just worked properly! But if that's not happening I say you shouldn't be worrying about _elegance_

! Good scrapers (like HTTrack) work vry well cuz they follow all _downward_ links and totally ignore _upward_ links so they get the blog, the whole blog and nothing but the blog w/o chance of recursion

! So it's just like ur always saying if it works it works!

Hypatia's Protege · Oct 1, 2018

@bertus? @jrap?

Any ideas? --- Although non-urgent (inasmuch as alternatives exist) I prefer the on-site 'blog back-up' feature's readiness of use and archive organization.

In case it's of any assistance, I expect the problem owes to size limitations imposed by the archiving software -- Is there, perhaps, a method of downloading large blogs 'piecemeal'?

Very best regards
HP

joeyd999 · Oct 1, 2018

Why not just wget?

Hypatia's Protege · Oct 1, 2018

joeyd999 said:
Why not just wget?

Indeed! Employment of 'site-retrieval' utilities are our 'plan B' should difficulty with the on-site feature prove intractable...

Best regards
HP

Hypatia's Protege · Oct 6, 2018

Aleph(0) said:
HP I totally agree it would be easier if blog download just worked properly! But if that's not happening I say you shouldn't be worrying about _elegance_! Good scrapers (like HTTrack) work vry well cuz they follow all _downward_ links

--Emphasis added--

A brief 'how-to' would be nice?

Supplied the following address:
https://forum.allaboutcircuits.com/xfa-blogs/hypatias-protege.262666/

All that's downloaded is the first page of my blog -- To wit: the 'spider' fails to follow the links to subsequent pages despite election of the following options/settings:

-Follow ALL (downward) links.
-Ignore 'robots.txt'.
-Get all files related to a link, including non-html...
-Attempt to detect all links (even in unknown tags/Javascript code).
Etc, etc, etc!

Any assistance from anyone will be greatly appreciated!

Best regards
HP

Aleph(0) · Oct 8, 2018

Hypatia's Protege said:
A brief 'how-to' would be nice?

Supplied the following address:
https://forum.allaboutcircuits.com/xfa-blogs/hypatias-protege.262666/

All that's downloaded is the first page of my blog -- To wit: the 'spider' fails to follow the links to subsequent pages despite election of the following options/settings:

-Follow ALL (downward) links.
-Ignore 'robots.txt'.
-Get all files related to a link, including non-html...
-Attempt to detect all links (even in unknown tags/Javascript code).
Etc, etc, etc!

HP I didn't mean 2 say HTTrack is best for what u want to do (just backing up your content)! So HTT is perfect for downloading web/cloud apps so they actually work offline

! But for backing up non-executable data like pix and text I agree it's basically a fail cuz no matter _max depth_ and _direction_ settings, if you set _global travel mode_ to _stay on same addy_ then it's exactly like u say! It won't follow any links at all (incl thumbnails to full size pix!) And if you ease travel restriction to just _stay on same domain_ it tries 2 download entire site

! So whether it's just bug or totally f''ed up convoluted setting interface I say either way it's way 2 much trouble getting it to work for just data bu utility!

HP I say for your purpose plz look on _BackStreet Browser_ or _PageNest_ or like that but whatever u decide on just make totally sure it's set to automatically change links to relative address and make sure u ALWAYS test offline archive after download by disabling wifi and clearing browser cache!

joeyd999 said:
Why not just wget?

@joeyd999 is there Windows version for that? Also 2b sure it needs to automatically follow sever side image maps (Re: pages with _ismap_ attribute set) cuz otherwise links in pix (in off-line archive) are to resources on AAC servers which is just shortcut NOT backup

So Joeyd999 if you think Wget can work 2 backup entire blog including links to internal content (by which I mean content on blog being backed up) plz let us know cuz a straight forward tool is best

!

joeyd999 · Oct 8, 2018

Aleph(0) said:
@joeyd999 is there Windows version for that?

Worst case: install Cygwin.

Aleph(0) said:
Also 2b sure it needs to automatically follow sever side image maps...

There are lots of command line options. There is likely a combo that does what you want.

Aleph(0) said:
So Joeyd999 if you think Wget can work 2 backup entire blog including links to internal content (by which I mean content on blog being backed up) plz let us know cuz a straight forward tool is best!

Any remotely accessible (via http) data will be accessible to wget. Internal, protected, and hidden stuff likely not.

Thread starter	Similar threads	Forum	Replies	Date
A	LCD with VSCode	Microcontrollers	2	Jan 30, 2025
T	couldn't download some library in the platform IO	Software & IDEs	4	Nov 21, 2023
	[SOLVED] Searching for download link of Lessons in electric circuits pdf collection	General Electronics Chat	1	Sep 14, 2022
D	TINY ML - download Xcode AND OpenCV?	Microcontrollers	0	Jul 28, 2021
	Build a Real-Time Clock with Picaxe 08M2 project missing download	Microcontrollers	5	Mar 13, 2021

Download of large blogs?

Join our Engineering Community! Sign-in with:

Download of large blogs?

Hypatia's Protege

Aleph(0)

Hypatia's Protege

joeyd999

Hypatia's Protege

Hypatia's Protege

Aleph(0)

joeyd999

You May Also Like

Toshiba’s Quantum-Inspired Computer 100x Faster With New Algorithm

Class AB vs. Class D: Understanding the Trade-Offs for Piezo Driver Design

EU’s European Investment Bank Meets the Unique Needs of Semiconductors

Decisions Ahead for the Next Generation of Advanced Packaging