Menu links of the downloaded site point to the on-line pages #418

j-balint · 2024-11-11T20:37:28Z

Monolith would be an ideal tool for me to download a complete website. It downloads my Wordpress based website quickly, the home page works perfectly, but unfortunately all the menu links point to the on-line pages. I could not get it to work fully off-line.
So I used it on Manjaro Linux/KDE:
monolith https://site-URL/ -b /home/balint/Desktop/B4X/B4X.html -o /home/balint/Desktop/B4X/B4X.html
Is this really not possible or did I parameterize it wrong?
[email protected]

RaphGL · 2024-11-23T14:41:45Z

Not the developer. I came here to create this same issue.

I've glanced quickly at the source code and looked at the flags and there doesn't seem to be any functionality for this.
The program simply walks through the page and creates and embeds the resources it finds in the page to output a single document.

You can see here that it simply copies the anchor tag:

monolith/src/html.rs

Lines 1014 to 1036 in 2a8d5d7

    
           "a" | "area" => { 
        
               if let Some(anchor_attr_href_value) = get_node_attr(node, "href") { 
        
                   if anchor_attr_href_value 
        
                       .clone() 
        
                       .trim() 
        
                       .starts_with("javascript:") 
        
                   { 
        
                       if options.no_js { 
        
                           // Replace with empty JS call to preserve original behavior 
        
                           set_node_attr(node, "href", Some("javascript:;".to_string())); 
        
                       } 
        
                   } else { 
        
                       // Don't touch mailto: links or hrefs which begin with a hash sign 
        
                       if !anchor_attr_href_value.clone().starts_with('#') 
        
                           && !is_url_and_has_protocol(&anchor_attr_href_value.clone()) 
        
                       { 
        
                           let href_full_url: Url = 
        
                               resolve_url(document_url, &anchor_attr_href_value); 
        
                           set_node_attr(node, "href", Some(href_full_url.to_string())); 
        
                       } 
        
                   } 
        
               } 
        
           }

If the program recursively walked and built a local document tree it would greatly increase how useful it is imo

snshn · 2024-12-02T21:55:59Z

Hi @j-balint,

The -b option there is meant to be mostly for https:// URLs, e.g. to pull more resources from the internet, if converting a conventionally saved file+folder HTML page locally. I think it could in theory work for file:// links. What if you try monolith https://site-url/ -b file:///home/balint/Desktop/B4X/ -o /home/balint/Desktop/B4X/B4X.html?

And also hello Ralph, you are absolutely correct, monolith is extremely dumb, but let's look on the bright side, at least it won't take over the world and travel through time to try and klll its creator, right?
Archiving child pages is something that's been requested since day one, and there're programs to do that already, but I can see how it could be handy when every separate linked page is in its own .html file, even if it means certain overhead. One of the problems would be sharing one of those documents, since they will try to link locally, instead of externally.
There is work being done on making it possible to utilize monolith as a crate (library), rather than a stand-alone CLI, to make it possible to create scrapers and follow links, hence powering browser extensions and server-side software, along with promoting creation of monolith-based scrapers capable of archiving whole websites. I hope it sees the light of day soon.

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Menu links of the downloaded site point to the on-line pages #418

Menu links of the downloaded site point to the on-line pages #418

j-balint commented Nov 11, 2024 •

edited

Loading

RaphGL commented Nov 23, 2024

snshn commented Dec 2, 2024

Menu links of the downloaded site point to the on-line pages #418

Menu links of the downloaded site point to the on-line pages #418

Comments

j-balint commented Nov 11, 2024 • edited Loading

RaphGL commented Nov 23, 2024

snshn commented Dec 2, 2024

j-balint commented Nov 11, 2024 •

edited

Loading