Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Export space #1458

Open
aleksvujic opened this issue Sep 26, 2024 · 5 comments
Open

Export space #1458

aleksvujic opened this issue Sep 26, 2024 · 5 comments

Comments

@aleksvujic
Copy link
Contributor

Can you please implement "Export space" functionality available via GUI in your Python library as well? It is available in GUI by clicking on Space settings in the left sidebar and clicking on Export space in the Manage space section.

image
image

@gkowalc
Copy link
Contributor

gkowalc commented Oct 1, 2024

I was considering this idea some time ago, but I couldn't find a working solution. Executing the endpoint to trigger the export is feasible. However, the issue arises after sending the POST request to initiate the export. Another form is displayed to update the status (via AJAX requests). Once the status update reaches 100%, another request is sent with the link to download the generated export file. Does anyone have any ideas on how to capture these AJAX requests that originate from the initial POST HTTP request from the site?

@aleksvujic
Copy link
Contributor Author

aleksvujic commented Oct 1, 2024

@gkowalc This is how I solved it eventually by inspecting HTTP requests that Confluence sends. My script uses HTML as hard-coded format but I am sure it can be further parameterized if needed. This function returns a direct URL to the zipped content which can be downloaded by sending HTTP GET request. I used get_pdf_download_url_for_confluence_cloud method as an inspiration.

def __get_space_html_download_url(self, space_key: str) -> str:
    try:
        url = f"spaces/exportspacehtml.action?key={space_key}"
        response = self.confluence_client.get(url, advanced_mode=True)
        parsed_html = BeautifulSoup(response.text, "html.parser")
        atl_token = parsed_html.find("input", { "name": "atl_token" }).get("value")
        
        form_data = {
            "atl_token": atl_token,
            "exportType": "TYPE_HTML",
            "contentOption": "visibleOnly",
            "includeComments": True,
            "confirm": "Export"
        }
        # bypass self.confluence_client.post method because it serializes form data as JSON which is wrong
        url = self.confluence_client.url_joiner(url=self.confluence_client.url, path=f"spaces/doexportspace.action?key={space_key}")
        response = self.confluence_client.session.post(url, headers=self.confluence_client.form_token_headers, data=form_data)
        parsed_html = BeautifulSoup(response.text, "html.parser")
        poll_url = parsed_html.find("meta", { "name": "ajs-pollURI" }).get("content")
        
        running_task = True
        while running_task:
            progress_response = self.confluence_client.get(poll_url)
            print(progress_response)
            if progress_response['complete']:
                parsed_html = BeautifulSoup(progress_response['message'], "html.parser")
                download_url = parsed_html.find("a", { "class": "space-export-download-path" }).get("href")
                return download_url
            time.sleep(1)
        
        return None
    except Exception as e:
        print(e)
        return None

Maybe someone can tweak it further, make it more general (choice of export format and what to export) and create a PR so it becomes a part of the library. 🙂

@gkowalc
Copy link
Contributor

gkowalc commented Oct 1, 2024

"Hmm, I was setting up a session using the requests library to utilize an object from atl_token, but your approach with Beautiful Soup (BS4) looks promising. I will try to experiment with the code based on your idea, and if I come up with a working solution, I will send a pull request (PR) soon. Thx for sharing your idea.

@aleksvujic
Copy link
Contributor Author

Using BeautifulSoup is not a must here, it can easily be replaced with simple regular expressions. I used it so the code is more readable.

Feel free to experiment with the snippet.

@gkowalc
Copy link
Contributor

gkowalc commented Oct 15, 2024

Hello @aleksvujic and others interested in this feature.
I added a PR #1466 that implements the method. I run my tests on Confluence Cloud (it might not work as expected in confluence server/data-center). but feel free to run your tests, also on bigger space sizes.

I think we might hit an issue when export time takes too long (CSRF token might expire), I run my tests on relatively small space exports that were done in couple of minutes.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants