Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Schema.org structured data support (example with org-mode file). #39

Open
MorphicResonance opened this issue Nov 5, 2021 · 5 comments

Comments

@MorphicResonance
Copy link

MorphicResonance commented Nov 5, 2021

Processing various metadata from org-mode is part of my trial to create org structured, "content first" and search engine friendly web pages. But the question is bigger and for all users.
With soupault we choose the conventional way of html formatting for design and abandoned templates, their variations and other opinionated hints.
The widely used and conventional way of machine executable content markup is developed by schema.org.
After some progress think here is simple requirements for the plugin that should transfer metatags into <head> section of web page.

  1. Data for metatags like title and meta-description
  2. Machine-executable data for search engine robots as json-ld.

the good news is that it's likely to be possible to convert the just input text without having to write a separate yaml block for json-ld. I talked to the developers from stencil, they'll took care of it.

There is only the 1st task with extraction data for meta tags. And the second item is decided by the converter. So,

  • if user do not choose json-ld into his web page, but only microdata formatting, then the second task is not need. Just extract metatags and run the converter to html.
  • if user choose json-ld format, then need to take it from the output of the converter and put it in the defined section of the page (usually in the <head>).
    extract metatags --> run converter to convert input into json--> run the converter again to convert input into html.

My case is without json-ld and operate with microdata. Just take note about json-ld case.
this is input file:
#+begin_example

#+meta_title: this is a title of the page
#+meta_description: this is a metadescription of the page
#+title: A simple Org Mode article for testing
#+author: Nokome Bentley

* Introduction

A simple Org Mode article for testing. When making changes please note
that test snapshots based on this fixture may need to be updated.

* Methods

This is the methods section.

* Results

The results include a table (Table 1).

| Group | Value |
|-------+-------|
| A     | 1.1   |
| B     | 2.2   |

* Discussion

This is the discussion section.

#+end_example

Plugin should take this is a title of the page from #+meta_title:. and this is metadescription of the page from #+meta_description: .
If #+meta_title: is not exist then take data from the #+title: (it means that web page title and article title will have identical titles in this case).

Then delete these strings with #+meta_... completely and leave other as is (#+title: should be left). Other properties will be applied by converter for microdata markup.

Then place value of title/metadescription variables into title/metadescription tags of the page.

<head>
....
<title>{{meta_title}}</title>
<meta name="description" content="{{meta_description}}" />
....
</head>

this is basic version of the plugin since converting from org-mode to html by stencila is in development. But it is clear the way plugin should be written, don't think there will be much difference from above.

@dmbaturin
Copy link
Collaborator

I haven't forgotten your request.

Please remind me, the title field should do to the page <title> in its <head>, but what exactly do you want to do with other fields?

Ideally, I'd like to see examples of source pages in the Org format and hand-written mockups of output pages you want to produce from them.

@MorphicResonance

This comment was marked as outdated.

@dmbaturin
Copy link
Collaborator

Could you confirm or deny the following: an org-mode metadata entry will always start with #+, will always contain a string, and will always end with a newline? That is, will #\+(.*)\n be a safe regex for extracting metadata entries?

Since soupault 4.0.0 supports a pre-parse hook, it's now possible to reimplement various types of front matter with that hook. Since that hook works on the page source before it's parsed and before it's decided whether it will be indexed or not, it will also have to produce text.

Does something like this look good to you? I assume the plugin should always put the rendered HTML before the page body. Let me know what you think.

[hooks.pre-parse]
  file = "hooks/org-mode-metadata.lua"
  template = """
    <h1 id="post-title">{{title}}</h1>
    ...
  """

@MorphicResonance MorphicResonance changed the title Provide plugin that gets org-mode metadata from html file. Schema.org structured data support (example with org-mode file). Apr 21, 2022
@MorphicResonance

This comment was marked as duplicate.

@MorphicResonance
Copy link
Author

MorphicResonance commented Aug 5, 2022

Yes metatags always start with #+ and ended with newline.names from values are delimited a:.
I don't see how it can be done with pre-parse hook since we need extract values for metatags, save them to somekind of global variables, delete these strings and send values from them into html tree then.
So preparse hook is working only for extracting/deleting string with metatags.
I see the variant with render as unified version of pandoc's "in the middle" lua filters. But it just the same dance with fake tags as I wrote long time ago.

Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

No branches or pull requests

2 participants