Schema.org structured data support (example with org-mode file). #39

MorphicResonance · 2021-11-05T21:37:57Z

Processing various metadata from org-mode is part of my trial to create org structured, "content first" and search engine friendly web pages. But the question is bigger and for all users.
With soupault we choose the conventional way of html formatting for design and abandoned templates, their variations and other opinionated hints.
The widely used and conventional way of machine executable content markup is developed by schema.org.
After some progress think here is simple requirements for the plugin that should transfer metatags into <head> section of web page.

Data for metatags like title and meta-description
Machine-executable data for search engine robots as json-ld.

the good news is that it's likely to be possible to convert the just input text without having to write a separate yaml block for json-ld. I talked to the developers from stencil, they'll took care of it.

There is only the 1st task with extraction data for meta tags. And the second item is decided by the converter. So,

if user do not choose json-ld into his web page, but only microdata formatting, then the second task is not need. Just extract metatags and run the converter to html.
if user choose json-ld format, then need to take it from the output of the converter and put it in the defined section of the page (usually in the <head>).
extract metatags --> run converter to convert input into json--> run the converter again to convert input into html.

My case is without json-ld and operate with microdata. Just take note about json-ld case.
this is input file:
#+begin_example

#+meta_title: this is a title of the page
#+meta_description: this is a metadescription of the page
#+title: A simple Org Mode article for testing
#+author: Nokome Bentley

* Introduction

A simple Org Mode article for testing. When making changes please note
that test snapshots based on this fixture may need to be updated.

* Methods

This is the methods section.

* Results

The results include a table (Table 1).

| Group | Value |
|-------+-------|
| A     | 1.1   |
| B     | 2.2   |

* Discussion

This is the discussion section.

#+end_example

Plugin should take this is a title of the page from #+meta_title:. and this is metadescription of the page from #+meta_description: .
If #+meta_title: is not exist then take data from the #+title: (it means that web page title and article title will have identical titles in this case).

Then delete these strings with #+meta_... completely and leave other as is (#+title: should be left). Other properties will be applied by converter for microdata markup.

Then place value of title/metadescription variables into title/metadescription tags of the page.

<head>
....
<title>{{meta_title}}</title>
<meta name="description" content="{{meta_description}}" />
....
</head>

this is basic version of the plugin since converting from org-mode to html by stencila is in development. But it is clear the way plugin should be written, don't think there will be much difference from above.

The text was updated successfully, but these errors were encountered:

dmbaturin · 2022-02-05T07:29:03Z

I haven't forgotten your request.

Please remind me, the title field should do to the page <title> in its <head>, but what exactly do you want to do with other fields?

Ideally, I'd like to see examples of source pages in the Org format and hand-written mockups of output pages you want to produce from them.

dmbaturin · 2022-04-06T09:46:47Z

Could you confirm or deny the following: an org-mode metadata entry will always start with #+, will always contain a string, and will always end with a newline? That is, will #\+(.*)\n be a safe regex for extracting metadata entries?

Since soupault 4.0.0 supports a pre-parse hook, it's now possible to reimplement various types of front matter with that hook. Since that hook works on the page source before it's parsed and before it's decided whether it will be indexed or not, it will also have to produce text.

Does something like this look good to you? I assume the plugin should always put the rendered HTML before the page body. Let me know what you think.

[hooks.pre-parse]
  file = "hooks/org-mode-metadata.lua"
  template = """
    <h1 id="post-title">{{title}}</h1>
    ...
  """

MorphicResonance · 2022-08-05T00:23:56Z

Yes metatags always start with #+ and ended with newline.names from values are delimited a:.
I don't see how it can be done with pre-parse hook since we need extract values for metatags, save them to somekind of global variables, delete these strings and send values from them into html tree then.
So preparse hook is working only for extracting/deleting string with metatags.
I see the variant with render as unified version of pandoc's "in the middle" lua filters. But it just the same dance with fake tags as I wrote long time ago.

This comment was marked as outdated.

Sign in to view

MorphicResonance changed the title ~~Provide plugin that gets org-mode metadata from html file.~~ Schema.org structured data support (example with org-mode file). Apr 21, 2022

This comment was marked as duplicate.

Sign in to view

Provide feedback

Saved searches

Use saved searches to filter your results more quickly

Schema.org structured data support (example with org-mode file). #39

Schema.org structured data support (example with org-mode file). #39

MorphicResonance commented Nov 5, 2021 •

edited

Loading

dmbaturin commented Feb 5, 2022

This comment was marked as outdated.

dmbaturin commented Apr 6, 2022

This comment was marked as duplicate.

MorphicResonance commented Aug 5, 2022 •

edited

Loading

Schema.org structured data support (example with org-mode file). #39

Schema.org structured data support (example with org-mode file). #39

Comments

MorphicResonance commented Nov 5, 2021 • edited Loading

dmbaturin commented Feb 5, 2022

This comment was marked as outdated.

dmbaturin commented Apr 6, 2022

This comment was marked as duplicate.

MorphicResonance commented Aug 5, 2022 • edited Loading

MorphicResonance commented Nov 5, 2021 •

edited

Loading

MorphicResonance commented Aug 5, 2022 •

edited

Loading