ski is a tool written in Golang for extracting structured data.
ski use YAML to define data-extracting Executors, which are executed sequentially like a pipeline.
Here's a simple example to extract the title and author of selected books from HTML document.
$gq.elements: .books .select
$each:
$map:
title:
$gq: .title
author:
$gq: .author
output:
[{"title":"Book 1","author":"Author 1"},{"title":"Book 2","author":"Author 2"}]
$fetch
fetches the resource from the network, default method is GET.
$fetch: https://example.com
$kind
converts the argument the specified type.
$raw: 123
$kind: int
$list.of
returns a list of Executor result.
$list.of:
- 123
- 456
$str.join
joins strings with specified separator.
$list.of:
- 123
- 456
$str.join: ~
$str.split
splits string with specified separator.
$raw: 123~456
$str.split: ~
$map
returns a map of Executor result. [k1, v1, k2, v2, ...]
$map:
- 123
- 456
$each
loop the slice arg and execute the Executor.
$list.of:
- 123
- 456
$each:
$kind: int
$or
executes a slice of Executor. return result if the Executor result is not nil.
$or:
- $raw:
- 456
filter the string contains "2" and convert to int, output: [123, 234]
$list.of:
- 123
- 234
- 345
$each:
$pipe:
$if.contains: 2
$kind: int
filter the string match "bar", output: {"bar": "some value"}
$list.of:
- foo
- bar
- baz
$map:
$if.contains: bar
$raw: some value
- gq: similar to jQuery expressions.
- jq: JSONPath expressions.
- js: JavaScript expressions.
- regex: regular expressions.
- xpath: XPath expressions.
gq syntax consists of selectors and functions and is separated by ->.
$gq
returns the match element text of the selector. return the first if node length is 1.
$gq: .books .title -> text
$gq.element
returns the first element of the selector.
$gq.element: .books .select
$gq.elements
returns all elements of the selector.
$gq.elements: .books
$jq
returns the value of the JSONPath expression.
$jq: $.books[0].author
$js
returns the value of the JavaScript expression.
$js: export default (ctx) => ctx.get('content')
available flags:
- i Ignore case
- m Multiple line
- n Explicit capture
- c Compiled
- s Single line
- x Ignore pattern whitespace
- r Right to left
- d Debug
- e ECMAScript
- u Unicode
$regex.replace
/expr/replace/flags{start,count}
replaces the pattern of the string.
$regex.replace: /[^\d]/
$regex.match
/expr/flags{start,count}
returns the match of the pattern of the string.
$regex.match: /\\//1
$regex.assert
/expr/message/flags
asserts the pattern of the string.
$regex.assert: /\d+/number not found/
$xpath
returns the match element text of the XPath expression. return the first if node length is 1.
$xpath: div p
$xpath.element
returns the first element of the XPath expression.
$xpath.element: div p
$xpath.elements
returns all elements of the XPath expression.
$xpath.elements: div p
package main
import (
"context"
"fmt"
"github.com/shiroyk/ski"
)
const content = `...`
const source = ``
func main() {
executor, err := ski.Compile(source)
if err != nil {
panic(err)
}
result, err := executor.Exec(context.Background(), content)
if err != nil {
panic(err)
}
fmt.Println(result)
}
ski is distributed under the MIT license.