Warm tip: This article is reproduced from serverfault.com, please click

python-javascript,Hugo:将无序列表转换为 JSON

(python - javascript, Hugo: convert unordered list to the JSON)

发布于 2021-01-05 03:42:05

TLDR

我想转换下面的代码

<nav id="TableOfContents">
  <ul>
    <li><a href="#js">JS</a>
      <ul>
        <li><a href="#h3-1">H3-1</a></li>
        <li><a href="#h3-2">H3-2</a></li>
      </ul>
    </li>
    <li><a href="#python">Python</a>
      <ul>
        <li><a href="#h3-1-1">H3-1</a></li>
        <li><a href="#h3-2-1">H3-2</a></li>
      </ul>
    </li>
  </ul>
</nav>

使用 Javascript(或 Hugo)转换为以下 JSON 格式

{"t": "root", "d": 0, "v": "", "c": [
    {"t": "h", "d": 1, "v": "<a href=\"#js\">JS</a>", "c": [
            {"t": "h", "d": 2, "v": "<a href=\"#h3-1\">H3-1</a>"}, 
            {"t": "h", "d": 2, "v": "<a href=\"#h3-2\">H3-2</a>"}
        ]
    }, 
    {"t": "h", "d": 1, "v": "<a href=\"#python\">Python</a>", "c": [
            {"t": "h", "d": 2, "v": "<a href=\"#h3-1-1\">H3-1</a>"}, 
            {"t": "h", "d": 2, "v": "<a href=\"#h3-2-1\">H3-2</a>"}
        ]
     }
]}

上面只是一个例子,你提供的函数应该可以把任何类似的结构转换成像上面那样的JSON字符串没有任何问题(其实唯一的要求是不要使用硬编码来完成)

你可以像下面这样输出你的代码

<!DOCTYPE html>
<html>
<head>
  <script>
    (()=>{
      var input_text = `<nav id="TableOfContents">...`;
      var output = my_func(input_text);
      console.log(output); /* expected result: {"t": "root", "d": 0, "v": "", "c": [ ... */
    }
    )()

    function my_func(text) {
      /* ... */
    }
  </script>
</head>
</html>

很长的故事

我想做的事?

我想Hugo使用思维导图(markmapmarkmap-github并将其应用于文章(single.html)作为目录

Hugo 已经通过TableOfContents提供了 TOC

即我的.md

## JS

### H3-1

### H3-2

## Python

### H3-1

### H3-2

然后{{ .TableOfContents }}将输出

<nav id="TableOfContents">
  <ul>
    <li><a href="#js">JS</a>
      <ul>
        <li><a href="#h3-1">H3-1</a></li>
        <li><a href="#h3-2">H3-2</a></li>
      </ul>
    </li>
    <li><a href="#python">Python</a>
      <ul>
        <li><a href="#h3-1-1">H3-1</a></li>
        <li><a href="#h3-2-1">H3-2</a></li>
      </ul>
    </li>
  </ul>
</nav>

但是,如果我使用标记映射,那么我必须向它提供一个 JSON 字符串,如下所示,

{"t": "root", "d": 0, "v": "", "c": [
    {"t": "h", "d": 1, "v": "<a href=\"#js\">JS</a>", "c": [
            {"t": "h", "d": 2, "v": "<a href=\"#h3-1\">H3-1</a>"}, 
            {"t": "h", "d": 2, "v": "<a href=\"#h3-2\">H3-2</a>"}
        ]
    }, 
    {"t": "h", "d": 1, "v": "<a href=\"#python\">Python</a>", "c": [
            {"t": "h", "d": 2, "v": "<a href=\"#h3-1-1\">H3-1</a>"}, 
            {"t": "h", "d": 2, "v": "<a href=\"#h3-2-1\">H3-2</a>"}
        ]
     }
]}

我做了什么?

我想只用 Hugo 的函数来做,但失败了(逻辑上可行,但语法上难以实现)

所以我把希望寄托在javascript上,但一直以来,我的JS都是找别人的代码修改,这个例子对我来说是一个全新的例子;我无法开始(坦率地说,我是 JS 的门外汉)。

下面是我用Python实现的。

from bs4 import BeautifulSoup, Tag
from typing import Union
import os
from pathlib import Path
import json


def main():
    data = """<nav id="TableOfContents">
      <ul>
        <li><a href="#js">JS</a>
          <ul>
            <li><a href="#h3-1">H3-1</a></li>
            <li><a href="#h3-2">H3-2</a></li>
          </ul>
        </li>
        <li><a href="#python">Python</a>
          <ul>
            <li><a href="#h3-1-1">H3-1</a></li>
            <li><a href="#h3-2-1">H3-2</a></li>
          </ul>
        </li>
      </ul>
    </nav>"""
    soup = BeautifulSoup(data, "lxml")

    sub_list = []
    cur_level = 0
    dict_tree = dict(t='root', d=cur_level, v='', c=sub_list)
    root_ul: Tag = soup.find('ul')
    
    toc2json(root_ul, cur_level + 1, sub_list)  # <-- core
    
    toc_json: str = json.dumps(dict_tree)
    print(toc_json)
    output_file = Path('test.temp.html')
    create_markmap_html(toc_json, output_file)
    os.startfile(output_file)


def toc2json(tag: Tag, cur_level: int, sub_list):
    for li in tag:
        if not isinstance(li, Tag):
            continue
        a: Tag = li.find('a')
        ul: Union[Tag, None] = li.find('ul')
        if ul:
            new_sub_list = []
            toc2json(ul, cur_level + 1, new_sub_list)
            cur_obj = dict(t='h', d=cur_level, v=str(a), c=new_sub_list)
        else:
            cur_obj = dict(t='h', d=cur_level, v=str(a))
        sub_list.append(cur_obj)


def create_markmap_html(json_string: str, output_file: Path):
    import jinja2
    markmap_template = jinja2.Template("""
    <!DOCTYPE html>
    <html lang=en>
    <head>
      <script src="https://d3js.org/d3.v6.min.js"></script>
      <script src="https://cdn.jsdelivr.net/npm/markmap-view@0.2.0"></script>
      <style>      
      .mindmap {
        width: 100vw;
        height: 100vh;
      }
      </style>
    </head>
    <body>
      <main>
        <svg id="mindmap-test" class="mindmap"></svg>
      </main>
      
      <script>
        (
          (e, json_data)=>{
            const{Markmap:r}=e();
            window.mm=r.create("svg#mindmap-test",null,json_data);
          }
        )(
          ()=>window.markmap,
          {{ input_json }}
         );
      </script>
    </body>
    </html>
    """)

    html_string = markmap_template.render(dict(input_json=json_string))

    with open(output_file, 'w', encoding='utf-8') as f:
        f.write(html_string)


if __name__ == '__main__':
    main()

为了这小小的努力,请帮帮我。

Questioner
Carson
Viewed
11
1,772 2021-01-07 18:00:33

如有必要,我很乐意继续改进代码:)

class Parser {
  constructor(htmlExpr){
    this.result = {t: 'root', d: 0, v: "", c:[]};
    this.data = htmlExpr.split('\n').map(row => row.trim()) // to lines ?
    this.open = RegExp('^<(?<main>[^\/>]*)>')           // some open tag
    this.close = RegExp('^<\/(?<main>[^>]*)>')          // some close tag
    this.target = RegExp('(?<main><a[^>]*>[^<]*<\/a>)') // what we looking for
  }
  test(str){
    let [o, c, v] = [ // all matches
      this.open.exec(str),
      this.close.exec(str),
      this.target.exec(str)
    ];
    // extract tagNames and value
    if (o) o = o.groups.main
    if (c) c = c.groups.main
    if (v) v = v.groups.main
    return [o, c, v];
  }
  parse(){
    const parents = [];
    let level = 0;
    let lastNode = this.result;
    let length = this.data.length;
    for(let i = 0; i < length; i++){
      const [o, c, v] = this.test(this.data[i])
      if (o === 'ul') {
        level +=1;
        lastNode.c=[]
        parents.push(lastNode.c)
      } else if (c === 'ul') {
        level -=1;
        parents.pop()
      }
      if (v) {
        lastNode = {t: 'h', d: level, v}; // template
        parents[level - 1].push(lastNode) // insert to result
      }
    }
    return this.result;
  }
}

let htmlExpr = `<nav id="TableOfContents">
      <ul>
        <li><a href="#js">JS</a>
          <ul>
            <li><a href="#h3-1">H3-1</a></li>
            <li><a href="#h3-2">H3-2</a></li>
          </ul>
        </li>
        <li><a href="#python">Python</a>
          <ul>
            <li><a href="#h3-1-1">H3-1</a></li>
            <li><a href="#h3-2-1">H3-2</a></li>
          </ul>
        </li>
      </ul>
    </nav>`
const parser = new Parser(htmlExpr);
const test = parser.parse();
console.log(JSON.stringify(test, null, 2))