Warm tip: This article is reproduced from serverfault.com, please click

escaping headless javascript unicode decoding

Unicode characters cannot be decoded

发布于 2020-11-29 11:48:49

I use browserless.js (headless Chrome) to fetch the html code of a website, and then use a regular expression to find certain image URLs.

One example is the following:

https://vignette.wikia.nocookie.net/moviepedia/images/8/88/Adrien_Brody.jpg/revision/latest/top-crop/width/360/height/450?cb\u003d20141113231800\u0026path-prefix\u003dde

There are unicode characters such as \u003d, which should be decoded (in this case to =). The reason is that I want to include these images in a site, and without decoding some of them cannot be displayed (like that one above, just paste the URL; it gives broken-image.webp).

I have tried lots of things, but nothing works.

JSON.parse(JSON.stringify(...))
String.prototype.normalize()
decodeURIComponent

Curiously, the regular expression for "\u003d" (i.e. "\\u003d" in js) does not match that string above, but "u003d" does.

This is all very weird, and my current guess is that browserless is responsible for some weird formatting behind the scenes. Namely, when I console log the URL and copy paste it somewhere else, every method mentioned above works for decoding.

I hope that someone can help me on this.

Questioner

Martin Brandenburg

Viewed

0

community wiki 2021-03-11 15:59:40

Just to mark this one as answered. Thomas replied:

JSON.parse(`"${url}"`)

热门帖子

1

卷死同行 gpt-4o 模型 1.4 折中转接近官网 3.5 的价格！

2

各位大佬好，我是一名大学生，想请教一下大家有没有什么适合大学生的赚钱小项目？我深知赚钱不易，所以想在不影响学业的前提下，找一些小项目来赚点零花钱。希望各位大佬能不吝赐教，分享一些你们的经验和建议。谢谢大家啦！

3

虚心求教，数据量上亿的爬虫数据用什么该用什么数据库呢

4

联通推出了更便宜的 eSIM iPad 套餐

5

坐标深圳，收台主机，不急

6

google doc如何快速插入日期时间？

7

最近三年面了三百多人，给程序员和面试官们分享一下我的感受

8

求助-我想低成本批量搭建美国 ip 的 socks5 代理，有什么好的方式吗？

9

7 年 iOS， 2 年 Java

10

该换手机了，消息推送延时短的手机有哪些呢？

热门github

1

Implementation of paper - YOLOv9: Learning What You Want to Learn Using Programmable Gradient Information

2

A Windows and Office activator using HWID / Ohook / KMS38 / Online KMS activation methods, with a focus on open-source code and fewer antivirus detections.

3

Get up and running with Llama 2, Mistral, Gemma, and other large language models.

4

该项目可以让你通过订阅的方式使用Cloudflare WARP+，自动获取流量。This project enables you to use Cloudflare WARP+ through subscription, automatically acquiring traffic.

5

Multi functional app to find duplicates, empty folders, similar images etc.

6

Xray panel supporting multi-protocol multi-user expire day & traffic & ip limit (Vmess & Vless & Trojan & ShadowSocks & Wireguard)

7

The Free Software Media System

8

lightweight, standalone C++ inference engine for Google's Gemma models.

9

📚 Freely available programming books

10

A collective list of free APIs

11

1️⃣🐝🏎️ The One Billion Row Challenge -- A fun exploration of how quickly 1B rows from a text file can be aggregated with Java

12

🎓 Path to a free self-taught education in Computer Science!

13

Curso para aprender el lenguaje de programación Python desde cero y para principiantes. 75 clases, 37 horas en vídeo, código, proyectos y grupo de chat. Fundamentos, frontend, backend, testing, IA...

14

This repository contains System Design resources which are useful while preparing for interviews and learning Distributed Systems

15

Mamba is a new state space model architecture showing promising performance on information-dense data such as language modeling, where previous subquadratic models fall short of Transformers. It is based on the line of progress on structured state space models, with an efficient hardware-aware design and implementation in the spirit of FlashAttention.