Warm tip: This article is reproduced from serverfault.com, please click

BigQuery REGEX get text between 2 characters

发布于 2020-11-30 20:14:31

From this string :

gs://analytics_models_from_g_rdd_ga/9g653798d-7e13-4238-55443-6db72ace9911$$$$events_today.csv

I'm trying to extract this text :

9g653798d-7e13-4238-55443-6db72ace9911

and in separate REGEX function events_today

The rolls for the first string:

  1. The string will always end with $$$$.
  2. The string will always start with ga/.
  3. The extracted text length can be changed.

I sew several tutorials and red regex references but still not succeed I thought the solution will be /\\s*(.*?)\\s*$$$$ but is not.

In general, to extract a string between 2 strings

Questioner
idan
Viewed
0
Wiktor Stribiżew 2020-12-01 04:51:29

You can use

REGEXP_REPLACE(col, r".*ga/([^/$]*)\${4}.*", r"\1")

See the regex demo.

The identical, but a bit simplified since .* will become redundant, pattern can be used in REGEGEXP_EXTRACT:

REGEXP_EXTRACT(col, r"ga/([^/$]*)\${4}")

Pattern details:

  • .* - any zero or more chars other than line break chars, as many as possible
  • ga/ - ga and a slash
  • ([^/$]*) - Group 1: any zero or more chars other than / and $
  • \${4} - four $ symbols
  • .* - any zero or more chars other than line break chars, as many as possible.

The replacement is the Group 1 backreference, \1, that replaces the whole match with the contents of the group.