JavaScript正则表达式：从Content-Disposition头中提取文件名

Question

JavaScript正则表达式：从Content-Disposition头中提取文件名

20

Content-disposition头包含文件名，该文件名可以轻松提取，但有时会包含双引号，有时没有引号，可能还有其他变体。有人能否编写一个正则表达式，以在所有情况下都起作用。

Content-Disposition: attachment; filename=content.txt

以下是可能的目标字符串：

attachment; filename=content.txt
attachment; filename*=UTF-8''filename.txt
attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates
attachment; filename="omáèka.jpg"
and some other combinations might also be there

- adnan kamili

7个回答

19

稍作修改以适应我的使用情况（去掉所有引号和UTF标签） filename\*?=['"]?(?:UTF-\d['"]*)?([^;\r\n"']*)['"]?;?

参考链接：https://regex101.com/r/UhCzyI/3

- h0wXD

1

如果文件名包含'，则失败。 - FeeFiFoFum

10

/filename[^;=\n]*=(?:(\\?['"])(.*?)\1|(?:[^\s]+'.*?')?([^;\n]*))/i

https://regex101.com/r/hJ7tS6/51

Edit: 你还可以使用这个解析器： https://github.com/Rob--W/open-in-browser/blob/master/extension/content-disposition.js

- def00111

5

filename[^;\n]*=(UTF-\d['"]*)?((['"]).*?[.]$\2|[^;\n]*)?

我已经升级了 Robin的解决方案，以执行两个更多的任务：

捕获文件名，即使它包含转义的双引号。
将UTF-8''部分作为单独的组进行捕获。

这是一个ECMAScript解决方案。

https://regex101.com/r/7Csdp4/3/

- kiripk

我已经修改了你的正则表达式，允许在等号后和名称前加入空格：filename[^;\n]*=\s*(UTF-\d['"]*)?((['"]).*?[.]$\2|[^;\n]*)? - Carlos Araujo

3

声明: 以下答案仅适用于PCRE(例如Python/PHP),如果您必须使用JavaScript，请使用Robin的答案。

这个修改版的 Robin 的正则表达式可以去掉引号:

filename[^;\n=]*=(['\"])*(.*)(?(1)\1|)

filename        # match filename, followed by
[^;=\n]*        # anything but a ;, a = or a newline
=
(['"])*         # either single or double quote, put it in capturing group 1
(?:utf-8\'\')?  # removes the utf-8 part from the match
(.*)            # second capturing group, will contain the filename
(?(1)\1|)       # if clause: if first capturing group is not empty,
                # match it again (the quotes), else match nothing

文件名在第二个捕获组中。链接：https://regex101.com/r/hJ7tS6/28

- Antoine

这需要一种PCRE风格的正则表达式- OP要求使用JS。 - miqh

@miqid没错，对不起我编辑了我的答案。虽然我使用的是Python，但我认为我的版本可以被视为一个通用解决方案，适用于那些不使用JavaScript的人。 - Antoine

res = re.search(r"filename[^;\n=]=(['"])(.*)(?(1)\1|)", string) res.group(2) 结果 = re.search(r"filename[^;\n=]=(['"])(.*)(?(1)\1|)", 字符串) 结果.group(2) - maq

1

这是我的正则表达式。它可以在Javascript上工作。

filename\*?=((['"])[\s\S]*?\2|[^;\n]*)

我在我的项目中使用了这个。

- Herbert Young

0

我写了一个正则表达式，使用一个名为filename的组来查找这些名称。

/(?<=filename(?:=|\*=(?:[\w\-]+'')))["']?(?<filename>[^"';\n]+)["']?/g

const regex = /(?<=filename(?:=|\*=(?:[\w\-]+'')))["']?(?<filename>[^"';\n]+)["']?/g

const filenames = `
attachment; filename=content.txt
attachment; filename*=UTF-8''filename.txt
attachment; filename="EURO rates"; filename*=utf-8''%e2%82%ac%20rates
attachment; filename="omáèka.jpg"
`

function logMatches(){
  const array = new Array

  filenames.split("\n").forEach(line => {
    if(!line.trim()) return

    const matches = line.matchAll(regex)
    const groups = Array.from(matches).map(match => match?.groups?.filename)

    array.push(groups.length === 1 ? groups[0] : groups)
  })

  console.log(array)
}

logMatches()

- Alphka

网页内容由stack overflow 提供, 点击上面的

可以查看英文原文，
原文链接

- Robin · Accepted Answer

你可以尝试这种方式：

filename[^;=\n]*=((['"]).*?\2|[^;\n]*)

filename      # match filename, followed by
[^;=\n]*      # anything but a ;, a = or a newline
=
(             # first capturing group
    (['"])    # either single or double quote, put it in capturing group 2
    .*?       # anything up until the first...
    \2        # matching quote (single if we found single, double if we find double)
|             # OR
    [^;\n]*   # anything but a ; or a newline
)

您的文件名在第一个捕获组中：http://regex101.com/r/hJ7tS6