在Bash脚本中,我想要将一行内容分割为多个片段并存储到一个数组中。
例如,给定以下这一行:
Paris, France, Europe
我希望最终的数组看起来像这样:
array[0] = Paris
array[1] = France
array[2] = Europe
最好使用简单的实现方式,速度不重要。我该如何做?
由于有很多种方法可以解决这个问题,让我们首先定义一下我们想在解决方案中看到的内容。
readarray
命令来实现这个目的。让我们使用它。IFS
、循环、使用eval
或添加额外的元素然后将其删除。readarray
命令最容易使用换行符作为分隔符。对于其他分隔符,它可能会向数组中添加一个额外的元素。最清晰的方法是在传入之前先将输入调整为适合readarray
的形式。
在这个例子中,输入没有多字符分隔符。如果我们运用一点常识,最好将其理解为逗号分隔的输入,每个元素可能需要修剪。我的解决方案是将输入按逗号拆分成多行,修剪每个元素,然后将所有内容传递给readarray
。
string=' Paris,France , All of Europe '
readarray -t foo < <(tr ',' '\n' <<< "$string" |sed 's/^ *//' |sed 's/ *$//')
# Result:
declare -p foo
# declare -a foo='([0]="Paris" [1]="France" [2]="All of Europe")'
编辑:我的解决方案允许逗号分隔符周围的不一致空格,同时允许元素包含空格。很少有其他解决方案能处理这些特殊情况。
我还避免了看起来像是hack的方法,比如创建一个额外的数组元素然后再删除它。如果您不认为这是最佳答案,请留下评论解释。
如果您想尝试纯Bash实现相同的方法,并且使用更少的子shell,那是可能的。但结果更难阅读,而且这种优化可能是不必要的。
string=' Paris,France , All of Europe '
foo="${string#"${string%%[![:space:]]*}"}"
foo="${foo%"${foo##*[![:space:]]}"}"
foo="${foo//+([[:space:]]),/,}"
foo="${foo//,+([[:space:]])/,}"
readarray -t foo < <(echo "$foo")
这是我的技巧!
使用bash拆分字符串是一件相当无聊的事情。我们只有有限的方法适用于少数情况(按“;”,“/”,“.”等拆分),或者输出会有各种副作用。
下面的方法需要进行一些操作,但我相信它将适用于我们大多数的需求!
#!/bin/bash
# --------------------------------------
# SPLIT FUNCTION
# ----------------
F_SPLIT_R=()
f_split() {
: 'It does a "split" into a given string and returns an array.
Args:
TARGET_P (str): Target string to "split".
DELIMITER_P (Optional[str]): Delimiter used to "split". If not
informed the split will be done by spaces.
Returns:
F_SPLIT_R (array): Array with the provided string separated by the
informed delimiter.
'
F_SPLIT_R=()
TARGET_P=$1
DELIMITER_P=$2
if [ -z "$DELIMITER_P" ] ; then
DELIMITER_P=" "
fi
REMOVE_N=1
if [ "$DELIMITER_P" == "\n" ] ; then
REMOVE_N=0
fi
# NOTE: This was the only parameter that has been a problem so far!
# By Questor
# [Ref.: https://unix.stackexchange.com/a/390732/61742]
if [ "$DELIMITER_P" == "./" ] ; then
DELIMITER_P="[.]/"
fi
if [ ${REMOVE_N} -eq 1 ] ; then
# NOTE: Due to bash limitations we have some problems getting the
# output of a split by awk inside an array and so we need to use
# "line break" (\n) to succeed. Seen this, we remove the line breaks
# momentarily afterwards we reintegrate them. The problem is that if
# there is a line break in the "string" informed, this line break will
# be lost, that is, it is erroneously removed in the output!
# By Questor
TARGET_P=$(awk 'BEGIN {RS="dn"} {gsub("\n", "3F2C417D448C46918289218B7337FCAF"); printf $0}' <<< "${TARGET_P}")
fi
# NOTE: The replace of "\n" by "3F2C417D448C46918289218B7337FCAF" results
# in more occurrences of "3F2C417D448C46918289218B7337FCAF" than the
# amount of "\n" that there was originally in the string (one more
# occurrence at the end of the string)! We can not explain the reason for
# this side effect. The line below corrects this problem! By Questor
TARGET_P=${TARGET_P%????????????????????????????????}
SPLIT_NOW=$(awk -F"$DELIMITER_P" '{for(i=1; i<=NF; i++){printf "%s\n", $i}}' <<< "${TARGET_P}")
while IFS= read -r LINE_NOW ; do
if [ ${REMOVE_N} -eq 1 ] ; then
# NOTE: We use "'" to prevent blank lines with no other characters
# in the sequence being erroneously removed! We do not know the
# reason for this side effect! By Questor
LN_NOW_WITH_N=$(awk 'BEGIN {RS="dn"} {gsub("3F2C417D448C46918289218B7337FCAF", "\n"); printf $0}' <<< "'${LINE_NOW}'")
# NOTE: We use the commands below to revert the intervention made
# immediately above! By Questor
LN_NOW_WITH_N=${LN_NOW_WITH_N%?}
LN_NOW_WITH_N=${LN_NOW_WITH_N#?}
F_SPLIT_R+=("$LN_NOW_WITH_N")
else
F_SPLIT_R+=("$LINE_NOW")
fi
done <<< "$SPLIT_NOW"
}
# --------------------------------------
# HOW TO USE
# ----------------
STRING_TO_SPLIT="
* How do I list all databases and tables using psql?
\"
sudo -u postgres /usr/pgsql-9.4/bin/psql -c \"\l\"
sudo -u postgres /usr/pgsql-9.4/bin/psql <DB_NAME> -c \"\dt\"
\"
\"
\list or \l: list all databases
\dt: list all tables in the current database
\"
[Ref.: https://dba.stackexchange.com/questions/1285/how-do-i-list-all-databases-and-tables-using-psql]
"
f_split "$STRING_TO_SPLIT" "bin/psql -c"
# --------------------------------------
# OUTPUT AND TEST
# ----------------
ARR_LENGTH=${#F_SPLIT_R[*]}
for (( i=0; i<=$(( $ARR_LENGTH -1 )); i++ )) ; do
echo " > -----------------------------------------"
echo "${F_SPLIT_R[$i]}"
echo " < -----------------------------------------"
done
if [ "$STRING_TO_SPLIT" == "${F_SPLIT_R[0]}bin/psql -c${F_SPLIT_R[1]}" ] ; then
echo " > -----------------------------------------"
echo "The strings are the same!"
echo " < -----------------------------------------"
fi
对于多行元素,为什么不考虑使用类似于
$ array=($(echo -e $'a a\nb b' | tr ' ' '§')) && array=("${array[@]//§/ }") && echo "${array[@]/%/ INTERELEMENT}"
a a INTERELEMENT b b INTERELEMENT
另一种方法是:
string="Paris, France, Europe"
IFS=', ' arr=(${string})
现在你的元素存储在“arr”数组中。 要遍历这些元素:
for i in ${arr[@]}; do echo $i; done
另一种方法可以是:
str="a, b, c, d" # assuming there is a space after ',' as in Q
arr=(${str//,/}) # delete all occurrences of ','
执行完这行代码后,“arr”就是一个包含四个字符串的数组。 这不需要处理IFS、read或任何其他特殊的东西,因此更简单和直接。
cut
是一个有用的 bash 命令,也可以定义分隔符。https://en.wikibooks.org/wiki/Cut 你还可以从固定宽度的记录结构中提取数据。https://en.wikipedia.org/wiki/Cut_(Unix) https://www.computerhope.com/unix/ucut.htm - JGFMK