什么是拆分字符串的最佳方法?

3
我有一个字符串集合。
Host: example.com, IP address: 37.0.122.151, SBL: SBL196170, status: unknown, level: 4, Malware: Citadel, AS: 198310, country: RU

我希望每个数据都采用这种格式。
$host = "example.com";
$ip = "37.0.122.151";
$SBL = "SBL196170";
$status = unknown;
$level = "4";
$malware = "Citadel";
$as = "1098310";
$country = "RU";

什么是获取该字符串的最佳方法?我应该先按“,”拆分,然后再按“:”拆分,还是有一种只需进行一次拆分的解决方案?谢谢。

1
这可能是正则表达式的一个很好的应用案例? - Mike Christensen
5个回答

4

就像这样:

$input = "Host: example.com, IP address: 37.0.122.151, SBL: SBL196170, status: unknown, level: 4, Malware: Citadel, AS: 198310, country: RU";
preg_match_all('/(\w+): ([\w.]+)/', $input, $matches);
print_r($matches);

输出:

Array
(
    [0] => Array
        (
            [0] => Host: example.com
            [1] => address: 37.0.122.151
            [2] => SBL: SBL196170
            [3] => status: unknown
            [4] => level: 4
            [5] => Malware: Citadel
            [6] => AS: 198310
            [7] => country: RU
        )

    [1] => Array
        (
            [0] => Host
            [1] => address
            [2] => SBL
            [3] => status
            [4] => level
            [5] => Malware
            [6] => AS
            [7] => country
        )

    [2] => Array
        (
            [0] => example.com
            [1] => 37.0.122.151
            [2] => SBL196170
            [3] => unknown
            [4] => 4
            [5] => Citadel
            [6] => 198310
            [7] => RU
        )

)

然后:

$mydata = array_combine($matches[1], $matches[2]);
print_r($mydata);

提供:

Array
(
    [Host] => example.com
    [address] => 37.0.122.151
    [SBL] => SBL196170
    [status] => unknown
    [level] => 4
    [Malware] => Citadel
    [AS] => 198310
    [country] => RU
)

1
我认为正则表达式对于简单的解析来说有些过度了,而在这种情况下你正在使用4个数组。 "问题"可能出现在键或值中包含逗号或冒号的转义序列中,例如key: "v,alue" - Diego C Nascimento
@DiegoCNascimento 在输入格式中没有指定引用字符串,如果我们必须担心“如果有人将错误的输入提供给我的程序会发生什么”,那么就永远不会有任何进展。垃圾输入,垃圾输出。 - Sammitch
我同意你在第一段的观点。但是关于另外一点,绝对不行,你应该始终检查有效输入,否则你会给不良用户提供漏洞。 - Diego C Nascimento
我喜欢这种方法,而且我认为Diego提出的问题并不是很严重:如果确实存在键或值中可能包含类似垃圾的情况,那么使用任何方法解析字符串的可能性非常小。如果一个字符串不符合任何类型的规则或模式,你真的不能指望能够可靠地像这样使用它。 - Chris Baker

1
我会在字符串上使用简单的explode函数,然后对于每个元素,用键/值信息填充一个数组:
$string = 'Host: ...';
$raw_array = explode(',', $string);
$final_array = array();
foreach($raw_array as $item) {
    $item_array = explode(':', trim($item));
    $key = trim($item_array[0]);
    $value = trim($item_array[1]);
    $final_array[$key] = $value;
}
var_dump($final_array);

请注意,这不是使用像您问题中要求的单个变量,而是根据字符串的键填充单个数组的键值。这是一种更加灵活的方法。

1
您可以使用正则表达式替换将其转换为类似查询字符串的字符串,然后使用parse_str将其转换为关联数组。没有循环,只需两行代码!
$string = preg_replace(array('/:/', '/, /'), array('=','&'), $string);
parse_str($string, $output);

var_dump($output);
/*
array(8) { ["Host"]=> string(8) " xxx.com" ["IP_address"]=> string(13) " 37.0.122.151" ["SBL"]=> string(10) " SBL196170" ["status"]=> string(8) " unknown" ["level"]=> string(2) " 4" ["Malware"]=> string(8) " Citadel" ["AS"]=> string(7) " 198310" ["country"]=> string(3) " RU" } 
*/

请在此处尝试:http://codepad.viper-7.com/5gwWyC

文档


0
加入一些函数式编程,你就会得到以下代码:
$string = 'Host: xxx.com, IP address: 37.0.122.151, SBL: SBL196170, status: unknown, level: 4, Malware: Citadel, AS: 198310, country: RU';
$result = array_reduce(explode(',', $string), function($result, $item) {
    $pair = explode(':', $item);
    $result[trim($pair[0])] = trim($pair[1]);
    return $result;
}, array());

0

下面是一个非常简单且实用的函数,我经常用它来处理类似这样的事情。

<?php

function get_string_between($string, $start, $end){
    $string = " ".$string;
    $ini = strpos($string,$start);
    if ($ini == 0) return "";
    $ini += strlen($start);
    $len = strpos($string,$end,$ini) - $ini;
    return substr($string,$ini,$len);
}

$src = "Host: xxx.com, IP address: 37.0.122.151, SBL: SBL196170, status: unknown, level: 4, Malware: Citadel, AS: 198310, country: RU";

//add a character to src to help identify the last field
$src = $src.",";

$host = get_string_between($src, "Host: ", ","); //this is grabbing any text between "Host: " and ","
$ip = get_string_between($src, "IP address: ", ",");
$SBL = get_string_between($src, "SBL: ", ",");
$status = get_string_between($src, "status: ", ",");
$level = get_string_between($src, "level: ", ",");
$malware = get_string_between($src, "Malware: ", ",");
$as = get_string_between($src, "AS: ", ",");
$country = get_string_between($src, "country: ", ",");

?>

编程愉快!


网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接