解析mysql:///和sqlite:///URLs

4
我们有一个小的正则表达式模块,用于解析以下格式的URL:
if( my ($conn, $driver, $user, $pass, $host, $port, $dbname, $table_name, $tparam_name, $tparam_value, $conn_param_string) =
    $url =~ m{^((\w*)://(?:(\w+)(?:\:([^/\@]*))?\@)?(?:([\w\-\.]+)(?:\:(\d+))?)?/(\w*))(?:/(\w+)(?:\?(\w+)=(\w+))?)?((?:;(\w+)=(\w+))*)$} ) {

mysql://anonymous@my.self.com:1234/dbname

现在我们想要添加解析sqlite URL的功能,它可能像这样:

sqlite:///dbname_which_is_a_file

但是绝对路径不起作用,比如:sqlite:///tmp/dbname_which_is_a_file

那么正确的做法是什么呢?


1
顺便提一下,当处理这种长的非//x正则表达式时,你可以发现YAPE::Regex::Explain很有用。我在这里给出了这个具体的正则表达式作为例子。 - Pablo Marin-Garcia
2个回答

4

CPAN模块URI::Split比脆弱的正则表达式更适合长期使用。以下是它的POD中的概述:

use URI::Split qw(uri_split uri_join);
($scheme, $auth, $path, $query, $frag) = uri_split($uri);
$uri = uri_join($scheme, $auth, $path, $query, $frag);

一个更为通用的模块(更加灵活、复杂)是URI,但对于简单的使用来说,它的额外复杂性可能并不必要。

顺便说一下,URI 是统一资源标识符的缩写,是 URL 的超集或父级。URL 是 URI 的一种具体应用。


2
正则表达式的问题在于无法处理超过两个元素的路径。它将把路径分为数据库名称和表名称(如果有的话)。此外,这个正则表达式无法处理SQLite特殊文件名,例如“:memory”(这些对于测试非常有用)。
为了拥有可维护的正则表达式方法,最好的方法是使用一个分配表,其中包含需要不同解析的主要协议,并为每种不同的方法编写一个子程序。同时,使用//x的正则表达式会有所帮助,因为它可以包含注释,帮助提高可维护性。
 sub test_re{
     my $url =shift;
     my $x={};
     @$x{qw(conn driver user pass host port dbname table_name tparam_name tparam_value conn_param_string)} =
         $url =~ m{
                ^(
                  (\w*)
                  ://
                  (?:
                    (\w+) # user
                    (?:
                      \:
                      ([^/\@]*) # password 
                    )?
                    \@
                  )? # could not have user,pass
                  (?:
                    ([\w\-\.]+) #host
                    (?:
                      \:
                      (\d+) # port
                    )? # port optional
                  )? # host and port optional
                  / # become in a third '/' if no user pass host and port
                  (\w*) # get the db (only until the first '/' is any). Will not work with full paths for sqlite.
                )
                (?:
                  /  # if tables
                  (\w+) # get table
                  (?:
                    \? # parameters
                    (\w+)
                    =
                   (\w+)
                  )? # parameter is conditional but would have always a tablename
                )? # conditinal table and parameter
                (
                  (?:
                    ;
                    (\w+)
                    =
                    (\w+)
                  )* # rest of parameters if any
                )
                $
             }x;
     return $x;
 }

但我建议使用URI::Split(比URI更简洁),然后根据需要分割路径。

您可以在此处查看使用RE与URI::Split的区别:

#!/usr/bin/env perl

use feature ':5.10';
use strict;
use URI::Split qw(uri_join uri_split);
use Data::Dumper;

my $urls = [qw(
             mysql://anonymous@my.self.com:1234/dbname
             mysql://anonymous@my.self.com:1234/dbname/tablename
             mysql://anonymous@my.self.com:1234/dbname/pathextra/tablename
             sqlite:///dbname_which_is_a_file
             sqlite:///tmp/dbname_which_is_a_file
             sqlite:///tmp/db/dbname_which_is_a_file
             sqlite:///:dbname_which_is_a_file
             sqlite:///:memory
             )];



foreach my $url (@$urls) {
    print Dumper(test_re($url));
    print Dumper(uri_split($url));
}

结果:

 [...]
 == testing sqlite:///dbname_which_is_a_file ==
 - RE
 $VAR1 = {
           'pass' => undef,
           'port' => undef,
           'dbname' => 'dbname_which_is_a_file',
           'host' => undef,
           'conn_param_string' => '',
           'conn' => 'sqlite:///dbname_which_is_a_file',
           'tparam_name' => undef,
           'tparam_value' => undef,
           'user' => undef,
           'table_name' => undef,
           'driver' => 'sqlite'
         };

 - URI::Split
 $VAR1 = 'sqlite';
 $VAR2 = '';
 $VAR3 = '/dbname_which_is_a_file';
 $VAR4 = undef;
 $VAR5 = undef;

 == testing sqlite:///tmp/dbname_which_is_a_file ==
 - RE
 $VAR1 = {
           'pass' => undef,
           'port' => undef,
           'dbname' => 'tmp',
           'host' => undef,
           'conn_param_string' => '',
           'conn' => 'sqlite:///tmp',
           'tparam_name' => undef,
           'tparam_value' => undef,
           'user' => undef,
           'table_name' => 'dbname_which_is_a_file',
           'driver' => 'sqlite'
         };

 - URI::Split
 $VAR1 = 'sqlite';
 $VAR2 = '';
 $VAR3 = '/tmp/dbname_which_is_a_file';
 $VAR4 = undef;
 $VAR5 = undef;

== testing sqlite:///tmp/db/dbname_which_is_a_file ==
- RE
$VAR1 = {
          'pass' => undef,
          'port' => undef,
          'dbname' => undef,
          'host' => undef,
          'conn_param_string' => undef,
          'conn' => undef,
          'tparam_name' => undef,
          'tparam_value' => undef,
          'user' => undef,
          'table_name' => undef,
          'driver' => undef
        };

- URI::Split
$VAR1 = 'sqlite';
$VAR2 = '';
$VAR3 = '/tmp/db/dbname_which_is_a_file';
$VAR4 = undef;
$VAR5 = undef;

== testing sqlite:///:memory ==
- RE
$VAR1 = {
          'pass' => undef,
          'port' => undef,
          'dbname' => undef,
          'host' => undef,
          'conn_param_string' => undef,
          'conn' => undef,
          'tparam_name' => undef,
          'tparam_value' => undef,
          'user' => undef,
          'table_name' => undef,
          'driver' => undef
        };

- URI::Split
$VAR1 = 'sqlite';
$VAR2 = '';
$VAR3 = '/:memory';
$VAR4 = undef;
$VAR5 = undef;

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接