将URL拆分为主机、端口和资源 - C++

5
我需要将URL分割成主机、端口和资源。我查阅了很多参考资料,但没有找到可以帮助我的内容。这就是我想要的格式:
例如: URL为-1.2.3.4:5678/path1/path2.html 必要的输出是:主机-1.2.3.4,端口-5678,资源-/path1/path2.html
这是我尝试过的方法:
#include <iostream>
 #include <cstddef>
 #include <string>
 using namespace std;

int main()
{
   string url="http://qwert.mjgug.ouhnbg:5678/path1/path2.html";
   size_t found = url.find_first_of("://");
   cout<<found<<endl;
   string protocol=url.substr(0,found);
   size_t found1 =url.find_first_of(":");
   cout<<found1<<endl;
   string host =url.substr(found+3,found1-found+1);
   size_t found2 = url.find_first_of(":/");
   string port1 =url.substr(found1+7,found2+found1-1);
   string port =url.substr(found2+1);
   cout<<protocol<<endl;
   cout<<host<<endl;
   cout<<port1<<endl;
   cout<<port;
   return 0;
}

我的期望结果是:

Protocol - http
Host - qwert.mjgug.ouhnbg
Port - 5678
Resource - path1/path2.html

但是我的结果是:

http:                                                                                                                                                  
qwert.mj                                                                                                                                               
t.mjgug                                                                                                                                                
//qwert.mjgug.ouhnbg:5678/path1/path2.html

我应该改变什么?


将字符串写入 std::stringstream 中,并使用 std::getline(stream, hoststr, ':')std::getline(stream, portstr, '/') 进行分割。流中的剩余部分只需要在前面添加 '/' 即可得到路径。 - user4581301
2个回答

4
使用string.first_find_of(":")获取任意字符的第一次出现的索引,使用string.substr(pos,len)获取从索引pos开始的子字符串,长度为len;
 #include <iostream>
 #include <cstddef>
 #include <string>
 using namespace std;

int main()
{
   string url="1.2.3.4:5678/path1/path2.html";
   size_t found = url.find_first_of(":");
   string host=url.substr(0,found);
   size_t found1 =url.find_first_of("/");
   string port =url.substr(found+1,found1-found-1);
   string resource =url.substr(found1);
   cout<<host<<endl;
   cout<<port<<endl;
   cout<<resource;
   return 0;
}

使用http或https协议的url

int main()
{
  string url="http://qwert.mjgug.ouhnbg:5678/path1/path2.html";
  size_t found = url.find_first_of(":");
  string protocol=url.substr(0,found); 

 string url_new=url.substr(found+3); //url_new is the url excluding the http part
 size_t found1 =url_new.find_first_of(":");
 string host =url_new.substr(0,found1);

 size_t found2 = url_new.find_first_of("/");
 string port =url_new.substr(found1+1,found2-found1-1);
 string path =url_new.substr(found2);

  cout<<protocol<<endl;
 cout<<host<<endl;
 cout<<port<<endl;
 cout<<path;
 return 0;
 }

非常感谢你,Chandini。这真的是一次巨大的帮助!我是认真的。 - Electronic Brat
这是相同的方式... url=https://1.2.3.4:5678/path1/path2.html。如果您在开头有http或https,首先找到“/”的出现,让该索引为i,并通过protocol=url.substr(0,i+1)获取协议部分,然后在url_excluding_protocol = url.substr(i+2)中搜索主机、端口和资源。 - Chandini
我尝试了,但没有得到预期的结果。不知道哪里出了问题。请检查上面的问题,我已经编辑过了。 - Electronic Brat
我尝试了很多次,但我真的不明白哪里出了问题。你能帮我看一下上面的代码吗? - Electronic Brat
让我们在聊天中继续这个讨论 - Chandini
显示剩余2条评论

0
将这两者结合起来:
string url = "http://qwert.mjgug.ouhnbg:5678/path1/path2.html";
size_t found = 0;
string protocol;
if (url.rfind("http", 0) == 0) {
    // URL starts with http[s]
    found = url.find_first_of(":");
    protocol = url.substr(0, found);
    found += 3; // Step over colon and slashes
}
size_t found1 = url.find_first_of(":", found);
string host;
string port;
string path;
if (string::npos != found1) {
    // Port found
    host = url.substr(found, found1 - found);
    size_t found2 = url.find_first_of("/", found1);
    port = url.substr(found1 + 1, found2 - found1 - 1);
    path = url.substr(found2);
} else {
    // No port
    found1 = url.find_first_of("/", found);
    host = url.substr(found, found1 - found);
    path = url.substr(found1);
}
cout << "protocol = [" << protocol << "]";
cout << "host = [" << host << "]";
cout << "port = [" << port << "]";
cout << "path = [" << path << "]";

网页内容由stack overflow 提供, 点击上面的
可以查看英文原文,
原文链接