我正在开发一个应用程序,需要从链接中获取网页的源代码,并解析该页面的html。
你能给我一些例子或起点,帮助我开始编写这样的应用程序吗?
我正在开发一个应用程序,需要从链接中获取网页的源代码,并解析该页面的html。
你能给我一些例子或起点,帮助我开始编写这样的应用程序吗?
您可以使用HttpClient执行HTTP GET请求并获取HTML响应,类似于以下内容:
HttpClient client = new DefaultHttpClient();
HttpGet request = new HttpGet(url);
HttpResponse response = client.execute(request);
String html = "";
InputStream in = response.getEntity().getContent();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder str = new StringBuilder();
String line = null;
while((line = reader.readLine()) != null)
{
str.append(line);
}
in.close();
html = str.toString();
<uses-permission android:name="android.permission.INTERNET" />
- Michelnew URI("http://www.google.com/")
,但我遇到了“NullReferenceException”的问题。除了“android.permission.INTERNET”之外,还需要其他权限吗? - Kamran AhmedString html = EntityUtils.toString(response.getEntity());
? - ben这个问题有点老了,但我认为现在应该发布我的答案,因为DefaultHttpClient
,HttpGet
等都已经被弃用。给定一个URL,这个函数应该获取并返回HTML。
public static String getHtml(String url) throws IOException {
// Build and set timeout values for the request.
URLConnection connection = (new URL(url)).openConnection();
connection.setConnectTimeout(5000);
connection.setReadTimeout(5000);
connection.connect();
// Read and store the result line by line then return the entire string.
InputStream in = connection.getInputStream();
BufferedReader reader = new BufferedReader(new InputStreamReader(in));
StringBuilder html = new StringBuilder();
for (String line; (line = reader.readLine()) != null; ) {
html.append(line);
}
in.close();
return html.toString();
}
public class RetrieveSiteData extends AsyncTask<String, Void, String> {
@Override
protected String doInBackground(String... urls) {
StringBuilder builder = new StringBuilder(100000);
for (String url : urls) {
DefaultHttpClient client = new DefaultHttpClient();
HttpGet httpGet = new HttpGet(url);
try {
HttpResponse execute = client.execute(httpGet);
InputStream content = execute.getEntity().getContent();
BufferedReader buffer = new BufferedReader(new InputStreamReader(content));
String s = "";
while ((s = buffer.readLine()) != null) {
builder.append(s);
}
} catch (Exception e) {
e.printStackTrace();
}
}
return builder.toString();
}
@Override
protected void onPostExecute(String result) {
}
}
调用它的方式如下:
new RetrieveFeedTask(new OnTaskFinished()
{
@Override
public void onFeedRetrieved(String feeds)
{
//do whatever you want to do with the feeds
}
}).execute("http://enterurlhere.com");
RetrieveFeedTask.class
class RetrieveFeedTask extends AsyncTask<String, Void, String>
{
String HTML_response= "";
OnTaskFinished onOurTaskFinished;
public RetrieveFeedTask(OnTaskFinished onTaskFinished)
{
onOurTaskFinished = onTaskFinished;
}
@Override
protected void onPreExecute()
{
super.onPreExecute();
}
@Override
protected String doInBackground(String... urls)
{
try
{
URL url = new URL(urls[0]); // enter your url here which to download
URLConnection conn = url.openConnection();
// open the stream and put it into BufferedReader
BufferedReader br = new BufferedReader(new InputStreamReader(conn.getInputStream()));
String inputLine;
while ((inputLine = br.readLine()) != null)
{
// System.out.println(inputLine);
HTML_response += inputLine;
}
br.close();
System.out.println("Done");
}
catch (MalformedURLException e)
{
e.printStackTrace();
}
catch (IOException e)
{
e.printStackTrace();
}
return HTML_response;
}
@Override
protected void onPostExecute(String feed)
{
onOurTaskFinished.onFeedRetrieved(feed);
}
}
OnTaskFinished.java
public interface OnTaskFinished
{
public void onFeedRetrieved(String feeds);
}
其他SO帖子的答案帮助了我。这不是逐行读取;假设html文件中间有一行null。作为前提,在项目设置中添加此依赖项"com.koushikdutta.ion:ion:2.2.1",在AsyncTASK中实现此代码。如果您想要返回的-something-在UI线程中,请将其传递给共同接口。
Ion.with(getApplicationContext()).
load("https://google.com/hashbrowns")
.asString()
.setCallback(new FutureCallback<String>()
{
@Override
public void onCompleted(Exception e, String result) {
//int s = result.lastIndexOf("user_id")+9;
// String st = result.substring(s,s+5);
// Log.e("USERID",st); //something
}
});
public class DownloadTask extends AsyncTask<String, Void, String> {
@Override
protected String doInBackground(String... urls) {
String result = "";
URL url;
HttpsURLConnection urlConnection = null;
try {
url = new URL(urls[0]);
urlConnection = (HttpsURLConnection) url.openConnection();
BufferedReader br = new BufferedReader(new InputStreamReader(urlConnection.getInputStream()));
String inputLine;
while ((inputLine = br.readLine()) != null)
{
// System.out.println(inputLine);
result += inputLine;
}
br.close();
return result;
} catch (Exception e) {
e.printStackTrace();
return "failed";
}
}
}
@Override
protected void onCreate(Bundle savedInstanceState) {
super.onCreate(savedInstanceState);
setContentView(R.layout.activity_main);
DownloadTask task = new DownloadTask();
String result = null;
try {
result = task.execute("https://www.example.com").get();
}catch (Exception e){
e.printStackTrace();
}
Log.i("Result", result);
}