Ciao a tutti,
con la funzione Curl mi salvo in locale una pagina html in formato testo, da questa pagina dovrei estrarre alcuni dati che poi devo salvare in un altro file di testo.
Purtroppo il file è abbastanza complesso e ho difficoltà a fare questa estrapolazione. Qualcuno mi da qualche suggerimento?
File esempio
codice:
<html>
<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<head>
<title>PROXY LISTS - Free Anonymous Proxies and Proxy Tools</title>
<META http-equiv=Content-Language content=en-us>
<META http-equiv=Content-Type content="text/html; charset=windows-1252">
<META name=description content="Free anonymous proxy and socks lists with a massive friendly forum. Free Privacy tools and Proxy software, List sorted into Anonymous, Transparent, HTTP, HTTPS, SSL, CONNECT, IRC. Detailed host and country details. Fastest Proxy Lists.">
<META name=keywords content="free proxies, proxy lists, proxy list, public proxies, free proxy, fastest, proxies, anonymous proxies, anonymous proxy, proxy server, proxy forum, free, socks, anonymous, public, irc, free anonymous proxy, free proxy server">
<META name="revisit-after" content="1 days">
<script type="text/javascript" src="/ajax.js"></script>
<style type="text/css">
.proxbo {COLOR: #000000; FONT-FAMILY: Verdana; FONT-SIZE: 16px; FONT-WEIGHT: none; TEXT-DECORATION: none}
</style>
<LINK REL="SHORTCUT ICON" HREF="http://www.digitalcybersoft.com/favicon.ico">
</head>
<body bgcolor="ffffff" link="black" vlink="black" alink="black">
<center>
<table cellspacing=0 cellpadding=0 border=0 width=950>
<tr>
<td>
[img]/images/title.gif[/img]
<center>
<script type="text/javascript">
<!--
google_ad_client = "pub-3868464957071471";
google_ad_width = 336;
google_ad_height = 280;
google_ad_format = "336x280_as";
google_ad_type = "text_image";
google_ad_channel ="";
google_color_border = "ffffff";
google_color_bg = "ffffff";
google_color_link = "3366cc";
google_color_text = "666666";
google_color_url = "666666";
//--></script>
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
<script type="text/javascript">
<!--
google_ad_client = "pub-3868464957071471";
google_ad_width = 336;
google_ad_height = 280;
google_ad_format = "336x280_as";
google_ad_type = "text_image";
google_ad_channel ="";
google_color_border = "ffffff";
google_color_bg = "ffffff";
google_color_link = "3366cc";
google_color_text = "666666";
google_color_url = "666666";
//--></script>
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
</center>
<table cellspacing=5 cellpadding=0 border=0 width=100% height=20 background="/images/titlebar.gif">
<tr>
<td width=20> </td>
<td width=150><font color=white face=Arial style="text-decoration:none">IRC Insomnia</font></td>
<td width=200><font color=white face=Arial style="text-decoration:none">Binary Proxy Server</font></td>
<td width=120><font color=white face=Arial style="text-decoration:none">Proxy List</font></td>
<td width=200><font color=white face=Arial style="text-decoration:none">Digital Cyber Server</font></td>
<td width=150><font color=white face=Arial style="text-decoration:none">Links</font></td>
<td> </td>
</tr>
</table>
</td></tr>
<tr bgcolor="FFFFFF" height=5><td></td></tr>
<tr bgcolor="FFFFFF"><td>
<font face='Verdana'>
<table>
<tr>
<td VALIGN="TOP">
<table width="185" height="190" border="0" style="background-color:#f4f6f6">
<tr height="30"><td>
<font style="font-size: 14px; font-weight:bold;text-decoration:none;
font-family:arial,sans-serif;">
• <font color=#3366cc>Fresh Proxy List</font>
<font style="font-size: 10px">The main lists of Proxies, get all your open Proxies here!</font>
• <font color=#3366cc>Proxy Forum</font>
<font style="font-size: 10px">More Lists and Proxy Discussion</font>
• <font color=#3366cc>Proxy Checker</font>
<font style="font-size: 10px">Check if your Proxies work</font>
• <font color=#3366cc>List Leecher</font>
<font style="font-size: 10px">Use our script to leech Lists from other sites</font>
• <font color=#3366cc>SwitchProxy</font>
<font style="font-size: 10px">Make Firefox Anonymous the easy way</font>
• <font color=#3366cc>Judge Me</font>
<font style="font-size: 10px">Is your browser proxy worthy of your use?</font>
• <font color=#3366cc>Documents</font>
<font style="font-size: 10px">Various documents about Proxies</font>
</tr>
</table>
</td>
<td VALIGN="TOP">
<font face="Verdana" style="font-size: 11pt">
<center>
Proxies By Port: ALL 80 81 1080 3128 8000 8080
Proxies By Level: Level 1 Level 2 Level 3 Level 4 Level 5
Proxies By Type: Anonymous Transparent
Proxies By Method: GET HTTPS CONNECT
Proxies By Format: TEXT HTML
</center>
Listing Only Port: 8080
<pre>
222.223.82.137:8080 Transparent Unknown
118.182.246.56:8080 Anonymous Unknown
62.84.13.37:8080 Transparent Unknown
193.87.164.120:8080 Transparent Unknown
210.101.131.232:8080 Transparent Unknown
189.72.230.140:8080 Transparent Unknown
123.232.99.216:8080 Transparent Unknown
210.52.58.10:8080 Transparent Unknown
124.107.16.50:8080 Transparent Unknown
92.61.178.107:8080 Transparent Unknown
221.204.246.150:8080 Transparent Unknown
</pre>
45 Working Proxies Found...
</td>
<td VALIGN="TOP">
</td></tr></table>
</td>
</tr>
</table>
Il risultato dovrebbe essere:
222.223.82.137:8080
118.182.246.56:8080
62.84.13.37:8080
193.87.164.120:8080
210.101.131.232:8080
189.72.230.140:8080
123.232.99.216:8080
210.52.58.10:8080
124.107.16.50:8080
92.61.178.107:8080
221.204.246.150:8080
Posso estrapolarmi le righe contenute fra i tag <pre></pre>?
Grazie