Ciao a tutti,
con la funzione Curl mi salvo in locale una pagina html in formato testo, da questa pagina dovrei estrarre alcuni dati che poi devo salvare in un altro file di testo.
Purtroppo il file è abbastanza complesso e ho difficoltà a fare questa estrapolazione. Qualcuno mi da qualche suggerimento?

File esempio
codice:
<html>

<!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
<head>
<title>PROXY LISTS - Free Anonymous Proxies and Proxy Tools</title>
<META http-equiv=Content-Language content=en-us>
<META http-equiv=Content-Type content="text/html; charset=windows-1252">
<META name=description content="Free anonymous proxy and socks lists with a massive friendly forum. Free Privacy tools and Proxy software, List sorted into Anonymous, Transparent, HTTP, HTTPS, SSL, CONNECT, IRC. Detailed host and country details. Fastest Proxy Lists.">
<META name=keywords content="free proxies, proxy lists, proxy list, public proxies, free proxy, fastest, proxies, anonymous proxies, anonymous proxy, proxy server, proxy forum, free, socks, anonymous, public, irc, free anonymous proxy, free proxy server">
<META name="revisit-after" content="1 days">
<script type="text/javascript" src="/ajax.js"></script>
 
<style type="text/css">
 .proxbo {COLOR: #000000; FONT-FAMILY: Verdana; FONT-SIZE: 16px; FONT-WEIGHT: none; TEXT-DECORATION: none}
</style>

<LINK REL="SHORTCUT ICON" HREF="http://www.digitalcybersoft.com/favicon.ico"> 

</head>

<body bgcolor="ffffff" link="black" vlink="black" alink="black">



<center>
<table cellspacing=0 cellpadding=0 border=0 width=950>
<tr>
<td>
[img]/images/title.gif[/img]

<center>

<script type="text/javascript">
<!--
google_ad_client = "pub-3868464957071471";
google_ad_width = 336;
google_ad_height = 280;
google_ad_format = "336x280_as";
google_ad_type = "text_image";
google_ad_channel ="";
google_color_border = "ffffff";
google_color_bg = "ffffff";
google_color_link = "3366cc";
google_color_text = "666666";
google_color_url = "666666";
//--></script>
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>

<script type="text/javascript">
<!--
google_ad_client = "pub-3868464957071471";
google_ad_width = 336;
google_ad_height = 280;
google_ad_format = "336x280_as";
google_ad_type = "text_image";
google_ad_channel ="";
google_color_border = "ffffff";
google_color_bg = "ffffff";
google_color_link = "3366cc";
google_color_text = "666666";
google_color_url = "666666";
//--></script>
<script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
</script>
</center>

<table cellspacing=5 cellpadding=0 border=0 width=100% height=20 background="/images/titlebar.gif">
<tr>
<td width=20> </td>
<td width=150><font color=white face=Arial style="text-decoration:none">IRC Insomnia</font></td>
<td width=200><font color=white face=Arial style="text-decoration:none">Binary Proxy Server</font></td>
<td width=120><font color=white face=Arial style="text-decoration:none">Proxy List</font></td>
<td width=200><font color=white face=Arial style="text-decoration:none">Digital Cyber Server</font></td>
<td width=150><font color=white face=Arial style="text-decoration:none">Links</font></td>
<td> </td>
</tr>
</table>
</td></tr>
<tr bgcolor="FFFFFF" height=5><td></td></tr>

<tr bgcolor="FFFFFF"><td>
<font face='Verdana'>
 


<table>
<tr>
<td VALIGN="TOP">

<table width="185" height="190" border="0" style="background-color:#f4f6f6">
<tr height="30"><td>


<font style="font-size: 14px; font-weight:bold;text-decoration:none;
font-family:arial,sans-serif;">

&bull; <font color=#3366cc>Fresh Proxy List</font>

<font style="font-size: 10px">The main lists of Proxies, get all your open Proxies here!</font>


&bull; <font color=#3366cc>Proxy Forum</font>

<font style="font-size: 10px">More Lists and Proxy Discussion</font>


&bull; <font color=#3366cc>Proxy Checker</font>

<font style="font-size: 10px">Check if your Proxies work</font>


&bull; <font color=#3366cc>List Leecher</font>

<font style="font-size: 10px">Use our script to leech Lists from other sites</font>


&bull; <font color=#3366cc>SwitchProxy</font>

<font style="font-size: 10px">Make Firefox Anonymous the easy way</font>


&bull; <font color=#3366cc>Judge Me</font>

<font style="font-size: 10px">Is your browser proxy worthy of your use?</font>


&bull; <font color=#3366cc>Documents</font>

<font style="font-size: 10px">Various documents about Proxies</font>





</tr>

</table>
 

</td>

<td VALIGN="TOP">

<font face="Verdana"  style="font-size: 11pt">


<center>
Proxies By Port: ALL 80 81 1080 3128 8000  8080

Proxies By Level: Level 1 Level 2 Level 3 Level 4 Level 5

Proxies By Type: Anonymous Transparent

Proxies By Method: GET HTTPS CONNECT

Proxies By Format: TEXT HTML



</center>


Listing Only Port: 8080

<pre>
222.223.82.137:8080     Transparent   Unknown
118.182.246.56:8080     Anonymous     Unknown
62.84.13.37:8080       Transparent   Unknown
193.87.164.120:8080     Transparent   Unknown
210.101.131.232:8080    Transparent   Unknown
189.72.230.140:8080    Transparent   Unknown
123.232.99.216:8080     Transparent   Unknown
210.52.58.10:8080       Transparent   Unknown
124.107.16.50:8080      Transparent   Unknown
92.61.178.107:8080      Transparent   Unknown
221.204.246.150:8080    Transparent   Unknown
</pre>

45 Working Proxies Found...


</td>
<td VALIGN="TOP">





</td></tr></table>
 

</td>
</tr>
</table>
Il risultato dovrebbe essere:

222.223.82.137:8080
118.182.246.56:8080
62.84.13.37:8080
193.87.164.120:8080
210.101.131.232:8080
189.72.230.140:8080
123.232.99.216:8080
210.52.58.10:8080
124.107.16.50:8080
92.61.178.107:8080
221.204.246.150:8080

Posso estrapolarmi le righe contenute fra i tag <pre></pre>?

Grazie