Ciao a tutti,
con la funzione Curl mi salvo in locale una pagina html in formato testo, da questa pagina dovrei estrarre alcuni dati che poi devo salvare in un altro file di testo.
Purtroppo il file è abbastanza complesso e ho difficoltà a fare questa estrapolazione. Qualcuno mi da qualche suggerimento?
File esempio
Il risultato dovrebbe essere:codice:<html> <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN"> <head> <title>PROXY LISTS - Free Anonymous Proxies and Proxy Tools</title> <META http-equiv=Content-Language content=en-us> <META http-equiv=Content-Type content="text/html; charset=windows-1252"> <META name=description content="Free anonymous proxy and socks lists with a massive friendly forum. Free Privacy tools and Proxy software, List sorted into Anonymous, Transparent, HTTP, HTTPS, SSL, CONNECT, IRC. Detailed host and country details. Fastest Proxy Lists."> <META name=keywords content="free proxies, proxy lists, proxy list, public proxies, free proxy, fastest, proxies, anonymous proxies, anonymous proxy, proxy server, proxy forum, free, socks, anonymous, public, irc, free anonymous proxy, free proxy server"> <META name="revisit-after" content="1 days"> <script type="text/javascript" src="/ajax.js"></script> <style type="text/css"> .proxbo {COLOR: #000000; FONT-FAMILY: Verdana; FONT-SIZE: 16px; FONT-WEIGHT: none; TEXT-DECORATION: none} </style> <LINK REL="SHORTCUT ICON" HREF="http://www.digitalcybersoft.com/favicon.ico"> </head> <body bgcolor="ffffff" link="black" vlink="black" alink="black"> <center> <table cellspacing=0 cellpadding=0 border=0 width=950> <tr> <td> [img]/images/title.gif[/img] <center> <script type="text/javascript"> <!-- google_ad_client = "pub-3868464957071471"; google_ad_width = 336; google_ad_height = 280; google_ad_format = "336x280_as"; google_ad_type = "text_image"; google_ad_channel =""; google_color_border = "ffffff"; google_color_bg = "ffffff"; google_color_link = "3366cc"; google_color_text = "666666"; google_color_url = "666666"; //--></script> <script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> </script> <script type="text/javascript"> <!-- google_ad_client = "pub-3868464957071471"; google_ad_width = 336; google_ad_height = 280; google_ad_format = "336x280_as"; google_ad_type = "text_image"; google_ad_channel =""; google_color_border = "ffffff"; google_color_bg = "ffffff"; google_color_link = "3366cc"; google_color_text = "666666"; google_color_url = "666666"; //--></script> <script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js"> </script> </center> <table cellspacing=5 cellpadding=0 border=0 width=100% height=20 background="/images/titlebar.gif"> <tr> <td width=20> </td> <td width=150><font color=white face=Arial style="text-decoration:none">IRC Insomnia</font></td> <td width=200><font color=white face=Arial style="text-decoration:none">Binary Proxy Server</font></td> <td width=120><font color=white face=Arial style="text-decoration:none">Proxy List</font></td> <td width=200><font color=white face=Arial style="text-decoration:none">Digital Cyber Server</font></td> <td width=150><font color=white face=Arial style="text-decoration:none">Links</font></td> <td> </td> </tr> </table> </td></tr> <tr bgcolor="FFFFFF" height=5><td></td></tr> <tr bgcolor="FFFFFF"><td> <font face='Verdana'> <table> <tr> <td VALIGN="TOP"> <table width="185" height="190" border="0" style="background-color:#f4f6f6"> <tr height="30"><td> <font style="font-size: 14px; font-weight:bold;text-decoration:none; font-family:arial,sans-serif;"> • <font color=#3366cc>Fresh Proxy List</font> <font style="font-size: 10px">The main lists of Proxies, get all your open Proxies here!</font> • <font color=#3366cc>Proxy Forum</font> <font style="font-size: 10px">More Lists and Proxy Discussion</font> • <font color=#3366cc>Proxy Checker</font> <font style="font-size: 10px">Check if your Proxies work</font> • <font color=#3366cc>List Leecher</font> <font style="font-size: 10px">Use our script to leech Lists from other sites</font> • <font color=#3366cc>SwitchProxy</font> <font style="font-size: 10px">Make Firefox Anonymous the easy way</font> • <font color=#3366cc>Judge Me</font> <font style="font-size: 10px">Is your browser proxy worthy of your use?</font> • <font color=#3366cc>Documents</font> <font style="font-size: 10px">Various documents about Proxies</font> </tr> </table> </td> <td VALIGN="TOP"> <font face="Verdana" style="font-size: 11pt"> <center> Proxies By Port: ALL 80 81 1080 3128 8000 8080 Proxies By Level: Level 1 Level 2 Level 3 Level 4 Level 5 Proxies By Type: Anonymous Transparent Proxies By Method: GET HTTPS CONNECT Proxies By Format: TEXT HTML </center> Listing Only Port: 8080 <pre> 222.223.82.137:8080 Transparent Unknown 118.182.246.56:8080 Anonymous Unknown 62.84.13.37:8080 Transparent Unknown 193.87.164.120:8080 Transparent Unknown 210.101.131.232:8080 Transparent Unknown 189.72.230.140:8080 Transparent Unknown 123.232.99.216:8080 Transparent Unknown 210.52.58.10:8080 Transparent Unknown 124.107.16.50:8080 Transparent Unknown 92.61.178.107:8080 Transparent Unknown 221.204.246.150:8080 Transparent Unknown </pre> 45 Working Proxies Found... </td> <td VALIGN="TOP"> </td></tr></table> </td> </tr> </table>
222.223.82.137:8080
118.182.246.56:8080
62.84.13.37:8080
193.87.164.120:8080
210.101.131.232:8080
189.72.230.140:8080
123.232.99.216:8080
210.52.58.10:8080
124.107.16.50:8080
92.61.178.107:8080
221.204.246.150:8080
Posso estrapolarmi le righe contenute fra i tag <pre></pre>?
Grazie

Rispondi quotando
