Visualizzazione dei risultati da 1 a 3 su 3
  1. #1

    Estrazione righe da file di testo complesso

    Ciao a tutti,
    con la funzione Curl mi salvo in locale una pagina html in formato testo, da questa pagina dovrei estrarre alcuni dati che poi devo salvare in un altro file di testo.
    Purtroppo il file è abbastanza complesso e ho difficoltà a fare questa estrapolazione. Qualcuno mi da qualche suggerimento?

    File esempio
    codice:
    <html>
    
    <!DOCTYPE HTML PUBLIC "-//W3C//DTD HTML 4.0 Transitional//EN">
    <head>
    <title>PROXY LISTS - Free Anonymous Proxies and Proxy Tools</title>
    <META http-equiv=Content-Language content=en-us>
    <META http-equiv=Content-Type content="text/html; charset=windows-1252">
    <META name=description content="Free anonymous proxy and socks lists with a massive friendly forum. Free Privacy tools and Proxy software, List sorted into Anonymous, Transparent, HTTP, HTTPS, SSL, CONNECT, IRC. Detailed host and country details. Fastest Proxy Lists.">
    <META name=keywords content="free proxies, proxy lists, proxy list, public proxies, free proxy, fastest, proxies, anonymous proxies, anonymous proxy, proxy server, proxy forum, free, socks, anonymous, public, irc, free anonymous proxy, free proxy server">
    <META name="revisit-after" content="1 days">
    <script type="text/javascript" src="/ajax.js"></script>
     
    <style type="text/css">
     .proxbo {COLOR: #000000; FONT-FAMILY: Verdana; FONT-SIZE: 16px; FONT-WEIGHT: none; TEXT-DECORATION: none}
    </style>
    
    <LINK REL="SHORTCUT ICON" HREF="http://www.digitalcybersoft.com/favicon.ico"> 
    
    </head>
    
    <body bgcolor="ffffff" link="black" vlink="black" alink="black">
    
    
    
    <center>
    <table cellspacing=0 cellpadding=0 border=0 width=950>
    <tr>
    <td>
    [img]/images/title.gif[/img]
    
    <center>
    
    <script type="text/javascript">
    <!--
    google_ad_client = "pub-3868464957071471";
    google_ad_width = 336;
    google_ad_height = 280;
    google_ad_format = "336x280_as";
    google_ad_type = "text_image";
    google_ad_channel ="";
    google_color_border = "ffffff";
    google_color_bg = "ffffff";
    google_color_link = "3366cc";
    google_color_text = "666666";
    google_color_url = "666666";
    //--></script>
    <script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
    </script>
    
    <script type="text/javascript">
    <!--
    google_ad_client = "pub-3868464957071471";
    google_ad_width = 336;
    google_ad_height = 280;
    google_ad_format = "336x280_as";
    google_ad_type = "text_image";
    google_ad_channel ="";
    google_color_border = "ffffff";
    google_color_bg = "ffffff";
    google_color_link = "3366cc";
    google_color_text = "666666";
    google_color_url = "666666";
    //--></script>
    <script type="text/javascript" src="http://pagead2.googlesyndication.com/pagead/show_ads.js">
    </script>
    </center>
    
    <table cellspacing=5 cellpadding=0 border=0 width=100% height=20 background="/images/titlebar.gif">
    <tr>
    <td width=20> </td>
    <td width=150><font color=white face=Arial style="text-decoration:none">IRC Insomnia</font></td>
    <td width=200><font color=white face=Arial style="text-decoration:none">Binary Proxy Server</font></td>
    <td width=120><font color=white face=Arial style="text-decoration:none">Proxy List</font></td>
    <td width=200><font color=white face=Arial style="text-decoration:none">Digital Cyber Server</font></td>
    <td width=150><font color=white face=Arial style="text-decoration:none">Links</font></td>
    <td> </td>
    </tr>
    </table>
    </td></tr>
    <tr bgcolor="FFFFFF" height=5><td></td></tr>
    
    <tr bgcolor="FFFFFF"><td>
    <font face='Verdana'>
     
    
    
    <table>
    <tr>
    <td VALIGN="TOP">
    
    <table width="185" height="190" border="0" style="background-color:#f4f6f6">
    <tr height="30"><td>
    
    
    <font style="font-size: 14px; font-weight:bold;text-decoration:none;
    font-family:arial,sans-serif;">
    
    &bull; <font color=#3366cc>Fresh Proxy List</font>
    
    <font style="font-size: 10px">The main lists of Proxies, get all your open Proxies here!</font>
    
    
    &bull; <font color=#3366cc>Proxy Forum</font>
    
    <font style="font-size: 10px">More Lists and Proxy Discussion</font>
    
    
    &bull; <font color=#3366cc>Proxy Checker</font>
    
    <font style="font-size: 10px">Check if your Proxies work</font>
    
    
    &bull; <font color=#3366cc>List Leecher</font>
    
    <font style="font-size: 10px">Use our script to leech Lists from other sites</font>
    
    
    &bull; <font color=#3366cc>SwitchProxy</font>
    
    <font style="font-size: 10px">Make Firefox Anonymous the easy way</font>
    
    
    &bull; <font color=#3366cc>Judge Me</font>
    
    <font style="font-size: 10px">Is your browser proxy worthy of your use?</font>
    
    
    &bull; <font color=#3366cc>Documents</font>
    
    <font style="font-size: 10px">Various documents about Proxies</font>
    
    
    
    
    
    </tr>
    
    </table>
     
    
    </td>
    
    <td VALIGN="TOP">
    
    <font face="Verdana"  style="font-size: 11pt">
    
    
    <center>
    Proxies By Port: ALL 80 81 1080 3128 8000  8080
    
    Proxies By Level: Level 1 Level 2 Level 3 Level 4 Level 5
    
    Proxies By Type: Anonymous Transparent
    
    Proxies By Method: GET HTTPS CONNECT
    
    Proxies By Format: TEXT HTML
    
    
    
    </center>
    
    
    Listing Only Port: 8080
    
    <pre>
    222.223.82.137:8080     Transparent   Unknown
    118.182.246.56:8080     Anonymous     Unknown
    62.84.13.37:8080       Transparent   Unknown
    193.87.164.120:8080     Transparent   Unknown
    210.101.131.232:8080    Transparent   Unknown
    189.72.230.140:8080    Transparent   Unknown
    123.232.99.216:8080     Transparent   Unknown
    210.52.58.10:8080       Transparent   Unknown
    124.107.16.50:8080      Transparent   Unknown
    92.61.178.107:8080      Transparent   Unknown
    221.204.246.150:8080    Transparent   Unknown
    </pre>
    
    45 Working Proxies Found...
    
    
    </td>
    <td VALIGN="TOP">
    
    
    
    
    
    </td></tr></table>
     
    
    </td>
    </tr>
    </table>
    Il risultato dovrebbe essere:

    222.223.82.137:8080
    118.182.246.56:8080
    62.84.13.37:8080
    193.87.164.120:8080
    210.101.131.232:8080
    189.72.230.140:8080
    123.232.99.216:8080
    210.52.58.10:8080
    124.107.16.50:8080
    92.61.178.107:8080
    221.204.246.150:8080

    Posso estrapolarmi le righe contenute fra i tag <pre></pre>?

    Grazie

  2. #2
    Moderatore di Server Apache L'avatar di marketto
    Registrato dal
    Sep 2001
    Messaggi
    5,858
    codice:
    $n = '[\d]{1,3}';
    $p = "#({$n}\.{$n}\.{$n}\.{$n}\:8080)#";
    preg_match_all( $p, $s, $ris );
    print_r( $ris[1] );
    dove $s è la pagina da controllare.

    try
    think simple think ringo

  3. #3
    Originariamente inviato da marketto
    codice:
    $n = '[\d]{1,3}';
    $p = "#({$n}\.{$n}\.{$n}\.{$n}\:8080)#";
    preg_match_all( $p, $s, $ris );
    print_r( $ris[1] );
    dove $s è la pagina da controllare.

    try
    Ti ringrazio infinitamente x l'aiuto !!

Permessi di invio

  • Non puoi inserire discussioni
  • Non puoi inserire repliche
  • Non puoi inserire allegati
  • Non puoi modificare i tuoi messaggi
  •  
Powered by vBulletin® Version 4.2.1
Copyright © 2025 vBulletin Solutions, Inc. All rights reserved.