Ho iniziato nel creare lo spider che rileva gli URL presenti in una determinata pagina.
Quindi ho trovato parte di un codice che fa al caso mio su http://www.orebla.it/forum/viewtopic...art&view=print
Poi ho sostituito split() con esplode().
Codice PHP:
<?
ini_set('error_reporting', E_ALL);
ini_set("display_errors", 1);
function get_links($content,$url) {
$found=array();
$l=0;
for ($i=0; $i<count($content); $i++) {
$line=explode(" ",$content[$i]);
for ($j=0; $j<count($line); $j++) {
if (strstr($line[$j],"<a") && strstr($line[$j+1],"href=")) {
$link=explode('"',$line[$j+1]);
if (!strstr($link[1],"http://")) {
$fields=explode("/",$url);
$root="http:/";
for ($k=2; $k<count($fields)-1; $k++)
$root=$root."/".$fields[$k];
$link[1]=$root."/".$link[1];
}
$found[$l++]=$link[1];
}
}
}
return $found;
}
$url="http://www.miositoimmobliliare.com";
//$url=$_POST["url"];
$fp=fopen($url,"r");
if ($fp==null)
die ("Error reading from $url\n");
fclose($fp);
$found_1=get_links(file($url),$url);
print "<h2>Links found inside $url:</h2>\n";
print "<ul>\n";
for ($i=0; $i<count($found_1); $i++)
print "[*]<a href=\"$found_1[$i]\">$found_1[$i]</a>\n";
print "[/list]";
for ($i=0; $i<count($found_1); $i++) {
$url=$found_1[$i];
$found_2=get_links(file($url),$url);
print "<h2>Links found inside $url:</h2>\n";
print "<ul>\n";
for ($j=0; $j<count($found_2); $j++)
print "[*]<a href=\"$found_2[$j]\">$found_2[$j]</a>\n";
print "[/list]";
}
?>
Purtroppo eseguendo lo script ottengo diversi Warning:
codice:
Warning: file(): php_network_getaddresses: getaddrinfo failed: nodename nor servname provided, or not known in /Applications/MAMP/htdocs/magellano/get_links.php on line 61
Cosa significa??????