Google e nuovi bot

**MarcoTuscany** · 29-09-2004, 08:40

Posto pari pari da un forum internazionale, dove si parla appunto di nuovi bot di Google che stanno apparendo. Ve lo traducete da voi.....please.(fonte:WebProNews.com)

Did Google Unleash Additional Googlebots?

Apparently, Google has begun using another spider in their scanning and indexing of web sites. News of a second Googlebot was discovered by a number of site owners who, while studying their site logs, noticed two Google spiders; with different IP address ranges; visited and scanned their respective sites.

Have you had visits from more than one Googlebot? Discuss at WebProWorld.

News of the additional Googlebot was first noticed on the DigitalPoint forums, posted by digitalpoint himself. In his post, digital noticed that two Googlebots had visited his site and that each one had different IP addresses:

"The normal one:

66.249.64.47 - - [15/Sep/2004:18:59:12 -0700] "GET /robots.txt HTTP/1.0" 404 1227 "-" "Googlebot/2.1 (+http://www.google.com/bot.html)"

and also this one:

66.249.66.129 - - [15/Sep/2004:18:12:51 -0700] "GET / HTTP/1.1" 200 38358 "-" "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)"

Aside from the slightly different user agent, it's also HTTP 1.1. The IP address it uses is an IP block is normally just used for Mediapartners (AdSense spider), but it's spidering a site without any AdSense."

Once this thread was launched, scores of other posters shared their encounters with the second Googlebot. A DigitalPoint member named Redleg also noticed several visits from the new spider and also recorded the IP ranges of the new visitors, "Don't remember the exact IP addresses (about 15-20 of them) but here's the IP ranges: 66.249.78.* 66.249.64.* 66.249.79.*…"

Many who checked their server logs noticed a number of visits from both Googlebots, with various IP ranges. Not only were there numerous visits, but also each bot performed a different kind of crawl than its "partner". Over at the WebmasterWorld forums, a poster named Gomer noticed that one bot performed a complete site crawl while the other did more of a surface-type crawl. According to Gomer:

"The 66.249.64.X series was requesting pages that were fully indexed i.e., they have a page title and description. The 66.249.65.X series was requesting pages that were only partially indexed… In my case, the 66.249.65.X were pages that exist on my server but I am trying to get Googlebot to stop indexing."

As the realization of an additional Googlebot set in, speculation began concerning the motive of having two bots performing site scans. Because Google likes to keep the business concerning their search index, spiders, and anything having to with their search engine under tight wraps, educated guesses are all that can be done.

Brett Tabke posted an interesting thought concerning Google's extensive crawling, "looks like "panic" based spidering… as if an index needs to be rebuilt from the ground up in a short time period (aka: the old index didn't work)." Another member believed these scans are apart of the PR re-calculation for the next PageRank update. Another poster, idoc, had also had an intriguing look at Google actions:

"I expect a lot of cloaking and redirect sites will be dropped soon from these new bot IPs and this crawl. It's what I had in mind in the post about hijacks when I said I think Google is on it. They have been asking for file paths and filenames with extensions I have never used before. I am hopeful anyway."

Longtime WMW poster claus suggested that these events might be because Google is preparing a new datacenter, while others thought the index may contain a glitch. However, Liane, in agreeing with Brett about these deep crawls being out of the ordinary. She stated, "Something must be causing this feeding frenzy and it wouldn't surprise me if there was a glitch with the index. Google went nuts every day this past week on my site, but in the last 24 hours… only one hit. Never had that before. Not that I can remember anyway… I smell a "major" update in the offing... once they get things sorted."

As it stands, the reasons behind Google's scanning efforts are unknown. The only things that are certain is they are using more than one crawler and that at least one of them performs a complete site scan. Is Google repopulating their index, or are they hunting out cloaked/doorway pages? Or are they finally getting around to doing another PR update? Like so many others have said, time will tell.

**beke** · 29-09-2004, 12:34

Dai un'occhiata alla pagina di stats del mio primo esperimento.

**xnavigator** · 29-09-2004, 12:43

Originariamente inviato da MarcoTuscany
Ve lo traducete da voi.....please.

si si ... traducetelo per favore.. anche solo il succo

**maniladisco** · 30-09-2004, 03:28

Tradotto da GG (un po alla buona...)

Google Ha liberato Googlebots Supplementare?

Apparentemente, Google ha cominciato a usando un altro ragno nel loro esame ed indexing dei luoghi di fotoricettore. Le notizie di un secondo Googlebot sono state scoperte da un certo numero di proprietari del luogo che, mentre studiando i loro ceppi di luogo, notati due ragni di Google; con differenti gamme di IP ADDRESS; visitato ed esplorato i loro luoghi rispettivi.

Avete avuti chiamate da più di un Googlebot? Discuta a WebProWorld.

Le notizie del Googlebot supplementare in primo luogo sono state notate sulle tribune di DigitalPoint, inviate da digitalpoint egli stesso. In suo alberino, che due Googlebots avevano visitato il suo luogo e che ogni ha avuto differente il IP notato digitale richiama:

"quello normale:

66,249,64,47 - - [ 15/Sep/2004:18:59:12 -0700 ] "OTTENGONO/robots.txt HTTP/1.0" 404 1227 "-" "Googlebot/2.1 (+ http://www.google.com/bot.html )"

ed anche questo:

66,249,66,129 - - [ 15/Sep/2004:18:12:51 -0700 ] "OTTENGONO/HTTP/1.1" 200 38358 "-" "Mozilla/5.0 (compatibili; Googlebot/2.1; + http://www.google.com/bot.html )"

Oltre all'agente un po'differente dell'utente, è inoltre HTTP 1,1. Il IP ADDRESS che usa è un blocchetto del IP è normalmente giusto usato per Mediapartners (ragno di AdSense), ma spidering un luogo senza alcun AdSense."

Una volta che questo filetto fosse lanciato, massa di altri manifesti ha ripartito il loro incontro con il secondo Googlebot. Un membro di DigitalPoint chiamato Redleg inoltre ha notato parecchie chiamate dal nuovo ragno ed inoltre ha registrato le gamme del IP di nuovi ospiti, "non si ricorda degli indirizzi esatti del IP (circa 15-20 di loro) ma qui è le gamme del IP: 66,249,78. * 66,249,64. * 66,249,79. *?"

Molti che controllassero i loro ceppi dell'assistente hanno notato un certo numero di chiamate dagli entrambi Googlebots, con le varie gamme del IP. Era non soltanto ci chiamate numerose, ma inoltre ogni BOT ha effettuato un genere differente di crawl che il relativo "socio". L'eccedenza alle tribune di WebmasterWorld, un manifesto ha chiamato Gomer ha notato che un BOT ha effettuato un crawl completo del luogo mentre l'altro ha fatto più di un crawl del superficie-tipo. Secondo Gomer:

"la serie 66.249.64.X stava chiedendo le pagine che completamente sono state spostate ad incrementi cioè, ha un titolo e una descrizione della pagina. La serie 66.249.65.X stava chiedendo le pagine che parzialmente sono state spostate ad incrementi soltanto? Nel mio caso, i 66.249.65.X erano pagine che esistono sul mio assistente ma sto provando a convincere Googlebot per smettere di spostare ad incrementi."

Mentre la realizzazione di un Googlebot supplementare si è regolata dentro, la speculazione ha cominciato ad interessare il motivo di avere due bots effettuare le esplorazioni del luogo. Poiché Google gradisce mantenere il commercio riguardo al al loro indice, ragni e qualche cosa di ricerca deve con il loro motore di ricerca sotto gli involucri stretti, le congetture istruite sono tutte che possano essere fatte.

Brett Tabke ha inviato un pensiero interessante riguardo a vasto strisciare del Google, "assomigli a spidering basato" di panico "? come se un indice debba essere ricostruito dalla terra in su in poco tempo un periodo (aka: il vecchio indice non lavoro)." Un altro membro ha creduto che queste esplorazioni fossero diverse del re-calculation del fotoricettore per l'aggiornamento seguente di PageRank. Un altro manifesto, idoc, inoltre aveva avuto uno sguardo intrigante alle azioni di Google:

"prevedo il a.lot di cloaking e rioriento i luoghi sarò caduto presto dai questi nuovo BOT IPS e questo crawl. È che cosa ho avuto in mente nell'alberino circa i dirottamenti quando ho detto penso che Google sia su esso. Stanno chiedendo i percorsi ed i nomi di schedario della lima con le estensioni che non ho usato mai prima. Sono comunque promettente."

Il manifesto Claus di Longtime WMW ha suggerito che questi eventi potrebbero essere perché Google sta preparando un nuovo datacenter, mentre altri hanno pensato l'indice può contenere un impulso errato. Tuttavia, Liane, nel essere d'accordo con Brett circa questi crawls profondi che sono dall'ordinario. Ha dichiato, "qualcosa deve causare questo frenzy d'alimentazione e non lo sorpreserebbe se ci fosse un impulso errato con l'indice. Google è andato dadi ogni giorno questa settimana passata sul mio luogo, ma nelle ultime 24 ore? soltanto uno ha colpito. Non ha avuto mai prima quello. Non che posso ricordarsi di comunque? Sento l'odore di un aggiornamento "principale" in vista... una volta che ottengono le cose fascicolate."

Mentre si leva in piedi, i motivi dietro gli sforzi di esame del Google sono sconosciuti. Le uniche cose che sono determinate sono stanno usando più di un cingolo e che almeno uno di loro effettua un'esplorazione completa del luogo. Google repopulating il loro indice, o sono hunting verso l'esterno le pagine di cloaked/doorway? O infine stanno ottenendo intorno a fare un altro aggiornamento del fotoricettore? Come tanti altri hanno detto, chi vivrà vedrà.

**chisono** · 06-04-2005, 17:19

ma dai, tra poco diranno che se google cambia il logo in determinati giorni dell'anno è perchè vuole bloccare i SEO

:maLOL:

Mi sembra evidente perchè è stato adottato l'agent mozilla:

- evitare i layer fuori dimensione
- ditinguere meglio le immagini ed i video per il passaggio del googlebot-image
- controllare i javascript

ovviamente si sono resi conto che archiviare una sito come lo vedrebbe davvero un utente è meglio, e siccome mozilla 5.0 è l'agent + diffuso tra i browser hanno adottato la stessa piattaforma.

il discorso adsense non regge, per niente.

**key** · 06-04-2005, 17:41

The new Google spider uses a slightly different user agent: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)".

This means that Googlebot now also accepts the HTTP 1.1 protocol. The new spider might be able to understand more content formats, including compressed HTML.
Why does Google do this?

Google hasn't revealed the reason for it yet. There are two main theories:

The first theory is that Google uses the new spider to spot web sites that use cloaking, JavaScript redirects and other dubious web site optimization techniques. As the new spider seems to be more powerful than the old spider, this sounds plausible.

The second theory is that Google's extensive crawling might be a panic reaction because the index needs to be rebuilt from the ground up in a short time period. The reason for this might be that the old index contains too many spam pages.
What does this mean to your web site?

If you use questionable techniques such as cloaking or JavaScript redirects, you might get into trouble. If Google really uses the new spider to detect spamming web sites, it's likely that these sites will be banned from the index.

To obtain long-term results on search engines, it's better to use ethical search engine optimization methods. General information about Google's web page spider can be found here.

It's likely that the new spider announces a major Google update. We'll have to see what this means in detail.

**chisono** · 06-04-2005, 18:01

[supersaibal]Originariamente inviato da key
The new Google spider uses a slightly different user agent: "Mozilla/5.0 (compatible; Googlebot/2.1; +http://www.google.com/bot.html)".

This means that Googlebot now also accepts the HTTP 1.1 protocol. The new spider might be able to understand more content formats, including compressed HTML.
Why does Google do this? [/supersaibal]

e su questo siamo tutti daccordo

Google hasn't revealed the reason for it yet. There are two main theories:

The first theory is that Google uses the new spider to spot web sites that use cloaking, JavaScript redirects and other dubious web site optimization techniques. As the new spider seems to be more powerful than the old spider, this sounds plausible.

quello che ho detto prima "rilevare i javascript"

The second theory is that Google's extensive crawling might be a panic reaction because the index needs to be rebuilt from the ground up in a short time period. The reason for this might be that the old index contains too many spam pages.
What does this mean to your web site?

questa invece sà di fantascienza, cambiare il protocollo dell'indicizzazione per vedere quali sono le pagine che si adeguano e quali no :maLOL:
ma a loro basterebbe fare un controllo sulla cache (che tra l'altro già fanno) per vedere quali sono le pagine che non sono + aggiornate.

If you use questionable techniques such as cloaking or JavaScript redirects, you might get into trouble. If Google really uses the new spider to detect spamming web sites, it's likely that these sites will be banned from the index.

ed anche questa è una cretinata, ormai il cloaking si fà lato server non ci sono + bisogno di js.

**key** · 06-04-2005, 18:07

QUANTI LO SANNO o lo usano sul server...e non sulla pagina?

**chisono** · 06-04-2005, 18:08

[supersaibal]Originariamente inviato da key
QUANTI LO SANNO o lo usano sul server...e non sulla pagina? [/supersaibal]

dai key

non fare lo gnorri

tutti quelli che sono primi lo fanno lato server

e questo puo' bastare

**key** · 06-04-2005, 18:20

x un banno x js o x un lato server?
A GG non gli interessa "fare" ,interessa + che si parla...lui se vuole ,non passa da nessuna parte.
a 8miliardi di cache...puo campare 100 anni se li fa ruotare

Discussione: Google e nuovi bot

Strumenti discussione

Ricerca discussione

Visualizza

Google e nuovi bot

Re: Google e nuovi bot

chisono potresti avere ragione...ma leggi anche questo

Re: chisono potresti avere ragione...ma leggi anche questo

cloaking si fà lato server .....sono di + o di meno degli js?

Re: cloaking si fà lato server .....sono di + o di meno degli js?

si ma chi fa piu pbblicita indiretta uno che viene sul forum e piange....

Permessi di invio