Google - sistema di memorizzazione dei risultati estratti.

**GENKO** · 17-05-2006, 15:48

Ciao,
mi stavo chiedendo in che maniera vengono visualizzati i risultati di una ricerca su Google.
Cioè, quando la query di ricerca ha estratto il risultato dal DB, come viene mantenuto il risultato della ricerca fatta per permetterci di andare di pagina in pagina, attraverso una o più tabelle temporanee in cui vengono depositati i records ed esposti tramite paginanzioni successive?.
E se avviene questo, quando vengono cancellati questi record dalle tabelle temporanee?

Ciao.

**Ferro9** · 17-05-2006, 17:40

A parte la diversa utilità delle risposte (non mi è chiaro a cosa ti possa servire questa informazione) la tua domanda ha le stesse probabilità di avere una risposta della mia su che numeri usciranno al lotto domenica prossima...

I codici utilizzati da Google sono probabilmente uno dei maggiori segreti industriali in tutto il mondo, e non credo che nessuno sia disposto a rivelarne una seppur minima ed inutile parte.

**GENKO** · 18-05-2006, 09:07

Era semplicemente una domanda tecnica dettata dalla curiosità, un argomento di discussione come un'altro.
Aldilà dei processi di funzionamento di Google, credo che la maggioranza dei motori di ricerca attuali adotti grosso modo le stesse metodologie tecniche di funzionamento.
E comunque posso essere quasi certo senza tuffarmi in oceani di fantascienza che tutte le ricerche/estrazioni vengano memorizzate su tabelle temporanee, probabilmente storicizzate su altre macchine in contemporanea o in seconda fase per successive analisi e statistiche, nel contempo credo che le tabelle temporanee vengano aggiornate costantemente per permettere un flusso costante di immagazzinamento dei record estratti dalle ricerche.
In ogni caso se la mia domanda non ha senso, la tua risposta ne ha ancora di meno.

Saluti.

**Ferro9** · 18-05-2006, 09:21

Perdona se ho offeso la tua intelligenza, ma continuo a ritenere fuorviante una domanda del genere. Questo è il tipo di post che, a mio parere, genera leggende metropolitane e "certezze" dure a morire: è scritto bene, in linguaggio simil-tecnico, e molti prenderanno per verità le assunzioni che vi vengono fatte.

Così non è, e visto che sono un realista cerco sempre di far capire a chi si avvicina per la prima volta ai temi del SEO che molte discussioni spesso si basano su premesse ipotetiche, e che le teorie costruite su basi ipotetiche hanno la stessa validità del sale di Wanna Marchi.

Poi ciascuno è libero di pensarla a suo modo, ma permetterai che esprima la mia opinione.

**GENKO** · 18-05-2006, 09:42

l'ipotesi è parte della natura umana e da essa proviene anche google e tutti i suoi fantascientifici sistemi di funzionamento, mentre la supposizione è la madre di tutte le cazzate.
Le mie sono ipotesi/opinioni tecniche e come tali devono essere prese in considerazione, non riesco a capire perchè non si possono fare ipotesi sui sistemi di funzionamento di Google, non stò contestando il Vangelo, ognuno è libero di esprime il suo pensiero tecnico in relazione anche se errato.
Questo post come molti altri lo vedo semplicemente come un occasione di accrescere la propria conoscenza tecnica sulle possibili metodologie di funzionamento di un motore di ricerca.

**CiodoF** · 18-05-2006, 11:28

Tu come l'implementeresti?

A volte la più semplice deduzione è la base di partenza dalla quale implementare un algoritmo complesso e cazzuto.

**GENKO** · 18-05-2006, 13:53

Tu come l'implementeresti?

mi manca il soggetto...
che cosa?

**annunciaaa** · 18-05-2006, 14:25

Penso che ogni volta GG riproponga la query di ricerca variando semplicemente l'offset dei risultati tra pagina e pagina. Quindi niente tabelle temporanee.

**GENKO** · 18-05-2006, 14:57

Non riesco a capire con la frase : variando semplicemente l'offset dei risultati tra pagina e pagina, cosa intedi dire.
Mettiamo che uno ricerchi la parola >guerra civile< su Google, tu pensi che possa esistere una query del tipo select * from tabellaxxx where search_key like %guerra%civile% e la cui estrazione venga disposta al volo tramite paginazioni.....non credo.

Ti riporto come fà Google ad eseguire una ricerca:

tratto da http://www.google.com/librariancente...s/0512_01.html
-------------------------------------------------------------------
In order to present and score the results, Google need to do two things:

Find the set of pages that contain the user's query somewhere
Rank the matching pages in order of relevance
We've developed an interesting trick that speeds up the first step: instead of storing the entire index on one very powerful computer, Google uses hundreds of computers to do the job. Because the task is divided among many machines, the answer can be found much faster. To illustrate, let's suppose an index for a book was 30 pages long. If one person had to search for several pieces of information in the index, it would take at least several seconds for each search. But what if you gave each page of the index to a different person? Thirty people could search their portions of the index much more quickly than one person could search the entire index alone. Similarly, Google splits its data between many machines to find matching documents faster.

How do we find pages that contain the user's query? Let's return to our civil war example. The word "civil" was in documents 3, 8, 22, 56, 68, and 92; the word "war" was in documents 2, 8, 15, 22, 68, and 77. Let's write the documents across the page and look for those with both words.

civil 3 8 22 56 68 92
war 2 8 15 22 68 77
both words 8 22 68

Arranging the documents this way makes clear that the words "civil" and "war" appear in three documents (8, 22, and 68). The list of documents that contain a word is called a "posting list," and looking for documents with both words is called "intersecting a posting list." (A fast way to intersect two posting lists is to walk down both at the same time. If one list skips from 22 to 68, you can skip ahead to document 68 on the other list as well.)

Ranking Results
Now we have the set of pages that contain the user's query somewhere, and it's time to rank them in terms of relevance. Google uses many factors in ranking. Of these, the PageRank algorithm might be the best known. PageRank evaluates two things: how many links there are to a web page from other pages, and the quality of the linking sites. With PageRank, five or six high-quality links from websites such as www.cnn.com and www.nytimes.com would be valued much more highly than twice as many links from less reputable or established sites.

But we use many factors besides PageRank. For example, if a document contains the words "civil" and "war" right next to each other, it might be more relevant than a document discussing the Revolutionary War that happens to use the word "civil" somewhere else on the page. Also, if a page includes the words "civil war" in its title, that's a hint that it might be more relevant than a document with the title "19th Century American Clothing." In the same way, if the words "civil war" appear several times throughout the page, that page is more likely to be about the civil war than if the words only appear once.

As a rule, Google tries to find pages that are both reputable and relevant. If two pages appear to have roughly the same amount of information matching a given query, we'll usually try to pick the page that more trusted websites have chosen to link to. Still, we'll often elevate a page with fewer links or lower PageRank if other signals suggest that the page is more relevant. For example, a web page dedicated entirely to the civil war is often more useful than an article that mentions the civil war in passing, even if the article is part of a reputable site such as Time.com.

Once we've made a list of documents and their scores, we take the documents with the highest scores as the best matches. Google does a little bit of extra work to try to show snippets – a few sentences – from each document that highlight the words that a user typed. Then we return the ranked URLs and the snippets to the user as results pages.

As you can see, running a search engine takes a lot of computing resources. For each search that someone types in, over 500 computers may work together to find the best documents, and it all happens in under half a second.
-----------------------------------------------------------------------------------

**annunciaaa** · 18-05-2006, 15:28

Questo è quello che penso...

Se faccio una ricerca tipo:

http://www.google.it/search?hl=it&q=forum+html&meta=

che equivale a:

http://www.google.it/search?q=forum+...=&start=0&sa=N

a gg interessa il parametro q= ed il parametro start= (oltre che l'iniziale hl=nazione)

Se io vario start e lo metto =0 o =6 GG parte da quel punto.

Se copio ed incollo su un altro browser GG mi dà sempre gli stessi risultati.
Ergo sembra che GG ogni volta processi la stringa che gli arriva dal browser.

Poi posso anche sbagliarmi...ma l'impressione è quella

Poi che analizzi i parametri e restituisca i risultati in base al pr o quello che vuoi è un discorso che fa anche alla prima ricerca.

Discussione: Google - sistema di memorizzazione dei risultati estratti.

Strumenti discussione

Ricerca discussione

Visualizza

Google - sistema di memorizzazione dei risultati estratti.

Permessi di invio