GYM search engines
- index the most data and provide reliable sub-second responses
- provide increase quality answers and rand and present results
- respond quickly to changes in content
- eliminate duplication, dead links and off topic spam
- speed
- politeness
- exclusion of content
- avoids duplication of content
- does a continuous crawl
- rejects spam
Part 2
An inverted file is used to rapidly identify terms in a search. The file can be inverted in 2 ways:
- scanning--the text of the document is scanned
- for each indexible term--aposting is created with document numbers and term numbers. This is put into a temporary file in document number order.
Indexes store additional information in posting.
Query process algorithms look up each term in term dictionary and locate posting lists. This returns only the documents with query words.
Query speed up searches by:
- skipping
- early termination
- clever assignment of document numbers
- caching
1 comment:
Hi Lori,
I also didn't get the OAI information that we read. It was a lot of stuff that I had no idea existed.
The search engine parts were kind of interesting to read about. How they acquired their information, and how they are not supposed to access some websites if that is posted somewhere on the site.
It was good meeting you last weekend. Can't believe we just got home last Sunday. Seems like it was a long time ago.
See you later.
Post a Comment