| |
You can search for any word or phrase on a Web site by typing
the word or phrase into a query form and clicking the button to execute
the query (for example, the Execute Query button on the sample query form).
This section covers the following topics:
Searches produce a list of files that contain the word or
phrase no matter where they appear in the text. This list gives the rules
for formulating queries:
- Consecutive words are treated as a phrase; they must
appear in the same order within a matching document.
- Queries are case-insensitive, so you can type your
query in uppercase or lowercase.
- You can search for any word except for those in the
exception list (for English, this includes a, an,
and, as, and other common words), which are ignored
during a search.
- Words in the exception list are treated as placeholders
in phrase and proximity queries. For example, if you searched for Word
for Windows, the results could give you Word for Windows
and Word and Windows, because for is a noise word
and appears in the exception list.
- Punctuation marks such as the period (.), colon (:),
semicolon (;), and comma (,) are ignored during a search.
- To use specially treated characters such as &,
|, ^, #, @, $, (, ), in a query, enclose your query in quotation marks
().
- To search for a word or phrase containing quotation
marks, enclose the entire phrase in quotation marks and then double
the quotation marks around the word or words you want to surround with
quotes. For example, World-Wide Web or Web
searches for World-Wide Web or Web.
- You can insert Boolean operators
(AND, OR, and NOT)
and the proximity operator (NEAR)
to specify additional search information.
- The wildcard character (*)
can match words with a given prefix. The query esc* matches the terms
ESC, escape, and so on.
- Free-text queries can
be specified without regard to query syntax.
- Vector space queries can
be specified.
- ActiveX (OLE) and file attribute property
value queries can be issued.
Boolean and
proximity operators can create a more precise query.
| To Search
For |
Example |
Results |
| Both terms in the same page |
access and basic
Or
access & basic |
Pages with both the words access and basic
|
| Either term in a page |
cgi or isapi
Or
cgi | isapi |
Pages with the words cgi or isapi
|
| The first term without the second term |
access and not
basic
Or
access & ! basic |
Pages with the word access but not basic
|
| Pages not matching a property value |
not @size = 100
Or
! @size = 100 |
Pages that are not 100 bytes |
| Both terms in the same page, close together |
excel near project
Or
excel ~ project |
Pages with the word excel near the word
project |
Hints:
- You can
add parentheses to nest expressions within a query. The expressions
in parentheses are evaluated before the rest of the query.
- Use double
quotes () to indicate that a Boolean or NEAR
operator keyword should be ignored in your query. For example, Abbott
and Costello will match pages with the phrase, not pages that
match the Boolean expression. In addition to being an operator, the
word and is a noise word in English.
- The
NEAR operator is similar to the AND operator
in that NEAR returns a match if both words being searched
for are in the same page. However, the NEAR operator
differs from AND because the rank assigned by NEAR
depends on the proximity of words. That is, the rank of a page with
the searched-for words closer together is greater than or equal to the
rank of a page where the words are farther apart. If the searched-for
words are more than 50 words apart, they are not considered near enough,
and the page is assigned a rank of zero.
- The NOT
operator can be used only after an AND operator in
content queries; it can be used only to exclude pages that match a previous
content restriction. For property value queries, the NOT
operator can be used apart from the AND operator.
- The AND
operator has a higher precedence than OR. For example,
the first three queries are equal, but the fourth is not:a AND b OR
c
c OR a AND b
c OR (a AND b)
(c OR a) AND b
Note The
symbols (&, |, !, ~) and the English keywords AND,
OR, NOT, and NEAR work
the same way in all languages supported by Index Server. Localized keywords
are also available when the browser locale is set to one of the following
six languages:
| Language |
Keywords |
| German |
UND, ODER, NICHT,
NAH |
| French |
ET, OU, SANS,
PRES |
| Spanish |
Y, O, NO,
CERCA |
| Dutch |
EN, OF, NIET,
NABIJ |
| Swedish |
OCH, ELLER, INTE,
NÄRA |
| Italian |
E, O, NO, VICINO |
Note The
NEAR operator can be applied only to words or phrases.
Wildcard operators help you find pages
containing words similar to a given word.
The query engine finds pages
that best match the words and phrases in a free-text query. This is done
by automatically finding pages that match the meaning, not the exact wording,
of the query. Boolean, proximity, and wildcard operators are ignored within
a free-text query. Free-text queries are prefixed with $contents.
The query engine supports vector space queries. Vector
queries return pages that match a list of words and phrases. The rank
of each page indicates how well the page matched the query.
| To
Search For |
Example |
Results |
| Pages
that contain specific words |
light,
bulb |
Files
with words that best match the words being searched for |
| Pages
that contain weighted prefixes, words, and phrases |
invent*,
light[50], bulb[10], "light bulb"[400] |
Files
that contain words prefixed by invent, the words light,
bulb, and the phrase light bulb (the terms
are weighted) |
- Components in vector queries are separated by commas.
- Components in vector queries can be weighted by using
the [weight] syntax.
- Pages returned by vector queries do not necessarily
match every term in the query.
- Vector queries work best when the results are sorted
by rank.
With property value queries, you can find files that have
property values that match a given criteria. The properties over which
you can query include basic file information like file name and file size,
and ActiveX properties including the document summary (information) that
is stored in files created by ActiveX-aware applications.
There are two types of property queries:
- Relational property
queries consist of an at character
(@), a property name,
a relational operator, and a property
value. For example, to find all of the files larger than one million
bytes, issue the query @size > 1000000.
- Regular expression property queries consist
of a number sign (#), a property name, and a regular
expression for the property value. For example, to find to find
all of the video (.avi) files, issue the query #filename *.avi. Regular
expressions will never match the special properties contents (#contents)
and all (#all). Properties that are not retrievable at query time cannot
be used in # queries. these include HTML META properties not stored
in the property cache.
This section covers the following topics:
Property names are preceded by either the at
(@) or number sign (#) character. Use @ for relational queries, and #
for regular expression queries.
If no property name is specified, @contents is
assumed.
Properties available for all files include:
| Property
Name |
Description |
| All |
Matches words, phrases, and any property |
| Contents |
Words and phrases in the file |
| Filename |
Name of the file |
| Size |
File size |
| Write |
Last time the file was modified |
ActiveX property values can also be used in queries. Web
sites with files created by most ActiveX-aware applications can be queried
for these properties:
| Property
Name |
Description |
| DocTitle |
Title
of the document |
| DocSubject |
Subject
of the document |
| DocAuthor |
The
documents author |
| DocKeywords |
Keywords
for the document |
| DocComments |
Comments
about the document |
For a complete list of property names, see the List
of Property Names later on this page.
Relational operators are used in relational property queries.
| To
Search For |
Example |
Results |
| Property
values in relation to a fixed value |
@size
< 100
@size <= 100
@size = 100
@size != 100
@size >= 100
@size > 100 |
Files
whose size matches the query |
| Property
values with all of a set of bits on |
@attrib
^a 0x820 |
Compressed
files with the archive bit on |
| Property
values with some of a set of bits on |
@attrib
^s 0x20 |
Files
with the archive bit on |
| To
Search For |
Example |
Results |
| A
specific value |
@DocAuthor
= Bill Barnes |
Files
authored by Bill Barnes |
| Values
beginning with a prefix |
#DocAuthor
George* |
Files
whose author property begins with George |
| Files
with any of a set of extensions |
#filename
*.|(exe|,dll|,sys|) |
Files
with .exe, .dll, or .sys extensions |
| Files
modified after a certain date |
@write
> 96/2/14 10:00:00 |
Files
modified after February 14, 1996 at 10:00 GMT |
| Files
modified after a relative date |
@write
> -1d2h |
Files
modified in the last 26 hours |
| Vectors
matching a vector |
@vectorprop
= { 10, 15, 20 } |
ActiveX
documents with a vectorprop value of { 10, 15, 20 } |
| Vectors
where each value matches a criteria |
@vectorprop
>^a 15 |
ActiveX
documents with a vectorprop value in which all values in the vector
are greater than 15 |
| Vectors
where at least one value matches a criteria |
@vectorprop
=^s 15 |
ActiveX
documents with a vectorprop value in which at least one value is 15 |
- Be sure
to use the pound (#) character before the property name when using a
regular expression in a property value, and an at (@) character
otherwise. The equal (=) relational operator is assumed for regular-expression
queries.
- File name
(#filename) is the only property that efficiently supports regular expressions
with wildcards to the left of text.
- Date and
time values are of the form yyyy/mm/dd hh:mm:ss or yyyy-mm-dd
hh:mm:ss. The first two characters of the year and the entire time
can be omitted. If you omit the first two characters of the year, then
29 or less is interpreted as the year 2000, and 30 or greater is interpreted
as the year 1900. All dates and times are in Greenwich Mean Time (GMT).
- Dates
and times relative to the current time can be expressed with a minus
(-) character followed by zero or by more integer unit and time unit
pairs. Time units are expressed as: (y) for years, (m) for months, (w)
for weeks, (d) for days, (h) for hours, (n) for minutes, and (s) for
seconds. A three-digit millisecond value can be optionally specified
after the seconds value in date expressions. For example, 1997/12/8
10:10:03:452
- Currency
values are of the form x.y, where x is the whole value
amount and y is the fractional amount. There is no assumption
about units.
- Boolean
values are (t) or (true) for TRUE and (f) or (false)
for FALSE.
- Vectors
(VT_VECTOR) are expressed as an opening brace ({), followed by a comma-separated
list of values, then a closing brace (}).
- Single-value
expressions that are compared against vectors are expressed as a relational
operator, then a (^a) for all of or a (^s) for some
of.
- Numeric
values can be in decimal or hexadecimal (preceded by 0x).
- The contents
property does not support relational operators. If a relational operator
is specified, no results will be found. For example, @contents Microsoft
will find documents containing Microsoft, but @contents=Microsoft
will find none.
Regular expressions in property queries are defined as
follows:
- Any character except asterisk (*), period (.), question
mark (?), and vertical bar (|) defaults to matching just itself.
- Regular expressions can be enclosed in matching quotes
(), and must be enclosed in quotes if they contain a space ( )
or closing parenthesis ()).
- The characters *, ., and ? behave as they behave in
Windows; they match any number of characters, match (.) or end of string,
and match any one character, respectively.
- The character | is an escape character. After |, the
following characters have special meaning:
( opens a group. Must be followed by a matching ).
) closes a group. Must be preceded by a matching (.
[ opens a character class. Must be followed by a matching
(un-escaped) ].
{ opens a counted match. Must be followed by a matching
}.
} closes a counted match. Must be preceded by a matching
{.
, separates OR clauses.
* matches zero or more occurrences of the preceding
expression.
? matches zero or one occurrences of the preceding expression.
+ matches one or more occurrences of the preceding expression.
Anything else, including |, matches itself.
- Between square brackets ([]) the following characters
have special meaning:
^ matches everything but following classes. Must be
the first character.
] matches ]. May only be preceded by ^, otherwise it
closes the class.
- range operator. Preceded and followed by normal characters.
Anything else matches itself (or begins or ends a range
at itself).
- Between curly braces ({}) the following syntax applies:
|{m|} matches exactly m occurrences of the
preceding expression. (0 < m < 256).
|{m,|} matches at least m occurrences of the
preceding expression. (1 < m < 256).
|{m,n|} matches between m and n occurrences
of the preceding expression, inclusive. (0 < m < 256, 0 < n
< 256).
- To match *, ., and ?, enclose them in brackets (for
example, |[*]sample will match *sample).
| Example |
Results |
@size
> 1000000 |
Pages
larger than one million bytes |
@write
> 95/12/23 |
Pages
modified after the date |
Apple
tree |
Pages
with the phrase apple tree |
"apple
tree" |
Same
as above |
@contents
apple tree |
Same
as above |
Microsoft
and @size > 1000000 |
Pages
with the word Microsoft that are larger than one million
bytes |
"microsoft
and @size > 1000000" |
Pages
with the phrase specified (not the same as above) |
#filename
*.avi |
Video
files (the # prefix is used because the query contains a regular expression) |
@attrib
^s 32 |
Pages
with the archive attribute bit on |
@docauthor
= John Smith |
Pages
with the given author |
$contents
why is the sky blue? |
Pages
that match the query |
@size
< 100 & #filename *.gif |
Graphics
Interchange Format (GIF) files less than 100 bytes in size |
These properties are always available for queries. Additional
properties may also be available depending on the configuration of the
Web server.
| Friendly
Name |
Datatype |
Property |
| A_HRef |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML HREF. This property name was created for Microsoft® Site
Server and corresponds with the Index Server property name HtmlHRef.
Can be queried but not retrieved. |
| Access |
VT_FILETIME |
Last
time file was accessed. |
| All |
(not
applicable) |
Searches
every property for a string. Can be queried but not retrieved. |
| AllocSize |
DBTYPE_I8 |
Size
of disk allocation for file. |
| Attrib |
DBTYPE_UI4 |
File
attributes. Documented in Win32 SDK. |
| ClassId |
DBTYPE_GUID |
Class
ID of object, for example, WordPerfect, Word, and so on. |
| Characterization |
DBTYPE_WSTR
| DBTYPE_BYREF |
Characterization,
or abstract, of document. Computed by Index Server. |
| Contents |
(not
applicable) |
Main
contents of file. Can be queried but not retrieved. |
| Create |
VT_FILETIME |
Time
file was created. |
| Directory |
DBTYPE_WSTR
| DBTYPE_BYREF |
Physical
path to the file, not including the file name. |
| DocAppName |
DBTYPE_WSTR
| DBTYPE_BYREF |
Name
of application that created the file. |
| DocAuthor |
DBTYPE_WSTR
| DBTYPE_BYREF |
Author
of document. |
| DocByteCount |
DBTYPE_14 |
Number
of bytes in a document. |
| DocCategory |
DBTYPE_STR
| DBTYPE_BYREF |
Type
of document such as a memo, schedule, or whitepaper. |
| DocCharCount |
DBTYPE_I4 |
Number
of characters in document. |
| DocComments |
DBTYPE_WSTR
| DBTYPE_BYREF |
Comments
about document. |
| DocCompany |
DBTYPE_STR
| DBTYPE_BYREF |
Name
of the company for which the document was written. |
| DocCreatedTm |
VT_FILETIME |
Time
document was created. |
| DocEditTime |
VT_FILETIME |
Total
time spent editing document. |
| DocHiddenCount |
DBTYPE_14 |
Number
of hidden slides in a Microsoft® PowerPoint document. |
| DocKeywords |
DBTYPE_WSTR
| DBTYPE_BYREF |
Document
keywords. |
| DocLastAuthor |
DBTYPE_WSTR
| DBTYPE_BYREF |
Most
recent user who edited document. |
| DocLastPrinted |
VT_FILETIME |
Time
document was last printed. |
| DocLastSavedTm |
VT_FILETIME |
Time
document was last saved. |
| DocLineCount |
DBTYPE_14 |
Number
of lines contained in a document. |
| DocManager |
DBTYPE_STR
| DBTYPE_BYREF |
Name
of the manager of the documents author. |
| DocNoteCount |
DBTYPE_14 |
Number
of pages with notes in a PowerPoint document. |
| DocPageCount |
DBTYPE_I4 |
Number
of pages in document. |
| DocParaCount |
DBTYPE_14 |
Number
of paragraphs in a document. |
| DocPartTitles |
DBTYPE_STR
| DBTYPE_VECTOR |
Names
of document parts. For example, in Excel part titles are the names
of spread sheets, in PowerPoint slide titles, and in Word for Windows
the names of the documents in the master document. |
| DocPresentationTarget |
DBTYPE_STR|DBTYPE_BYREF |
Target
format (35mm, printer, video, and so on) for a presentation in PowerPoint. |
| DocRevNumber |
DBTYPE_WSTR
| DBTYPE_BYREF |
Current
version number of document. |
| DocSlideCount |
DBTYPE_14 |
Number
of slides in a PowerPoint document. |
| DocSubject |
DBTYPE_WSTR
| DBTYPE_BYREF |
Subject
of document. |
| DocTemplate |
DBTYPE_WSTR
| DBTYPE_BYREF |
Name
of template for document. |
| DocTitle |
DBTYPE_WSTR
| DBTYPE_BYREF |
Title
of document. |
| DocWordCount |
DBTYPE_I4 |
Number
of words in document. |
| FileIndex |
DBTYPE_I8 |
Unique
ID of file. |
| FileName |
DBTYPE_WSTR
| DBTYPE_BYREF |
Name
of file. |
| HitCount |
DBTYPE_I4 |
Number
of hits (words matching query) in file. |
| HtmlHRef |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML HREF. Can be queried but not retrieved. |
| HtmlHeading1 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H1. Can be queried but not retrieved. |
| HtmlHeading2 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H2. Can be queried but not retrieved. |
| HtmlHeading3 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H3. Can be queried but not retrieved. |
| HtmlHeading4 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H4. Can be queried but not retrieved. |
| HtmlHeading5 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H5. Can be queried but not retrieved. |
| HtmlHeading6 |
DBTYPE_WSTR
| DBTYPE_BYREF |
Text
of HTML document in style H6. Can be queried but not retrieved. |
| Img_Alt |
DBTYPE_WSTR
| DBTYPE_BYREF |
Alternate
text for <IMG> tags. Can be queried but not retrieved. |
| Path |
DBTYPE_WSTR
| DBTYPE_BYREF |
Full
physical path to file, including file name. |
| Rank |
DBTYPE_I4 |
Rank
of row. Ranges from 0 to 1000. Larger numbers indicate better matches. |
| RankVector |
DBTYPE_I4
| DBTYPE_VECTOR |
Ranks
of individual components of a vector query. |
| ShortFileName |
DBTYPE_WSTR
| DBTYPE_BYREF |
Short
(8.3) file name. |
| Size |
DBTYPE_I8 |
Size
of file, in bytes. |
| USN |
DBTYPE_I8 |
Update
Sequence Number. NTFS drives only. |
| VPath |
DBTYPE_WSTR
| DBTYPE_BYREF |
Full
virtual path to file, including file name. If more than one possible
path, then the best match for the specific query is chosen. |
| WorkId |
DBTYPE_I4 |
Internal
ID for file. Used within Index Server. |
| Write |
VT_FILETIME |
Last
time file was written. |
To define
properties that are not in the previous list, you must list them in a
[Names] section in the .idq file. To use these properties in a restriction,
sort specification, or as a retrieved column, you have define them in
the .idq file, using the following format:
[Names]
#Properties that are not in the standard list
Propertyname ( Datatype ) = GUID ["Name"
| propid]
In the syntax,
"Name" is the property name ("Sales" in the
following example), and propid is the property ID in hexadecimal.
Note that you need to surround the friendly name with quotation marks,
but the property ID does not take quotation marks.
For example,
suppose you want to define an HTML meta tag as a property name that somebody
can search for. The property you want to define is Sales.
To define
the Sales property
- In the
.idq file, under the [Names] section, add the following line.
MetaDescription(DBTYPE_WSTR)
= d1b5d3f0-c0b3-11cf-9a92-00a0c908dbf1 "Sales"
The GUID
number comes from the MetaTagClsid parameter in the
registry, at the following location:
HKEY_LOCAL_MACHINE
\SYSTEM
\CurrentControlSet
\Control
\HtmlFilter
\MetaTagClsid
- Then,
in the HTML files where you want the tag to appear, define the meta
description.
For example,
say you want to search for all files that give sales projections for
the future:
In File1.htm:
<META
NAME="Sales" CONTENT="Projections for 1998">
In File2.htm:
<META
NAME="Sales" CONTENT="Projections for 1999">
In File3.htm:
<META
NAME="Sales" CONTENT="Sales in 1997">
Note Be
sure to add your META NAME tags between the <head> and </head>
HTML tags at the beginning of the file.
You can now
search for all files that show sales projections. Send the following query:
@metadescription
projections
This query
returns all the files with the word projections in the CONTENT
field of the meta tag. In this example, File1.htm and File2.htm are returned.
But suppose
you want to search for sales by year, for example a list of sales in 1997.
Send the following query:
@metadescription
1997
File3.htm
is returned.
©
1997 by Microsoft Corporation. All rights reserved.
|
|