Search manual

The VOICE CLARIAH search manual gives an overview of search functions supported by VOICE 3.0 Online BETA. The enhanced webtool developed for VOICE 3.0 Online BETA also displays the CQL of search queries performed below the search field. If you encounter any difficulties with search queries performed, please save its CQL and share it with the VOICE CLARIAH team via the VOICE CLARIAH survey.

1. Token search

2. POS and lemma search

3. Mark-up search – NEW!

4. Phrase search – partly NEW!

5. Expert search – NEW!

6. Examples of combined searches – NEW!

Useful links

1. Token search

SEARCH

EXPLANATION

Example Search

Finds

1. TOKEN SEARCH

 

 

 

General remark

Tokens are to be searched for with lower case characters (e.g. i speak french). (Capital letters indicate a POS search, see below.)

1.1. Simple token search

token

Search for a particular token

 

Contracted forms (e.g. wanna, gonna, don’t, it’s) need to be searched for with a space inserted before the contracted part.

manage

manage

wan na

wanna

do nt

dont

1.2. Token search with wildcards

General remark

Note to users of previous versions of VOICE Online: the syntax of wildcard search has changed in VOICE 3.0. As a general rule, you now need to insert a full stop before any wildcard to obtain the search results you are familiar with from previous versions of VOICE Online, e.g. .*, .+ or .? (see examples in this section)

 

token.* (no space)

Token plus zero or more characters

 

 

.*ment

 

department

environment

segment

manage.*

manage

MANAGED

manager

management

token.? (no space)

Token plus zero or one character

behave.?

behave

behaves

token.+ (no space)

Token plus one or more characters

house.+

                 

houses

household

 

2. POS and lemma search

2. POS AND LEMMA SEARCH

 

 

 

General remark

POS tags are searched for with capital letters, e.g. VVP. (Lower case characters are used for token searches, see above).

Also, note that all tokens in VOICE are tagged with POS tags for morphological form and, in parentheses, syntactic function. These are often, though not always, identical. If a tag is entered without specification, both positions are searched. Alternatively, form- or function position can also be searched separately (see below). Note however, that if form- and function tag are identical, the POS view in the web-interface currently only shows one tag, e.g. you_PP, not you_PP(PP), and both form- and function tag only if they differ, e.g. neuropa_PVC(NP).

2.1. Simple POS search

POS

(equivalent to pf:POS)

 

All tokens with a particular part-of-speech tag (POS) in form or function position

 

NB. General searches for a POS tag in the whole corpus are likely to yield many hits and might slow down the search engine.

JJ

professional_JJ(JJ)

p_non-formal_PVC(JJ)

full-time_JJ(RB)

present_JJ(JJ)/VV(VV)

 

p:POS

All tokens with a part-of-speech tag in form position

p:JJ

(adj. in form position)

professional_JJ(JJ)

full-time_JJ(RB)

f:POS

All tokens with a part-of-speech tag in function position

f:JJ

(adj. in function position)

professional_JJ(JJ)

p_non-formal_PVC(JJ)

2.2. POS search with wildcards

POS.*

 

POS tag with wildcard

 

NB. Using wildcards with POS tags is meaningful for POS categories which are sub-divided into finer categories, e.g. Verbs, Adjectives, Nouns, Adverbs. Users may also want to narrow down results by adding sub-specifications, e.g. (go,V.* see 5. EXPERT SEARCH).

V.*

Verb-tag with wildcard

(all verb forms, e.g. VV, VBP, …)

want_VVP(VVP)

to ask_VV(VV)

saw_VVD(VVD)

is_VBZ(VBZ)

J.*

Adjective-tag with wildcard

(all adjective forms, i.e. JJ, JJR, JJS)

big_JJ(JJ)

cheaper_JJR(JJR)

most_JJS(JJS)

N.*

Noun-tag with wildcard

(all noun forms, e.g. NN, NNS, …)

airport_NN(NN)

ideas_NNS(NNS)

london_NP(NP)

netherlands_NPS(NPS)

2.3. Lemma search

l:lemma

Finds all tokens of a particular lemma

l:walk

 

walk

walking

walked

 

3. Mark-up search – NEW!

3. MARK-UP SEARCH – NEW!!!

 

 

 

General remarks

Mark-up can be searched via several means in VOICE 3.0 Online BETA.

Apart from POS tags for conversational features like pauses (PA) or breath (BR) that were available already in VOICE 2.0 POS Online (see short POS tag set), VOICE 3.0 Online BETA introduces new possibilities to search for pauses, laughter and mark-up between pointed brackets (e.g. speaking modes, overlaps, tags for non-English speech) in more sophisticated ways.
(For descriptions of the mark-up categories available in VOICE transcripts, see the VOICE Mark-up Conventions.)

3.1. Pauses

_0

_1

_2

Pauses of different lengths
(Numbers indicate length in seconds as transcribed. _0 indicates a short pause of up to a half second, see VOICE Mark-up conventions.)

 

NB. Be mindful that especially short pauses are rather frequent and are thus best searched in combination (e.g. with tokens or POS tags, see 4.4. and below).

 

NB. In order to find pauses irrespective of their length, we recommend you use the POS tag PA, see “Other mark-up searches” and POS short tag set.)

_0

_1

_2

 

 

 

 

 

(.)

(1)

(2)

 

 

 

3.2. Laughter

_@

_@@

 _@@@

Laughter, each @-symbol refers to the respective number of syllables laughed (e.g. Ha ha = @@, see VOICE Mark-up Conventions).

_@@

@@

@@ @@

 

 

 

_@+

 

Laughter strings with at least one “@”, i.e.
laughter strings with any number of syllables.

_@+

@

@@

@@@@

@@@@@@@

_@{2,4}

 

Laughter with a defined string length (see 5.3. Context search)

_@{2,4}

(sequences of minimum 2 and maximum 4 repetitions of the @-character)

@@

@@@

@@@@

3.3. Speaking modes

<@/>

<fast/>

<slow/>

<loud/>

<soft/>

Stretch of speech marked <xyz> token token </xyz>

 

NB. For the full list of speaking modes see the VOICE Mark-up Conventions.

 

<@/>

(speaking mode: laughingly)

<@> yeah yeah </@>

<soft/>

(speaking mode: soft)

<soft> okay </soft>

3.4. Non-English Speech

<L/>

 

All stretches of transcription in languages marked as non-English speech (L1, LN or LQ; see VOICE Mark-up Conventions).

<L/>

<LNger> diesen leberknoedel {this liver dumpling} </LNger>

<L1slo> xxxx </L1slo>

<LQslo> dobre? {good} </LQslo>

<L1/>

All stretches of speakers first languages (L1) other than English

<L1/>

<L1mlt> mara {woman} </L1mlt>

<L1rum> securitate </L1rum>

<LN/>

All stretches of speech in neither English nor a speakers first language

<LN/>

<LNger> senf. {mustard} </LNger>

<LNita> toscana? {name of pizza} </LNita>

<LQ/>

All stretches of speech where it is not known whether they were a speakers first or a foreign language

<LQ/>

<LQfre> melange {mixture} </LQfre>

<LQger> danke {thanks} </LQger>

<L translation="token"/>

 

Finds tokens in any translation tag (L1, LN or LQ)

<L translation="yes"/>

<L1scc> jeste {yes} </L1scc>

<LNita> s:i. {yes} </LNita>

<L1 translation="token"/>

<LN translation="token"/>

<LQ translation="token"/>

 

 

Finds tokens in translations either an L1, LN or an LQ-tag

<LN translation="yes"/>

<LNger> ja {yes} </LNger>

<LNita> s:i. {yes} </LNita>

<LNfre> oui {yes} </LNfre>

<L1language/>

<LNlanguage/>

<LQlanguage/>

Finds and highlights a stretch of a particular language tag

NB: Languages are abbreviated according to the iso 639-2 codes.

<L1ger/>

<L1ger> nein danke {no thanks} </L1ger>

<LNita/>

<Lnita> grazie {thanks} </Lnita>

3.5. Overlaps

<ol/>

Overlaps

NB: This search is best narrowed down, e.g. by using within or containing (see example on the right and 5.4.1. Tokens within Mark-up)

<ol/>

<1> what is it </1>

<3> yeah </3>

<6> we have that </6>

 

 

okay within <ol/>

<2> okay </2>

<8> oh okay </8>

3.6. Additional mark-up searches

<ono/>

Onomatopoeia

<ono/>

<ono> wəʊəʊ: </ono>

<ono> brbrm </ono>

<clears throat/>

<whistles/>

Speaker noises

NB: For the full list of speaker noises see the VOICE Mark-up Conventions.

<clears throat/>

 

<clears throat>

3.7. Mark-up searches via POS tags and special queries

FW

f_.*

 

All foreign (i.e. non-English) tokens

f_.*

<LNbul>  rakia_FW(FW) {raki} </LNbul>

<L1ger> tschuldigung_FW(FW) {sorry} </L1ger>

<LNger> schottentor?_FW(FW) {place in vienna} </LNger>

PA

All pauses (POS tag PA)

PA

(.)

(1)
(4)

PVC

p_.*

 

All pronunciation variations and coinages

p_.*

<pvc> creativitly_PVC(NN)/PVC(RB) {creatively} </pvc>

<pvc> frauding_PVC(VVG) </pvc>

ONO

All onomatopoeia

ONO

<ono> bvuff_ONO(ONO) <ono/>

<ono> lalala_ONO(ONO) </ono>

SP

s_.*

All spelt items

NB. While spelt tokens are annotated with different POS tags (e.g. SP, CD, NN), they can be retrieved through the common prefix s_.

s_.*

 

<spel> p h d_NN(NN) </spel>

<spel> a_LS(LS) </spel>

<spel> e u_NP(NP) </spel>

<spel> a m_RB(RB) </spel>

<spel> s_p_SP(SP) </spel>

 

4. Phrase search – partly NEW!

4. PHRASE SEARCH – partly NEW!!!

 

 

 

General remarks

Any combination or sequence of tokens (i.e. character strings/lexical searches), tags or searchable mark-up can be searched as phrases when each item is separated by a space.

Phrase searches are only carried out within individual utterances. In consequence, phrases that go beyond utterance boundaries will not be found.

Conversational mark-up such as pauses, laughter, breathing, tags for overlapping speech and other mark-up are ignored (i.e. they do not break up lexical phrases), unless mark-up items are explicitly included in the phrase search.

4.1. Lexical phrases (tokens)

Token plus token

 

 

token token

 

Finds a particular sequence of tokens

and the

and the

a:nd the

(and) (1) the

and hh the

and the </@> hh and the

4.2. Part-of-speech and lemma combinations

POS tag plus POS tag / lemma

 

 

POS1 POS2 POS3

 

 

Finds a particular sequence of POS tags

DT JJ NN

(Determiner followed by adjective followed by noun)

a hu:ge university

the other way

a good soccer

POS1 POS2 lemma1

Finds a particular sequence of POS tags and/or lemma tags

 

DT JJ l:university

a hu:ge university

a (.) modern university

the private universities

4.3. Word, POS, lemma combinations

Token plus POS or lemma

 

 

token POS

POS token

lemma POS

token1 POS1 POS2

POS1 token1 token2

Finds sequences of tokens, POS tags and lemmas

whenever PP

(token whenever plus personal pronoun)

whenever you

whenever they

whenever we

PVC er

(pronunciation variation and coinage plus token er)

<pvc> preferently </pvc> er

<pvc> (knowledges) </pvc> er

you MD VV

(token you followed by modal verb and base verb)

you will go

you can get

play the NN

(tokens play and the followed by singular noun)

play the card

play the doorman

play the map

4.4. Word and mark-up sequences

Token plus mark-up

 

 

token <speaking mode/>

Token followed by speaking mode soft

yeah <soft/>

(token yeah followed by mark-up indicating softly spoken)

yeah <soft> okay okay <1> i understand </1> </soft>

token <L/>

 

Token followed by non-English speech

say <L/>

(token say followed by any language tag)

can say <LNger> vermissen {to miss} </LNger> (.)

how do you say <LNfre> subvention {subvention, subsidy} </LNfre>

now we say (.) <L1nor> trettito {thirty-two} </L1nor>

is <L1/>

is <L1ger> garnisongasse {street name} </L1ger>

_@ token

Laughter followed by token

_@+   yes

(any number of laughter-syllables followed by token yes)

@ yes

@@ yes

@@@ <1> yes </1>

token _1

Token followed by pause

i _1

(token i followed by a 1 -second pause)

no i (1) i just

what i: (1) would like to

4.5. POS and mark-up sequences

POS tag plus mark-up

 

 

<@/> POS

Speaking mode followed by POS

<@/> UH

(laughingly spoken followed by interjection)

<@> no </@> @ ah

<@> okay </@> (1) erm

<@> well </@> (1) wow.

<L/> POS

Tag indicating non-English speech followed by POS

<L/> PVC

(language tag followed by PVC)

<L1scc> xx x </L1scc> <pvc> sympatic </pvc>

POS <ol/>

POS tag followed by overlap

UH <ol/>

(interjection followed by overlap)

er <4> reaction </4>

a:h <2> well yes </2>

huh? (.) <3> and the: </3>

4.6. Phrase search with wild cards

 

General remark

For phrase search, wildcards need to be separated from all other tokens, mark-up, POS tags, lemmas etc. by a space.

 .*

 

Token/POS/lemma plus token with one or more (n-)characters

manage .*

manage to

manage with

manage i

 

 

.* NN .*

a student here

.?

Token/POS/lemma plus token with one character

 

go .?

go i

.+

 

Token/POS/lemma plus token with one or more characters

NB. Search results with .+ are identical to .* in phrase search.

austria .+

austria from

austria i

austria the

 

5. Expert search – NEW!

5. EXPERT SEARCH – NEW!!!

 

 

5.1. Fine-tuning searches (and)

,

Meaning: and

Finds sub-specifications of tokens with POS tags or lemmas. (Any sequence of item before and after | possible.)

token,POS

 

Token tagged with a particular POS tag

walk,NN

(token walk as noun)

a five minute walk_NN(NN)

 

RB,real

(token real as adverb)

real_RB(RB) beautiful

l:lemma,POS

All tokens of a particular lemma tagged with a particular POS tag

l:go,VVZ

(all tokens with lemma go and tagged with verb-tag present tense 3rd person singular)

everybody goes_VVZ(VVZ)

who <@> loses go_V(VVZ) <8> for drinks

5.2. Fine-tuning searches (or)

|

Meaning: alternation (or)

Finds any of the options either to the left or the right of the pipe character | . (Any sequence of tokens, lemmas or POS tags before and after | are possible.)

token|token

Finds either one of these tokens

 

say|mean that

mean that

say that

POS1|POS2

Finds either one of these POS tags

 

VHD|VBD

(verb have or be, past tense)

was_VBD

were_VBD

had_VHD

token|POS

Finds either this token or POS tag

EX|you

(existential there or you)

there_EX(EX)

you_PP(PP)

l:lemma1|l:lemma2

Finds either one of these lemmas

 

l:say|l:mean that

(lemma say or lemma mean plus token that)

say that

said that

saying that

mean that

means that

token|l:lemma

 

Finds either this token or this lemma

man|l:house

 

man

house

houses

token1|token2 l:lemma1

Token1 or token2 followed by lemma1

never|always say

(token never or always followed by lemma say)

always say

never said

always saying

token1 POS1|POS2

Token1 followed by POS1 or POS2

i RE|UH

(token i followed by response marker or interjection)

yeah i

er i

mhm: i

always VBZ|VHZ|VVZ

(token always followed by third pers. singular form of be, have or other verbs)

always is

always has

always depends

always does

_@ POS1|POS2

Laughter followed by POS1 or POS2

_@ UH|RE

(one syllable of laughter followed by interjection or response marker)

 

@ er

@ yeah

@ ah

 

5.3. Context search: Defining range of context

General remark

As with any phrase search, in context search only search results within individual utterances are found.

{number}

{minimal number,maximal number}

Specifies an exact number or a range of minimal and maximal number.

 

NB. If used without a space the number in {…} refers to its immediate left neighbour, e.g. _@{2} finds exactly two @-syllables (@@).

 

NB. If used with space, _@ {2} finds one syllable of laughter (@) followed by any two tokens.

 

 

token {2}

Token followed by any two tokens in the same utterance

really {2}

really low and

really strong hm

really good at

token {0,3}

 

Token followed by any zero to three tokens in the same utterance

house {0,3}

house in like say

{1,2} token {1,2}

 

Token preceded and followed by any one to two tokens in the same utterance

{1,2} house {1,2}

have a house in each

on the house on the

to your house again

token1 {0,3} token2

 

Zero to three tokens between token1 and token2 in the same utterance

i {0,3} go

i must go

i decided to go

i only want to go

{1,2} POS tag {1,2}

 

A POS tag preceded and followed by one or two tokens in the same utterance

{1,2} PVC {1,2}

in er <pvc> maltesan {maltese} </pvc> english?

of (.) european <pvc> reintegration </pvc> you know?

.*{number}

 

Wildcard which defines a particular number of placeholder tokens.

NB. This type of query only yields meaningful results when narrowed down e.g. by phrase search (see example to the right).

a .*{1} house

a neutral house

a retirement house

a lovely house

.*{minimal, maximal}

Wildcard defining a particular number range of placeholder tokens.

go .*{1,2} university

go to university

go to the university

go to state university

5.4. Search within: Find tokens and POS/lemmas within mark-up

within

Finds and highlights individual tokens/tags or combinations of tokens, POS tags or lemmas within a mark-up tag (in pointed brackets).

5.4.1. Tokens within Mark-up

token within <speaking mode/>

 

Token within pointed brackets, e.g. Speaking mode

go within <soft/>

<soft> have to go: </soft>

yeah within <@/>

<@> yeah yeah yeah </@>

token within <L1/LN/LQ/>

Token within tag for non-English speech

nein within <L1ger/>

<L1ger> nei:n {no:} </L1ger>

token within <ol/>

Token within overlapping speech

really within <ol/>

<3> really strong. (1) hm? </3>

<4> really? </4>

<2> not really </2>

_@ within <speaking mode/>

Laughter within speaking mode

_@ within <loud/>

loud> @ </loud>

5.4.2. POS within Mark-up

POS within <speaking mode/>

POS tag within speaking mode

RE within <loud/>

<loud> yeah_RE(RE) </loud>

<loud> okay?_RE(RE) </loud>

POS within <ol/>

POS tag within overlap

FI within <ol/>

<4> sorry_FI(FI) </4>

<7> oh_FI(FI) my_FI(FI) gosh_FI(FI) </7>

<8> youre_FI(FI) welcome_FI(FI) </8>

<4> bye-bye_FI(FI) </4>

5.4.3. Lemma within Mark-up

l:lemma within <speaking mode/>

Lemma within speaking mode

l:be within <imitating/>

<imitating> be: the members of the working groups <8>

l:lemma within <ol/>

Lemma within overlap tag

l:say within <ol/>

<6> say </6>

<2> am i saying</2>

<4> hed say </4>

5.5. Search for containing: Find stretches of speech with particular mark-up that contain particular tokens/POS/lemmas

containing

 

5.5.1. Mark-up containing token

 

 

<ol/> containing token

 

Overlap containing token

<ol/> containing funny

<7> so funny </7>

<7> a little bit funny </7>

<6> thats funny</6>

<L/> containing token

Language tags marked as non-English speech containing token

<L1/> containing ja

<L1ger> ja tust du (weiter) {do you hurry up} </L1ger>

<L1ger> ja? {yeah} </L1ger>

<soft/> containing token

Speaking mode containing token

<soft/> containing okay

<soft> okay </soft>

<soft> okay its my turn? </soft>

5.5.2. Mark-up containing POS

 

 

<speaking mode/> containing POS

Speaking mode containing POS tag

<loud/> containing RE

<loud> no dont </loud>

<loud> yeah. </loud>

<loud> yes </loud>

<loud> okay there is coffee </loud>

5.5.3. Mark-up containing lemma

 

 

<@/> containing

l:lemma

Speaking mode laughingly spoken containing lemma

<@/> containing l:go

<@> when and where to go </@>

<@> you went shopping </@>

 

6. Examples of combined searches – NEW!

6. Examples of COMBINED SEARCHES NEW!!!

General remark

This section provides a non-exhaustive selection of possible search combinations for illustration and inspiration.

 

6.1. Combined searches with wildcards and fine-tuning

 

 

token .* .* .*

Token plus wildcards for any number of tokens with more characters

i really .* .* .*

i really feel so old

i really appreciate talking to

i really think that you

.+ token .*

Token preceded by wildcard and followed by wildcard

.+ i really .* .*

what i really liked was

e:r i really hope that

i i really dont

p:POS,f:POS

Combination of particular form and function-POS tags

p:JJ,f:RB

(token tagged adjective in form-position and adverb in function position)

you grew up (.) bilingual_JJ(RB).

perform good_JJ(RB) in another language

token.*,POS

.*token,POS

Token with wildcard tagged with a particular POS tag

thank.*,FI

(thank with wildcard as formulaic item)

thanks

thank you

.*ness,PVC

(all tokens ending in -ness tagged PVC)

competiveness

healthness

europeanness

business

l:lemma,POS.*

All instances of a lemma tagged with a particular POS tag with wildcard.

 

NB. In phrases, this type of search can be useful to retrieve all POS tags of a superordinate POS category, e.g. V.* (all verbs), N.* for (all nouns).

 

l:see,V.*

(see, all verb-forms)

they saw_VVD(VVD) plays

you dont see_VV(VV) it

were seeing_VVG(VVG) a growing gap

token1 POS1 token2 .*

Combinations of token and POS tag plus a wildcard (standing for any token with one or more characters)

i RB think .*

(i followed by adverb followed by think followed by any token with more characters)

i also_RB think that

POS1 POS2 token,POS.*

Sequence of POS tags followed by a token with a sub-specification

PP RB think,V.*

(Personal pronoun followed by adverb followed by think as verb)

 i_PP also_RB think_VVP

could you_PP maybe_RB think_VV

POS1 POS.*2 POS3

Sequence of POS tags including wild cards

RB V.* PP

(Adverb followed by any verb-form followed by personal pronoun)

just smell it

nt put it

then leave it

l:lemma,POS.*

Token of a particular lemma sub-specified with POS tag

l:show,V.*

(Lemma show followed by any verb form)

show

showing

showed

shown

l:thought,NN.*

(Lemma thought as singular or plural noun)

thoughts

thought

l:lemma.*,POS.*

Token of a lemma with wildcard sub-specified with POS tag with wildcard.

l:re.*,V.*

(Lemma starting with re- tagged as any verb-form)

recording

read

related

registering

POS l:lemma POS,.*token

POS tag followed by lemma followed by POS tag sub-specified with a token with wildcard

DT l:good NN,.*ion

(Determiner followed by lemma good followed by singular nouns ending in
-ion)

a better situation

the: good discussion

the best solution

POS1|POS2 token1

Either POS tag1 or POS tag2 followed by token1

RB|JJ good

(Adverb or adjective followed by good)

very good

no good

good good

many good

token1|token2|token3 POS1

Either token1, token2 or token3 followed by POS tag1

yes|yeah|yah UH

(Tokens yes, yeah or yah followed by POS category interjection)

yes o:h

yah?  er

yeah. ooph

6.2. Combined searches with within or containing

token within <speaking mode/>

Token within speaking mode

<soft/> containing well

<7> <soft> well you know </soft> </7>

<soft> mhm (2) very well </soft>

<soft> on Thursday as well </soft>

POS1|POS2|POS3 within <ol/>

Either POS tag1, 2 or 3 within overlap-tag

FI|RE|UH within <ol/>

(Formulaic item or response marker or interjection within overlap tag)

<3> thanks_FI(FI) </3>

<5> ye:s_RE(RE) </5>

<10> er:_UH(UH) </10>

laughter within <ol/>

Laughter within overlap-tag

_@@ within <ol/>

(Two syllables of laughter within overlap-tag)

<8> @@ </8>

<1> hi @@ </1>

<speaking mode/> containing token,POS

Speaking mode tag containing a token sub-specified with a POS tag

<soft/> containing well,DM

(Speaking mode soft containing token well POS tagged as discourse marker)

<soft> well_DM(DM) you know </soft>

<soft> well_DM(DM) (then) yeah of course but </soft>

<ol/> containing token,POS

Overlap-tag containing a token sub-specified with a POS tag

<ol/> containing you,FI

(Overlap containing token you tagged as formulaic item)

<6> thank you_FI(FI) @@@ </6>

<11> see you_FI(FI) </11>

<7> you_FI(FI)re welcome </7>

<ol/> containing l:lemma,POS.*

Overlap-tag containing all tokens of a lemma sub-specified with a POS tag with wildcard

<ol/> containing l:good,RB.*

(Overlap tag containing all tokens of a lemma good as any type of adverb, i.e. RB,RBR,RBS)

is going <6> (good)_RB(RB) </6>

<1> much better_RBR(RBR) </1>

 

6.3. Combined searches with context

token1 {0,1} POS1 POS2

Token1 followed by a defined range of context followed by POS tag1 and 2

the {0,2} JJ NN

(Token the followed by 0-2 tokens followed by adjective and noun)

the main building

the second third lesson

the the legal stuff

the legal erm legal clinic

<speaking mode/> containing token {0,1}

Speaking mode tag containing a token followed by a defined range of tokens

<soft/> containing yes {1,5}

(<Speaking mode/> containing token yes followed by pause range of 1-5.)

<soft> yes okay </soft>

<soft> a:h yes. (.) [name2]</soft>

<soft> yes they must be calibrated </soft>

6.4. Combined mark-up searches

Combination of mark-up searches via POS tags and new mark-up searches in pointed brackets.

 

 

PA <speaking mode/>

Any pause followed by a speaking mode tag

PA <fast/>

(all pauses followed by fast speech)

(.) <fast> keep that in mind </fast>

PVC within <speaking mode/>

All pronunciation variations and coinages which occur within a speaking mode tag

PVC within <soft/>

<soft> <pvc> unconcrete </pvc> </soft>

<soft> a balloon <pvc> wobbler? </pvc> </soft>

 

<ol/> containing SP

 

All overlaps containing spelt tokens

<ol/> containing SP

<9> <spel> s p </spel> </9>

 

 

Scroll Up