Many, I’m sure, know what stop words are and how they differ from negative keywords.
“Stop words are service parts of speech and pronouns, as well as any words that do not carry additional meaning, which are automatically excluded from the user’s request when selecting ads for display.” Direct Help
The main difference between stop words is that they are deterministic, and any word can be a negative keyword. The main thing in defining stop words is that they are words that have no meaning. However, this concept is relative. There are “ambiguous” stop words, for example, “that” (maintenance), “tech” (specifications = technical specifications), “themes” (many interesting topics). And often stop words radically change the meaning of phrases. This is precisely the main idea of this material.
The exact list of Yandex.Direct stop words is unknown and is constantly changing. According to my observations, all Ukrainian stop words were recently removed from it.
How to define stop words?
I know four ways:
- By setting up a group in the interface: words without a stop word and with a stop word “collapse”, only a word remains without a stop word.
- Through cross-backing and deleting duplicates in the Commander.
- Through the Budget forecast in the interface: if, when requesting the frequency, the system swears “A key phrase cannot only consist of stop words: conjunctions, prepositions, particles.” Moreover, unlike Wordstat, it does not allow it to be done even with the use of operators.
- Via Yandex Wordstat: if it gives 0 impressions for a word:
The amazing thing is that these options give different data, there is a slight out of sync. I took the Budget Forecast in the interface as the truth, since I believe that this is the highest priority product. At the moment, I have found 295 words that make sense:
a about all an and any are as at be but by can do for from have i if in is it my no not of on one or so that the there they this to was we what which will with would you would be if would be would be would be in you all of you all all all all all all all all all all all all of you yes for before him her her her her her her if he eats there is still her from or to them them them to how whom to when who whom whom which which which which which which which which which which which which who me me might might might might might might might might may my mine my my my my my may can my my my my my my my ability my we on us us our our our our our our our our our our our our not he her her he he he is not her them them but oh one one one one one one one one one one one one one one he she they are from to with himself by their very themselves their own theirs theirs theirs theirs theirs theirs theirs theirs theirs themselves so so so so so so so so so so so so so so you that that only that that that that that you have something than something that this this this this this this this this this this this me
For Google Ads, the list of stop words can be wider: there is no need to set any restrictions on this in Ads. In fact, Ads can count as a stop word any word that you do not put a broad match modifier in front of – the choice of stop words is yours.
Working with operators
Working with stop words implies putting or removing modifiers before them.
You may need to remove modifiers in several cases:
- A request in Yandex.Direct consists of seven words without stop words, with modifiers – the system will not let you through.
- There is a risk of losing reach due to the fact that users may not use stop words in the query, and there are no equivalent phrases without stop words.
- Stop words were added intentionally to embellish template headers.
Modifiers must be used in all opposite cases.
Since the approach to stop words in Direct and Ads is different, I made two lists of stop words in my add-in: a general one and only for Direct. Each of the lists can be used in macros: delete stop words, delete operators in front of them, put down operators “!” or “+”. The choice of operators is due to the fact that some stop words lean, for example, all, everything, everyone, everything, etc.
Stop words as an intent marker
Stop words can be classified by intent. I discovered this life hack a long time ago and used it when working on negative keywords. It lies in the fact that stop words in combination with the promoted entity (service or product) can characterize the user’s request as relevant or irrelevant. On the customer journey map, a fundamental parameter affecting interaction with a product / service is the time period. Exaggerating, this is before and after… Also, the user may doubt and look for alternatives – this happens during basic search.
Based on this, I marked the stop words by intent in order to calculate irrelevant queries based on them.
This includes many inquiries related to human fears, doubts and the desire to dispel them by turning to search. This is a “warm” audience, as a rule, it does not have a high conversion rate, but with skillful work it can be profitable, since they often prefer not to bother with it and leave it for later by your competitors.
- what / what / what … + essence
- o / v + service / product
- with + service in the service sector
- whether (is it painful, is it harmful, is it worth it, is it necessary, is it possible, is it possible, is it good, is it true …)
- and, of course, before + service – in the service sector
This includes search queries that indicate problems that a user has after purchasing a product or service. These may be any defects in the goods or the consequences of poorly rendered services, the need to replace, return, repair the goods, or search for consulting material (what to do and how to act in new realities).
- for + essence
- under / under + entity
- to + entity (except purchase markers: product price, product discounts)
- in / in + essence
- k / ko + essence
- from + essence
- entity + not
- essence + verb (excluding purchase marker verbs)
- as (in product semantics, except for phrases with purchase markers)
- after + essence – in the service sector
Everything is simple here: the user is either not our potential client at all, or the probability of this is about 100%. He is looking for an alternative to our product, and not necessarily a paid one. There are many types of similar interests and activities; user portraits can be completely different:
- Student or specialist. Searches for articles, abstracts, coursework, courses, educational institutions, etc.
- DIY enthusiast. Looking for manuals and instructions, trying to do everything with his own hands.
- Lover of porn.
- Seeker of meaning. Interested in dream books, horoscopes, omens, prayers, fortune telling, love spells, etc.
- An avid onliner. His behavior echoes some of the above. Looks for anecdotes, jokes, vidyahi, firewood, software, desktop wallpapers and other similar entities. These words do not belong to stop words, but without their use, the elaboration of non-target semantics would be less effective.
From stop words typical for such a portrait:
- without + essence
- instead of + entity
- why + entity or entity + why
- why + entity or entity + why
- or + entity or entity + whether
- whether + entity or entity + whether
And this list is far from complete and will be substantially replenished.
Negative words through stop words – detailed search algorithm
Knowing the intent that stop words give phrases, and understanding the relevance of the intent itself, we can automate the collection of irrelevant semantics. You will get this kind of clustering of phrases by intent through stop words.
How is it done:
Choosing an irrelevant intent.
We select the stop words that characterize it.
We analyze the order of the stop words and the promoted entity.
We select all phrases with fixed sequences from the semantic core. There are two options at this stage:
more accurate when we take strictly one word before / after the stop words;
less accurate, but there are more words at the output: we compose a frequency dictionary of the obtained semantics.
We delete obviously useful words: purchase markers, epithets, geomarkers, stop words.
Profit! In fact, you still need to go through the list with your eyes.
Non-obvious difficulties and their solution
- It is not easy to define a part of speech (verb).
- Sometimes there may be another word between the stop word and the promoted entity (for example, an epithet). If you do not delete it first, the phrase will not be filtered.
- The service can be used in a request in any declension, therefore either morphology or the use of the service in all declensions is needed. After sacrificing a bit of usability and simplifying development, I chose the second option.
There is an algorithm – there is a script! The selection of words from the intent is possible in one click in SEMTools. Publishing an add-on with a script implemented in it that does it all in one click coincides with my talk at SEMConf on September 14, 2018.