Robots.txt for sites on Bitrix

There are many variations of the robots.txt file on the internet. for sites based on 1C-Bitrix, but some are already outdated, while others have errors. We offer the correct version of robots.txt that is relevant in 2019, which takes into account the features of Bitrix of the latest versions, as well as the features of robots.txt for the popular Aspro Next, Deluxe, Nextype Magnit solutions and solutions for corporate sites Aspro. When preparing, in addition to recommending search engines and analyzing demo sites on solutions, we studied indexing of real sites and getting pages into excluded ones.

Features of the proposed robots for Bitrix sites

  • Sort accounting
  • Filtration
  • Accounting for pagination
  • Clearing get parameters
  • Closing service pages
  • Closing your personal account
  • Working with the local folder
  • Opening style files needed by search engines for indexing
  • Blocking access to the most active and useless bots and limiting the crawl speed to everyone except Yandex and Google to reduce the load

Robots.txt is a text file that contains instructions – site indexing parameters for search engine robots. Search engines support the Robots Exclusion Protocol with advanced features.

The purpose of the robots.txt file

As Yandex suggests, the robots.txt file is necessary in order to prohibit indexing of sections of the site or individual pages. For example, close from indexing:

  • pages with confidential data;
  • pages with site search results;
  • website traffic statistics;
  • duplicate pages;
  • various logs;
  • service service pages.

But. Do not forget that Google points to a slightly different purpose of the robots.txt file, indicating the main purpose is not to prohibit indexing, but to reduce the load on site crawling.

“The robots.txt file is not meant to block the display of web pages in Google search results. If other sites have links to your page containing its description, then it can still be indexed, even if Googlebot is prohibited from visiting it. To exclude a page from search results, you must use a different method, such as password protection or the noindex directive. If your robots.txt file prevents Googlebot from processing a web page, it can still appear on Google. Other methods should be used to exclude a page from Google Search entirely. “

Bitrix robots.txt example

User-Agent: *
Disallow: */index.php$
Disallow: /bitrix/
Disallow: /personal/
Disallow: */cgi-bin/
Disallow: /local/
Disallow: /test/
Disallow: /*show_include_exec_time=
Disallow: /*show_page_exec_time=
Disallow: /*show_sql_stat=
Disallow: /*bitrix_include_areas=
Disallow: /*clear_cache=
Disallow: /*clear_cache_session=
Disallow: /*ADD_TO_COMPARE_LIST
Disallow: /*ORDER_BY
Disallow: /*?print=
Disallow: /*?list_style=
Disallow: /*?sort=
Disallow: /*?sort_by=
Disallow: /*?set_filter=
Disallow: /*?arrFilter=
Disallow: /*?order=
Disallow: /*&print=
Disallow: /*print_course=
Disallow: /*?action=
Disallow: /*&action=
Disallow: /*register=
Disallow: /*forgot_password=
Disallow: /*change_password=
Disallow: /*login=
Disallow: /*logout=
Disallow: /*auth=
Disallow: */auth/
Disallow: /*backurl=
Disallow: /*back_url=
Disallow: /*BACKURL=
Disallow: /*BACK_URL=
Disallow: /*back_url_admin=
Disallow: /*?utm_source=
Disallow: */order/
Disallow: /*download
Disallow: /test.php
Disallow: */filter/*/apply/
Disallow: /*setreg=
Disallow: /*logout
Disallow: */filter/
Disallow: /*back_url_admin
Disallow: /*sphrase_id
Disallow: */search/
Disallow: /*type=
Disallow: /*?product_id=
Disallow: /*?display=
Disallow: /*?view_mode=
Disallow: /*min_price=
Disallow: /*max_price=
Disallow: /*&page=
Disallow: /*?path=
Disallow: /*?route=
Disallow: /*?products_on_page=
Disallow: /*back_url_admin=
Disallow: /*?PAGEN_1=1$
Disallow: /*?PAGEN_1=1/$
Disallow: /*?new=Y
Disallow: /*?edit=
Disallow: /*?preview=
Disallow: /*SHOWALL=
Disallow: /*SHOW_ALL=
Disallow: /*SHOWBY=
Disallow: /*SPHRASE_ID=
Disallow: /*TYPE=
Disallow: /*?utm*=
Disallow: /*&utm*=
Disallow: /*?ei=
Disallow: /*?p=
Disallow: /*?q=
Disallow: /*?VIEW=
Disallow: /*?SORT_TO=
Disallow: /*?SORT_FIELD=
Disallow: /*set_filter=
Disallow: */auth.php
Disallow: /*?alfaction=
Disallow: /*?oid=
Disallow: /*?name=
Disallow: /*?form_id=
Disallow: /*&form_id=
Disallow: /*?bxajaxid=
Disallow: /*&bxajaxid=
Disallow: /*?view_result=
Disallow: /*&view_result=
Disallow: */resize_cache/
Allow: /bitrix/components/
Allow: /bitrix/cache/
Allow: /bitrix/js/
Allow: /bitrix/templates/
Disallow: /bitrix/panel/
Allow: /local/components/
Allow: /local/cache/
Allow: /local/js/
Allow: /local/templates/
Crawl-delay: 30
Sitemap: https://XXXXCC/sitemap.xml

User-Agent: Yandex
Disallow: */index.php$
Disallow: /bitrix/
Disallow: /personal/
Disallow: */cgi-bin/
Disallow: /local/
Disallow: /test/
Disallow: /*show_include_exec_time=
Disallow: /*show_page_exec_time=
Disallow: /*show_sql_stat=
Disallow: /*bitrix_include_areas=
Disallow: /*clear_cache=
Disallow: /*clear_cache_session=
Disallow: /*ADD_TO_COMPARE_LIST
Disallow: /*ORDER_BY
Disallow: /*?print=
Disallow: /*?list_style=
Disallow: /*?sort=
Disallow: /*?sort_by=
Disallow: /*?set_filter=
Disallow: /*?arrFilter=
Disallow: /*?order=
Disallow: /*&print=
Disallow: /*print_course=
Disallow: /*?action=
Disallow: /*&action=
Disallow: /*register=
Disallow: /*forgot_password=
Disallow: /*change_password=
Disallow: /*login=
Disallow: /*logout=
Disallow: /*auth=
Disallow: */auth/
Disallow: /*backurl=
Disallow: /*back_url=
Disallow: /*BACKURL=
Disallow: /*BACK_URL=
Disallow: /*back_url_admin=
Disallow: /*?utm_source=
Disallow: */order/
Disallow: /*download
Disallow: /test.php
Disallow: */filter/*/apply/
Disallow: /*setreg=
Disallow: /*logout
Disallow: */filter/
Disallow: /*back_url_admin
Disallow: /*sphrase_id
Disallow: */search/
Disallow: /*type=
Disallow: /*?product_id=
Disallow: /*?display=
Disallow: /*?view_mode=
Disallow: /*min_price=
Disallow: /*max_price=
Disallow: /*&page=
Disallow: /*?path=
Disallow: /*?route=
Disallow: /*?products_on_page=
Disallow: /*back_url_admin=
Disallow: /*?PAGEN_1=1$
Disallow: /*?PAGEN_1=1/$
Disallow: /*?new=Y
Disallow: /*?edit=
Disallow: /*?preview=
Disallow: /*SHOWALL=
Disallow: /*SHOW_ALL=
Disallow: /*SHOWBY=
Disallow: /*SPHRASE_ID=
Disallow: /*TYPE=
Disallow: /*?utm*=
Disallow: /*&utm*=
Disallow: /*?ei=
Disallow: /*?p=
Disallow: /*?q=
Disallow: /*?VIEW=
Disallow: /*?SORT_TO=
Disallow: /*?SORT_FIELD=
Disallow: /*set_filter=
Disallow: */auth.php
Disallow: /*?alfaction=
Disallow: /*?oid=
Disallow: /*?name=
Disallow: /*?form_id=
Disallow: /*&form_id=
Disallow: /*?bxajaxid=
Disallow: /*&bxajaxid=
Disallow: /*?view_result=
Disallow: /*&view_result=
Disallow: */resize_cache/
Allow: /bitrix/components/
Allow: /bitrix/cache/
Allow: /bitrix/js/
Allow: /bitrix/templates/
Disallow: /bitrix/panel/
Allow: /local/components/
Allow: /local/cache/
Allow: /local/js/
Allow: /local/templates/
Host: https://XXXXXC
Clean-param: setreg&back_url_admin&logout&sphrase_id&action&utm_source&openstat&sort&sort_by&arrFilter&display&bxajaxid&view_mode&set_filter&alfaction&SORT_TO&SORT_FIELD&VIEW&bitrix_include_areas&clear_cache

User-Agent: Googlebot
Disallow: */index.php$
Disallow: /bitrix/
Disallow: /personal/
Disallow: */cgi-bin/
Disallow: /local/
Disallow: /test/
Disallow: /*show_include_exec_time=
Disallow: /*show_page_exec_time=
Disallow: /*show_sql_stat=
Disallow: /*bitrix_include_areas=
Disallow: /*clear_cache=
Disallow: /*clear_cache_session=
Disallow: /*ADD_TO_COMPARE_LIST
Disallow: /*ORDER_BY
Disallow: /*?print=
Disallow: /*?list_style=
Disallow: /*?sort=
Disallow: /*?sort_by=
Disallow: /*?set_filter=
Disallow: /*?arrFilter=
Disallow: /*?order=
Disallow: /*&print=
Disallow: /*print_course=
Disallow: /*?action=
Disallow: /*&action=
Disallow: /*register=
Disallow: /*forgot_password=
Disallow: /*change_password=
Disallow: /*login=
Disallow: /*logout=
Disallow: /*auth=
Disallow: */auth/
Disallow: /*backurl=
Disallow: /*back_url=
Disallow: /*BACKURL=
Disallow: /*BACK_URL=
Disallow: /*back_url_admin=
Disallow: /*?utm_source=
Disallow: */order/
Disallow: /*download
Disallow: /test.php
Disallow: */filter/*/apply/
Disallow: /*setreg=
Disallow: /*logout
Disallow: */filter/
Disallow: /*back_url_admin
Disallow: /*sphrase_id
Disallow: */search/
Disallow: /*type=
Disallow: /*?product_id=
Disallow: /*?display=
Disallow: /*?view_mode=
Disallow: /*min_price=
Disallow: /*max_price=
Disallow: /*&page=
Disallow: /*?path=
Disallow: /*?route=
Disallow: /*?products_on_page=
Disallow: /*back_url_admin=
Disallow: /*?PAGEN_1=1$
Disallow: /*?PAGEN_1=1/$
Disallow: /*?new=Y
Disallow: /*?edit=
Disallow: /*?preview=
Disallow: /*SHOWALL=
Disallow: /*SHOW_ALL=
Disallow: /*SHOWBY=
Disallow: /*SPHRASE_ID=
Disallow: /*TYPE=
Disallow: /*?utm*=
Disallow: /*&utm*=
Disallow: /*?ei=
Disallow: /*?p=
Disallow: /*?q=
Disallow: /*?VIEW=
Disallow: /*?SORT_TO=
Disallow: /*?SORT_FIELD=
Disallow: /*set_filter=
Disallow: */auth.php
Disallow: /*?alfaction=
Disallow: /*?oid=
Disallow: /*?name=
Disallow: /*?form_id=
Disallow: /*&form_id=
Disallow: /*?bxajaxid=
Disallow: /*&bxajaxid=
Disallow: /*?view_result=
Disallow: /*&view_result=
Disallow: */resize_cache/
Allow: /bitrix/components/
Allow: /bitrix/cache/
Allow: /bitrix/js/
Allow: /bitrix/templates/
Disallow: /bitrix/panel/
Allow: /local/components/
Allow: /local/cache/
Allow: /local/js/
Allow: /local/templates/

User-Agent: SemrushBot
Disallow: /
User-Agent: MJ12bot
Disallow: /
User-Agent: AhrefsBot
Disallow: /
User-agent: gigabot
Disallow: /
User-agent: Gigabot/2.0
Disallow: /
User-agent: msnbot
Disallow: /
User-agent: msnbot/1.0
Disallow: /
User-agent: ia_archiver
Disallow: /
User-agent: libwww-perl
Disallow: /
User-agent: NetStat.Ru Agent
Disallow: /
User-agent: WebAlta Crawler/1.3.25
Disallow: /
User-agent: Yahoo!-MMCrawler/3.x
Disallow: /
User-agent: MMCrawler/3.x
Disallow: /
User-agent: NG/2.0
Disallow: /
User-agent: slurp
Disallow: /
User-agent: aipbot
Disallow: /
User-agent: Alexibot
Disallow: /
User-agent: GameSpyHTTP/1.0
Disallow: /
User-agent: Aqua_Products
Disallow: /
User-agent: asterias
Disallow: /
User-agent: b2w/0.1
Disallow: /
User-agent: BackDoorBot/1.0
Disallow: /
User-agent: becomebot
Disallow: /
User-agent: BlowFish/1.0
Disallow: /
User-agent: Bookmark search tool
Disallow: /
User-agent: BotALot
Disallow: /
User-agent: BotRightHere
Disallow: /
User-agent: BuiltBotTough
Disallow: /
User-agent: Bullseye/1.0
Disallow: /
User-agent: BunnySlippers
Disallow: /
User-agent: CheeseBot
Disallow: /
User-agent: CherryPicker
Disallow: /
User-agent: CherryPickerElite/1.0
Disallow: /
User-agent: CherryPickerSE/1.0
Disallow: /
User-agent: Copernic
Disallow: /
User-agent: CopyRightCheck
Disallow: /
User-agent: cosmos
Disallow: /
User-agent: Crescent
Disallow: /
User-agent: Crescent Internet ToolPak HTTP OLE Control v.1.0
Disallow: /
User-agent: DittoSpyder
Disallow: /
User-agent: EmailCollector
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailWolf
Disallow: /
User-agent: EroCrawler
Disallow: /
User-agent: ExtractorPro
Disallow: /
User-agent: FairAd Client
Disallow: /
User-agent: Fasterfox
Disallow: /
User-agent: Flaming AttackBot
Disallow: /
User-agent: Foobot
Disallow: /
User-agent: Gaisbot
Disallow: /
User-agent: GetRight/4.2
Disallow: /
User-agent: Harvest/1.5
Disallow: /
User-agent: hloader
Disallow: /
User-agent: httplib
Disallow: /
User-agent: HTTrack 3.0
Disallow: /
User-agent: humanlinks
Disallow: /
User-agent: IconSurf
Disallow: /
User-agent: InfoNaviRobot
Disallow: /
User-agent: Iron33/1.0.2
Disallow: /
User-agent: JennyBot
Disallow: /
User-agent: Kenjin Spider
Disallow: /
User-agent: Keyword Density/0.9
Disallow: /
User-agent: larbin
Disallow: /
User-agent: LexiBot
Disallow: /
User-agent: libWeb/clsHTTP
Disallow: /
User-agent: LinkextractorPro
Disallow: /
User-agent: LinkScan/8.1a Unix
Disallow: /
User-agent: LinkWalker
Disallow: /
User-agent: LNSpiderguy
Disallow: /
User-agent: lwp-trivial
Disallow: /
User-agent: lwp-trivial/1.34
Disallow: /
User-agent: Mata Hari
Disallow: /
User-agent: Microsoft URL Control
Disallow: /
User-agent: Microsoft URL Control - 5.01.4511
Disallow: /
User-agent: Microsoft URL Control - 6.00.8169
Disallow: /
User-agent: MIIxpc
Disallow: /
User-agent: MIIxpc/4.2
Disallow: /
User-agent: Mister PiX
Disallow: /
User-agent: moget
Disallow: /
User-agent: moget/2.1
Disallow: /
User-agent: MSIECrawler
Disallow: /
User-agent: NetAnts
Disallow: /
User-agent: NICErsPRO
Disallow: /
User-agent: Offline Explorer
Disallow: /
User-agent: Openbot
Disallow: /
User-agent: Openfind
Disallow: /
User-agent: Openfind data gatherer
Disallow: /
User-agent: Oracle Ultra Search
Disallow: /
User-agent: PerMan
Disallow: /
User-agent: ProPowerBot/2.14
Disallow: /
User-agent: ProWebWalker
Disallow: /
User-agent: psbot
Disallow: /
User-agent: Python-urllib
Disallow: /
User-agent: QueryN Metasearch
Disallow: /
User-agent: Radiation Retriever 1.1
Disallow: /
User-agent: RepoMonkey
Disallow: /
User-agent: RepoMonkey Bait & Tackle/v1.01
Disallow: /
User-agent: RMA
Disallow: /
User-agent: searchpreview
Disallow: /
User-agent: SiteSnagger
Disallow: /
User-agent: SpankBot
Disallow: /
User-agent: spanner
Disallow: /
User-agent: SurveyBot
Disallow: /
User-agent: suzuran
Disallow: /
User-agent: Szukacz/1.4
Disallow: /
User-agent: Teleport
Disallow: /
User-agent: TeleportPro
Disallow: /
User-agent: Telesoft
Disallow: /
User-agent: The Intraformant
Disallow: /
User-agent: TheNomad
Disallow: /
User-agent: TightTwatBot
Disallow: /
User-agent: toCrawl/UrlDispatcher
Disallow: /
User-agent: True_Robot
Disallow: /
User-agent: True_Robot/1.0
Disallow: /
User-agent: turingos
Disallow: /
User-agent: TurnitinBot
Disallow: /
User-agent: TurnitinBot/1.5
Disallow: /
User-agent: URL Control
Disallow: /
User-agent: URL_Spider_Pro
Disallow: /
User-agent: URLy Warning
Disallow: /
User-agent: VCI
Disallow: /
User-agent: VCI WebViewer VCI WebViewer Win32
Disallow: /
User-agent: Web Image Collector
Disallow: /
User-agent: WebAuto
Disallow: /
User-agent: WebBandit
Disallow: /
User-agent: WebBandit/3.50
Disallow: /
User-agent: WebCapture 2.0
Disallow: /
User-agent: WebCopier
Disallow: /
User-agent: WebCopier v.2.2
Disallow: /
User-agent: WebCopier v3.2a
Disallow: /
User-agent: WebEnhancer
Disallow: /
User-agent: WebSauger
Disallow: /
User-agent: Website Quester
Disallow: /
User-agent: Webster Pro
Disallow: /
User-agent: WebStripper
Disallow: /
User-agent: WebZip
Disallow: /
User-agent: WebZip
Disallow: /
User-agent: WebZip/4.0
Disallow: /
User-agent: WebZIP/4.21
Disallow: /
User-agent: WebZIP/5.0
Disallow: /
User-agent: Wget
Disallow: /
User-agent: wget
Disallow: /
User-agent: Wget/1.5.3
Disallow: /
User-agent: Wget/1.6
Disallow: /
User-agent: WWW-Collector-E
Disallow: /
User-agent: Xenu's
Disallow: /
User-agent: Xenu's Link Sleuth 1.1c
Disallow: /
User-agent: Zeus
Disallow: /
User-agent: Zeus 32297 Webster Pro V2.9 Win32
Disallow: /
User-agent: Zeus Link Scout
Disallow: /
User-agent: EmailSiphon
Disallow: /
User-agent: EmailCollector
Disallow: /

Setting up robots.txt

Starting with version 14 of the Search Engine Optimization module, you no longer need to manually create a robots.txt file for your site. Now you can create it using a special generator, which is available on the Manage robots.txt page (Marketing> Search Engine Optimization> Setting robots.txt). The form on this page allows you to create, manage and monitor your site’s robots.txt file. In the Bitrix administration panel, robots.txt can be generated automatically or manually corrected. Settings in the Bitrix admin panel are available at /bitrix/admin/seo_robots.php?lang=ru

If there are several sites in the system, then using the context panel button do not forget to switch to the desired site for which you need to view / create a robots.txt file.

Read also:   Website layout Pixel Perfect

On the General Rules tab, instructions are created that apply to all search engines (bots). The necessary rules are generated using the buttons:

On the “Yandex and Google” tabs, the rules for Yandex and Google bots are configured, respectively. Special rules for specific bots are configured in the same way as general rules, for them only the basic set of rules and the path to the sitemap file are not specified. In addition, using the links available at the bottom of the form, you can familiarize yourself with the documentation from Yandex and Google on using the robots.txt file.

Common robots.txt errors

Closing pagination pages.

  • In the proposed version of the file, the pages are open, only a duplicate of the first page is excluded, which is often opened both with the PAGEN parameter and without it. All other pagination pages are not prohibited from indexing.

Olga Leontyeva, Marketing Specialist, APRIORUM GROUP
You should leave pagination pages open for indexing, but close their duplicates. Duplicate parts are created when using the “Show By” selection or sorts on the site. It is also necessary to properly organize the closing of duplicates in the presence of a page selection and “show all” at the same time. Additionally, it is advisable to remove duplicate category descriptions on pagination pages other than the first one, as well as add a unique addition with the page number to the meta tags (for example, “Page 2 of 100” or “25 spare parts out of 1000” or “Catalog page 2”) …

Line skips

  • The presence of empty line breaks between the directives User-agent, Disallow and Allow is unacceptable.
Read also:   Bitrix 18: an overview of a new online store

Incorrect file name case

  • The file name is written without using uppercase.

Incorrect case of paths in file

  • The robot is case sensitive in writing substrings (name or path to a file, robot name) and is not case sensitive in directive names.

Closing the robots file itself from indexing

  • In this case, search robots ignore the file directives.

Cyrillic usage

  • Cyrillic is prohibited in the robots.txt file and in the server’s HTTP headers.

Use Punycode to specify domain names.

Invalid protocol

  • The protocol specified for the sitemap must be updated after the site is transferred from http to https.

Extra features

Crawl-Delay

  • Instead of Crawl-Delay, Yandex recommends using the crawl speed setting in Yandex.Webmaster instead of the directive.

Directives for Google from 2019

  • From September 1, 2019 Google has stopped following directives that are not supported or published in the robots exclusion protocol. The changes were announced on the company’s blog.

Dynamic robots for multi-region or multisite

  • Instructions for setting up robots for sites with multisite content on subdomains (including multicity) can be found in the 1C-Bitrix tutorials and, for example, instructions for Aspro solutions.

Robots.txt check

  • After placing the file, it is advisable to check the robots.txt (for example, you can check the robots.txt online using the Yandex Webmaster tool).

Frequently asked questions about the Disallow directive

The question is often asked whether the directives are equivalent:

Disallow: /auth/
Disallow: */auth/
Disallow: /auth
Disallow: /auth/*

The directives are not entirely equivalent. For instance, Disallow: / auth / it is the section that prohibits http://site.ru/auth/ (starting from the first level of cnc), while pages of the form https://site.ru/info/auth/help/page will remain available when using such a directive. Disallow: / auth / * – alternative notation of the directive. Directive Disallow: / auth will disallow all links that start with an address http://site.ru/authe.g. page http://site.ru/authentication will also be banned. Directive Disallow: * / auth / correctly disallows the page to be indexed at any level.

Read also:   1C-Bitrix: Site Management 18.0

0
Black Friday Every Day: How To Make Discounts Part Of Your Marketing Strategy How to generate RSS: automatic and manual ways

No Comments

No comments yet

Leave a Reply

Your email address will not be published. Required fields are marked *