搜索引擎工作的基础流程与原理_威海佰年网络技术有限公司_网站建设_软件开发_私有云_商标注册_公众号_小程序_APP_物联网_ChatGPT

Categories

Tags

搜索引擎工作的基础流程与原理

搜索引擎工作的基本流程如下： 1. 网络爬虫：首先搜索引擎需要通过网络爬虫抓取网页，并对网页进行解析，将其中的文本、图片、链接等信息提取出来。 2. 存储索引：接着搜索引擎需要将这些信息存储在索引库中，为后续搜索做准备。 3. 搜索关键词：当用户输入关键词进行搜索时，搜索引擎需要对这些关键词进行分词，并对分词结果进行查询，找到相关的网页并按照相关度排序。 4. 显示结果：最后，搜索引擎会按照一定的算法将搜索结果呈现给用户，一般是以列表的形式展示。搜索引擎工作的基本原理主要是借鉴信息检索学科的相关理论和技术，其中包括以下几个方面： 1. 分词技术：将用户输入的自然语言文本按照一定的规则进行分割，形成对应的词语编码。 2. 倒排索引：将网络爬虫抓取到的文本信息按照词语分割，分别存储在对应的索引项中，建立一张包含所有词语的倒排索引表，以加快查询速度。 3. 相似度计算：通过对分词后的关键词和倒排索引表中的索引项进行匹配，计算网页与查询关键词的相似程度，以实现相关度排序。 4. 算法优化：通过对搜索引擎中的关键技术进行不断优化，提高搜索准确度和速度，不断满足用户的搜索需求。

Public @ 2023-04-02 03:00:29

搜索引擎检索系统概述

前面简要介绍过了搜索引擎的索引系统，实际上在建立倒排索引的最后还需要有一个入库写库的过程，而为了提高效率这个过程还需要将全部term以及偏移量保存在文件头部，并且对数据进行压缩，这涉及到的过于技术化在此就不多提了。今天简要给大家介绍一下索引之后的检索系统。检索系统主要包含了五个部分，如下图所示：索引&检索.jpg（1）Query串切词分词即将用户的查询词进行分词，对之后的查询做准备，以“1

Public @ 2011-11-07 16:21:49

数据分析：如何追踪访客初始来源

了解网站的运营情况、了解用户构成是保证网站健康持续发展的重要基础，所以看数据做分析是网站优化人员每日必做的工作。上周平台发布了《网站分析白皮书（站长版）》，本周小编又发现了一篇非常好的实战型文章《在Google Analytics中如何跟踪访客的初始来源》，作者马骏是已获得GOOGLE Analytics IQ认证的网站访客行为分析师，得知平台要转载此文章后很贴心地将原文中的英文内容都做成了中文的

Public @ 2020-09-06 16:21:48

爬行、抓取、索引、收录，指的都是什么？

一位读者在蜘蛛抓取配额是什么这篇帖子留言：不对呀，这个index标签，是指告诉蜘蛛可以抓取该页面，那么noindex不就是不允许抓取该页面吗？！那么为什么文章最后的几个说明里有“noindex标签不能节省抓取份额。搜索引擎要知道页面上有noindex标签，就得先抓取这个页面，所以并不节省抓取份额。”留言说明，这位读者并没有太明白什么是抓取，什么是索引，index和noindex标签的意义又是什么。

Public @ 2021-09-23 16:21:52

搜索引擎工作流程

搜索引擎工作流程主要有数据采集、数据预处理、数据处理、结果展示等阶段。在各工作阶段分别使用了网络爬虫、中文分词、大数据处理、数据挖掘等技术。网络爬虫也被称为蜘蛛或者网络机器人，它是搜索引擎抓取系统的重要组成部分。网络爬虫根据相应的规则，以某些站点作为起始站点通过各页面上的超链接遍历整个互联网，利用URL弓I用根据广度优先遍历策略从一个html文档爬行到另一个html文档来抓取信息。中文分词是中文搜

Public @ 2017-09-27 16:22:24

更多您感兴趣的搜索

基本文件流程错误 SQL 调试

/www/wwwroot/bninc.cn/public/index.php ( 0.79 KB )
/www/wwwroot/bninc.cn/public/public.php ( 1.08 KB )
/www/wwwroot/bninc.cn/thinkphp/start.php ( 0.73 KB )
/www/wwwroot/bninc.cn/thinkphp/base.php ( 2.66 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Loader.php ( 19.47 KB )
/www/wwwroot/bninc.cn/vendor/composer/autoload_namespaces.php ( 0.21 KB )
/www/wwwroot/bninc.cn/vendor/composer/autoload_psr4.php ( 0.84 KB )
/www/wwwroot/bninc.cn/vendor/composer/autoload_classmap.php ( 0.14 KB )
/www/wwwroot/bninc.cn/vendor/composer/autoload_files.php ( 0.42 KB )
/www/wwwroot/bninc.cn/vendor/qiniu/php-sdk/src/Qiniu/functions.php ( 7.10 KB )
/www/wwwroot/bninc.cn/vendor/qiniu/php-sdk/src/Qiniu/Config.php ( 0.70 KB )
/www/wwwroot/bninc.cn/vendor/topthink/think-captcha/src/helper.php ( 1.59 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Route.php ( 59.82 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Config.php ( 6.03 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Validate.php ( 40.27 KB )
/www/wwwroot/bninc.cn/vendor/topthink/think-queue/src/config.php ( 0.77 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Console.php ( 21.22 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Error.php ( 3.59 KB )
/www/wwwroot/bninc.cn/thinkphp/convention.php ( 10.31 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/App.php ( 21.04 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Request.php ( 50.94 KB )
/www/wwwroot/bninc.cn/app/config.php ( 11.25 KB )
/www/wwwroot/bninc.cn/app/database.php ( 1.41 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Hook.php ( 4.76 KB )
/www/wwwroot/bninc.cn/app/tags.php ( 1.16 KB )
/www/wwwroot/bninc.cn/app/common/behavior/InitBase.php ( 8.17 KB )
/www/wwwroot/bninc.cn/app/common.php ( 23.29 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Env.php ( 1.25 KB )
/www/wwwroot/bninc.cn/thinkphp/helper.php ( 17.86 KB )
/www/wwwroot/bninc.cn/app/function.php ( 0.78 KB )
/www/wwwroot/bninc.cn/app/extend.php ( 13.29 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Debug.php ( 7.06 KB )
/www/wwwroot/bninc.cn/app/common/model/Config.php ( 0.78 KB )
/www/wwwroot/bninc.cn/app/common/model/ModelBase.php ( 12.18 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Model.php ( 66.83 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Db.php ( 6.54 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Log.php ( 5.84 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/db/connector/Mysql.php ( 3.94 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/db/Connection.php ( 29.97 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/db/Query.php ( 86.80 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/db/builder/Mysql.php ( 2.16 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/db/Builder.php ( 30.47 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Cache.php ( 6.17 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/cache/driver/File.php ( 7.46 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/cache/Driver.php ( 5.52 KB )
/www/wwwroot/bninc.cn/app/common/behavior/InitHook.php ( 1.25 KB )
/www/wwwroot/bninc.cn/app/common/model/Hook.php ( 0.77 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Lang.php ( 6.95 KB )
/www/wwwroot/bninc.cn/thinkphp/lang/zh-cn.php ( 3.85 KB )
/www/wwwroot/bninc.cn/app/route.php ( 0.91 KB )
/www/wwwroot/bninc.cn/app/index/config.php ( 0.96 KB )
/www/wwwroot/bninc.cn/app/index/common.php ( 0.68 KB )
/www/wwwroot/bninc.cn/app/index/controller/Wiki.php ( 2.44 KB )
/www/wwwroot/bninc.cn/app/index/controller/IndexBase.php ( 1.10 KB )
/www/wwwroot/bninc.cn/app/common/controller/ControllerBase.php ( 4.75 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Controller.php ( 6.20 KB )
/www/wwwroot/bninc.cn/thinkphp/library/traits/controller/Jump.php ( 4.97 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/View.php ( 6.86 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/view/driver/Think.php ( 5.61 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Template.php ( 46.46 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/template/driver/File.php ( 2.24 KB )
/www/wwwroot/bninc.cn/app/index/logic/Wiki.php ( 6.16 KB )
/www/wwwroot/bninc.cn/app/index/logic/IndexBase.php ( 0.79 KB )
/www/wwwroot/bninc.cn/app/common/logic/LogicBase.php ( 0.83 KB )
/www/wwwroot/bninc.cn/app/common/model/Article.php ( 0.78 KB )
/www/wwwroot/bninc.cn/app/common/model/ArticleTongji.php ( 0.79 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/paginator/driver/Bootstrap.php ( 5.90 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Paginator.php ( 9.45 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Collection.php ( 8.63 KB )
/www/wwwroot/bninc.cn/runtime/temp/fd12b1d7af823e9ae53201dacc6a6621.php ( 56.49 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/Response.php ( 8.64 KB )
/www/wwwroot/bninc.cn/thinkphp/library/think/debug/Html.php ( 4.27 KB )

[ DB ] CONNECT:[ UseTime:0.021790s ] mysql:dbname=briline.net;host=106.14.77.182;port=3306;charset=utf8
[ SQL ] SHOW COLUMNS FROM `ob_article` [ RunTime:0.015789s ]
[ SQL ] SELECT * FROM `ob_article` WHERE `id` = 6658 LIMIT 1 [ RunTime:0.014740s ]
[ EXPLAIN : array ( 'id' => 1, 'select_type' => 'SIMPLE', 'table' => 'ob_article', 'type' => 'const', 'possible_keys' => 'PRIMARY', 'key' => 'PRIMARY', 'key_len' => '4', 'ref' => 'const', 'rows' => 1, 'extra' => NULL, ) ]
[ SQL ] select * from `ob_article_tongji` where category_id=12 and mark_type='cate' order by times desc limit 15 [ RunTime:0.014929s ]
[ EXPLAIN : array ( 'id' => 1, 'select_type' => 'SIMPLE', 'table' => 'ob_article_tongji', 'type' => 'ALL', 'possible_keys' => NULL, 'key' => NULL, 'key_len' => NULL, 'ref' => NULL, 'rows' => 608, 'extra' => 'Using where; Using filesort', ) ]
[ SQL ] select * from `ob_article_tongji` where category_id=12 and mark_type='tags' order by times desc limit 100 [ RunTime:0.015075s ]
[ EXPLAIN : array ( 'id' => 1, 'select_type' => 'SIMPLE', 'table' => 'ob_article_tongji', 'type' => 'ALL', 'possible_keys' => NULL, 'key' => NULL, 'key_len' => NULL, 'ref' => NULL, 'rows' => 608, 'extra' => 'Using where; Using filesort', ) ]
[ SQL ] select * from `ob_article_tongji` where category_id=12 and mark_type='tags' order by rand() limit 30 [ RunTime:0.015682s ]
[ EXPLAIN : array ( 'id' => 1, 'select_type' => 'SIMPLE', 'table' => 'ob_article_tongji', 'type' => 'ALL', 'possible_keys' => NULL, 'key' => NULL, 'key_len' => NULL, 'ref' => NULL, 'rows' => 608, 'extra' => 'Using where; Using temporary; Using filesort', ) ]
[ SQL ] SELECT * FROM `ob_article` WHERE `id` = 6658 LIMIT 1 [ RunTime:0.014702s ]
[ EXPLAIN : array ( 'id' => 1, 'select_type' => 'SIMPLE', 'table' => 'ob_article', 'type' => 'const', 'possible_keys' => 'PRIMARY', 'key' => 'PRIMARY', 'key_len' => '4', 'ref' => 'const', 'rows' => 1, 'extra' => NULL, ) ]
[ SQL ] update `ob_article` set views=views+1 where id=6658 [ RunTime:0.015374s ]
[ SQL ] SELECT COUNT(*) AS tp_count FROM `ob_article` WHERE `category_id` = 12 AND `cate` = '威海搜索引擎工作原理' AND `status` <> -1 LIMIT 1 [ RunTime:0.020976s ]
[ EXPLAIN : array ( 'id' => 1, 'select_type' => 'SIMPLE', 'table' => 'ob_article', 'type' => 'ALL', 'possible_keys' => NULL, 'key' => NULL, 'key_len' => NULL, 'ref' => NULL, 'rows' => 9562, 'extra' => 'Using where', ) ]
[ SQL ] SELECT * FROM `ob_article` WHERE `category_id` = 12 AND `cate` = '威海搜索引擎工作原理' AND `status` <> -1 ORDER BY rand() LIMIT 0,2 [ RunTime:0.040996s ]
[ EXPLAIN : array ( 'id' => 1, 'select_type' => 'SIMPLE', 'table' => 'ob_article', 'type' => 'ALL', 'possible_keys' => NULL, 'key' => NULL, 'key_len' => NULL, 'ref' => NULL, 'rows' => 9562, 'extra' => 'Using where; Using temporary; Using filesort', ) ]
[ SQL ] SELECT COUNT(*) AS tp_count FROM `ob_article` WHERE `category_id` = 12 AND `tags` = '威海搜索引擎工作原理' AND `status` <> -1 LIMIT 1 [ RunTime:0.023856s ]
[ EXPLAIN : array ( 'id' => 1, 'select_type' => 'SIMPLE', 'table' => 'ob_article', 'type' => 'ALL', 'possible_keys' => NULL, 'key' => NULL, 'key_len' => NULL, 'ref' => NULL, 'rows' => 9562, 'extra' => 'Using where', ) ]
[ SQL ] SELECT * FROM `ob_article` WHERE `category_id` = 12 AND `tags` = '威海搜索引擎工作原理' AND `status` <> -1 ORDER BY rand() LIMIT 0,2 [ RunTime:0.034447s ]
[ EXPLAIN : array ( 'id' => 1, 'select_type' => 'SIMPLE', 'table' => 'ob_article', 'type' => 'ALL', 'possible_keys' => NULL, 'key' => NULL, 'key_len' => NULL, 'ref' => NULL, 'rows' => 9562, 'extra' => 'Using where; Using temporary; Using filesort', ) ]

0.442178s

ShowPageTrace