Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
I
intel_promotion_api
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
liyilin
intel_promotion_api
Commits
c56a641a
Commit
c56a641a
authored
Jul 10, 2024
by
刘帅阳
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
修改
parent
6cf5d5fc
Changes
2
Show whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
30 additions
and
17 deletions
+30
-17
CrawlerController.java
...cisoft/business/crawler/controller/CrawlerController.java
+16
-16
WebsiteCrawlerServiceImpl.java
...iness/crawler/service/impl/WebsiteCrawlerServiceImpl.java
+14
-1
No files found.
src/main/java/org/rcisoft/business/crawler/controller/CrawlerController.java
View file @
c56a641a
...
...
@@ -46,7 +46,7 @@ public class CrawlerController {
* web端爬取页面数据 + 定时爬取
*/
@GetMapping
(
value
=
"/start"
)
public
CyResult
start
()
{
public
CyResult
start
()
throws
Exception
{
CmsTask
cmsTask
=
cmsTaskService
.
add
(
"web"
+
new
Date
());
//将用户拿出来
String
authenBusinessId
=
CyUserUtil
.
getAuthenBusinessId
();
...
...
src/main/java/org/rcisoft/business/crawler/service/impl/WebsiteCrawlerServiceImpl.java
View file @
c56a641a
...
...
@@ -54,6 +54,7 @@ public class WebsiteCrawlerServiceImpl implements WebsiteCrawlerService {
public
static
final
String
CS_COM_CN
=
"https://www.cs.com.cn/"
;
public
static
final
String
CBIMC_CN
=
"http://www.cbimc.cn/"
;
public
static
final
String
E_CHINALIFE_COM
=
"https://www.e-chinalife.com/"
;
public
static
final
String
PEOPLEAPP_COM
=
"https://www.peopleapp.com/"
;
/**
* 指定URL
...
...
@@ -366,10 +367,12 @@ public class WebsiteCrawlerServiceImpl implements WebsiteCrawlerService {
map
=
getFinancePeople
(
doc
);
}
else
if
(
articleUrl
.
contains
(
FINANCE_CHINA_COM_CN
))
{
map
=
getFinanceChina
(
doc
);
}
else
if
(
articleUrl
.
contains
(
PEOPLEAPP_COM
))
{
map
=
getPeopleAppCom
(
doc
);
}
//通过 title 判断当前文章是否跟数据库有重复
String
title
=
cmsNewsService
.
getNewsByTitleByTitle
(
map
.
get
(
"title"
));
if
(
title
==
null
)
{
if
(
title
==
null
&&
map
.
containsKey
(
title
)
&&
map
.
containsKey
(
"content"
)
)
{
// 图片转换
Document
parse
=
Jsoup
.
parse
(
map
.
get
(
"content"
));
replaceImgSrc
(
parse
);
...
...
@@ -395,6 +398,16 @@ public class WebsiteCrawlerServiceImpl implements WebsiteCrawlerService {
}
private
Map
<
String
,
String
>
getPeopleAppCom
(
Document
document
)
{
Map
<
String
,
String
>
map
=
new
HashMap
<>();
String
title
=
document
.
select
(
"div.title"
).
html
();
String
content
=
document
.
select
(
"body"
).
html
();
map
.
put
(
"title"
,
title
);
map
.
put
(
"content"
,
content
);
return
map
;
}
/**
* 图片转换,防止盗链
*
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment