Skip to content
Projects
Groups
Snippets
Help
Loading...
Help
Submit feedback
Contribute to GitLab
Sign in
Toggle navigation
I
intel_promotion_api
Project
Project
Details
Activity
Releases
Cycle Analytics
Repository
Repository
Files
Commits
Branches
Tags
Contributors
Graph
Compare
Charts
Issues
0
Issues
0
List
Board
Labels
Milestones
Merge Requests
0
Merge Requests
0
CI / CD
CI / CD
Pipelines
Jobs
Schedules
Charts
Wiki
Wiki
Snippets
Snippets
Members
Members
Collapse sidebar
Close sidebar
Activity
Graph
Charts
Create a new issue
Jobs
Commits
Issue Boards
Open sidebar
liyilin
intel_promotion_api
Commits
c56a641a
Commit
c56a641a
authored
Jul 10, 2024
by
刘帅阳
Browse files
Options
Browse Files
Download
Email Patches
Plain Diff
修改
parent
6cf5d5fc
Changes
2
Hide whitespace changes
Inline
Side-by-side
Showing
2 changed files
with
30 additions
and
17 deletions
+30
-17
CrawlerController.java
...cisoft/business/crawler/controller/CrawlerController.java
+16
-16
WebsiteCrawlerServiceImpl.java
...iness/crawler/service/impl/WebsiteCrawlerServiceImpl.java
+14
-1
No files found.
src/main/java/org/rcisoft/business/crawler/controller/CrawlerController.java
View file @
c56a641a
...
@@ -46,7 +46,7 @@ public class CrawlerController {
...
@@ -46,7 +46,7 @@ public class CrawlerController {
* web端爬取页面数据 + 定时爬取
* web端爬取页面数据 + 定时爬取
*/
*/
@GetMapping
(
value
=
"/start"
)
@GetMapping
(
value
=
"/start"
)
public
CyResult
start
()
{
public
CyResult
start
()
throws
Exception
{
CmsTask
cmsTask
=
cmsTaskService
.
add
(
"web"
+
new
Date
());
CmsTask
cmsTask
=
cmsTaskService
.
add
(
"web"
+
new
Date
());
//将用户拿出来
//将用户拿出来
String
authenBusinessId
=
CyUserUtil
.
getAuthenBusinessId
();
String
authenBusinessId
=
CyUserUtil
.
getAuthenBusinessId
();
...
@@ -111,22 +111,22 @@ public class CrawlerController {
...
@@ -111,22 +111,22 @@ public class CrawlerController {
String
authenBusinessId
=
CyUserUtil
.
getAuthenBusinessId
();
String
authenBusinessId
=
CyUserUtil
.
getAuthenBusinessId
();
// new Thread(() -> {
// new Thread(() -> {
// 调用接口
// 调用接口
try
{
try
{
Integer
count
=
publicAccountCrawlerService
.
scanImage
(
publicAccountNames
);
Integer
count
=
publicAccountCrawlerService
.
scanImage
(
publicAccountNames
);
cmsTask
.
setStatus
(
"1"
);
cmsTask
.
setStatus
(
"1"
);
cmsTask
.
setUpdateBy
(
authenBusinessId
);
cmsTask
.
setUpdateBy
(
authenBusinessId
);
cmsTask
.
setNum
(
count
);
cmsTask
.
setNum
(
count
);
cmsTaskService
.
merge
(
cmsTask
);
cmsTaskService
.
merge
(
cmsTask
);
}
catch
(
Exception
e
)
{
}
catch
(
Exception
e
)
{
//抛出异常
//抛出异常
e
.
printStackTrace
();
e
.
printStackTrace
();
System
.
out
.
println
(
"------------------"
+
e
);
System
.
out
.
println
(
"------------------"
+
e
);
//出现异常 修改 数据库 任务表数据 并 返回
//出现异常 修改 数据库 任务表数据 并 返回
cmsTask
.
setStatus
(
"2"
);
cmsTask
.
setStatus
(
"2"
);
cmsTask
.
setUpdateBy
(
authenBusinessId
);
cmsTask
.
setUpdateBy
(
authenBusinessId
);
cmsTaskService
.
merge
(
cmsTask
);
cmsTaskService
.
merge
(
cmsTask
);
}
}
// }).start();
// }).start();
return
CyResultGenUtil
.
builder
(
new
CyPersistModel
(
1
),
return
CyResultGenUtil
.
builder
(
new
CyPersistModel
(
1
),
CyMessCons
.
MESSAGE_ALERT_SUCCESS
,
CyMessCons
.
MESSAGE_ALERT_SUCCESS
,
...
...
src/main/java/org/rcisoft/business/crawler/service/impl/WebsiteCrawlerServiceImpl.java
View file @
c56a641a
...
@@ -54,6 +54,7 @@ public class WebsiteCrawlerServiceImpl implements WebsiteCrawlerService {
...
@@ -54,6 +54,7 @@ public class WebsiteCrawlerServiceImpl implements WebsiteCrawlerService {
public
static
final
String
CS_COM_CN
=
"https://www.cs.com.cn/"
;
public
static
final
String
CS_COM_CN
=
"https://www.cs.com.cn/"
;
public
static
final
String
CBIMC_CN
=
"http://www.cbimc.cn/"
;
public
static
final
String
CBIMC_CN
=
"http://www.cbimc.cn/"
;
public
static
final
String
E_CHINALIFE_COM
=
"https://www.e-chinalife.com/"
;
public
static
final
String
E_CHINALIFE_COM
=
"https://www.e-chinalife.com/"
;
public
static
final
String
PEOPLEAPP_COM
=
"https://www.peopleapp.com/"
;
/**
/**
* 指定URL
* 指定URL
...
@@ -366,10 +367,12 @@ public class WebsiteCrawlerServiceImpl implements WebsiteCrawlerService {
...
@@ -366,10 +367,12 @@ public class WebsiteCrawlerServiceImpl implements WebsiteCrawlerService {
map
=
getFinancePeople
(
doc
);
map
=
getFinancePeople
(
doc
);
}
else
if
(
articleUrl
.
contains
(
FINANCE_CHINA_COM_CN
))
{
}
else
if
(
articleUrl
.
contains
(
FINANCE_CHINA_COM_CN
))
{
map
=
getFinanceChina
(
doc
);
map
=
getFinanceChina
(
doc
);
}
else
if
(
articleUrl
.
contains
(
PEOPLEAPP_COM
))
{
map
=
getPeopleAppCom
(
doc
);
}
}
//通过 title 判断当前文章是否跟数据库有重复
//通过 title 判断当前文章是否跟数据库有重复
String
title
=
cmsNewsService
.
getNewsByTitleByTitle
(
map
.
get
(
"title"
));
String
title
=
cmsNewsService
.
getNewsByTitleByTitle
(
map
.
get
(
"title"
));
if
(
title
==
null
)
{
if
(
title
==
null
&&
map
.
containsKey
(
title
)
&&
map
.
containsKey
(
"content"
)
)
{
// 图片转换
// 图片转换
Document
parse
=
Jsoup
.
parse
(
map
.
get
(
"content"
));
Document
parse
=
Jsoup
.
parse
(
map
.
get
(
"content"
));
replaceImgSrc
(
parse
);
replaceImgSrc
(
parse
);
...
@@ -395,6 +398,16 @@ public class WebsiteCrawlerServiceImpl implements WebsiteCrawlerService {
...
@@ -395,6 +398,16 @@ public class WebsiteCrawlerServiceImpl implements WebsiteCrawlerService {
}
}
private
Map
<
String
,
String
>
getPeopleAppCom
(
Document
document
)
{
Map
<
String
,
String
>
map
=
new
HashMap
<>();
String
title
=
document
.
select
(
"div.title"
).
html
();
String
content
=
document
.
select
(
"body"
).
html
();
map
.
put
(
"title"
,
title
);
map
.
put
(
"content"
,
content
);
return
map
;
}
/**
/**
* 图片转换,防止盗链
* 图片转换,防止盗链
*
*
...
...
Write
Preview
Markdown
is supported
0%
Try again
or
attach a new file
Attach a file
Cancel
You are about to add
0
people
to the discussion. Proceed with caution.
Finish editing this message first!
Cancel
Please
register
or
sign in
to comment