Skip to content
Open
Changes from all commits
Commits
File filter

Filter by extension

Filter by extension

Conversations
Failed to load comments.
Loading
Jump to
Jump to file
Failed to load files.
Loading
Diff view
Diff view
9 changes: 9 additions & 0 deletions lib/routes/cna/utils.ts
Original file line number Diff line number Diff line change
Expand Up @@ -14,6 +14,15 @@ export async function getFullText(item) {
const content = load(detailResponse.data);
content('div.SubscriptionInner').remove();
content('.gmailNews').remove();

// Those boxes are for explaining terms. They are injected inline and interrupt reading.
// If readers want to learn about terms, they can learn more online.
content('.dictionary-box').remove();
// Those are for separating "延伸閱讀" links. On web pages, those links have style "display:block;".
// However, some RSS readers, such as TT-RSS, sanitize HTML aggressively and remove <style> tags and style= attributes.
// As a result, a wrapping div is needed.
content('.moreArticle-link').wrap('<div></div>');
Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Remove .moreArticle as well

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Do you mean removing the entire moreArticles block and all links inside it? I prefer to keep them, as those links are helpful for understanding the whole picture from relevant events.

Copy link
Copy Markdown
Collaborator

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Yes, those content aren't the main article.

Copy link
Copy Markdown
Contributor Author

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

I agree that those links are not part of the main article, while I believe keeping them is better for cna. Otherwise, some useful information is stripped away compared to web pages. https://rsshub.netlify.app/joinus/new-rss/start-code#better-reading-experience mentions an example that "will have a similar reading experience to the original website" using full article extraction, and my idea roughly follows that spirit.

Ideally, the description box is also preserved, while I cannot figure out a way to represent it properly, particularly in restrictive RSS readers like TT-RSS, so I remove it to make the reading experience closer to web pages - an uninterrupted reading flow.


const topImage = content('.fullPic').html();

item.description = (topImage === null ? '' : topImage) + content('.paragraph').eq(0).html();
Expand Down
Loading