HTML Method
The HTML method extracts feed URLs from HTML content by scanning link elements and anchor tags.
Follows RSS Board Autodiscovery and WHATWG Feed Autodiscovery specs.
How It Works
The HTML method scans two types of elements:
Link Elements
Looks for <link> elements that advertise feeds:
<!-- rel="alternate" with feed MIME type -->
<link rel="alternate" type="application/rss+xml" href="/feed.xml" />
<!-- rel="feed" (WHATWG spec) -->
<link rel="feed" href="/feed" />Anchor Elements
Scans <a> tags for feed links using two strategies:
- URI matching — Checks if
hrefcontains common feed paths like/feed,/rss.xml. - Label matching — Checks if link text contains words like "RSS", "Feed", "Subscribe".
<!-- Matched by URI -->
<a href="/feed.xml">XML</a>
<!-- Matched by label -->
<a href="/subscribe">RSS Feed</a>Configuration
Feedscout comes with reasonable defaults, but you can customize how HTML is parsed if needed.
Link Selectors
Control which <link> elements are matched:
import { mimeTypes } from 'feedscout/feeds'
const feeds = await discoverFeeds(url, {
methods: {
html: {
linkSelectors: [
{ rel: 'alternate', types: mimeTypes },
{ rel: 'feed' },
],
},
},
})Anchor URIs
Specify URI patterns to match in anchor href attributes:
const feeds = await discoverFeeds(url, {
methods: {
html: {
anchorUris: ['/feed', '/rss', '/atom', '/rss.xml', '/feed.xml'],
},
},
})Anchor Labels
Specify text patterns to match in anchor content:
const feeds = await discoverFeeds(url, {
methods: {
html: {
anchorLabels: ['rss', 'feed', 'atom', 'subscribe'],
},
},
})Ignored URIs
Exclude certain URI patterns from anchor matching:
const feeds = await discoverFeeds(url, {
methods: {
html: {
anchorIgnoredUris: ['wp-json/oembed/', 'wp-json/wp/'],
},
},
})Default Values
You can import the default HTML options:
import { defaultHtmlOptions } from 'feedscout/feeds'The defaults include comprehensive anchor URIs and common feed-related labels. You can also import individual pieces:
import {
linkSelectors,
anchorLabels,
urisComprehensive,
ignoredUris,
} from 'feedscout/feeds'Using Directly
You can use the HTML discovery function directly to get URIs without validation:
import { discoverUrisFromHtml } from 'feedscout/methods'
const uris = discoverUrisFromHtml(htmlContent, {
baseUrl: 'https://example.com',
linkSelectors: [{ rel: 'alternate', types: ['application/rss+xml'] }],
anchorUris: ['/feed'],
anchorLabels: ['rss'],
anchorIgnoredUris: [],
})
// [
// 'https://example.com/feed.xml',
// 'https://example.com/rss',
// ]