Skip to content
New issue

Have a question about this project? Sign up for a free GitHub account to open an issue and contact its maintainers and the community.

By clicking “Sign up for GitHub”, you agree to our terms of service and privacy statement. We’ll occasionally send you account related emails.

Already on GitHub? Sign in to your account

Introduce two new options to enabled/disabled the Private/ICANN domains. #123

Closed
wants to merge 1 commit into from

Conversation

remusao
Copy link
Collaborator

@remusao remusao commented Mar 18, 2018

When given as an option to fromUserSettings:

  • allowIcann set to false will ignore the ICANN section of the list.
  • allowPrivate set to false will ignore the PRIVATE section of the list.

This is an attempt to address the recurring issue about non-intuitive results on some domains. In particular, it allows to disable the use of the PRIVATE section from public suffix list (You can also disable the use of ICANN domains but it's probably less useful).

Instantiate tld.js with the custom option:

> const tldjs = require('./index.js')
> const custom = tld.fromUserSettings({ allowPrivate: false })

#120

> custom.parse('acc1sub1-dot-moisestest.appspot.com')
{ hostname: 'acc1sub1-dot-moisestest.appspot.com',
  isValid: true,
  isIp: false,
  tldExists: true,
  publicSuffix: 'com',
  domain: 'appspot.com',
  subdomain: 'acc1sub1-dot-moisestest' }

#117

> custom.parse('https://github.io')
{ hostname: 'github.io',
  isValid: true,
  isIp: false,
  tldExists: true,
  publicSuffix: 'io',
  domain: 'github.io',
  subdomain: '' }
> custom.parse('https://ngrok.io')
{ hostname: 'ngrok.io',
  isValid: true,
  isIp: false,
  tldExists: true,
  publicSuffix: 'io',
  domain: 'ngrok.io',
  subdomain: '' }
> custom.parse('https://sandcats.io')
{ hostname: 'sandcats.io',
  isValid: true,
  isIp: false,
  tldExists: true,
  publicSuffix: 'io',
  domain: 'sandcats.io',
  subdomain: '' }

#111

> custom.parse('http://bntp-assets.global.ssl.fastly.net/')
{ hostname: 'bntp-assets.global.ssl.fastly.net',
  isValid: true,
  isIp: false,
  tldExists: true,
  publicSuffix: 'net',
  domain: 'fastly.net',
  subdomain: 'bntp-assets.global.ssl' }
> custom.parse('http://cres-wpstatic.freetls.fastly.net/')
{ hostname: 'cres-wpstatic.freetls.fastly.net',
  isValid: true,
  isIp: false,
  tldExists: true,
  publicSuffix: 'net',
  domain: 'fastly.net',
  subdomain: 'cres-wpstatic.freetls' }
> custom.parse('http://r3engage-live.global.ssl.fastly.net/')
{ hostname: 'r3engage-live.global.ssl.fastly.net',
  isValid: true,
  isIp: false,
  tldExists: true,
  publicSuffix: 'net',
  domain: 'fastly.net',
  subdomain: 'r3engage-live.global.ssl' }
> custom.parse('http://ticket-magic-ember-herokuapp-com.global.ssl.fastly.net/')
{ hostname: 'ticket-magic-ember-herokuapp-com.global.ssl.fastly.net',
  isValid: true,
  isIp: false,
  tldExists: true,
  publicSuffix: 'net',
  domain: 'fastly.net',
  subdomain: 'ticket-magic-ember-herokuapp-com.global.ssl' }

#25

> custom.parse('blogspot.co.uk')
{ hostname: 'blogspot.co.uk',
  isValid: true,
  isIp: false,
  tldExists: true,
  publicSuffix: 'co.uk',
  domain: 'blogspot.co.uk',
  subdomain: '' }

So most of the issues are solved at this point. There are a few corner cases not handled by this change, but they could be with a few more options to customize the behavior of tld.js. In particular, #78 could benefit from an option to disable the use of wildcard rules (such as *.er which will consider google.er to have public suffix google.er).

@oncletom Do you think it could be useful to also add two fields in the resulting object from parse: isIcann and isPrivate. This would allow to know what kind of rule was used to parse a given URL/hostname.

Also, this PR introduces a breaking change in the way the trie is serialized/represented. Instead of having value 0 as leaf values, it now has either 1 (ICANN) or 2 (Private).

@thom4parisot
Copy link
Owner

Super nice!


writing out loud

My initial intention with this library was to use PublicSuffix as a database to identify subdomains and make sure TLD were well-known.

I think a lot of confusion comes from the fact some functions are here to serve the purpose of PublicSuffix (in a good way), and some are pure utility functions (independant from PublicSuffix).


Technically speaking, it's all good.

I'd rather give a meaningful name to the options rather than having make users to understand whether things are part of a Public or Private part of a given database.
I feel it would be more portable too.

So for example when you write

allowIcann set to false will ignore the ICANN section of the list.
allowPrivate set to false will ignore the PRIVATE section of the list.

What do both lines mean from a library perspective? What difference does it make in term of results?

@@ -12,17 +12,25 @@ var extractTldFromHost = require('./from-host.js');
* @param {string} hostname
* @return {string}
*/
module.exports = function getPublicSuffix(rules, hostname) {
module.exports = function getPublicSuffix(rules, hostname, allowIcann, allowPrivate) {
Copy link
Owner

Choose a reason for hiding this comment

The reason will be displayed to describe this comment to others. Learn more.

Here, I'd change allowIcann, allowPrivate in an options object.

At the same time, if it's not part of the public API, it does not matter as much.

@remusao
Copy link
Collaborator Author

remusao commented Mar 19, 2018

Thanks for the review! At the beginning I was wondering if we could have a set of options to tweak the behavior of tld.js (disable Private section, disable wildcard, etc.), which could be behind a more user-friendly flag (not sure about the right name for this), and would result overall in a less surprising behavior for tld.js. I'm a bit torn between the two options:

  1. In one way, I agree that it would be super nice to have more high-level names for the options, to communicate what it means in terms of behavior of tld.js.
  2. On the other hand, since there is no silver bullet and because we might need to combine several tweaks to make the overall behavior less surprising (which still might not work in all cases), it would be nice to leave users the possibility to control the options in a granular way.

Maybe there is a way to get the best of both worlds? By having some low-level options as well as high-level ones, which will translate to lower-level options when enabled? Or we could make the distinction between the public API exposed by tld.js, which should not be about the implementation details with public suffix list, and then the implementation in terms of public suffix list, where both would have their own set of options (one public and one internal), and the first one would translate into the other somehow. I'm just a bit worried about the added complexity here, both for development and users.

Do you have suggestions about how this could look like?

* 'allowIcann' set to 'false' will ignore the ICANN section of the list.
* 'allowPrivate' set to 'false' will ignore the PRIVATE section of the list.
@remusao remusao closed this Jan 10, 2019
@remusao remusao deleted the icann-vs-private branch January 10, 2019 20:00
Sign up for free to join this conversation on GitHub. Already have an account? Sign in to comment
Labels
None yet
Projects
None yet
Development

Successfully merging this pull request may close these issues.

None yet

2 participants